Tag Archives: apt

Confused About the Hybrid Cloud? You’re Not Alone

Post Syndicated from Roderick Bauer original https://www.backblaze.com/blog/confused-about-the-hybrid-cloud-youre-not-alone/

Hybrid Cloud. What is it?

Do you have a clear understanding of the hybrid cloud? If you don’t, it’s not surprising.

Hybrid cloud has been applied to a greater and more varied number of IT solutions than almost any other recent data management term. About the only thing that’s clear about the hybrid cloud is that the term hybrid cloud wasn’t invented by customers, but by vendors who wanted to hawk whatever solution du jour they happened to be pushing.

Let’s be honest. We’re in an industry that loves hype. We can’t resist grafting hyper, multi, ultra, and super and other prefixes onto the beginnings of words to entice customers with something new and shiny. The alphabet soup of cloud-related terms can include various options for where the cloud is located (on-premises, off-premises), whether the resources are private or shared in some degree (private, community, public), what type of services are offered (storage, computing), and what type of orchestrating software is used to manage the workflow and the resources. With so many moving parts, it’s no wonder potential users are confused.

Let’s take a step back, try to clear up the misconceptions, and come up with a basic understanding of what the hybrid cloud is. To be clear, this is our viewpoint. Others are free to do what they like, so bear that in mind.

So, What is the Hybrid Cloud?

The hybrid cloud refers to a cloud environment made up of a mixture of on-premises private cloud resources combined with third-party public cloud resources that use some kind of orchestration between them.

To get beyond the hype, let’s start with Forrester Research‘s idea of the hybrid cloud: “One or more public clouds connected to something in my data center. That thing could be a private cloud; that thing could just be traditional data center infrastructure.”

To put it simply, a hybrid cloud is a mash-up of on-premises and off-premises IT resources.

To expand on that a bit, we can say that the hybrid cloud refers to a cloud environment made up of a mixture of on-premises private cloud[1] resources combined with third-party public cloud resources that use some kind of orchestration[2] between them. The advantage of the hybrid cloud model is that it allows workloads and data to move between private and public clouds in a flexible way as demands, needs, and costs change, giving businesses greater flexibility and more options for data deployment and use.

In other words, if you have some IT resources in-house that you are replicating or augmenting with an external vendor, congrats, you have a hybrid cloud!

Private Cloud vs. Public Cloud

The cloud is really just a collection of purpose built servers. In a private cloud, the servers are dedicated to a single tenant or a group of related tenants. In a public cloud, the servers are shared between multiple unrelated tenants (customers). A public cloud is off-site, while a private cloud can be on-site or off-site — or on-prem or off-prem.

As an example, let’s look at a hybrid cloud meant for data storage, a hybrid data cloud. A company might set up a rule that says all accounting files that have not been touched in the last year are automatically moved off-prem to cloud storage to save cost and reduce the amount of storage needed on-site. The files are still available; they are just no longer stored on your local systems. The rules can be defined to fit an organization’s workflow and data retention policies.

The hybrid cloud concept also contains cloud computing. For example, at the end of the quarter, order processing application instances can be spun up off-premises in a hybrid computing cloud as needed to add to on-premises capacity.

Hybrid Cloud Benefits

If we accept that the hybrid cloud combines the best elements of private and public clouds, then the benefits of hybrid cloud solutions are clear, and we can identify the primary two benefits that result from the blending of private and public clouds.

Diagram of the Components of the Hybrid Cloud

Benefit 1: Flexibility and Scalability

Undoubtedly, the primary advantage of the hybrid cloud is its flexibility. It takes time and money to manage in-house IT infrastructure and adding capacity requires advance planning.

The cloud is ready and able to provide IT resources whenever needed on short notice. The term cloud bursting refers to the on-demand and temporary use of the public cloud when demand exceeds resources available in the private cloud. For example, some businesses experience seasonal spikes that can put an extra burden on private clouds. These spikes can be taken up by a public cloud. Demand also can vary with geographic location, events, or other variables. The public cloud provides the elasticity to deal with these and other anticipated and unanticipated IT loads. The alternative would be fixed cost investments in on-premises IT resources that might not be efficiently utilized.

For a data storage user, the on-premises private cloud storage provides, among other benefits, the highest speed access. For data that is not frequently accessed, or needed with the absolute lowest levels of latency, it makes sense for the organization to move it to a location that is secure, but less expensive. The data is still readily available, and the public cloud provides a better platform for sharing the data with specific clients, users, or with the general public.

Benefit 2: Cost Savings

The public cloud component of the hybrid cloud provides cost-effective IT resources without incurring capital expenses and labor costs. IT professionals can determine the best configuration, service provider, and location for each service, thereby cutting costs by matching the resource with the task best suited to it. Services can be easily scaled, redeployed, or reduced when necessary, saving costs through increased efficiency and avoiding unnecessary expenses.

Comparing Private vs Hybrid Cloud Storage Costs

To get an idea of the difference in storage costs between a purely on-premises solutions and one that uses a hybrid of private and public storage, we’ll present two scenarios. For each scenario we’ll use data storage amounts of 100 terabytes, 1 petabyte, and 2 petabytes. Each table is the same format, all we’ve done is change how the data is distributed: private (on-premises) cloud or public (off-premises) cloud. We are using the costs for our own B2 Cloud Storage in this example. The math can be adapted for any set of numbers you wish to use.

Scenario 1    100% of data on-premises storage

Data Stored
Data stored On-Premises: 100% 100 TB 1,000 TB 2,000 TB
On-premises cost range Monthly Cost
Low — $12/TB/Month $1,200 $12,000 $24,000
High — $20/TB/Month $2,000 $20,000 $40,000

Scenario 2    20% of data on-premises with 80% public cloud storage (B2)

Data Stored
Data stored On-Premises: 20% 20 TB 200 TB 400 TB
Data stored in Cloud: 80% 80 TB 800 TB 1,600 TB
On-premises cost range Monthly Cost
Low — $12/TB/Month $240 $2,400 $4,800
High — $20/TB/Month $400 $4,000 $8,000
Public cloud cost range Monthly Cost
Low — $5/TB/Month (B2) $400 $4,000 $8,000
High — $20/TB/Month $1,600 $16,000 $32,000
On-premises + public cloud cost range Monthly Cost
Low $640 $6,400 $12,800
High $2,000 $20,000 $40,000

As can be seen in the numbers above, using a hybrid cloud solution and storing 80% of the data in the cloud with a provider such as Backblaze B2 can result in significant savings over storing only on-premises. For other cost scenarios, see the B2 Cost Calculator.

When Hybrid Might Not Always Be the Right Fit

There are circumstances where the hybrid cloud might not be the best solution. Smaller organizations operating on a tight IT budget might best be served by a purely public cloud solution. The cost of setting up and running private servers is substantial.

An application that requires the highest possible speed might not be suitable for hybrid, depending on the specific cloud implementation. While latency does play a factor in data storage for some users, it is less of a factor for uploading and downloading data than it is for organizations using the hybrid cloud for computing. Because Backblaze recognized the importance of speed and low-latency for customers wishing to use computing on data stored in B2, we directly connected our data centers with those of our computing partners, ensuring that latency would not be an issue even for a hybrid cloud computing solution.

It is essential to have a good understanding of workloads and their essential characteristics in order to make the hybrid cloud work well for you. Each application needs to be examined for the right mix of private cloud, public cloud, and traditional IT resources that fit the particular workload in order to benefit most from a hybrid cloud architecture.

The Hybrid Cloud Can Be a Win-Win Solution

From the high altitude perspective, any solution that enables an organization to respond in a flexible manner to IT demands is a win. Avoiding big upfront capital expenses for in-house IT infrastructure will appeal to the CFO. Being able to quickly spin up IT resources as they’re needed will appeal to the CTO and VP of Operations.

Should You Go Hybrid?

We’ve arrived at the bottom line and the question is, should you or your organization embrace hybrid cloud infrastructures?

According to 451 Research, by 2019, 69% of companies will operate in hybrid cloud environments, and 60% of workloads will be running in some form of hosted cloud service (up from 45% in 2017). That indicates that the benefits of the hybrid cloud appeal to a broad range of companies.

In Two Years, More Than Half of Workloads Will Run in Cloud

Clearly, depending on an organization’s needs, there are advantages to a hybrid solution. While it might have been possible to dismiss the hybrid cloud in the early days of the cloud as nothing more than a buzzword, that’s no longer true. The hybrid cloud has evolved beyond the marketing hype to offer real solutions for an increasingly complex and challenging IT environment.

If an organization approaches the hybrid cloud with sufficient planning and a structured approach, a hybrid cloud can deliver on-demand flexibility, empower legacy systems and applications with new capabilities, and become a catalyst for digital transformation. The result can be an elastic and responsive infrastructure that has the ability to quickly respond to changing demands of the business.

As data management professionals increasingly recognize the advantages of the hybrid cloud, we can expect more and more of them to embrace it as an essential part of their IT strategy.

Tell Us What You’re Doing with the Hybrid Cloud

Are you currently embracing the hybrid cloud, or are you still uncertain or hanging back because you’re satisfied with how things are currently? Maybe you’ve gone totally hybrid. We’d love to hear your comments below on how you’re dealing with the hybrid cloud.


[1] Private cloud can be on-premises or a dedicated off-premises facility.

[2] Hybrid cloud orchestration solutions are often proprietary, vertical, and task dependent.

The post Confused About the Hybrid Cloud? You’re Not Alone appeared first on Backblaze Blog | Cloud Storage & Cloud Backup.

Pirates Taunt Amazon Over New “Turd Sandwich” Prime Video Quality

Post Syndicated from Andy original https://torrentfreak.com/pirates-taunt-amazon-over-new-turd-sandwich-prime-video-quality-180419/

Even though they generally aren’t paying for the content they consume, don’t fall into the trap of believing that all pirates are eternally grateful for even poor quality media.

Without a doubt, some of the most quality-sensitive individuals are to be found in pirate communities and they aren’t scared to make their voices known when release groups fail to come up with the best possible goods.

This week there’s been a sustained chorus of disapproval over the quality of pirate video releases sourced from Amazon Prime. The anger is usually directed at piracy groups who fail to capture content in the correct manner but according to a number of observers, the problem is actually at Amazon’s end.

Discussions on Reddit, for example, report that episodes in a single TV series have been declining in filesize and bitrate, from 1.56 GB in 720p at a 3073 kb/s video bitrate for episode 1, down to 907 MB in 720p at just 1514 kb/s video bitrate for episode 10.

Numerous theories as to why this may be the case are being floated around, including that Amazon is trying to save on bandwidth expenses. While this is a possibility, the company hasn’t made any announcements to that end.

Indeed, one legitimate customer reported that he’d raised the quality issue with Amazon and they’d said that the problem was “probably on his end”.

“I have Amazon Prime Video and I noticed the quality was always great for their exclusive shows, so I decided to try buying the shows on Amazon instead of iTunes this year. I paid for season pass subscriptions for Legion, Billions and Homeland this year,” he wrote.

“Just this past weekend, I have noticed a significant drop in details compared to weeks before! So naturally I assumed it was an issue on my end. I started trying different devices, calling support, etc, but nothing really helped.

“Billions continued to look like a blurry mess, almost like I was watching a standard definition DVD instead of the crystal clear HD I paid for and have experienced in the past! And when I check the previous episodes, sure enough, they look fantastic again. What the heck??”

With Amazon distancing itself from the issues, piracy groups have already begun to dig in the knife. Release group DEFLATE has been particularly critical.

“Amazon, in their infinite wisdom, have decided to start fucking with the quality of their encodes. They’re now reaching Netflix’s subpar 1080p.H264 levels, and their H265 encodes aren’t even close to what Netflix produces,” the group said in a file attached to S02E07 of The Good Fight released on Sunday.

“Netflix is able to produce drastic visual improvements with their H265 encodes compared to H264 across every original. In comparison, Amazon can’t decide whether H265 or H264 is going to produce better results, and as a result we suffer for it.”

Arrr! The quality be fallin’

So what’s happening exactly?

A TorrentFreak source (who tells us he’s been working in the BluRay/DCP authoring business for the last 10 years) was kind enough to give us two opinions, one aimed at the techies and another at us mere mortals.

“In technical terms, it appears [Amazon has] increased the CRF [Constant Rate Factor] value they use when encoding for both the HEVC [H265] and H264 streams. Previously, their H264 streams were using CRF 18 and a max bitrate of 15Mbit/s, which usually resulted in file sizes of roughly 3GB, or around 10Mbit/s. Similarly with their HEVC streams, they were using CRF 20 and resulting in streams which were around the same size,” he explained.

“In the past week, the H264 streams have decreased by up to 50% for some streams. While there are no longer any x264 headers embedded in the H264 streams, the HEVC streams still retain those headers and the CRF value used has been increased, so it does appear this change has been done on purpose.”

In layman’s terms, our source believes that Amazon had previously been using an encoding profile that was “right on the edge of relatively good quality” which kept bitrates relatively low but high enough to ensure no perceivable loss of quality.

“H264 streams encoded with CRF 18 could provide an acceptable compromise between quality and file size, where the loss of detail is often negligible when watched at regular viewing distances, at a desk, or in a lounge room on a larger TV,” he explained.

“Recently, it appears these values have been intentionally changed in order to lower the bitrate and file sizes for reasons unknown. As a result, the quality of some streams has been reduced by up to 50% of their previous values. This has introduced a visual loss of quality, comparable to that of viewing something in standard definition versus high definition.”

With the situation failing to improve during the week, by the time piracy group DEFLATE released S03E14 of Supergirl on Tuesday their original criticism had transformed into flat-out insults.

“These are only being done in H265 because Amazon have shit the bed, and it’s a choice between a turd sandwich and a giant douche,” they wrote, offering these images as illustrative of the problem and these indicating what should be achievable.

With DEFLATE advising customers to start complaining to Amazon, the memes have already begun, with unfavorable references to now-defunct group YIFY (which was often chastized for its low quality rips) and even a spin on one of the most well known anti-piracy campaigns.

You wouldn’t download stream….

TorrentFreak contacted Amazon Prime for comment on both the recent changes and growing customer complaints but at the time of publication we were yet to receive a response.

Source: TF, for the latest info on copyright, file-sharing, torrent sites and more. We also have VPN reviews, discounts, offers and coupons.

Notes on setting up Raspberry Pi 3 as WiFi hotspot

Post Syndicated from Robert Graham original https://blog.erratasec.com/2018/04/notes-on-setting-up-raspberry-pi-3-as.html

I want to sniff the packets for IoT devices. There are a number of ways of doing this, but one straightforward mechanism is configuring a “Raspberry Pi 3 B” as a WiFi hotspot, then running tcpdump on it to record all the packets that pass through it. Google gives lots of results on how to do this, but they all demand that you have the precise hardware, WiFi hardware, and software that the authors do, so that’s a pain.

I got it working using the instructions here. There are a few additional notes, which is why I’m writing this blogpost, so I remember them.
https://www.raspberrypi.org/documentation/configuration/wireless/access-point.md

I’m using the RPi-3-B and not the RPi-3-B+, and the latest version of Raspbian at the time of this writing, “Raspbian Stretch Lite 2018-3-13”.

Some things didn’t work as described. The first is that it couldn’t find the package “hostapd”. That solution was to run “apt-get update” a second time.

The second problem was error message about the NAT not working when trying to set the masquerade rule. That’s because the ‘upgrade’ updates the kernel, making the running system out-of-date with the files on the disk. The solution to that is make sure you reboot after upgrading.

Thus, what you do at the start is:

apt-get update
apt-get upgrade
apt-get update
shutdown -r now

Then it’s just “apt-get install tcpdump” and start capturing on wlan0. This will get the non-monitor-mode Ethernet frames, which is what I want.

WHOIS Limits Under GDPR Will Make Pirates Harder to Catch, Groups Fear

Post Syndicated from Andy original https://torrentfreak.com/whois-limits-under-gdpr-will-make-pirates-harder-to-catch-groups-fear-180413/

The General Data Protection Regulation (GDPR) is a regulation in EU law covering data protection and privacy for all individuals within the European Union.

As more and more personal data is gathered, stored and (ab)used online, the aim of the GDPR is to protect EU citizens from breaches of privacy. The regulation applies to all companies processing the personal data of subjects residing in the Union, no matter where in the world the company is located.

Penalties for non-compliance can be severe. While there is a tiered approach according to severity, organizations can be fined up to 4% of annual global turnover or €20 million, whichever is greater. Needless to say, the regulations will need to be taken seriously.

Among those affected are domain name registries and registrars who publish the personal details of domain name owners in the public WHOIS database. In a full entry, a person or organization’s name, address, telephone numbers and email addresses can often be found.

This raises a serious issue. While registries and registrars are instructed and contractually obliged to publish data in the WHOIS database by global domain name authority ICANN, in millions of cases this conflicts with the requirements of the GDPR, which prevents the details of private individuals being made freely available on the Internet.

As explained in detail by the EFF, ICANN has been trying to resolve this clash. Its proposed interim model for GDPR compliance (pdf) envisions registrars continuing to collect full WHOIS data but not necessarily publishing it, to “allow the existing data
to be preserved while the community discussions continue on the next generation of WHOIS.”

But the proposed changes that will inevitably restrict free access to WHOIS information has plenty of people spooked, including thousands of companies belonging to entertainment industry groups such as the MPAA, IFPI, RIAA and the Copyright Alliance.

In a letter sent to Vice President Andrus Ansip of the European Commission, these groups and dozens of others warn that restricted access to WHOIS will have a serious effect on their ability to protect their intellectual property rights from “cybercriminals” which pose a threat to their businesses.

Signed by 50 organizations involved in IP protection and other areas of online security, the letter expresses concern that in attempting to comply with the GDPR, ICANN is on a course to “over-correct” while disregarding proportionality, accountability and transparency.

A small sample of the groups calling on ICANN

“We strongly assert that this model does not properly account for the critical public and legitimate interests served by maintaining a sufficient amount of data publicly available while respecting privacy interests of registrants by instituting a tiered or layered access system for the vast majority of personal data as defined by the GDPR,” the groups write.

The letter focuses on two aspects of “over-correction”, the first being ICANN’s proposal that no personal data whatsoever of a domain name registrant will be made available “without appropriate consideration or balancing of the countervailing interests in public disclosure of a limited amount of such data.”

In response to ICANN’s proposal that only the province/state and country of a domain name registrant be made publicly available, the groups advise the organization that publishing “a natural person registrant’s e-mail address” in a publicly accessible WHOIS directory will not constitute a breach of the GDPR.

“[W]e strongly believe that the continued public availability of the registrant’s e-mail address – specifically the e-mail address that the registrant supplies to the registrar at the time the domain name is purchased and which e-mail address the registrar is required to validate – is critical for several reasons,” the groups write.

“First, it is the data element that is typically the most important to have readily available for law enforcement, consumer protection, particularly child protection, intellectual property enforcement and cybersecurity/anti-malware purposes.

“Second, the public accessibility of the registrant’s e-mail address permits a broad array of threats and illegal activities to be addressed quickly and the damage from such threats mitigated and contained in a timely manner, particularly where the abusive/illegal activity may be spawned from a variety of different domain names on different generic Top Level Domains,” they add.

The groups also argue that since making email addresses is effectively required in light of Article 5.1(c) ECD, “there is no legitimate justification to discontinue public availability of the registrant’s e-mail address in the WHOIS directory and especially not in light of other legitimate purposes.”

The EFF, on the other hand, says that being able to contact a domain owner wouldn’t necessarily require an email address to be made public.

“There are other cases in which it makes sense to allow members of the public to contact the owner of a domain, without having to obtain a court order,” EFF writes.

“But this could be achieved very simply if ICANN were simply to provide something like a CAPTCHA-protected contact form, which would deliver email to the appropriate contact point with no need to reveal the registrant’s actual email address.”

The groups’ second main concern is that ICANN reportedly makes no distinction between name registrants that are “natural persons versus those that are legal entities” and intends to treat them all as if they are subject to the GDPR, despite the fact that the regulation only applies to data associated with an “identified or identifiable natural person”.

They say it is imperative that EU Data Protection Authorities are made to understand that when registrants obtain a domain for illegal purposes, they often only register it as a “natural person” when registering as a legal person (legal entity) would be more appropriate, despite that granting them less privacy.

“Consequently, the test for differentiating between a legal and natural person should not merely be the legal status of the registrant, but also whether the registrant is, in fact, acting as a legal or natural person vis a vis the use of the domain name,” the groups note.

“We therefore urge that ICANN be given appropriate guidance as to the importance of maintaining a distinction between natural person and legal person registrants and keeping as much data about legal person domain name registrants as publicly accessible as possible,” they conclude.

What will happen with WHOIS on May 25 still isn’t clear. It wasn’t until October 2017 that ICANN finally determined that it would be affected by the GDPR, meaning that it’s been scrambling ever since to meet the compliance date. And it still is, according to the latest available documentation (pdf).

Source: TF, for the latest info on copyright, file-sharing, torrent sites and more. We also have VPN reviews, discounts, offers and coupons.

How to retain system tables’ data spanning multiple Amazon Redshift clusters and run cross-cluster diagnostic queries

Post Syndicated from Karthik Sonti original https://aws.amazon.com/blogs/big-data/how-to-retain-system-tables-data-spanning-multiple-amazon-redshift-clusters-and-run-cross-cluster-diagnostic-queries/

Amazon Redshift is a data warehouse service that logs the history of the system in STL log tables. The STL log tables manage disk space by retaining only two to five days of log history, depending on log usage and available disk space.

To retain STL tables’ data for an extended period, you usually have to create a replica table for every system table. Then, for each you load the data from the system table into the replica at regular intervals. By maintaining replica tables for STL tables, you can run diagnostic queries on historical data from the STL tables. You then can derive insights from query execution times, query plans, and disk-spill patterns, and make better cluster-sizing decisions. However, refreshing replica tables with live data from STL tables at regular intervals requires schedulers such as Cron or AWS Data Pipeline. Also, these tables are specific to one cluster and they are not accessible after the cluster is terminated. This is especially true for transient Amazon Redshift clusters that last for only a finite period of ad hoc query execution.

In this blog post, I present a solution that exports system tables from multiple Amazon Redshift clusters into an Amazon S3 bucket. This solution is serverless, and you can schedule it as frequently as every five minutes. The AWS CloudFormation deployment template that I provide automates the solution setup in your environment. The system tables’ data in the Amazon S3 bucket is partitioned by cluster name and query execution date to enable efficient joins in cross-cluster diagnostic queries.

I also provide another CloudFormation template later in this post. This second template helps to automate the creation of tables in the AWS Glue Data Catalog for the system tables’ data stored in Amazon S3. After the system tables are exported to Amazon S3, you can run cross-cluster diagnostic queries on the system tables’ data and derive insights about query executions in each Amazon Redshift cluster. You can do this using Amazon QuickSight, Amazon Athena, Amazon EMR, or Amazon Redshift Spectrum.

You can find all the code examples in this post, including the CloudFormation templates, AWS Glue extract, transform, and load (ETL) scripts, and the resolution steps for common errors you might encounter in this GitHub repository.

Solution overview

The solution in this post uses AWS Glue to export system tables’ log data from Amazon Redshift clusters into Amazon S3. The AWS Glue ETL jobs are invoked at a scheduled interval by AWS Lambda. AWS Systems Manager, which provides secure, hierarchical storage for configuration data management and secrets management, maintains the details of Amazon Redshift clusters for which the solution is enabled. The last-fetched time stamp values for the respective cluster-table combination are maintained in an Amazon DynamoDB table.

The following diagram covers the key steps involved in this solution.

The solution as illustrated in the preceding diagram flows like this:

  1. The Lambda function, invoke_rs_stl_export_etl, is triggered at regular intervals, as controlled by Amazon CloudWatch. It’s triggered to look up the AWS Systems Manager parameter store to get the details of the Amazon Redshift clusters for which the system table export is enabled.
  2. The same Lambda function, based on the Amazon Redshift cluster details obtained in step 1, invokes the AWS Glue ETL job designated for the Amazon Redshift cluster. If an ETL job for the cluster is not found, the Lambda function creates one.
  3. The ETL job invoked for the Amazon Redshift cluster gets the cluster credentials from the parameter store. It gets from the DynamoDB table the last exported time stamp of when each of the system tables was exported from the respective Amazon Redshift cluster.
  4. The ETL job unloads the system tables’ data from the Amazon Redshift cluster into an Amazon S3 bucket.
  5. The ETL job updates the DynamoDB table with the last exported time stamp value for each system table exported from the Amazon Redshift cluster.
  6. The Amazon Redshift cluster system tables’ data is available in Amazon S3 and is partitioned by cluster name and date for running cross-cluster diagnostic queries.

Understanding the configuration data

This solution uses AWS Systems Manager parameter store to store the Amazon Redshift cluster credentials securely. The parameter store also securely stores other configuration information that the AWS Glue ETL job needs for extracting and storing system tables’ data in Amazon S3. Systems Manager comes with a default AWS Key Management Service (AWS KMS) key that it uses to encrypt the password component of the Amazon Redshift cluster credentials.

The following table explains the global parameters and cluster-specific parameters required in this solution. The global parameters are defined once and applicable at the overall solution level. The cluster-specific parameters are specific to an Amazon Redshift cluster and repeat for each cluster for which you enable this post’s solution. The CloudFormation template explained later in this post creates these parameters as part of the deployment process.

Parameter name Type Description
Global parametersdefined once and applied to all jobs
redshift_query_logs.global.s3_prefix String The Amazon S3 path where the query logs are exported. Under this path, each exported table is partitioned by cluster name and date.
redshift_query_logs.global.tempdir String The Amazon S3 path that AWS Glue ETL jobs use for temporarily staging the data.
redshift_query_logs.global.role> String The name of the role that the AWS Glue ETL jobs assume. Just the role name is sufficient. The complete Amazon Resource Name (ARN) is not required.
redshift_query_logs.global.enabled_cluster_list StringList A comma-separated list of cluster names for which system tables’ data export is enabled. This gives flexibility for a user to exclude certain clusters.
Cluster-specific parametersfor each cluster specified in the enabled_cluster_list parameter
redshift_query_logs.<<cluster_name>>.connection String The name of the AWS Glue Data Catalog connection to the Amazon Redshift cluster. For example, if the cluster name is product_warehouse, the entry is redshift_query_logs.product_warehouse.connection.
redshift_query_logs.<<cluster_name>>.user String The user name that AWS Glue uses to connect to the Amazon Redshift cluster.
redshift_query_logs.<<cluster_name>>.password Secure String The password that AWS Glue uses to connect the Amazon Redshift cluster’s encrypted-by key that is managed in AWS KMS.

For example, suppose that you have two Amazon Redshift clusters, product-warehouse and category-management, for which the solution described in this post is enabled. In this case, the parameters shown in the following screenshot are created by the solution deployment CloudFormation template in the AWS Systems Manager parameter store.

Solution deployment

To make it easier for you to get started, I created a CloudFormation template that automatically configures and deploys the solution—only one step is required after deployment.

Prerequisites

To deploy the solution, you must have one or more Amazon Redshift clusters in a private subnet. This subnet must have a network address translation (NAT) gateway or a NAT instance configured, and also a security group with a self-referencing inbound rule for all TCP ports. For more information about why AWS Glue ETL needs the configuration it does, described previously, see Connecting to a JDBC Data Store in a VPC in the AWS Glue documentation.

To start the deployment, launch the CloudFormation template:

CloudFormation stack parameters

The following table lists and describes the parameters for deploying the solution to export query logs from multiple Amazon Redshift clusters.

Property Default Description
S3Bucket mybucket The bucket this solution uses to store the exported query logs, stage code artifacts, and perform unloads from Amazon Redshift. For example, the mybucket/extract_rs_logs/data bucket is used for storing all the exported query logs for each system table partitioned by the cluster. The mybucket/extract_rs_logs/temp/ bucket is used for temporarily staging the unloaded data from Amazon Redshift. The mybucket/extract_rs_logs/code bucket is used for storing all the code artifacts required for Lambda and the AWS Glue ETL jobs.
ExportEnabledRedshiftClusters Requires Input A comma-separated list of cluster names from which the system table logs need to be exported.
DataStoreSecurityGroups Requires Input A list of security groups with an inbound rule to the Amazon Redshift clusters provided in the parameter, ExportEnabledClusters. These security groups should also have a self-referencing inbound rule on all TCP ports, as explained on Connecting to a JDBC Data Store in a VPC.

After you launch the template and create the stack, you see that the following resources have been created:

  1. AWS Glue connections for each Amazon Redshift cluster you provided in the CloudFormation stack parameter, ExportEnabledRedshiftClusters.
  2. All parameters required for this solution created in the parameter store.
  3. The Lambda function that invokes the AWS Glue ETL jobs for each configured Amazon Redshift cluster at a regular interval of five minutes.
  4. The DynamoDB table that captures the last exported time stamps for each exported cluster-table combination.
  5. The AWS Glue ETL jobs to export query logs from each Amazon Redshift cluster provided in the CloudFormation stack parameter, ExportEnabledRedshiftClusters.
  6. The IAM roles and policies required for the Lambda function and AWS Glue ETL jobs.

After the deployment

For each Amazon Redshift cluster for which you enabled the solution through the CloudFormation stack parameter, ExportEnabledRedshiftClusters, the automated deployment includes temporary credentials that you must update after the deployment:

  1. Go to the parameter store.
  2. Note the parameters <<cluster_name>>.user and redshift_query_logs.<<cluster_name>>.password that correspond to each Amazon Redshift cluster for which you enabled this solution. Edit these parameters to replace the placeholder values with the right credentials.

For example, if product-warehouse is one of the clusters for which you enabled system table export, you edit these two parameters with the right user name and password and choose Save parameter.

Querying the exported system tables

Within a few minutes after the solution deployment, you should see Amazon Redshift query logs being exported to the Amazon S3 location, <<S3Bucket_you_provided>>/extract_redshift_query_logs/data/. In that bucket, you should see the eight system tables partitioned by customer name and date: stl_alert_event_log, stl_dlltext, stl_explain, stl_query, stl_querytext, stl_scan, stl_utilitytext, and stl_wlm_query.

To run cross-cluster diagnostic queries on the exported system tables, create external tables in the AWS Glue Data Catalog. To make it easier for you to get started, I provide a CloudFormation template that creates an AWS Glue crawler, which crawls the exported system tables stored in Amazon S3 and builds the external tables in the AWS Glue Data Catalog.

Launch this CloudFormation template to create external tables that correspond to the Amazon Redshift system tables. S3Bucket is the only input parameter required for this stack deployment. Provide the same Amazon S3 bucket name where the system tables’ data is being exported. After you successfully create the stack, you can see the eight tables in the database, redshift_query_logs_db, as shown in the following screenshot.

Now, navigate to the Athena console to run cross-cluster diagnostic queries. The following screenshot shows a diagnostic query executed in Athena that retrieves query alerts logged across multiple Amazon Redshift clusters.

You can build the following example Amazon QuickSight dashboard by running cross-cluster diagnostic queries on Athena to identify the hourly query count and the key query alert events across multiple Amazon Redshift clusters.

How to extend the solution

You can extend this post’s solution in two ways:

  • Add any new Amazon Redshift clusters that you spin up after you deploy the solution.
  • Add other system tables or custom query results to the list of exports from an Amazon Redshift cluster.

Extend the solution to other Amazon Redshift clusters

To extend the solution to more Amazon Redshift clusters, add the three cluster-specific parameters in the AWS Systems Manager parameter store following the guidelines earlier in this post. Modify the redshift_query_logs.global.enabled_cluster_list parameter to append the new cluster to the comma-separated string.

Extend the solution to add other tables or custom queries to an Amazon Redshift cluster

The current solution ships with the export functionality for the following Amazon Redshift system tables:

  • stl_alert_event_log
  • stl_dlltext
  • stl_explain
  • stl_query
  • stl_querytext
  • stl_scan
  • stl_utilitytext
  • stl_wlm_query

You can easily add another system table or custom query by adding a few lines of code to the AWS Glue ETL job, <<cluster-name>_extract_rs_query_logs. For example, suppose that from the product-warehouse Amazon Redshift cluster you want to export orders greater than $2,000. To do so, add the following five lines of code to the AWS Glue ETL job product-warehouse_extract_rs_query_logs, where product-warehouse is your cluster name:

  1. Get the last-processed time-stamp value. The function creates a value if it doesn’t already exist.

salesLastProcessTSValue = functions.getLastProcessedTSValue(trackingEntry=”mydb.sales_2000",job_configs=job_configs)

  1. Run the custom query with the time stamp.

returnDF=functions.runQuery(query="select * from sales s join order o where o.order_amnt > 2000 and sale_timestamp > '{}'".format (salesLastProcessTSValue) ,tableName="mydb.sales_2000",job_configs=job_configs)

  1. Save the results to Amazon S3.

functions.saveToS3(dataframe=returnDF,s3Prefix=s3Prefix,tableName="mydb.sales_2000",partitionColumns=["sale_date"],job_configs=job_configs)

  1. Get the latest time-stamp value from the returned data frame in Step 2.

latestTimestampVal=functions.getMaxValue(returnDF,"sale_timestamp",job_configs)

  1. Update the last-processed time-stamp value in the DynamoDB table.

functions.updateLastProcessedTSValue(“mydb.sales_2000",latestTimestampVal[0],job_configs)

Conclusion

In this post, I demonstrate a serverless solution to retain the system tables’ log data across multiple Amazon Redshift clusters. By using this solution, you can incrementally export the data from system tables into Amazon S3. By performing this export, you can build cross-cluster diagnostic queries, build audit dashboards, and derive insights into capacity planning by using services such as Athena. I also demonstrate how you can extend this solution to other ad hoc query use cases or tables other than system tables by adding a few lines of code.


Additional Reading

If you found this post useful, be sure to check out Using Amazon Redshift Spectrum, Amazon Athena, and AWS Glue with Node.js in Production and Amazon Redshift – 2017 Recap.


About the Author

Karthik Sonti is a senior big data architect at Amazon Web Services. He helps AWS customers build big data and analytical solutions and provides guidance on architecture and best practices.

 

 

 

 

More power to your Pi

Post Syndicated from James Adams original https://www.raspberrypi.org/blog/pi-power-supply-chip/

It’s been just over three weeks since we launched the new Raspberry Pi 3 Model B+. Although the product is branded Raspberry Pi 3B+ and not Raspberry Pi 4, a serious amount of engineering was involved in creating it. The wireless networking, USB/Ethernet hub, on-board power supplies, and BCM2837 chip were all upgraded: together these represent almost all the circuitry on the board! Today, I’d like to tell you about the work that has gone into creating a custom power supply chip for our newest computer.

Raspberry Pi 3 Model B+, with custome power supply chip

The new Raspberry Pi 3B+, sporting a new, custom power supply chip (bottom left-hand corner)

Successful launch

The Raspberry Pi 3B+ has been well received, and we’ve enjoyed hearing feedback from the community as well as reading the various reviews and articles highlighting the solid improvements in wireless networking, Ethernet, CPU, and thermal performance of the new board. Gareth Halfacree’s post here has some particularly nice graphs showing the increased performance as well as how the Pi 3B+ keeps cool under load due to the new CPU package that incorporates a metal heat spreader. The Raspberry Pi production lines at the Sony UK Technology Centre are running at full speed, and it seems most people who want to get hold of the new board are able to find one in stock.

Powering your Pi

One of the most critical but often under-appreciated elements of any electronic product, particularly one such as Raspberry Pi with lots of complex on-board silicon (processor, networking, high-speed memory), is the power supply. In fact, the Raspberry Pi 3B+ has no fewer than six different voltage rails: two at 3.3V — one special ‘quiet’ one for audio, and one for everything else; 1.8V; 1.2V for the LPDDR2 memory; and 1.2V nominal for the CPU core. Note that the CPU voltage is actually raised and lowered on the fly as the speed of the CPU is increased and decreased depending on how hard the it is working. The sixth rail is 5V, which is the master supply that all the others are created from, and the output voltage for the four downstream USB ports; this is what the mains power adaptor is supplying through the micro USB power connector.

Power supply primer

There are two common classes of power supply circuits: linear regulators and switching regulators. Linear regulators work by creating a lower, regulated voltage from a higher one. In simple terms, they monitor the output voltage against an internally generated reference and continually change their own resistance to keep the output voltage constant. Switching regulators work in a different way: they ‘pump’ energy by first storing the energy coming from the source supply in a reactive component (usually an inductor, sometimes a capacitor) and then releasing it to the regulated output supply. The switches in switching regulators effect this energy transfer by first connecting the inductor (or capacitor) to store the source energy, and then switching the circuit so the energy is released to its destination.

Linear regulators produce smoother, less noisy output voltages, but they can only convert to a lower voltage, and have to dissipate energy to do so. The higher the output current and the voltage difference across them is, the more energy is lost as heat. On the other hand, switching supplies can, depending on their design, convert any voltage to any other voltage and can be much more efficient (efficiencies of 90% and above are not uncommon). However, they are more complex and generate noisier output voltages.

Designers use both types of regulators depending on the needs of the downstream circuit: for low-voltage drops, low current, or low noise, linear regulators are usually the right choice, while switching regulators are used for higher power or when efficiency of conversion is required. One of the simplest switching-mode power supply circuits is the buck converter, used to create a lower voltage from a higher one, and this is what we use on the Pi.

A history lesson

The BCM2835 processor chip (found on the original Raspberry Pi Model B and B+, as well as on the Zero products) has on-chip power supplies: one switch-mode regulator for the core voltage, as well as a linear one for the LPDDR2 memory supply. This meant that in addition to 5V, we only had to provide 3.3V and 1.8V on the board, which was relatively simple to do using cheap, off-the-shelf parts.

Pi Zero sporting a BCM2835 processor which only needs 2 external switchers (the components clustered behind the camera port)

When we moved to the BCM2836 for Raspberry Pi Model 2 (and subsequently to the BCM2837A1 and B0 for Raspberry Pi 3B and 3B+), the core supply and the on-chip LPDDR2 memory supply were not up to the job of supplying the extra processor cores and larger memory, so we removed them. (We also used the recovered chip area to help fit in the new quad-core ARM processors.) The upshot of this was that we had to supply these power rails externally for the Raspberry Pi 2 and models thereafter. Moreover, we also had to provide circuitry to sequence them correctly in order to control exactly when they power up compared to the other supplies on the board.

Power supply design is tricky (but critical)

Raspberry Pi boards take in 5V from the micro USB socket and have to generate the other required supplies from this. When 5V is first connected, each of these other supplies must ‘start up’, meaning go from ‘off’, or 0V, to their correct voltage in some short period of time. The order of the supplies starting up is often important: commonly, there are structures inside a chip that form diodes between supply rails, and bringing supplies up in the wrong order can sometimes ‘turn on’ these diodes, causing them to conduct, with undesirable consequences. Silicon chips come with a data sheet specifying what supplies (voltages and currents) are needed and whether they need to be low-noise, in what order they must power up (and in some cases down), and sometimes even the rate at which the voltages must power up and down.

A Pi3. Power supply components are clustered bottom left next to the micro USB, middle (above LPDDR2 chip which is on the bottom of the PCB) and above the A/V jack.

In designing the power chain for the Pi 2 and 3, the sequencing was fairly straightforward: power rails power up in order of voltage (5V, 3.3V, 1.8V, 1.2V). However, the supplies were all generated with individual, discrete devices. Therefore, I spent quite a lot of time designing circuitry to control the sequencing — even with some design tricks to reduce component count, quite a few sequencing components are required. More complex systems generally use a Power Management Integrated Circuit (PMIC) with multiple supplies on a single chip, and many different PMIC variants are made by various manufacturers. Since Raspberry Pi 2 days, I was looking for a suitable PMIC to simplify the Pi design, but invariably (and somewhat counter-intuitively) these were always too expensive compared to my discrete solution, usually because they came with more features than needed.

One device to rule them all

It was way back in May 2015 when I first chatted to Peter Coyle of Exar (Exar were bought by MaxLinear in 2017) about power supply products for Raspberry Pi. We didn’t find a product match then, but in June 2016 Peter, along with Tuomas Hollman and Trevor Latham, visited to pitch the possibility of building a custom power management solution for us.

I was initially sceptical that it could be made cheap enough. However, our discussion indicated that if we could tailor the solution to just what we needed, it could be cost-effective. Over the coming weeks and months, we honed a specification we agreed on from the initial sketches we’d made, and Exar thought they could build it for us at the target price.

The chip we designed would contain all the key supplies required for the Pi on one small device in a cheap QFN package, and it would also perform the required sequencing and voltage monitoring. Moreover, the chip would be flexible to allow adjustment of supply voltages from their default values via I2C; the largest supply would be capable of being adjusted quickly to perform the dynamic core voltage changes needed in order to reduce voltage to the processor when it is idling (to save power), and to boost voltage to the processor when running at maximum speed (1.4 GHz). The supplies on the chip would all be generously specified and could deliver significantly more power than those used on the Raspberry Pi 3. All in all, the chip would contain four switching-mode converters and one low-current linear regulator, this last one being low-noise for the audio circuitry.

The MXL7704 chip

The project was a great success: MaxLinear delivered working samples of first silicon at the end of May 2017 (almost exactly a year after we had kicked off the project), and followed through with production quantities in December 2017 in time for the Raspberry Pi 3B+ production ramp.

The team behind the power supply chip on the Raspberry Pi 3 Model B+ (group of six men, two of whom are holding Raspberry Pi boards)

Front row: Roger with the very first Pi 3B+ prototypes and James with a MXL7704 development board hacked to power a Pi 3. Back row left to right: Will Torgerson, Trevor Latham, Peter Coyle, Tuomas Hollman.

The MXL7704 device has been key to reducing Pi board complexity and therefore overall bill of materials cost. Furthermore, by being able to deliver more power when needed, it has also been essential to increasing the speed of the (newly packaged) BCM2837B0 processor on the 3B+ to 1.4GHz. The result is improvements to both the continuous output current to the CPU (from 3A to 4A) and to the transient performance (i.e. the chip has helped to reduce the ‘transient response’, which is the change in supply voltage due to a sudden current spike that occurs when the processor suddenly demands a large current in a few nanoseconds, as modern CPUs tend to do).

With the MXL7704, the power supply circuitry on the 3B+ is now a lot simpler than the Pi 3B design. This new supply also provides the LPDDR2 memory voltage directly from a switching regulator rather than using linear regulators like the Pi 3, thereby improving energy efficiency. This helps to somewhat offset the extra power that the faster Ethernet, wireless networking, and processor consume. A pleasing side effect of using the new chip is the symmetric board layout of the regulators — it’s easy to see the four switching-mode supplies, given away by four similar-looking blobs (three grey and one brownish), which are the inductors.

Close-up of the power supply chip on the Raspberry Pi 3 Model B+

The Pi 3B+ PMIC MXL7704 — pleasingly symmetric

Kudos

It takes a lot of effort to design a new chip from scratch and get it all the way through to production — we are very grateful to the team at MaxLinear for their hard work, dedication, and enthusiasm. We’re also proud to have created something that will not only power Raspberry Pis, but will also be useful for other product designs: it turns out when you have a low-cost and flexible device, it can be used for many things — something we’re fairly familiar with here at Raspberry Pi! For the curious, the product page (including the data sheet) for the MXL7704 chip is here. Particular thanks go to Peter Coyle, Tuomas Hollman, and Trevor Latham, and also to Jon Cronk, who has been our contact in the US and has had to get up early to attend all our conference calls!

The MXL7704 design team celebrating on Pi Day — it takes a lot of people to design a chip!

I hope you liked reading about some of the effort that has gone into creating the new Pi. It’s nice to finally have a chance to tell people about some of the (increasingly complex) technical work that makes building a $35 computer possible — we’re very pleased with the Raspberry Pi 3B+, and we hope you enjoy using it as much as we’ve enjoyed creating it!

The post More power to your Pi appeared first on Raspberry Pi.

Artefacts in the classroom with Museum in a Box

Post Syndicated from Alex Bate original https://www.raspberrypi.org/blog/museum-in-a-box/

Museum in a Box bridges the gap between museums and schools by creating a more hands-on approach to conservation education through 3D printing and digital making.

Artefacts in the classroom with Museum in a Box || Raspberry Pi Stories

Learn more: http://rpf.io/ Subscribe to our YouTube channel: http://rpf.io/ytsub Help us reach a wider audience by translating our video content: http://rpf.io/yttranslate Buy a Raspberry Pi from one of our Approved Resellers: http://rpf.io/ytproducts Find out more about the Raspberry Pi Foundation: Raspberry Pi http://rpf.io/ytrpi Code Club UK http://rpf.io/ytccuk Code Club International http://rpf.io/ytcci CoderDojo http://rpf.io/ytcd Check out our free online training courses: http://rpf.io/ytfl Find your local Raspberry Jam event: http://rpf.io/ytjam Work through our free online projects: http://rpf.io/ytprojects Do you have a question about your Raspberry Pi?

Fantastic collections and where to find them

Large, impressive statues are truly a sight to be seen. Take for example the 2.4m Hoa Hakananai’a at the British Museum. Its tall stature looms over you as you read its plaque to learn of the statue’s journey from Easter Island to the UK under the care of Captain Cook in 1774, and you can’t help but wonder at how it made it here in one piece.

Hoa Hakananai’a Captain Cook British Museum
Hoa Hakananai’a Captain Cook British Museum

But unless you live near a big city where museums are plentiful, you’re unlikely to see the likes of Hoa Hakananai’a in person. Instead, you have to content yourself with online photos or videos of world-famous artefacts.

And that only accounts for the objects that are on display: conservators estimate that only approximately 5 to 10% of museums’ overall collections are actually on show across the globe. The rest is boxed up in storage, inaccessible to the public due to risk of damage, or simply due to lack of space.

Museum in a Box

Museum in a Box aims to “put museum collections and expert knowledge into your hand, wherever you are in the world,” through modern maker practices such as 3D printing and digital making. With the help of the ‘Scan the World’ movement, an “ambitious initiative whose mission is to archive objects of cultural significance using 3D scanning technologies”, the Museum in a Box team has been able to print small, handheld replicas of some of the world’s most recognisable statues and sculptures.

Museum in a Box Raspberry Pi

Each 3D print gets NFC tags so it can initiate audio playback from a Raspberry Pi that sits snugly within the laser-cut housing of a ‘brain box’. Thus the print can talk directly to us through the magic of wireless technology, replacing the dense, dry text of a museum plaque with engaging speech.

Museum in a Box Raspberry Pi

The Museum in a Box team headed by CEO George Oates (featured in the video above) makes use of these 3D-printed figures alongside original artefacts, postcards, and more to bridge the gap between large, crowded, distant museums and local schools. Modeled after the museum handling collections that used to be sent to schools, Museum in a Box is a cheaper, more accessible alternative. Moreover, it not only allows for hands-on learning, but also encourages children to get directly involved by hacking its technology! With NFC technology readily available to the public, students can curate their own collections about their local area, record their own messages, and send their own box-sized museums on to schools in other towns or countries. In this way, Museum in a Box enables students to explore, and expand the reach of, their own histories.

Moving forward

With the technology perfected and interest in the project ever-growing, Museum in a Box has a busy year ahead. Supporting the new ‘Unstacked’ learning initiative, the team will soon be delivering ten boxes to the Smithsonian Libraries. The team has curated two collections specifically for this: an exploration into Asia-Pacific America experiences of migration to the USA throughout the 20th century, and a look into the history of science.

Smithsonian Library Museum in a Box Raspberry Pi

The team will also be making a box for the British Museum to support their Iraq Scheme initiative, and another box will be heading to the V&A to support their See Red programme. While primarily installed in the Lansbury Micro Museum, the box will also take to the road to visit the local Spotlight high school.

Museum in a Box at Raspberry Fields

Lastly, by far the most exciting thing the Museum in a Box team will be doing this year — in our opinion at least — is showcasing at Raspberry Fields! This is our brand-new festival of digital making that’s taking place on 30 June and 1 July 2018 here in Cambridge, UK. Find more information about it and get your ticket here.

The post Artefacts in the classroom with Museum in a Box appeared first on Raspberry Pi.

Community profile: Dave Akerman

Post Syndicated from Alex Bate original https://www.raspberrypi.org/blog/community-profile-dave-akerman/

This column is from The MagPi issue 61. You can download a PDF of the full issue for free, or subscribe to receive the print edition through your letterbox or the digital edition on your tablet. All proceeds from the print and digital editions help the Raspberry Pi Foundation achieve our charitable goals.

The pinned tweet on Dave Akerman’s Twitter account shows a table displaying the various components needed for a high-altitude balloon (HAB) flight. Batteries, leads, a camera and Raspberry Pi, plus an unusually themed payload. The caption reads ‘The Queen, The Duke of York, and my TARDIS”, and sums up Dave’s maker career in a heartbeat.

David Akerman on Twitter

The Queen, The Duke of York, and my TARDIS 🙂 #UKHAS #RaspberryPi

Though writing software for industrial automation pays the bills, the majority of Dave’s time is spent in the world of high-altitude ballooning and the ever-growing community that encompasses it. And, while he makes some money sending business-themed balloons to near space for the likes of Aardman Animations, Confused.com, and the BBC, Dave is best known in the Raspberry Pi community for his use of the small computer in every payload, and his work as a tutor alongside the Foundation’s staff at Skycademy events.

Dave Akerman The MagPi Raspberry Pi Community Profile

Dave continues to help others while breaking records and having a good time exploring the atmosphere.

Dave has dedicated many hours and many, many more miles to assist with the Foundation’s Skycademy programme, helping to explore high-altitude ballooning with educators from across the UK. Using a Raspberry Pi and various other pieces of lightweight tech, Dave and Foundation staff member James Robinson explored the incorporation of high-altitude ballooning into education. Through Skycademy, educators were able to learn new skills and take them to the classroom, setting off their own balloons with their students, and recording the results on Raspberry Pis.

Dave Akerman The MagPi Raspberry Pi Community Profile

Dave’s most recent flight broke a new record. On 13 August 2017, his HAB payload was able to send back the highest images taken by any amateur flight.

But education isn’t the only reason for Dave’s involvement in the HAB community. As with anyone passionate about a specific hobby, Dave strives to break records. The most recent record-breaking flight took place on 13 August 2017, when Dave’s Raspberry Pi Zero HAB sent home the highest images taken by any amateur high-altitude balloon launch: at 43014 metres. No other HAB balloon has provided images from such an altitude, and the lightweight nature of the Pi Zero definitely helped, as Dave went on to mention on Twitter a few days later.

Dave Akerman The MagPi Raspberry Pi Community Profile

Dave is recognised as being the first person to incorporate a Raspberry Pi into a HAB payload, and continues to break records with the help of the little green board. More recently, he’s been able to lighten the load by using the Raspberry Pi Zero.

When the first Pi made its way to near space, Dave tore the computer apart in order to meet the weight restriction. The Pi in the Sky board was created to add the extra features needed for the flight. Since then, the HAT has experienced a few changes.

Dave Akerman The MagPi Raspberry Pi Community Profile

The Pi in the Sky board, created specifically for HAB flights.

Dave first fell in love with high-altitude ballooning after coming across the hobby in a video shared on a photographic forum. With a lifelong interest in space thanks to watching the Moon landings as a boy, plus a talent for electronics and photography, it seems a natural progression for him. Throw in his coding skills from learning to program on a Teletype and it’s no wonder he was ready and eager to take to the skies, so to speak, and capture the curvature of the Earth. What was so great about using the Raspberry Pi was the instant gratification he got from receiving images in real time as they were taken during the flight. While other devices could control a camera and store captured images for later retrieval, thanks to the Pi Dave was able to transmit the files back down to Earth and check the progress of his balloon while attempting to break records with a flight.

Dave Akerman The MagPi Raspberry Pi Community Profile Morph

One of the many commercial flights Dave has organised featured the classic children’s TV character Morph, a creation of the Aardman Animations studio known for Wallace and Gromit. Morph took to the sky twice in his mission to reach near space, and finally succeeded in 2016.

High-altitude ballooning isn’t the only part of Dave’s life that incorporates a Raspberry Pi. Having “lost count” of how many Pis he has running tasks, Dave has also created radio receivers for APRS (ham radio data), ADS-B (aircraft tracking), and OGN (gliders), along with a time-lapse camera in his garden, and he has a few more Pi for tinkering purposes.

The post Community profile: Dave Akerman appeared first on Raspberry Pi.

If YouTube-Ripping Sites Are Illegal, What About Tools That Do a Similar Job?

Post Syndicated from Andy original https://torrentfreak.com/if-youtube-ripping-sites-are-illegal-what-about-tools-that-do-a-similar-job-180407/

In 2016, the International Federation of the Phonographic Industry published research which claimed that half of 16 to 24-year-olds use stream-ripping tools to copy music from sites like YouTube.

While this might not have surprised those who regularly participate in the activity, IFPI said that volumes had become so vast that stream-ripping had overtaken pirate site music downloads. That was a big statement.

Probably not coincidentally, just two weeks later IFPI, RIAA, and BPI announced legal action against the world’s largest YouTube ripping site, YouTube-MP3.

“YTMP3 rapidly and seamlessly removes the audio tracks contained in videos streamed from YouTube that YTMP3’s users access, converts those audio tracks to an MP3 format, copies and stores them on YTMP3’s servers, and then distributes copies of the MP3 audio files from its servers to its users in the United States, enabling its users to download those MP3 files to their computers, tablets, or smartphones,” the complaint read.

The labels sued YouTube-MP3 for direct infringement, contributory infringement, vicarious infringement, inducing others to infringe, plus circumvention of technological measures on top. The case was big and one that would’ve been intriguing to watch play out in court, but that never happened.

A year later in September 2017, YouTubeMP3 settled out of court. No details were made public but YouTube-MP3 apparently took all the blame and the court was asked to rule in favor of the labels on all counts.

This certainly gave the impression that what YouTube-MP3 did was illegal and a strong message was sent out to other companies thinking of offering a similar service. However, other onlookers clearly saw the labels’ lawsuit as something to be studied and learned from.

One of those was the operator of NotMP3downloader.com, a site that offers Free MP3 Recorder for YouTube, a tool offering similar functionality to YouTube-MP3 while supposedly avoiding the same legal pitfalls.

Part of that involves audio being processed on the user’s machine – not by stream-ripping as such – but by stream-recording. A subtle difference perhaps, but the site’s operator thinks it’s important.

“After examining the claims made by the copyright holders against youtube-mp3.org, we identified that the charges were based on the three main points. [None] of them are applicable to our product,” he told TF this week.

The first point involves YouTube-MP3’s acts of conversion, storage and distribution of content it had previously culled from YouTube. Copies of unlicensed tracks were clearly held on its own servers, a potent direct infringement risk.

“We don’t have any servers to download, convert or store a copyrighted or any other content from YouTube. Therefore, we do not violate any law or prohibition implied in this part,” NotMP3downloader’s operator explains.

Then there’s the act of “stream-ripping” itself. While YouTube-MP3 downloaded digital content from YouTube using its own software, NotMP3downloader claims to do things differently.

“Our software doesn’t download any streaming content directly, but only launches a web browser with the video specified by a user. The capturing happens from a local machine’s sound card and doesn’t deal with any content streamed through a network,” its operator notes.

This part also seems quite important. YouTube-MP3 was accused of unlawfully circumventing technological measures implemented by YouTube to prevent people downloading or copying content. By opening up YouTube’s own website and viewing content in the way the site demands, NotMP3downloader says it does not “violate the website’s integrity nor performs direct download of audio or video files.”

Like the Betamax video recorder before it that enabled recording from analog TV, NotMP3downloader enables a user to record a YouTube stream on their local machine. This, its makers claim, means the software is completely legal and defeats all the claims made by the labels in the YouTube-MP3 lawsuit.

“What YouTube does is broadcasting content through the Internet. Thus, there is nothing wrong if users are allowed to watch such content later as they may want,” the NotMP3downloader team explain.

“It is worth noting that in Sony Corp. of America v. United City Studios, Inc. (464 U.S. 417) the United States Supreme Court held that such practice, also known as time-shifting, was lawful representing fair use under the US Copyright Act and causing no substantial harm to the copyright holder.”

While software that can record video and sounds locally are nothing new, the developments in the YouTube-MP3 case and this response from NotMP3downloader raises interesting questions.

We put some of them to none other than former RIAA Executive Vice President, Neil Turkewitz, who now works as President of Turkewitz Consulting Group.

Turkewitz stressed that he doesn’t speak for the industry as a whole or indeed the RIAA but it’s clear that his passion for protecting creators persists. He told us that in this instance, reliance on the Betamax decision is “misplaced”.

“The content is different, the activity is different, and the function is different,” Turkewitz told TF.

“The Sony decision must be understood in its context — the time shifting of audiovisual programming being broadcast from point to multipoint. The making available of content by a point-to-point interactive service like YouTube isn’t broadcasting — or at a minimum, is not a form of broadcasting akin to that considered by the Supreme Court in Sony.

“More fundamentally, broadcasting (right of communication to the public) is one of only several rights implicated by the service. And of course, issues of liability will be informed by considerations of purpose, effect and perceived harm. A court’s judgment will also be affected by whether it views the ‘innovation’ as an attempt to circumvent the requirements of law. The decision of the Supreme Court in ABC v. Aereo is certainly instructive in that regard.”

And there are other issues too. While YouTube itself is yet to take any legal action to deter users from downloading rather than merely streaming content, its terms of service are quite specific and seem to cover all eventualities.

“[Y]ou agree not to access Content or any reason other than your personal, non-commercial use solely as intended through and permitted by the normal functionality of the Service, and solely for Streaming,” YouTube’s ToS reads.

“‘Streaming’ means a contemporaneous digital transmission of the material by YouTube via the Internet to a user operated Internet enabled device in such a manner that the data is intended for real-time viewing and not intended to be downloaded (either permanently or temporarily), copied, stored, or redistributed by the user.

“You shall not copy, reproduce, distribute, transmit, broadcast, display, sell, license, or otherwise exploit any Content for any other purposes without the prior written consent of YouTube or the respective licensors of the Content.”

In this respect, it seems that a user doing anything but real-time streaming of YouTube content is breaching YouTube’s terms of service. The big question then, of course, is whether providing a tool specifically for that purpose represents an infringement of copyright.

The people behind Free MP3 Recorder believe that the “scope of application depends entirely on the end users’ intentions” which seems like a fair argument at first view. But, as usual, copyright law is incredibly complex and there are plenty of opposing views.

We asked the BPI, which took action against YouTubeMP3, for its take on this type of tool. The official response was “No comment” which doesn’t really clarify the position, at least for now.

Needless to say, the Betamax decision – relevant or not – doesn’t apply in the UK. But that only adds more parameters into the mix – and perhaps more opportunities for lawyers to make money arguing for and against tools like this in the future.

Source: TF, for the latest info on copyright, file-sharing, torrent sites and more. We also have VPN reviews, discounts, offers and coupons.

User Authentication Best Practices Checklist

Post Syndicated from Bozho original https://techblog.bozho.net/user-authentication-best-practices-checklist/

User authentication is the functionality that every web application shared. We should have perfected that a long time ago, having implemented it so many times. And yet there are so many mistakes made all the time.

Part of the reason for that is that the list of things that can go wrong is long. You can store passwords incorrectly, you can have a vulnerably password reset functionality, you can expose your session to a CSRF attack, your session can be hijacked, etc. So I’ll try to compile a list of best practices regarding user authentication. OWASP top 10 is always something you should read, every year. But that might not be enough.

So, let’s start. I’ll try to be concise, but I’ll include as much of the related pitfalls as I can cover – e.g. what could go wrong with the user session after they login:

  • Store passwords with bcrypt/scrypt/PBKDF2. No MD5 or SHA, as they are not good for password storing. Long salt (per user) is mandatory (the aforementioned algorithms have it built in). If you don’t and someone gets hold of your database, they’ll be able to extract the passwords of all your users. And then try these passwords on other websites.
  • Use HTTPS. Period. (Otherwise user credentials can leak through unprotected networks). Force HTTPS if user opens a plain-text version.
  • Mark cookies as secure. Makes cookie theft harder.
  • Use CSRF protection (e.g. CSRF one-time tokens that are verified with each request). Frameworks have such functionality built-in.
  • Disallow framing (X-Frame-Options: DENY). Otherwise your website may be included in another website in a hidden iframe and “abused” through javascript.
  • Have a same-origin policy
  • Logout – let your users logout by deleting all cookies and invalidating the session. This makes usage of shared computers safer (yes, users should ideally use private browsing sessions, but not all of them are that savvy)
  • Session expiry – don’t have forever-lasting sessions. If the user closes your website, their session should expire after a while. “A while” may still be a big number depending on the service provided. For ajax-heavy website you can have regular ajax-polling that keeps the session alive while the page stays open.
  • Remember me – implementing “remember me” (on this machine) functionality is actually hard due to the risks of a stolen persistent cookie. Spring-security uses this approach, which I think should be followed if you wish to implement more persistent logins.
  • Forgotten password flow – the forgotten password flow should rely on sending a one-time (or expiring) link to the user and asking for a new password when it’s opened. 0Auth explain it in this post and Postmark gives some best pracitces. How the link is formed is a separate discussion and there are several approaches. Store a password-reset token in the user profile table and then send it as parameter in the link. Or do not store anything in the database, but send a few params: userId:expiresTimestamp:hmac(userId+expiresTimestamp). That way you have expiring links (rather than one-time links). The HMAC relies on a secret key, so the links can’t be spoofed. It seems there’s no consensus, as the OWASP guide has a bit different approach
  • One-time login links – this is an option used by Slack, which sends one-time login links instead of asking users for passwords. It relies on the fact that your email is well guarded and you have access to it all the time. If your service is not accessed to often, you can have that approach instead of (rather than in addition to) passwords.
  • Limit login attempts – brute-force through a web UI should not be possible; therefore you should block login attempts if they become too many. One approach is to just block them based on IP. The other one is to block them based on account attempted. (Spring example here). Which one is better – I don’t know. Both can actually be combined. Instead of fully blocking the attempts, you may add a captcha after, say, the 5th attempt. But don’t add the captcha for the first attempt – it is bad user experience.
  • Don’t leak information through error messages – you shouldn’t allow attackers to figure out if an email is registered or not. If an email is not found, upon login report just “Incorrect credentials”. On passwords reset, it may be something like “If your email is registered, you should have received a password reset email”. This is often at odds with usability – people don’t often remember the email they used to register, and the ability to check a number of them before getting in might be important. So this rule is not absolute, though it’s desirable, especially for more critical systems.
  • Make sure you use JWT only if it’s really necessary and be careful of the pitfalls.
  • Consider using a 3rd party authentication – OpenID Connect, OAuth by Google/Facebook/Twitter (but be careful with OAuth flaws as well). There’s an associated risk with relying on a 3rd party identity provider, and you still have to manage cookies, logout, etc., but some of the authentication aspects are simplified.
  • For high-risk or sensitive applications use 2-factor authentication. There’s a caveat with Google Authenticator though – if you lose your phone, you lose your accounts (unless there’s a manual process to restore it). That’s why Authy seems like a good solution for storing 2FA keys.

I’m sure I’m missing something. And you see it’s complicated. Sadly we’re still at the point where the most common functionality – authenticating users – is so tricky and cumbersome, that you almost always get at least some of it wrong.

The post User Authentication Best Practices Checklist appeared first on Bozho's tech blog.

Safety first: a Raspberry Pi safety helmet

Post Syndicated from Alex Bate original https://www.raspberrypi.org/blog/safety-helmet/

Jennifer Fox is back, this time with a Raspberry Pi Zero–controlled impact force monitor that will notify you if your collision is a worth a trip to the doctor.

Make an Impact Force Monitor!

Check out my latest Hacker in Residence project for SparkFun Electronics: the Helmet Guardian! It’s a Pi Zero powered impact force monitor that turns on an LED if your head/body experiences a potentially dangerous impact. Install in your sports helmets, bicycle, or car to keep track of impact and inform you when it’s time to visit the doctor.

Concussion

We’ve all knocked our heads at least once in our lives, maybe due to tripping over a loose paving slab, or to falling off a bike, or to walking into the corner of the overhead cupboard door for the third time this week — will I ever learn?! More often than not, even when we’re seeing stars, we brush off the accident and continue with our day, oblivious to the long-term damage we may be doing.

Force of impact

After some thorough research, Jennifer Fox, founder of FoxBot Industries, concluded that forces of 4 to 6 G sustained for more than a few seconds are dangerous to the human body. With this in mind, she decided to use a Raspberry Pi Zero W and an accelerometer to create helmet with an impact force monitor that notifies its wearer if this level of G-force has been met.

Jennifer Fox Raspberry Pi Impact Force Monitor

Obviously, if you do have a serious fall, you should always seek medical advice. This project is an example of how affordable technology can be used to create medical and citizen science builds, and not a replacement for professional medical services.

Setting up the impact monitor

Jennifer’s monitor requires only a few pieces of tech: a Zero W, an accelerometer and breakout board, a rechargeable USB battery, and an LED, plus the standard wires and resistors for these components.

After installing Raspbian, Jennifer enabled SSH and I2C on the Zero W to make it run headlessly, and then accessed it from a laptop. This allows her to control the Pi without physically connecting to it, and it makes for a wireless finished project.

Jen wired the Pi to the accelerometer breakout board and LED as shown in the schematic below.

Jennifer Fox Raspberry Pi Impact Force Monitor

The LED acts as a signal of significant impacts, turning on when the G-force threshold is reached, and not turning off again until the program is reset.

Jennifer Fox Raspberry Pi Impact Force Monitor

Make your own and more

Jennifer’s full code for the impact monitor is on GitHub, and she’s put together a complete tutorial on SparkFun’s website.

For more tutorials from Jennifer Fox, such as her ‘Bark Back’ IoT Pet Monitor, be sure to follow her on YouTube. And for similar projects, check out Matt’s smart bike light and Amelia Day’s physical therapy soccer ball.

The post Safety first: a Raspberry Pi safety helmet appeared first on Raspberry Pi.

American Public Television Embraces the Cloud — And the Future

Post Syndicated from Andy Klein original https://www.backblaze.com/blog/american-public-television-embraces-the-cloud-and-the-future/

American Public Television website

American Public Television was like many organizations that have been around for a while. They were entrenched using an older technology — in their case, tape storage and distribution — that once met their needs but was limiting their productivity and preventing them from effectively collaborating with their many media partners. APT’s VP of Technology knew that he needed to move into the future and embrace cloud storage to keep APT ahead of the game.
Since 1961, American Public Television (APT) has been a leading distributor of groundbreaking, high-quality, top-rated programming to the nation’s public television stations. Gerry Field is the Vice President of Technology at APT and is responsible for delivering their extensive program catalog to 350+ public television stations nationwide.

In the time since Gerry  joined APT in 2007, the industry has been in digital overdrive. During that time APT has continued to acquire and distribute the best in public television programming to their technically diverse subscribers.

This created two challenges for Gerry. First, new technology and format proliferation were driving dramatic increases in digital storage. Second, many of APT’s subscribers struggled to keep up with the rapidly changing industry. While some subscribers had state-of-the-art satellite systems to receive programming, others had to wait for the post office to drop off programs recorded on tape weeks earlier. With no slowdown on the horizon of innovation in the industry, Gerry knew that his storage and distribution systems would reach a crossroads in no time at all.

American Public Television logo

Living the tape paradigm

The digital media industry is only a few years removed from its film, and later videotape, roots. Tape was the input and the output of the industry for many years. As a consequence, the tools and workflows used by the industry were built and designed to work with tape. Over time, the “file” slowly replaced the tape as the object to be captured, edited, stored and distributed. Trouble was, many of the systems and more importantly workflows were based on processing tape, and these have proven to be hard to change.

At APT, Gerry realized the limits of the tape paradigm and began looking for technologies and solutions that enabled workflows based on file and object based storage and distribution.

Thinking file based storage and distribution

For data (digital media) storage, APT, like everyone else, started by installing onsite storage servers. As the amount of digital data grew, more storage was added. In addition, APT was expanding its distribution footprint by creating or partnering with distribution channels such as CreateTV and APT Worldwide. This dramatically increased the number of programming formats and the amount of data that had to be stored. As a consequence, updating, maintaining, and managing the APT storage systems was becoming a major challenge and a major resource hog.

APT Online

Knowing that his in-house storage system was only going to cost more time and money, Gerry decided it was time to look at cloud storage. But that wasn’t the only reason he looked at the cloud. While most people consider cloud storage as just a place to back up and archive files, Gerry was envisioning how the ubiquity of the cloud could help solve his distribution challenges. The trouble was the price of cloud storage from vendors like Amazon S3 and Microsoft Azure was a non-starter, especially for a non-profit. Then Gerry came across Backblaze. B2 Cloud Storage service met all of his performance requirements, and at $0.005/GB/month for storage and $0.01/GB for downloads it was nearly 75% less than S3 or Azure.

Gerry did the math and found that he could economically incorporate B2 Cloud Storage into his IT portfolio, using it for both program submission and for active storage and archiving of the APT programs. In addition, B2 now gives him the foundation necessary to receive and distribute programming content over the Internet. This is especially useful for organizations that can’t conveniently access satellite distribution systems. Not to mention downloading from the cloud is much faster than sending a tape through the mail.

Adding B2 Cloud Storage to their infrastructure has helped American Public Television address two key challenges. First, they now have “unlimited” storage in the cloud without having to add any hardware. In addition, with B2, they only pay for the storage they use. That means they don’t have to buy storage upfront trying to match the maximum amount of storage they’ll ever need. Second, by using B2 as a distribution source for their programming APT subscribers, especially the smaller and remote ones, can get content faster and more reliably without having to perform costly upgrades to their infrastructure.

The road ahead

As APT gets used to their file based infrastructure and workflow, there are a number of cost saving and income generating ideas they are pondering which are now worth considering. Here are a few:

Program Submissions — New content can be uploaded from anywhere using a web browser, an Internet connection, and a login. For example, a producer in Cambodia can upload their film to B2. From there the film is downloaded to an in-house system where it is processed and transcoded using compute. The finished film is added to the APT catalog and added to B2. Once there, the program is instantly available for subscribers to order and download.

“The affordability and performance of Backblaze B2 is what allowed us to make the B2 cloud part of the APT data storage and distribution strategy into the future.” — Gerry Field

Easier Previews — At any time, work in process or finished programs can be made available for download from the B2 cloud. One place this could be useful is where a subscriber needs to review a program to comply with local policies and practices before airing. In the old system, each “one-off” was a time consuming manual process.

Instant Subscriptions — There are many organizations such as schools and businesses that want to use just one episode of a desired show. With an e-commerce based website, current or even archived programming kept in B2 could be available to download or stream for a minimal charge.

At APT there were multiple technologies needed to make their file-based infrastructure work, but as Gerry notes, having an affordable, trustworthy, cloud storage service like B2 is one of the critical building blocks needed to make everything work together.

The post American Public Television Embraces the Cloud — And the Future appeared first on Backblaze Blog | Cloud Storage & Cloud Backup.

Tinkernut’s hidden Coke bottle spy cam

Post Syndicated from Alex Bate original https://www.raspberrypi.org/blog/tinkernuts-spy-cam/

Go undercover and keep an eye on your stuff with this brilliant secret Coke bottle spy cam from Tinkernut!

Secret Coke Bottle SPY CAM! – Weekend Hacker #1803

SPECIAL NOTE*** THE FULL TUTORIAL WILL BE AVAILABLE NEXT WEEK April Fools! What a terrible day. So many pranks. You can’t believe anything you read. People invading your space. The mental and physical anguish of enduring the day. It’s time to fight back! Let’s catch the perps in action by making a device that always watches.

Keeping tabs

A Raspberry Pi Zero W, a small camera, and a rechargeable Lithium Polymer (LiPo) battery constitute the bulk of this project’s tech. A pair of 3D-printed parts, and gelatine-solidified Coke Zero make up the fake fizzy body.

Tinkernut Coke bottle Raspberry Pi Spy Cam

“So let’s make this video as short as possible and just buy a cheap pre-made spy cam off of Amazon. Just kidding,” Tinkernut jokes in the tutorial video for the project, before going through the step-by-step process of using the Raspberry Pi to “DIY this the right way”.

After accessing the Zero W from his laptop via SSH, Tinkernut opted for using the rpi_camera_surveillance_system Python script written by GitHub user RuiSantosdotme to control the spy cam. Luckily, this meant no additional library setup, and basically no lag on the video feed.

What we want to do is create a script that activates the camera and serves it to a web page so that we can access it from any web browser. There are plenty of different ways to do this (Motion, Raspivid, etc), but I found a simple Python script that does everything I need it to do and doesn’t require any extra software or libraries to install. The best thing about it is that the lag time is practically unnoticeable.

With the code in place, every boot-up of the Raspberry Pi automatically launches both the script and a web page of the live video, allowing for constant monitoring of potential sneaks and thieves.

Tinkernut Coke bottle Raspberry Pi Spy Cam

The projects is powered by a 1500mAh LiPo battery and the Adafruit LiPo charger. It also includes a simple on/off switch, which Tinkernut wired to the charger and the Pi’s PP1 and PP6 connector pads.

Tinkernut Coke bottle Raspberry Pi Spy Cam

Tinkernut decided to use a Coke Zero bottle for the build, incorporating 3D-printed parts to house the Pi, and a mix of Coke and gelatine to create a realistic-looking filling for the bottle. However, the setup can be transferred to pretty much any hollow item in your home, say, a cookie jar or a cracker box. So get creative and get spying!

A complete spy cam how-to

If you’d like to make your own secret spy cam, you can find a tutorial for Tinkernut’s build at hackster.io, or follow along with his video below. Also make sure to subscribe his YouTube channel to be updated on all his newest builds — they’re rather splendid.

BUILD: Coke Bottle SPY CAM! – Tinkernut Workbench

Learn how to take a regular Coke Zero bottle, cram a Raspberry Pi and webcam inside of it, and have it still look like a regular Coke Zero bottle. Why would you want to do this? To spy on those irritating April Fooligans!!!

And if you’re interested in more spy-themed digital making projects, check out our complete 007 how-to guide for links to tutorials such as our Sense HAT puzzle box, Parent detector, and Laser tripwire.

The post Tinkernut’s hidden Coke bottle spy cam appeared first on Raspberry Pi.

New – Encryption of Data in Transit for Amazon EFS

Post Syndicated from Jeff Barr original https://aws.amazon.com/blogs/aws/new-encryption-of-data-in-transit-for-amazon-efs/

Amazon Elastic File System was designed to be the file system of choice for cloud-native applications that require shared access to file-based storage. We launched EFS in mid-2016 and have added several important features since then including on-premises access via Direct Connect and encryption of data at rest. We have also made EFS available in additional AWS Regions, most recently US West (Northern California). As was the case with EFS itself, these enhancements were made in response to customer feedback, and reflect our desire to serve an ever-widening customer base.

Encryption in Transit
Today we are making EFS even more useful with the addition of support for encryption of data in transit. When used in conjunction with the existing support for encryption of data at rest, you now have the ability to protect your stored files using a defense-in-depth security strategy.

In order to make it easy for you to implement encryption in transit, we are also releasing an EFS mount helper. The helper (available in source code and RPM form) takes care of setting up a TLS tunnel to EFS, and also allows you to mount file systems by ID. The two features are independent; you can use the helper to mount file systems by ID even if you don’t make use of encryption in transit. The helper also supplies a recommended set of default options to the actual mount command.

Setting up Encryption
I start by installing the EFS mount helper on my Amazon Linux instance:

$ sudo yum install -y amazon-efs-utils

Next, I visit the EFS Console and capture the file system ID:

Then I specify the ID (and the TLS option) to mount the file system:

$ sudo mount -t efs fs-92758f7b -o tls /mnt/efs

And that’s it! The encryption is transparent and has an almost negligible impact on data transfer speed.

Available Now
You can start using encryption in transit today in all AWS Regions where EFS is available.

The mount helper is available for Amazon Linux. If you are running another distribution of Linux you will need to clone the GitHub repo and build your own RPM, as described in the README.

Jeff;

Amazon Translate Now Generally Available

Post Syndicated from Randall Hunt original https://aws.amazon.com/blogs/aws/amazon-translate-now-generally-available/


Today we’re excited to make Amazon Translate generally available. Late last year at AWS re:Invent my colleague Tara Walker wrote about a preview of a new AI service, Amazon Translate. Starting today you can access Amazon Translate in US East (N. Virginia), US East (Ohio), US West (Oregon), and EU (Ireland) with a 2 million character monthly free tier for the first 12 months and $15 per million characters after that. There are a number of new features available in GA: automatic source language inference, Amazon CloudWatch support, and up to 5000 characters in a single TranslateText call. Let’s take a quick look at the service in general availability.

Amazon Translate New Features

Since Tara’s post already covered the basics of the service I want to point out some of the new features of the service released today. Let’s start with a code sample:

import boto3
translate = boto3.client("translate")
resp = translate.translate_text(
    Text="🇫🇷Je suis très excité pour Amazon Traduire🇫🇷",
    SourceLanguageCode="auto",
    TargetLanguageCode="en"
)
print(resp['TranslatedText'])

Since I have specified my source language as auto, Amazon Translate will call Amazon Comprehend on my behalf to determine the source language used in this text. If you couldn’t guess it, we’re writing some French and the output is 🇫🇷I'm very excited about Amazon Translate 🇫🇷. You’ll notice that our emojis are preserved in the output text which is definitely a bonus feature for Millennials like me.

The Translate console is a great way to get started and see some sample response.

Translate is extremely easy to use in AWS Lambda functions which allows you to use it with almost any AWS service. There are a number of examples in the Translate documentation showing how to do everything from translate a web page to a Amazon DynamoDB table. Paired with other ML services like Amazon Comprehend and [transcribe] you can build everything from closed captioning to real-time chat translation to a robust text analysis pipeline for call centers transcriptions and other textual data.

New Languages Coming Soon

Today, Amazon Translate allows you to translate text to or from English, to any of the following languages: Arabic, Chinese (Simplified), French, German, Portuguese, and Spanish. We’ve announced support for additional languages coming soon: Japanese (go JAWSUG), Russian, Italian, Chinese (Traditional), Turkish, and Czech.

Amazon Translate can also be used to increase professional translator efficiency, and reduce costs and turnaround times for their clients. We’ve already partnered with a number of Language Service Providers (LSPs) to offer their customers end-to-end translation services at a lower cost by allowing Amazon Translate to produce a high-quality draft translation that’s then edited by the LSP for a guaranteed human quality result.

I’m excited to see what applications our customers are able to build with high quality machine translation just one API call away.

Randall

Here, have some videos!

Post Syndicated from Alex Bate original https://www.raspberrypi.org/blog/easter-monday-2018/

Today is Easter Monday and as such, the drawbridge is up at Pi Towers. So while we spend time with familytoo much chocolate…family and chocolate, here are some great Pi-themed videos from members of our community. Enjoy!

Eggies live stream!

Bluebird Birdhouse

Raspberry Pi and NoIR camera installed in roof of Bluebird house with IR LEDs. Currently 5 eggs being incubated.

Doctor Who TARDIS doorbell

Raspberry pi Tardis

Raspberry pi Tardis doorbell

Google AIY with Tech-nic-Allie

Ok Google! AIY Voice Kit MagPi

Allie assembles this Google Home kit, that runs on a Raspberry Pi, then uses the Google Home to test her space knowledge with a little trivia game. Stay tuned at the end to see a few printed cases you can use instead of the cardboard.

Buying a Coke with a Raspberry Pi rover

Buy a coke with raspberry pi rover

Mission date : March 26 2018 My raspberry pi project. I use LTE modem to connect internet. python programming. raspberry pi controls pi cam, 2servo motor, 2dc motor. (This video recoded with gopro to upload youtube. Actually I controll this rover by pi cam.

Raspberry Pi security camera

🔴How to Make a Smart Security Camera With Movement Notification – Under 60$

I built my first security camera with motion-control connected to my raspberry pi with MotionEyeOS. What you need: *Raspberry pi 3 (I prefer pi 3) *Any Webcam or raspberry pi cam *Mirco SD card (min 8gb) Useful links : Download the motioneyeOS software here ➜ https://github.com/ccrisan/motioneyeos/releases How to do it: – Download motioneyeOS to your empty SD card (I mounted it via Etcher ) – I always do a sudo apt-upgrade & sudo apt-update on my projects, in the Pi.

Happy Easter!

The post Here, have some videos! appeared first on Raspberry Pi.

A geometric Rust adventure

Post Syndicated from Eevee original https://eev.ee/blog/2018/03/30/a-geometric-rust-adventure/

Hi. Yes. Sorry. I’ve been trying to write this post for ages, but I’ve also been working on a huge writing project, and apparently I have a very limited amount of writing mana at my disposal. I think this is supposed to be a Patreon reward from January. My bad. I hope it’s super great to make up for the wait!

I recently ported some math code from C++ to Rust in an attempt to do a cool thing with Doom. Here is my story.

The problem

I presented it recently as a conundrum (spoilers: I solved it!), but most of those details are unimportant.

The short version is: I have some shapes. I want to find their intersection.

Really, I want more than that: I want to drop them all on a canvas, intersect everything with everything, and pluck out all the resulting polygons. The input is a set of cookie cutters, and I want to press them all down on the same sheet of dough and figure out what all the resulting contiguous pieces are. And I want to know which cookie cutter(s) each piece came from.

But intersection is a good start.

Example of the goal.  Given two squares that overlap at their corners, I want to find the small overlap piece, plus the two L-shaped pieces left over from each square

I’m carefully referring to the input as shapes rather than polygons, because each one could be a completely arbitrary collection of lines. Obviously there’s not much you can do with shapes that aren’t even closed, but at the very least, I need to handle concavity and multiple disconnected polygons that together are considered a single input.

This is a non-trivial problem with a lot of edge cases, and offhand I don’t know how to solve it robustly. I’m not too eager to go figure it out from scratch, so I went hunting for something I could build from.

(Infuriatingly enough, I can just dump all the shapes out in an SVG file and any SVG viewer can immediately solve the problem, but that doesn’t quite help me. Though I have had a few people suggest I just rasterize the whole damn problem, and after all this, I’m starting to think they may have a point.)

Alas, I couldn’t find a Rust library for doing this. I had a hard time finding any library for doing this that wasn’t a massive fully-featured geometry engine. (I could’ve used that, but I wanted to avoid non-Rust dependencies if possible, since distributing software is already enough of a nightmare.)

A Twitter follower directed me towards a paper that described how to do very nearly what I wanted and nothing else: “A simple algorithm for Boolean operations on polygons” by F. Martínez (2013). Being an academic paper, it’s trapped in paywall hell; sorry about that. (And as I understand it, none of the money you’d pay to get the paper would even go to the authors? Is that right? What a horrible and predatory system for discovering and disseminating knowledge.)

The paper isn’t especially long, but it does describe an awful lot of subtle details and is mostly written in terms of its own reference implementation. Rather than write my own implementation based solely on the paper, I decided to try porting the reference implementation from C++ to Rust.

And so I fell down the rabbit hole.

The basic algorithm

Thankfully, the author has published the sample code on his own website, if you want to follow along. (It’s the bottom link; the same author has, confusingly, published two papers on the same topic with similar titles, four years apart.)

If not, let me describe the algorithm and how the code is generally laid out. The algorithm itself is based on a sweep line, where a vertical line passes across the plane and ✨ does stuff ✨ as it encounters various objects. This implementation has no physical line; instead, it keeps track of which segments from the original polygon would be intersecting the sweep line, which is all we really care about.

A vertical line is passing rightwards over a couple intersecting shapes.  The line current intersects two of the shapes' sides, and these two sides are the "sweep list"

The code is all bundled inside a class with only a single public method, run, because… that’s… more object-oriented, I guess. There are several helper methods, and state is stored in some attributes. A rough outline of run is:

  1. Run through all the line segments in both input polygons. For each one, generate two SweepEvents (one for each endpoint) and add them to a std::deque for storage.

    Add pointers to the two SweepEvents to a std::priority_queue, the event queue. This queue uses a custom comparator to order the events from left to right, so the top element is always the leftmost endpoint.

  2. Loop over the event queue (where an “event” means the sweep line passed over the left or right end of a segment). Encountering a left endpoint means the sweep line is newly touching that segment, so add it to a std::set called the sweep list. An important point is that std::set is ordered, and the sweep list uses a comparator that keeps segments in order vertically.

    Encountering a right endpoint means the sweep line is leaving a segment, so that segment is removed from the sweep list.

  3. When a segment is added to the sweep list, it may have up to two neighbors: the segment above it and the segment below it. Call possibleIntersection to check whether it intersects either of those neighbors. (This is nearly sufficient to find all intersections, which is neat.)

  4. If possibleIntersection detects an intersection, it will split each segment into two pieces then and there. The old segment is shortened in-place to become the left part, and a new segment is created for the right part. The new endpoints at the point of intersection are added to the event queue.

  5. Some bookkeeping is done along the way to track which original polygons each segment is inside, and eventually the segments are reconstructed into new polygons.

Hopefully that’s enough to follow along. It took me an inordinately long time to tease this out. The comments aren’t especially helpful.

1
    std::deque<SweepEvent> eventHolder;    // It holds the events generated during the computation of the boolean operation

Syntax and basic semantics

The first step was to get something that rustc could at least parse, which meant translating C++ syntax to Rust syntax.

This was surprisingly straightforward! C++ classes become Rust structs. (There was no inheritance here, thankfully.) All the method declarations go away. Method implementations only need to be indented and wrapped in impl.

I did encounter some unnecessarily obtuse uses of the ternary operator:

1
(prevprev != sl.begin()) ? --prevprev : prevprev = sl.end();

Rust doesn’t have a ternary — you can use a regular if block as an expression — so I expanded these out.

C++ switch blocks become Rust match blocks, but otherwise function basically the same. Rust’s enums are scoped (hallelujah), so I had to explicitly spell out where enum values came from.

The only really annoying part was changing function signatures; C++ types don’t look much at all like Rust types, save for the use of angle brackets. Rust also doesn’t pass by implicit reference, so I needed to sprinkle a few &s around.

I would’ve had a much harder time here if this code had relied on any remotely esoteric C++ functionality, but thankfully it stuck to pretty vanilla features.

Language conventions

This is a geometry problem, so the sample code unsurprisingly has its own home-grown point type. Rather than port that type to Rust, I opted to use the popular euclid crate. Not only is it code I didn’t have to write, but it already does several things that the C++ code was doing by hand inline, like dot products and cross products. And all I had to do was add one line to Cargo.toml to use it! I have no idea how anyone writes C or C++ without a package manager.

The C++ code used getters, i.e. point.x (). I’m not a huge fan of getters, though I do still appreciate the need for them in lowish-level systems languages where you want to future-proof your API and the language wants to keep a clear distinction between attribute access and method calls. But this is a point, which is nothing more than two of the same numeric type glued together; what possible future logic might you add to an accessor? The euclid authors appear to side with me and leave the coordinates as public fields, so I took great joy in removing all the superfluous parentheses.

Polygons are represented with a Polygon class, which has some number of Contours. A contour is a single contiguous loop. Something you’d usually think of as a polygon would only have one, but a shape with a hole would have two: one for the outside, one for the inside. The weird part of this arrangement was that Polygon implemented nearly the entire STL container interface, then waffled between using it and not using it throughout the rest of the code. Rust lets anything in the same module access non-public fields, so I just skipped all that and used polygon.contours directly. Hell, I think I made contours public.

Finally, the SweepEvent type has a pol field that’s declared as an enum PolygonType (either SUBJECT or CLIPPING, to indicate which of the two inputs it is), but then some other code uses the same field as a numeric index into a polygon’s contours. Boy I sure do love static typing where everything’s a goddamn integer. I wanted to extend the algorithm to work on arbitrarily many input polygons anyway, so I scrapped the enum and this became a usize.


Then I got to all the uses of STL. I have only a passing familiarity with the C++ standard library, and this code actually made modest use of it, which caused some fun days-long misunderstandings.

As mentioned, the SweepEvents are stored in a std::deque, which is never read from. It took me a little thinking to realize that the deque was being used as an arena: it’s the canonical home for the structs so pointers to them can be tossed around freely. (It can’t be a std::vector, because that could reallocate and invalidate all the pointers; std::deque is probably a doubly-linked list, and guarantees no reallocation.)

Rust’s standard library does have a doubly-linked list type, but I knew I’d run into ownership hell here later anyway, so I think I replaced it with a Rust Vec to start with. It won’t compile either way, so whatever. We’ll get back to this in a moment.

The list of segments currently intersecting the sweep line is stored in a std::set. That type is explicitly ordered, which I’m very glad I knew already. Rust has two set types, HashSet and BTreeSet; unsurprisingly, the former is unordered and the latter is ordered. Dropping in BTreeSet and fixing some method names got me 90% of the way there.

Which brought me to the other 90%. See, the C++ code also relies on finding nodes adjacent to the node that was just inserted, via STL iterators.

1
2
3
next = prev = se->posSL = it = sl.insert(se).first;
(prev != sl.begin()) ? --prev : prev = sl.end();
++next;

I freely admit I’m bad at C++, but this seems like something that could’ve used… I don’t know, 1 comment. Or variable names more than two letters long. What it actually does is:

  1. Add the current sweep event (se) to the sweep list (sl), which returns a pair whose first element is an iterator pointing at the just-inserted event.

  2. Copies that iterator to several other variables, including prev and next.

  3. If the event was inserted at the beginning of the sweep list, set prev to the sweep list’s end iterator, which in C++ is a legal-but-invalid iterator meaning “the space after the end” or something. This is checked for in later code, to see if there is a previous event to look at. Otherwise, decrement prev, so it’s now pointing at the event immediately before the inserted one.

  4. Increment next normally. If the inserted event is last, then this will bump next to the end iterator anyway.

In other words, I need to get the previous and next elements from a BTreeSet. Rust does have bidirectional iterators, which BTreeSet supports… but BTreeSet::insert only returns a bool telling me whether or not anything was inserted, not the position. I came up with this:

1
2
3
let mut maybe_below = active_segments.range(..segment).last().map(|v| *v);
let mut maybe_above = active_segments.range(segment..).next().map(|v| *v);
active_segments.insert(segment);

The range method returns an iterator over a subset of the tree. The .. syntax makes a range (where the right endpoint is exclusive), so ..segment finds the part of the tree before the new segment, and segment.. finds the part of the tree after it. (The latter would start with the segment itself, except I haven’t inserted it yet, so it’s not actually there.)

Then the standard next() and last() methods on bidirectional iterators find me the element I actually want. But the iterator might be empty, so they both return an Option. Also, iterators tend to return references to their contents, but in this case the contents are already references, and I don’t want a double reference, so the map call dereferences one layer — but only if the Option contains a value. Phew!

This is slightly less efficient than the C++ code, since it has to look up where segment goes three times rather than just one. I might be able to get it down to two with some more clever finagling of the iterator, but microsopic performance considerations were a low priority here.

Finally, the event queue uses a std::priority_queue to keep events in a desired order and efficiently pop the next one off the top.

Except priority queues act like heaps, where the greatest (i.e., last) item is made accessible.

Sorting out sorting

C++ comparison functions return true to indicate that the first argument is less than the second argument. Sweep events occur from left to right. You generally implement sorts so that the first thing comes, erm, first.

But sweep events go in a priority queue, and priority queues surface the last item, not the first. This C++ code handled this minor wrinkle by implementing its comparison backwards.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
struct SweepEventComp : public std::binary_function<SweepEvent, SweepEvent, bool> { // for sorting sweep events
// Compare two sweep events
// Return true means that e1 is placed at the event queue after e2, i.e,, e1 is processed by the algorithm after e2
bool operator() (const SweepEvent* e1, const SweepEvent* e2)
{
    if (e1->point.x () > e2->point.x ()) // Different x-coordinate
        return true;
    if (e2->point.x () > e1->point.x ()) // Different x-coordinate
        return false;
    if (e1->point.y () != e2->point.y ()) // Different points, but same x-coordinate. The event with lower y-coordinate is processed first
        return e1->point.y () > e2->point.y ();
    if (e1->left != e2->left) // Same point, but one is a left endpoint and the other a right endpoint. The right endpoint is processed first
        return e1->left;
    // Same point, both events are left endpoints or both are right endpoints.
    if (signedArea (e1->point, e1->otherEvent->point, e2->otherEvent->point) != 0) // not collinear
        return e1->above (e2->otherEvent->point); // the event associate to the bottom segment is processed first
    return e1->pol > e2->pol;
}
};

Maybe it’s just me, but I had a hell of a time just figuring out what problem this was even trying to solve. I still have to reread it several times whenever I look at it, to make sure I’m getting the right things backwards.

Making this even more ridiculous is that there’s a second implementation of this same sort, with the same name, in another file — and that one’s implemented forwards. And doesn’t use a tiebreaker. I don’t entirely understand how this even compiles, but it does!

I painstakingly translated this forwards to Rust. Unlike the STL, Rust doesn’t take custom comparators for its containers, so I had to implement ordering on the types themselves (which makes sense, anyway). I wrapped everything in the priority queue in a Reverse, which does what it sounds like.

I’m fairly pleased with Rust’s ordering model. Most of the work is done in Ord, a trait with a cmp() method returning an Ordering (one of Less, Equal, and Greater). No magic numbers, no need to implement all six ordering methods! It’s incredible. Ordering even has some handy methods on it, so the usual case of “order by this, then by this” can be written as:

1
2
return self.point().x.cmp(&other.point().x)
    .then(self.point().y.cmp(&other.point().y));

Well. Just kidding! It’s not quite that easy. You see, the points here are composed of floats, and floats have the fun property that not all of them are comparable. Specifically, NaN is not less than, greater than, or equal to anything else, including itself. So IEEE 754 float ordering cannot be expressed with Ord. Unless you want to just make up an answer for NaN, but Rust doesn’t tend to do that.

Rust’s float types thus implement the weaker PartialOrd, whose method returns an Option<Ordering> instead. That makes the above example slightly uglier:

1
2
return self.point().x.partial_cmp(&other.point().x).unwrap()
    .then(self.point().y.partial_cmp(&other.point().y).unwrap())

Also, since I use unwrap() here, this code will panic and take the whole program down if the points are infinite or NaN. Don’t do that.

This caused some minor inconveniences in other places; for example, the general-purpose cmp::min() doesn’t work on floats, because it requires an Ord-erable type. Thankfully there’s a f64::min(), which handles a NaN by returning the other argument.

(Cool story: for the longest time I had this code using f32s. I’m used to translating int to “32 bits”, and apparently that instinct kicked in for floats as well, even floats spelled double.)

The only other sorting adventure was this:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
// Due to overlapping edges the resultEvents array can be not wholly sorted
bool sorted = false;
while (!sorted) {
    sorted = true;
    for (unsigned int i = 0; i < resultEvents.size (); ++i) {
        if (i + 1 < resultEvents.size () && sec (resultEvents[i], resultEvents[i+1])) {
            std::swap (resultEvents[i], resultEvents[i+1]);
            sorted = false;
        }
    }
}

(I originally misread this comment as saying “the array cannot be wholly sorted” and had no idea why that would be the case, or why the author would then immediately attempt to bubble sort it.)

I’m still not sure why this uses an ad-hoc sort instead of std::sort. But I’m used to taking for granted that general-purpose sorting implementations are tuned to work well for almost-sorted data, like Python’s. Maybe C++ is untrustworthy here, for some reason. I replaced it with a call to .sort() and all seemed fine.

Phew! We’re getting there. Finally, my code appears to type-check.

But now I see storm clouds gathering on the horizon.

Ownership hell

I have a problem. I somehow run into this problem every single time I use Rust. The solutions are never especially satisfying, and all the hacks I might use if forced to write C++ turn out to be unsound, which is even more annoying because rustc is just sitting there with this smug “I told you so expression” and—

The problem is ownership, which Rust is fundamentally built on. Any given value must have exactly one owner, and Rust must be able to statically convince itself that:

  1. No reference to a value outlives that value.
  2. If a mutable reference to a value exists, no other references to that value exist at the same time.

This is the core of Rust. It guarantees at compile time that you cannot lose pointers to allocated memory, you cannot double-free, you cannot have dangling pointers.

It also completely thwarts a lot of approaches you might be inclined to take if you come from managed languages (where who cares, the GC will take care of it) or C++ (where you just throw pointers everywhere and hope for the best apparently).

For example, pointer loops are impossible. Rust’s understanding of ownership and lifetimes is hierarchical, and it simply cannot express loops. (Rust’s own doubly-linked list type uses raw pointers and unsafe code under the hood, where “unsafe” is an escape hatch for the usual ownership rules. Since I only recently realized that pointers to the inside of a mutable Vec are a bad idea, I figure I should probably not be writing unsafe code myself.)

This throws a few wrenches in the works.

Problem the first: pointer loops

I immediately ran into trouble with the SweepEvent struct itself. A SweepEvent pulls double duty: it represents one endpoint of a segment, but each left endpoint also handles bookkeeping for the segment itself — which means that most of the fields on a right endpoint are unused. Also, and more importantly, each SweepEvent has a pointer to the corresponding SweepEvent at the other end of the same segment. So a pair of SweepEvents point to each other.

Rust frowns upon this. In retrospect, I think I could’ve kept it working, but I also think I’m wrong about that.

My first step was to wrench SweepEvent apart. I moved all of the segment-stuff (which is virtually all of it) into a single SweepSegment type, and then populated the event queue with a SweepEndpoint tuple struct, similar to:

1
2
3
4
5
6
enum SegmentEnd {
    Left,
    Right,
}

struct SweepEndpoint<'a>(&'a SweepSegment, SegmentEnd);

This makes SweepEndpoint essentially a tuple with a name. The 'a is a lifetime and says, more or less, that a SweepEndpoint cannot outlive the SweepSegment it references. Makes sense.

Problem solved! I no longer have mutually referential pointers. But I do still have pointers (well, references), and they have to point to something.

Problem the second: where’s all the data

Which brings me to the problem I always run into with Rust. I have a bucket of things, and I need to refer to some of them multiple times.

I tried half a dozen different approaches here and don’t clearly remember all of them, but I think my core problem went as follows. I translated the C++ class to a Rust struct with some methods hanging off of it. A simplified version might look like this.

1
2
3
4
struct Algorithm {
    arena: LinkedList<SweepSegment>,
    event_queue: BinaryHeap<SweepEndpoint>,
}

Ah, hang on — SweepEndpoint needs to be annotated with a lifetime, so Rust can enforce that those endpoints don’t live longer than the segments they refer to. No problem?

1
2
3
4
struct Algorithm<'a> {
    arena: LinkedList<SweepSegment>,
    event_queue: BinaryHeap<SweepEndpoint<'a>>,
}

Okay! Now for some methods.

1
2
3
4
5
6
7
8
fn run(&mut self) {
    self.arena.push_back(SweepSegment{ data: 5 });
    self.event_queue.push(SweepEndpoint(self.arena.back().unwrap(), SegmentEnd::Left));
    self.event_queue.push(SweepEndpoint(self.arena.back().unwrap(), SegmentEnd::Right));
    for event in &self.event_queue {
        println!("{:?}", event)
    }
}

Aaand… this doesn’t work. Rust “cannot infer an appropriate lifetime for autoref due to conflicting requirements”. The trouble is that self.arena.back() takes a reference to self.arena, and then I put that reference in the event queue. But I promised that everything in the event queue has lifetime 'a, and I don’t actually know how long self lives here; I only know that it can’t outlive 'a, because that would invalidate the references it holds.

A little random guessing let me to change &mut self to &'a mut self — which is fine because the entire impl block this lives in is already parameterized by 'a — and that makes this compile! Hooray! I think that’s because I’m saying self itself has exactly the same lifetime as the references it holds onto, which is true, since it’s referring to itself.

Let’s get a little more ambitious and try having two segments.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
fn run(&'a mut self) {
    self.arena.push_back(SweepSegment{ data: 5 });
    self.event_queue.push(SweepEndpoint(self.arena.back().unwrap(), SegmentEnd::Left));
    self.event_queue.push(SweepEndpoint(self.arena.back().unwrap(), SegmentEnd::Right));
    self.arena.push_back(SweepSegment{ data: 17 });
    self.event_queue.push(SweepEndpoint(self.arena.back().unwrap(), SegmentEnd::Left));
    self.event_queue.push(SweepEndpoint(self.arena.back().unwrap(), SegmentEnd::Right));
    for event in &self.event_queue {
        println!("{:?}", event)
    }
}

Whoops! Rust complains that I’m trying to mutate self.arena while other stuff is referring to it. And, yes, that’s true — I have references to it in the event queue, and Rust is preventing me from potentially deleting everything from the queue when references to it still exist. I’m not actually deleting anything here, of course (though I could be if this were a Vec!), but Rust’s type system can’t encode that (and I dread the thought of a type system that can).

I struggled with this for a while, and rapidly encountered another complete showstopper:

1
2
3
4
5
6
fn run(&'a mut self) {
    self.mutate_something();
    self.mutate_something();
}

fn mutate_something(&'a mut self) {}

Rust objects that I’m trying to borrow self mutably, twice — once for the first call, once for the second.

But why? A borrow is supposed to end automatically once it’s no longer used, right? Maybe if I throw some braces around it for scope… nope, that doesn’t help either.

It’s true that borrows usually end automatically, but here I have explicitly told Rust that mutate_something() should borrow with the lifetime 'a, which is the same as the lifetime in run(). So the first call explicitly borrows self for at least the rest of the method. Removing the lifetime from mutate_something() does fix this error, but if that method tries to add new segments, I’m back to the original problem.

Oh no. The mutation in the C++ code is several calls deep. Porting it directly seems nearly impossible.

The typical solution here — at least, the first thing people suggest to me on Twitter — is to wrap basically everything everywhere in Rc<RefCell<T>>, which gives you something that’s reference-counted (avoiding questions of ownership) and defers borrow checks until runtime (avoiding questions of mutable borrows). But that seems pretty heavy-handed here — not only does RefCell add .borrow() noise anywhere you actually want to interact with the underlying value, but do I really need to refcount these tiny structs that only hold a handful of floats each?

I set out to find a middle ground.

Solution, kind of

I really, really didn’t want to perform serious surgery on this code just to get it to build. I still didn’t know if it worked at all, and now I had to rearrange it without being able to check if I was breaking it further. (This isn’t Rust’s fault; it’s a natural problem with porting between fairly different paradigms.)

So I kind of hacked it into working with minimal changes, producing a grotesque abomination which I’m ashamed to link to. Here’s how!

First, I got rid of the class. It turns out this makes lifetime juggling much easier right off the bat. I’m pretty sure Rust considers everything in a struct to be destroyed simultaneously (though in practice it guarantees it’ll destroy fields in order), which doesn’t leave much wiggle room. Locals within a function, on the other hand, can each have their own distinct lifetimes, which solves the problem of expressing that the borrows won’t outlive the arena.

Speaking of the arena, I solved the mutability problem there by switching to… an arena! The typed-arena crate (a port of a type used within Rust itself, I think) is an allocator — you give it a value, and it gives you back a reference, and the reference is guaranteed to be valid for as long as the arena exists. The method that does this is sneaky and takes &self rather than &mut self, so Rust doesn’t know you’re mutating the arena and won’t complain. (One drawback is that the arena will never free anything you give to it, but that’s not a big problem here.)


My next problem was with mutation. The main loop repeatedly calls possibleIntersection with pairs of segments, which can split either or both segment. Rust definitely doesn’t like that — I’d have to pass in two &muts, both of which are mutable references into the same arena, and I’d have a bunch of immutable references into that arena in the sweep list and elsewhere. This isn’t going to fly.

This is kind of a shame, and is one place where Rust seems a little overzealous. Something like this seems like it ought to be perfectly valid:

1
2
3
4
let mut v = vec![1u32, 2u32];
let a = &mut v[0];
let b = &mut v[1];
// do stuff with a, b

The trouble is, Rust only knows the type signature, which here is something like index_mut(&'a mut self, index: usize) -> &'a T. Nothing about that says that you’re borrowing distinct elements rather than some core part of the type — and, in fact, the above code is only safe because you’re borrowing distinct elements. In the general case, Rust can’t possibly know that. It seems obvious enough from the different indexes, but nothing about the type system even says that different indexes have to return different values. And what if one were borrowed as &mut v[1] and the other were borrowed with v.iter_mut().next().unwrap()?

Anyway, this is exactly where people start to turn to RefCell — if you’re very sure you know better than Rust, then a RefCell will skirt the borrow checker while still enforcing at runtime that you don’t have more than one mutable borrow at a time.

But half the lines in this algorithm examine the endpoints of a segment! I don’t want to wrap the whole thing in a RefCell, or I’ll have to say this everywhere:

1
if segment1.borrow().point.x < segment2.borrow().point.x { ... }

Gross.

But wait — this code only mutates the points themselves in one place. When a segment is split, the original segment becomes the left half, and a new segment is created to be the right half. There’s no compelling need for this; it saves an allocation for the left half, but it’s not critical to the algorithm.

Thus, I settled on a compromise. My segment type now looks like this:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
struct SegmentPacket {
    // a bunch of flags and whatnot used in the algorithm
}
struct SweepSegment {
    left_point: MapPoint,
    right_point: MapPoint,
    faces_outwards: bool,
    index: usize,
    order: usize,
    packet: RefCell<SegmentPacket>,
}

I do still need to call .borrow() or .borrow_mut() to get at the stuff in the “packet”, but that’s far less common, so there’s less noise overall. And I don’t need to wrap it in Rc because it’s part of a type that’s allocated in the arena and passed around only via references.


This still leaves me with the problem of how to actually perform the splits.

I’m not especially happy with what I came up with, I don’t know if I can defend it, and I suspect I could do much better. I changed possibleIntersection so that rather than performing splits, it returns the points at which each segment needs splitting, in the form (usize, Option<MapPoint>, Option<MapPoint>). (The usize is used as a flag for calling code and oughta be an enum, but, isn’t yet.)

Now the top-level function is responsible for all arena management, and all is well.

Except, er. possibleIntersection is called multiple times, and I don’t want to copy-paste a dozen lines of split code after each call. I tried putting just that code in its own function, which had the world’s most godawful signature, and that didn’t work because… uh… hm. I can’t remember why, exactly! Should’ve written that down.

I tried a local closure next, but closures capture their environment by reference, so now I had references to a bunch of locals for as long as the closure existed, which meant I couldn’t mutate those locals. Argh. (This seems a little silly to me, since the closure’s references cannot possibly be used for anything if the closure isn’t being called, but maybe I’m missing something. Or maybe this is just a limitation of lifetimes.)

Increasingly desperate, I tried using a macro. But… macros are hygienic, which means that any new name you use inside a macro is different from any name outside that macro. The macro thus could not see any of my locals. Usually that’s good, but here I explicitly wanted the macro to mess with my locals.

I was just about to give up and go live as a hermit in a cabin in the woods, when I discovered something quite incredible. You can define local macros! If you define a macro inside a function, then it can see any locals defined earlier in that function. Perfect!

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
macro_rules! _split_segment (
    ($seg:expr, $pt:expr) => (
        {
            let pt = $pt;
            let seg = $seg;
            // ... waaay too much code ...
        }
    );
);

loop {
    // ...
    // This is possibleIntersection, renamed because Rust rightfully complains about camelCase
    let cross = handle_intersections(Some(segment), maybe_above);
    if let Some(pt) = cross.1 {
        segment = _split_segment!(segment, pt);
    }
    if let Some(pt) = cross.2 {
        maybe_above = Some(_split_segment!(maybe_above.unwrap(), pt));
    }
    // ...
}

(This doesn’t actually quite match the original algorithm, which has one case where a segment can be split twice. I realized that I could just do the left-most split, and a later iteration would perform the other split. I sure hope that’s right, anyway.)

It’s a bit ugly, and I ran into a whole lot of implicit behavior from the C++ code that I had to fix — for example, the segment is sometimes mutated just before it’s split, purely as a shortcut for mutating the left part of the split. But it finally compiles! And runs! And kinda worked, a bit!

Aftermath

I still had a lot of work to do.

For one, this code was designed for intersecting two shapes, not mass-intersecting a big pile of shapes. The basic algorithm doesn’t care about how many polygons you start with — all it sees is segments — but the code for constructing the return value needed some heavy modification.

The biggest change by far? The original code traced each segment once, expecting the result to be only a single shape. I had to change that to trace each side of each segment once, since the vast bulk of the output consists of shapes which share a side. This violated a few assumptions, which I had to hack around.

I also ran into a couple very bad edge cases, spent ages debugging them, then found out that the original algorithm had a subtle workaround that I’d commented out because it was awkward to port but didn’t seem to do anything. Whoops!

The worst was a precision error, where a vertical line could be split on a point not quite actually on the line, which wreaked all kinds of havoc. I worked around that with some tasteful rounding, which is highly dubious but makes the output more appealing to my squishy human brain. (I might switch to the original workaround, but I really dislike that even simple cases can spit out points at 1500.0000000000003. The whole thing is parameterized over the coordinate type, so maybe I could throw a rational type in there and cross my fingers?)

All that done, I finally, finally, after a couple months of intermittent progress, got what I wanted!

This is Doom 2’s MAP01. The black area to the left of center is where the player starts. Gray areas indicate where the player can walk from there, with lighter shades indicating more distant areas, where “distance” is measured by the minimum number of line crossings. Red areas can’t be reached at all.

(Note: large playable chunks of the map, including the exit room, are red. That’s because those areas are behind doors, and this code doesn’t understand doors yet.)

(Also note: The big crescent in the lower-right is also black because I was lazy and looked for the player’s starting sector by checking the bbox, and that sector’s bbox happens to match.)

The code that generated this had to go out of its way to delete all the unreachable zones around solid walls. I think I could modify the algorithm to do that on the fly pretty easily, which would probably speed it up a bit too. Downside is that the algorithm would then be pretty specifically tied to this problem, and not usable for any other kind of polygon intersection, which I would think could come up elsewhere? The modifications would be pretty minor, though, so maybe I could confine them to a closure or something.

Some final observations

It runs surprisingly slowly. Like, multiple seconds. Unless I add --release, which speeds it up by a factor of… some number with multiple digits. Wahoo. Debug mode has a high price, especially with a lot of calls in play.

The current state of this code is on GitHub. Please don’t look at it. I’m very sorry.

Honestly, most of my anguish came not from Rust, but from the original code relying on lots of fairly subtle behavior without bothering to explain what it was doing or even hint that anything unusual was going on. God, I hate C++.

I don’t know if the Rust community can learn from this. I don’t know if I even learned from this. Let’s all just quietly forget about it.

Now I just need to figure this one out…