Tag Archives: keys

UK Copyright Trolls Cite Hopeless Case to Make People Pay Up

Post Syndicated from Andy original https://torrentfreak.com/uk-copyright-trolls-cite-hopeless-case-to-make-people-pay-up-170916/

Our coverage of Golden Eye International dates back more than five years. Much like similar companies in the copyright troll niche, the outfit monitors BitTorrent swarms, collects IP addresses, and then heads off to court to obtain alleged pirates’ identities.

From there it sends letters threatening legal action, unless recipients pay a ‘fine’ of hundreds of pounds to settle an alleged porn piracy case. While some people pay up, others refuse to do so on the basis they are innocent, the ISP bill payer, or simply to have their day in court. Needless to say, a full-on court battle on the merits is never on the agenda.

Having gone quiet for an extended period of time, it was assumed that Golden Eye had outrun its usefulness as a ‘fine’ collection outfit. Just lately, however, there are signs that the company is having another go at reviving old cases against people who previously refused to pay.

A post on Slyck forums, which runs a support thread for people targeted by trolls, reveals the strategy.

“I dealt with these Monkeys last year. I spent 5 weeks practically arguing with them. They claim they have to prove it based on the balance of probability’s [sic]. I argue that they actually have to prove it was me,” ‘Matt’ wrote in August.

“It wasn’t me, and despite giving them reasonable doubt it wasn’t me. (I’m Gay… why would I be downloading straight porn?) They still persuaded it, trying to dismiss anything that cast any doubt on their claim. The emails finished how I figured they would…. They were going to send court documentation. It never arrived.”

After months of silence, at the end of August this year ‘Matt’ says GoldenEye got in touch again, suggesting that a conclusion to another copyright case might encourage him to cough up. He says that Golden Eye contacted him saying that someone settled out of court with TCYK, another copyright troll, for £1,000.

“My thoughts…Idiots and doubt it,” ‘Matt’ said. “Honestly, I almost cried I thought I had got rid of these trolls and they are back for round two.”

This wasn’t an isolated case. Another recipient of a Golden Eye threat also revealed getting contacted by the company, also with fresh pressure to pay.

“You may be interested to know that a solicitor, acting on behalf of Robert Kemble in a claim similar to ours but brought by TCYK LLC, entered into an agreement to settle the court case by paying £1,000,” Golden Eye told the individual.

“In view of the agreement reached in the Kemble case, we would invite you to reconsider your position as to whether you would like to reach settlement with us. We would point out, that, despite the terms of settlement in the Kemble case, we remain prepared to stand by our original offer of settlement with you, that is payment of £500.00.”

After last corresponding with the Golden Eye in January after repeated denials, new contact from the company would be worrying for anyone. It certainly affected this person negatively.

“I am now at a loss and don’t know what more I can do. I do not want to settle this, but also I cannot afford a solicitor. Any further advice would be gratefully appreciated as [i’m] now having panic attacks,” the person wrote.

After citing the Robert Kemble case, one might think that Golden Eye would be good enough to explain the full situation. They didn’t – so let’s help them a little bit in that respect, to help their targets make an informed decision.

Robert Kemble was a customer of Sky Broadband. TCYK, in conjunction with UK-based Hatton and Berkeley, sent a letter to Kemble in July 2015 asking him to pay a ‘fine’ for alleged Internet piracy of the Robert Redford movie The Company You Keep, way back in April 2013.

So far, so ordinary – but here’s the big deal.

Unlike the people being re-targeted by Golden Eye this time around, Kemble admitted in writing that infringement had been going on via his account.

In a response, Kemble told TCYK that he was shocked to receive their letter but after speaking to people in his household, had discovered that a child had been downloading films. He didn’t say that the Redford film was among them but he apologized to the companies all the same. Clearly, that wasn’t going to be enough.

In August 2015, TCYK wrote back to Kemble, effectively holding him responsible for other people’s actions while demanding a settlement of £600 to be paid to third-party company, Ranger Bay Limited.

“The child who is responsible for the infringement should sign the undertakings in our letter to you. Please when replying specify clearly on the undertakings the child’s full name and age,” the company later wrote. Nice.

What took place next was a round of letter tennis between Kemble’s solicitor and those acting for TCYK, with the latter insisting that Kemble had already admitted infringement (or authorizing the same) and demanding around £2000 to settle the case at this later stage.

With no settlement forthcoming, TCYK demanded £5,000 in the small claims court.

“The Defendant has admitted that his internet address has been used to infringe the Claimant’s copyright whereby, through the Defendant’s licencees’ use of the Defendant’s internet address, he acquired the Work and then communicated the Work in a digital form via the internet to the public without the license or consent of the Claimant,” the TCYK claim form reads.

TorrentFreak understands that the court process that followed didn’t center on the merits of the infringement case, but procedural matters over how the case was handled. On this front, Kemble failed in his efforts to have the case – which was heard almost a year ago – decided in his favor.

Now, according to Golden Eye at least, Kemble has settled with TCYK for £1000, which is just £300 more than their final pre-court offer. Hardly sounds like good value for money.

The main point, though, is that this case wouldn’t have gotten anywhere near a court if Kemble hadn’t admitted liability of sorts in the early stages. This is a freak case in all respects and has no bearing on anyone’s individual case, especially those who haven’t admitted liability.

So, for people getting re-hounded by Golden Eye now, remember the Golden Rule. If you’re innocent, by all means tell them, and stick to your guns. But, at your peril tell them anything else on top, or risk having it used against you.

Source: TF, for the latest info on copyright, file-sharing, torrent sites and ANONYMOUS VPN services.

Self-Driving Cars Should Be Open Source

Post Syndicated from Bozho original https://techblog.bozho.net/self-driving-cars-open-source/

Self-driving cars are (will be) the pinnacle of consumer products automation – robot vacuum cleaners, smart fridges and TVs are just toys compared to self-driving cars. Both in terms of technology and in terms of impact. We aren’t yet on level 5 self driving cars , but they are behind the corner.

But as software engineers we know how fragile software is. And self-driving cars are basically software, so we can see all the risks involved with putting our lives in the hands anonymous (from our point of view) developers and unknown (to us) processes and quality standards. One may argue that this has been the case for every consumer product ever, but with software is different – software is way more complex than anything else.

So I have an outrageous proposal – self-driving cars should be open source. We have to be able to verify and trust the code that’s navigating our helpless bodies around the highways. Not only that, but we have to be able to verify if it is indeed that code that is currently running in our car, and not something else.

In fact, let me extend that – all cars should be open source. Before you say “but that will ruin the competitive advantage of manufacturers and will be deadly for business”, I don’t actually care how they trained their neural networks, or what their datasets are. That’s actually the secret sauce of the self-driving car and in my view it can remain proprietary and closed. What I’d like to see open-sourced is everything else. (Under what license – I’d be fine to even have it copyrighted and so not “real” open source, but that’s a separate discussion).

Why? This story about remote carjacking using the entertainment system of a Jeep is a scary example. Attackers that reverse engineer the car software can remotely control everything in the car. Why did that happen? Well, I guess it’s complicated and we have to watch the DEFCON talk.

And also read the paper, but a paragraph in wikipedia about the CAN bus used in most cars gives us a hint:

CAN is a low-level protocol and does not support any security features intrinsically. There is also no encryption in standard CAN implementations, which leaves these networks open to man-in-the-middle packet interception. In most implementations, applications are expected to deploy their own security mechanisms; e.g., to authenticate incoming commands or the presence of certain devices on the network. Failure to implement adequate security measures may result in various sorts of attacks if the opponent manages to insert messages on the bus. While passwords exist for some safety-critical functions, such as modifying firmware, programming keys, or controlling antilock brake actuators, these systems are not implemented universally and have a limited number of seed/key pair

I don’t know in what world it makes sense to even have a link between the entertainment system and the low-level network that operates the physical controls. As apparent from the talk, the two systems are supposed to be air-gapped, but in reality they aren’t.

Rookie mistakes were abound – unauthenticated “execute” method, running as root, firmware is not signed, hard-coded passwords, etc. How do we know that there aren’t tons of those in all cars out there right now, and in the self-driving cars of the future (which will likely use the same legacy technologies of the current cars)? Recently I heard a negative comment about the source code of one of the self-driving cars “players”, and I’m pretty sure there are many of those rookie mistakes.

Why this is this even more risky for self-driving cars? I’m not an expert in car programming, but it seems like the attack surface is bigger. I might be completely off target here, but on a typical car you’d have to “just” properly isolate the CAN bus. With self-driving cars the autonomous system that watches the surrounding and makes decisions on what to do next has to be connected to the CAN bus. With Tesla being able to send updates over the wire, the attack surface is even bigger (although that’s actually a good feature – to be able to patch all cars immediately once a vulnerability is discovered).

Of course, one approach would be to introduce legislation that regulates car software. It might work, but it would rely on governments to to proper testing, which won’t always be the case.

The alternative is to open-source it and let all the white-hats find your issues, so that you can close them before the car hits the road. Not only that, but consumers like me will feel safer, and geeks would be able to verify whether the car is really running the software it claims to run by verifying the fingerprints.

Richard Stallman might be seen as a fanatic when he advocates against closed source software, but in cases like … cars, his concerns seem less extreme.

“But the Jeep vulnerability was fixed”, you may say. And that might be seen as being the way things are – vulnerabilities appear, they get fixed, life goes on. No person was injured because of the bug, right? Well, not yet. And “gaining control” is the extreme scenario – there are still pretty bad scenarios, like being able to track a car through its GPS, or cause panic by controlling the entertainment system. It might be over wifi, or over GPRS, or even by physically messing with the car by inserting a flash drive. Is open source immune to those issues? No, but it has proven to be more resilient.

One industry where the problem of proprietary software on a product that the customer bought is … tractors. It turns out farmers are hacking their tractors, because of multiple issues and the inability of the vendor to resolve them in a timely manner. This is likely to happen to cars soon, when only authorized repair shops are allowed to touch anything on the car. And with unauthorized repair shops the attack surface becomes even bigger.

In fact, I’d prefer open source not just for cars, but for all consumer products. The source code of a smart fridge or a security camera is trivial, it would rarely mean sacrificing competitive advantage. But refrigerators get hacked, security cameras are active part of botnets, the “internet of shit” is getting ubiquitous. A huge amount of these issues are dumb, beginner mistakes. We have the right to know what shit we are running – in our frdges, DVRs and ultimatey – cars.

Your fridge may soon by spying on you, your vacuum cleaner may threaten your pet in demand of “ransom”. The terrorists of the future may crash planes without being armed, can crash vans into crowds without being in the van, and can “explode” home equipment without being in the particular home. And that’s not just a hypothetical.

Will open source magically solve the issue? No. But it will definitely make things better and safer, as it has done with operating systems and web servers.

The post Self-Driving Cars Should Be Open Source appeared first on Bozho's tech blog.

Automate Your IT Operations Using AWS Step Functions and Amazon CloudWatch Events

Post Syndicated from Andy Katz original https://aws.amazon.com/blogs/compute/automate-your-it-operations-using-aws-step-functions-and-amazon-cloudwatch-events/


Rob Percival, Associate Solutions Architect

Are you interested in reducing the operational overhead of your AWS Cloud infrastructure? One way to achieve this is to automate the response to operational events for resources in your AWS account.

Amazon CloudWatch Events provides a near real-time stream of system events that describe the changes and notifications for your AWS resources. From this stream, you can create rules to route specific events to AWS Step Functions, AWS Lambda, and other AWS services for further processing and automated actions.

In this post, learn how you can use Step Functions to orchestrate serverless IT automation workflows in response to CloudWatch events sourced from AWS Health, a service that monitors and generates events for your AWS resources. As a real-world example, I show automating the response to a scenario where an IAM user access key has been exposed.

Serverless workflows with Step Functions and Lambda

Step Functions makes it easy to develop and orchestrate components of operational response automation using visual workflows. Building automation workflows from individual Lambda functions that perform discrete tasks lets you develop, test, and modify the components of your workflow quickly and seamlessly. As serverless services, Step Functions and Lambda also provide the benefits of more productive development, reduced operational overhead, and no costs incurred outside of when the workflows are actively executing.

Example workflow

As an example, this post focuses on automating the response to an event generated by AWS Health when an IAM access key has been publicly exposed on GitHub. This is a diagram of the automation workflow:

AWS proactively monitors popular code repository sites for IAM access keys that have been publicly exposed. Upon detection of an exposed IAM access key, AWS Health generates an AWS_RISK_CREDENTIALS_EXPOSED event in the AWS account related to the exposed key. A configured CloudWatch Events rule detects this event and invokes a Step Functions state machine. The state machine then orchestrates the automated workflow that deletes the exposed IAM access key, summarizes the recent API activity for the exposed key, and sends the summary message to an Amazon SNS topic to notify the subscribers―in that order.

The corresponding Step Functions state machine diagram of this automation workflow can be seen below:

While this particular example focuses on IT automation workflows in response to the AWS_RISK_CREDENTIALS_EXPOSEDevent sourced from AWS Health, it can be generalized to integrate with other events from these services, other event-generating AWS services, and even run on a time-based schedule.

Walkthrough

To follow along, use the code and resources found in the aws-health-tools GitHub repo. The code and resources include an AWS CloudFormation template, in addition to instructions on how to use it.

Launch Stack into N. Virginia with CloudFormation

The Step Functions state machine execution starts with the exposed keys event details in JSON, a sanitized example of which is provided below:

{
    "version": "0",
    "id": "121345678-1234-1234-1234-123456789012",
    "detail-type": "AWS Health Event",
    "source": "aws.health",
    "account": "123456789012",
    "time": "2016-06-05T06:27:57Z",
    "region": "us-east-1",
    "resources": [],
    "detail": {
        "eventArn": "arn:aws:health:us-east-1::event/AWS_RISK_CREDENTIALS_EXPOSED_XXXXXXXXXXXXXXXXX",
        "service": "RISK",
        "eventTypeCode": "AWS_RISK_CREDENTIALS_EXPOSED",
        "eventTypeCategory": "issue",
        "startTime": "Sat, 05 Jun 2016 15:10:09 GMT",
        "eventDescription": [
            {
                "language": "en_US",
                "latestDescription": "A description of the event is provided here"
            }
        ],
        "affectedEntities": [
            {
                "entityValue": "ACCESS_KEY_ID_HERE"
            }
        ]
    }
}

After it’s invoked, the state machine execution proceeds as follows.

Step 1: Delete the exposed IAM access key pair

The first thing you want to do when you determine that an IAM access key has been exposed is to delete the key pair so that it can no longer be used to make API calls. This Step Functions task state deletes the exposed access key pair detailed in the incoming event, and retrieves the IAM user associated with the key to look up API activity for the user in the next step. The user name, access key, and other details about the event are passed to the next step as JSON.

This state contains a powerful error-handling feature offered by Step Functions task states called a catch configuration. Catch configurations allow you to reroute and continue state machine invocation at new states depending on potential errors that occur in your task function. In this case, the catch configuration skips to Step 3. It immediately notifies your security team that errors were raised in the task function of this step (Step 1), when attempting to look up the corresponding IAM user for a key or delete the user’s access key.

Note: Step Functions also offers a retry configuration for when you would rather retry a task function that failed due to error, with the option to specify an increasing time interval between attempts and a maximum number of attempts.

Step 2: Summarize recent API activity for key

After you have deleted the access key pair, you’ll want to have some immediate insight into whether it was used for malicious activity in your account. Another task state, this step uses AWS CloudTrail to look up and summarize the most recent API activity for the IAM user associated with the exposed key. The summary is in the form of counts for each API call made and resource type and name affected. This summary information is then passed to the next step as JSON. This step requires information that you obtained in Step 1. Step Functions ensures the successful completion of Step 1 before moving to Step 2.

Step 3: Notify security

The summary information gathered in the last step can provide immediate insight into any malicious activity on your account made by the exposed key. To determine this and further secure your account if necessary, you must notify your security team with the gathered summary information.

This final task state generates an email message providing in-depth detail about the event using the API activity summary, and publishes the message to an SNS topic subscribed to by the members of your security team.

If the catch configuration of the task state in Step 1 was triggered, then the security notification email instead directs your security team to log in to the console and navigate to the Personal Health Dashboard to view more details on the incident.

Lessons learned

When implementing this use case with Step Functions and Lambda, consider the following:

  • One of the most important parts of implementing automation in response to operational events is to ensure visibility into the response and resolution actions is retained. Step Functions and Lambda enable you to orchestrate your granular response and resolution actions that provides direct visibility into the state of the automation workflow.
  • This basic workflow currently executes these steps serially with a catch configuration for error handling. More sophisticated workflows can leverage the parallel execution, branching logic, and time delay functionality provided by Step Functions.
  • Catch and retry configurations for task states allow for orchestrating reliable workflows while maintaining the granularity of each Lambda function. Without leveraging a catch configuration in Step 1, you would have had to duplicate code from the function in Step 3 to ensure that your security team was notified on failure to delete the access key.
  • Step Functions and Lambda are serverless services, so there is no cost for these services when they are not running. Because this IT automation workflow only runs when an IAM access key is exposed for this account (which is hopefully rare!), the total monthly cost for this workflow is essentially $0.

Conclusion

Automating the response to operational events for resources in your AWS account can free up the valuable time of your engineers. Step Functions and Lambda enable granular IT automation workflows to achieve this result while gaining direct visibility into the orchestration and state of the automation.

For more examples of how to use Step Functions to automate the operations of your AWS resources, or if you’d like to see how Step Functions can be used to build and orchestrate serverless applications, visit Getting Started on the Step Functions website.

ShadowBrokers Releases NSA UNITEDRAKE Manual

Post Syndicated from Bruce Schneier original https://www.schneier.com/blog/archives/2017/09/shadowbrokers_r.html

The ShadowBrokers released the manual for UNITEDRAKE, a sophisticated NSA Trojan that targets Windows machines:

Able to compromise Windows PCs running on XP, Windows Server 2003 and 2008, Vista, Windows 7 SP 1 and below, as well as Windows 8 and Windows Server 2012, the attack tool acts as a service to capture information.

UNITEDRAKE, described as a “fully extensible remote collection system designed for Windows targets,” also gives operators the opportunity to take complete control of a device.

The malware’s modules — including FOGGYBOTTOM and GROK — can perform tasks including listening in and monitoring communication, capturing keystrokes and both webcam and microphone usage, the impersonation users, stealing diagnostics information and self-destructing once tasks are completed.

More news.

UNITEDRAKE was mentioned in several Snowden documents and also in the TAO catalog of implants.

And Kaspersky Labs has found evidence of these tools in the wild, associated with the Equation Group — generally assumed to be the NSA:

The capabilities of several tools in the catalog identified by the codenames UNITEDRAKE, STRAITBAZZARE, VALIDATOR and SLICKERVICAR appear to match the tools Kaspersky found. These codenames don’t appear in the components from the Equation Group, but Kaspersky did find “UR” in EquationDrug, suggesting a possible connection to UNITEDRAKE (United Rake). Kaspersky also found other codenames in the components that aren’t in the NSA catalog but share the same naming conventions­they include SKYHOOKCHOW, STEALTHFIGHTER, DRINKPARSLEY, STRAITACID, LUTEUSOBSTOS, STRAITSHOOTER, and DESERTWINTER.

ShadowBrokers has only released the UNITEDRAKE manual, not the tool itself. Presumably they’re trying to sell that

timeShift(GrafanaBuzz, 1w) Issue 11

Post Syndicated from Blogs on Grafana Labs Blog original https://grafana.com/blog/2017/09/01/timeshiftgrafanabuzz-1w-issue-11/

September is here and summer is officially drawing to a close, but the Grafana team has stayed busy. We’re prepping for an upcoming Grafana 4.5 release, had some new and updated plugins, and would like to thank two contributors for fixing a non-obvious bug. Also – The CFP for GrafanaCon EU is open, and we’d like you to speak!


GrafanaCon EU CFP is Open

Have a big idea to share? Have a shorter talk or a demo you’d like to show off?
We’re looking for 40-minute detailed talks, 20-minute general talks and 10-minute lightning talks. We have a perfect slot for any type of content.

I’d Like to Speak at GrafanaCon

Grafana Labs is Hiring!

Do you believe in open source software? Build the future with us, and ship code.

Check out our open positions

From the Blogosphere

Zabbix, Grafana and Python, a Match Made in Heaven: David’s article, published earlier this year, hits on some great points about open source software and how you don’t have to spend much (or any) money to get valuable monitoring for your infrastructure.

The Business of Democratizing Metrics: Our friends over at Packet stopped by the office recently to sit down and chat with the Grafana Labs co-founders. They discussed how Grafana started, how monitoring has evolved, and democratizing metrics.

Visualizing CloudWatch with Grafana: Yuzo put together an article outlining his first experience adding a CloudWatch data source in Grafana, importing his first dashboard, then comparing the graphs between Grafana and CloudWatch.

Monitoring Linux performance with Grafana: Jim wanted to monitor his CentOS home router to get network traffic and disk usage stats, but wanted to try something different than his previous cacti monitoring. This walkthrough shows how he set things up to collect, store and visualize the data.

Visualizing Jenkins Pipeline Results in Grafana: Piotr provides a walkthrough of his setup and configuration to view Jenkins build results for his continuous delivery environment in Grafana.


Grafana Plugins

This week we’ve added a plugin for the new time series database Sidewinder, and updates to the Carpet Plot graph panel. If you haven’t installed a plugin, it’s easy. For on-premises installations, the Grafana-cli will do the work for you. If you’re using Hosted Grafana, you can install any plugin with one click.

NEW PLUGIN

Sidewinder Data Source – This is a data source plugin for the new Sidewinder database. Sidewinder is an open source, fast time series database designed for real-time analytics. It can be used for a variety of use cases that need storage of metrics data like APM and IoT.

Install Now

UPDATED PLUGIN

Carpet Plot Panel – This plugin received an update, which includes the following features and fixes:

  • New aggregate functions: Min, Max, First, Last
  • Possibility to invert color scheme
  • Possibility to change X axis label format
  • Possibility to hide X and Y axis labels

Update Now


This week’s MVC (Most Valuable Contributor)

This week we want to thank two contributors who worked together to fix a non-obvious bug in the new MySQL data source (a bug with sorting values in the legend).

robinsonjj
Thank you Joe, for tackling this issue and submitting a PR with an initial fix.

pdoan017
pdoan017 took robinsonjj’s contribution and added a new PR to retain the order in which keys are added.

Thank you both for taking the time to both troubleshoot and fix the issue. Much appreciated!


Tweet of the Week

We scour Twitter each week to find an interesting/beautiful dashboard and show it off! #monitoringLove

Nice! Combining different panel types on a dashboard can add more context to your data – Looks like a very functional dashboard.


What do you think?

Let us know how we’re doing! Submit a comment on this article below, or post something at our community forum. Help us make these roundups better and better!

Follow us on Twitter, like us on Facebook, and join the Grafana Labs community.

Netflix develops Morse code search option

Post Syndicated from Alex Bate original https://www.raspberrypi.org/blog/netflix-morse-code/

What happens when Netflix gives its staff two days to hack the platform and create innovative (and often unnecessary) variations on the streaming service?

This. This is what happens.

Hack Day Summer 2017 Teleflix

Uploaded by NetflixOpenSource on 2017-08-28.

Netflix Hack Day

Twice a year, the wonderful team at Netflix is given two days to go nuts and create fun, random builds, taking inspiration from Netflix and its content. So far they’ve debuted a downgraded version of the streaming platform played on an original Nintendo Entertainment System (NES), turned hit show Narcos into a video game, and utilised VR technology into many more builds that, while they’ll never be made public, have no doubt led to some lightbulb moments for the creative teams involved.

DarNES – Netflix Hack Day – Winter 2015

In a world… where devices proliferate… darNES digs back in time to provide Netflix access to the original Nintendo Entertainment System.

Kevin Spacey? More like ‘Kevin Spacebar’, am I right? Aha…ha…haaaa…I’ll get my coat.

Teleflix

The Teleflix build from this summer’s Hack Day is obviously the best one yet, as it uses a Raspberry Pi. By writing code that decodes the dots and dashes from an original 1920s telegraph (provided by AT&T, and lovingly restored by the team using ketchup!) into keystrokes, they’re able to search for their favourite shows via Morse code.

Netflix Morse Code

Morse code, for the unaware, is a method for transmitting letters and numbers via a standardised series of beeps, clicks, or flashes. Stuck in a sticky situation? Three dots followed by three dashes and a further three dots gives you ‘SOS’. Sorted. So long as there’s someone there to see or hear it, who also understands Morse Code.

Morse Code

Morse code was a method of transmiting textual information as a series of on-off tones that could be directly understood by a skilled listener. Mooo-Theme: http://soundcloud.com/mooojvm/mooo-theme

So if you’d like to watch, for example, The Unbreakable Kimmy Schmidt, you simply send: – …. . / ..- -. -… .-. . .- -.- .- -… .-.. . / -.- .. — — -.– / … -.-. …. — .. -.. – and you’re set. Easy!

To reach Netflix, the team used a Playstation 4. However, if you want to skip a tech step, you could stream Netflix directly to your Raspberry Pi by following this relatively new tutorial. Nobody at Pi Towers has tried it out yet, but if you have we’d be interested to see how you got on in the comments below.

And if you’d like to play around a little more with the Raspberry Pi and Morse code, you can pick up your own Morse code key, or build one using conductive components such as buttons or bananas, and try it out for yourself.

Alex’s Netflix-themed Morse code quiz

Just for fun, here are the titles of some of my favourite shows to watch on Netflix, translated into Morse code. Using the key below, why not take a break and challenge your mind to translate them back into English. Reward yourself +10 imaginary House Points for each correct answer.

Netflix Morse Code

  1. -.. — -.-. – — .-. / .– …. —
  2. …. .- -. -. .. -… .- .-..
  3. – …. . / — .-
  4. … . -. … . —..
  5. .— . … … .. -.-. .- / .— — -. . …
  6. –. .. .-.. — — .-. . / –. .. .-. .-.. …
  7. –. .-.. — .–

The post Netflix develops Morse code search option appeared first on Raspberry Pi.

GnuPG 2.2.0 released

Post Syndicated from corbet original https://lwn.net/Articles/732135/rss

Version 2.2.0 of the GNU Privacy Guard is out; this is the beginning of a
new long-term stable series. Changes in this release are mostly minor, but
it does now install as gpg rather than gpg2, and it will
automatically fetch keys from keyservers by default. “Note: this enables keyserver and Web Key Directory operators to
notice when you intend to encrypt to a mail address without having
the key locally. This new behaviour will eventually make key
discovery much easier and mostly automatic.

New AWS DevOps Blog Post: How to Help Secure Your Code in a Cross-Region/Cross-Account Deployment Solution on AWS

Post Syndicated from Craig Liebendorfer original https://aws.amazon.com/blogs/security/new-aws-devops-blog-post-how-to-help-secure-your-code-in-a-cross-regioncross-account-deployment-solution/

Security image

You can help to protect your data in a number of ways while it is in transit and at rest, such as by using Secure Sockets Layer (SSL) or client-side encryption. AWS Key Management Service (AWS KMS) is a managed service that makes it easy for you to create, control, rotate, and use your encryption keys. AWS KMS allows you to create custom keys, which you can share with AWS Identity and Access Management users and roles in your AWS account or in an AWS account owned by someone else.

In a new AWS DevOps Blog post, BK Chaurasiya describes a solution for building a cross-region/cross-account code deployment solution on AWS. BK explains options for helping to protect your source code as it travels between regions and between AWS accounts.

For more information, see the full AWS DevOps Blog post.

– Craig

From Data Lake to Data Warehouse: Enhancing Customer 360 with Amazon Redshift Spectrum

Post Syndicated from Dylan Tong original https://aws.amazon.com/blogs/big-data/from-data-lake-to-data-warehouse-enhancing-customer-360-with-amazon-redshift-spectrum/

Achieving a 360o-view of your customer has become increasingly challenging as companies embrace omni-channel strategies, engaging customers across websites, mobile, call centers, social media, physical sites, and beyond. The promise of a web where online and physical worlds blend makes understanding your customers more challenging, but also more important. Businesses that are successful in this medium have a significant competitive advantage.

The big data challenge requires the management of data at high velocity and volume. Many customers have identified Amazon S3 as a great data lake solution that removes the complexities of managing a highly durable, fault tolerant data lake infrastructure at scale and economically.

AWS data services substantially lessen the heavy lifting of adopting technologies, allowing you to spend more time on what matters most—gaining a better understanding of customers to elevate your business. In this post, I show how a recent Amazon Redshift innovation, Redshift Spectrum, can enhance a customer 360 initiative.

Customer 360 solution

A successful customer 360 view benefits from using a variety of technologies to deliver different forms of insights. These could range from real-time analysis of streaming data from wearable devices and mobile interactions to historical analysis that requires interactive, on demand queries on billions of transactions. In some cases, insights can only be inferred through AI via deep learning. Finally, the value of your customer data and insights can’t be fully realized until it is operationalized at scale—readily accessible by fleets of applications. Companies are leveraging AWS for the breadth of services that cover these domains, to drive their data strategy.

A number of AWS customers stream data from various sources into a S3 data lake through Amazon Kinesis. They use Kinesis and technologies in the Hadoop ecosystem like Spark running on Amazon EMR to enrich this data. High-value data is loaded into an Amazon Redshift data warehouse, which allows users to analyze and interact with data through a choice of client tools. Redshift Spectrum expands on this analytics platform by enabling Amazon Redshift to blend and analyze data beyond the data warehouse and across a data lake.

The following diagram illustrates the workflow for such a solution.

This solution delivers value by:

  • Reducing complexity and time to value to deeper insights. For instance, an existing data model in Amazon Redshift may provide insights across dimensions such as customer, geography, time, and product on metrics from sales and financial systems. Down the road, you may gain access to streaming data sources like customer-care call logs and website activity that you want to blend in with the sales data on the same dimensions to understand how web and call center experiences maybe correlated with sales performance. Redshift Spectrum can join these dimensions in Amazon Redshift with data in S3 to allow you to quickly gain new insights, and avoid the slow and more expensive alternative of fully integrating these sources with your data warehouse.
  • Providing an additional avenue for optimizing costs and performance. In cases like call logs and clickstream data where volumes could be many TBs to PBs, storing the data exclusively in S3 yields significant cost savings. Interactive analysis on massive datasets may now be economically viable in cases where data was previously analyzed periodically through static reports generated by inexpensive batch processes. In some cases, you can improve the user experience while simultaneously lowering costs. Spectrum is powered by a large-scale infrastructure external to your Amazon Redshift cluster, and excels at scanning and aggregating large volumes of data. For instance, your analysts maybe performing data discovery on customer interactions across millions of consumers over years of data across various channels. On this large dataset, certain queries could be slow if you didn’t have a large Amazon Redshift cluster. Alternatively, you could use Redshift Spectrum to achieve a better user experience with a smaller cluster.

Proof of concept walkthrough

To make evaluation easier for you, I’ve conducted a Redshift Spectrum proof-of-concept (PoC) for the customer 360 use case. For those who want to replicate the PoC, the instructions, AWS CloudFormation templates, and public data sets are available in the GitHub repository.

The remainder of this post is a journey through the project, observing best practices in action, and learning how you can achieve business value. The walkthrough involves:

  • An analysis of performance data from the PoC environment involving queries that demonstrate blending and analysis of data across Amazon Redshift and S3. Observe that great results are achievable at scale.
  • Guidance by example on query tuning, design, and data preparation to illustrate the optimization process. This includes tuning a query that combines clickstream data in S3 with customer and time dimensions in Amazon Redshift, and aggregates ~1.9 B out of 3.7 B+ records in under 10 seconds with a small cluster!
  • Guidance and measurements to help assess deciding between two options: accessing and analyzing data exclusively in Amazon Redshift, or using Redshift Spectrum to access data left in S3.

Stream ingestion and enrichment

The focus of this post isn’t stream ingestion and enrichment on Kinesis and EMR, but be mindful of performance best practices on S3 to ensure good streaming and query performance:

  • Use random object keys: The data files provided for this project are prefixed with SHA-256 hashes to prevent hot partitions. This is important to ensure that optimal request rates to support PUT requests from the incoming stream in addition to certain queries from large Amazon Redshift clusters that could send a large number of parallel GET requests.
  • Micro-batch your data stream: S3 isn’t optimized for small random write workloads. Your datasets should be micro-batched into large files. For instance, the “parquet-1” dataset provided batches >7 million records per file. The optimal file size for Redshift Spectrum is usually in the 100 MB to 1 GB range.

If you have an edge case that may pose scalability challenges, AWS would love to hear about it. For further guidance, talk to your solutions architect.

Environment

The project consists of the following environment:

  • Amazon Redshift cluster: 4 X dc1.large
  • Data:
    • Time and customer dimension tables are stored on all Amazon Redshift nodes (ALL distribution style):
      • The data originates from the DWDATE and CUSTOMER tables in the Star Schema Benchmark
      • The customer table contains attributes for 3 million customers.
      • The time data is at the day-level granularity, and spans 7 years, from the start of 1992 to the end of 1998.
    • The clickstream data is stored in an S3 bucket, and serves as a fact table.
      • Various copies of this dataset in CSV and Parquet format have been provided, for reasons to be discussed later.
      • The data is a modified version of the uservisits dataset from AMPLab’s Big Data Benchmark, which was generated by Intel’s Hadoop benchmark tools.
      • Changes were minimal, so that existing test harnesses for this test can be adapted:
        • Increased the 751,754,869-row dataset 5X to 3,758,774,345 rows.
        • Added surrogate keys to support joins with customer and time dimensions. These keys were distributed evenly across the entire dataset to represents user visits from six customers over seven years.
        • Values for the visitDate column were replaced to align with the 7-year timeframe, and the added time surrogate key.

Queries across the data lake and data warehouse 

Imagine a scenario where a business analyst plans to analyze clickstream metrics like ad revenue over time and by customer, market segment and more. The example below is a query that achieves this effect: 

The query part highlighted in red retrieves clickstream data in S3, and joins the data with the time and customer dimension tables in Amazon Redshift through the part highlighted in blue. The query returns the total ad revenue for three customers over the last three months, along with info on their respective market segment.

Unfortunately, this query takes around three minutes to run, and doesn’t enable the interactive experience that you want. However, there’s a number of performance optimizations that you can implement to achieve the desired performance.

Performance analysis

Two key utilities provide visibility into Redshift Spectrum:

  • EXPLAIN
    Provides the query execution plan, which includes info around what processing is pushed down to Redshift Spectrum. Steps in the plan that include the prefix S3 are executed on Redshift Spectrum. For instance, the plan for the previous query has the step “S3 Seq Scan clickstream.uservisits_csv10”, indicating that Redshift Spectrum performs a scan on S3 as part of the query execution.
  • SVL_S3QUERY_SUMMARY
    Statistics for Redshift Spectrum queries are stored in this table. While the execution plan presents cost estimates, this table stores actual statistics for past query runs.

You can get the statistics of your last query by inspecting the SVL_S3QUERY_SUMMARY table with the condition (query = pg_last_query_id()). Inspecting the previous query reveals that the entire dataset of nearly 3.8 billion rows was scanned to retrieve less than 66.3 million rows. Improving scan selectivity in your query could yield substantial performance improvements.

Partitioning

Partitioning is a key means to improving scan efficiency. In your environment, the data and tables have already been organized, and configured to support partitions. For more information, see the PoC project setup instructions. The clickstream table was defined as:

CREATE EXTERNAL TABLE clickstream.uservisits_csv10
…
PARTITIONED BY(customer int4, visitYearMonth int4)

The entire 3.8 billion-row dataset is organized as a collection of large files where each file contains data exclusive to a particular customer and month in a year. This allows you to partition your data into logical subsets by customer and year/month. With partitions, the query engine can target a subset of files:

  • Only for specific customers
  • Only data for specific months
  • A combination of specific customers and year/months

You can use partitions in your queries. Instead of joining your customer data on the surrogate customer key (that is, c.c_custkey = uv.custKey), the partition key “customer” should be used instead:

SELECT c.c_name, c.c_mktsegment, t.prettyMonthYear, SUM(uv.adRevenue)
…
ON c.c_custkey = uv.customer
…
ORDER BY c.c_name, c.c_mktsegment, uv.yearMonthKey  ASC

This query should run approximately twice as fast as the previous query. If you look at the statistics for this query in SVL_S3QUERY_SUMMARY, you see that only half the dataset was scanned. This is expected because your query is on three out of six customers on an evenly distributed dataset. However, the scan is still inefficient, and you can benefit from using your year/month partition key as well:

SELECT c.c_name, c.c_mktsegment, t.prettyMonthYear, SUM(uv.adRevenue)
…
ON c.c_custkey = uv.customer
…
ON uv.visitYearMonth = t.d_yearmonthnum
…
ORDER BY c.c_name, c.c_mktsegment, uv.visitYearMonth ASC

All joins between the tables are now using partitions. Upon reviewing the statistics for this query, you should observe that Redshift Spectrum scans and returns the exact number of rows, 66,270,117. If you run this query a few times, you should see execution time in the range of 8 seconds, which is a 22.5X improvement on your original query!

Predicate pushdown and storage optimizations 

Previously, I mentioned that Redshift Spectrum performs processing through large-scale infrastructure external to your Amazon Redshift cluster. It is optimized for performing large scans and aggregations on S3. In fact, Redshift Spectrum may even out-perform a medium size Amazon Redshift cluster on these types of workloads with the proper optimizations. There are two important variables to consider for optimizing large scans and aggregations:

  • File size and count. As a general rule, use files 100 MB-1 GB in size, as Redshift Spectrum and S3 are optimized for reading this object size. However, the number of files operating on a query is directly correlated with the parallelism achievable by a query. There is an inverse relationship between file size and count: the bigger the files, the fewer files there are for the same dataset. Consequently, there is a trade-off between optimizing for object read performance, and the amount of parallelism achievable on a particular query. Large files are best for large scans as the query likely operates on sufficiently large number of files. For queries that are more selective and for which fewer files are operating, you may find that smaller files allow for more parallelism.
  • Data format. Redshift Spectrum supports various data formats. Columnar formats like Parquet can sometimes lead to substantial performance benefits by providing compression and more efficient I/O for certain workloads. Generally, format types like Parquet should be used for query workloads involving large scans, and high attribute selectivity. Again, there are trade-offs as formats like Parquet require more compute power to process than plaintext. For queries on smaller subsets of data, the I/O efficiency benefit of Parquet is diminished. At some point, Parquet may perform the same or slower than plaintext. Latency, compression rates, and the trade-off between user experience and cost should drive your decision.

To help illustrate how Redshift Spectrum performs on these large aggregation workloads, run a basic query that aggregates the entire ~3.7 billion record dataset on Redshift Spectrum, and compared that with running the query exclusively on Amazon Redshift:

SELECT uv.custKey, COUNT(uv.custKey)
FROM <your clickstream table> as uv
GROUP BY uv.custKey
ORDER BY uv.custKey ASC

For the Amazon Redshift test case, the clickstream data is loaded, and distributed evenly across all nodes (even distribution style) with optimal column compression encodings prescribed by the Amazon Redshift’s ANALYZE command.

The Redshift Spectrum test case uses a Parquet data format with each file containing all the data for a particular customer in a month. This results in files mostly in the range of 220-280 MB, and in effect, is the largest file size for this partitioning scheme. If you run tests with the other datasets provided, you see that this data format and size is optimal and out-performs others by ~60X. 

Performance differences will vary depending on the scenario. The important takeaway is to understand the testing strategy and the workload characteristics where Redshift Spectrum is likely to yield performance benefits. 

The following chart compares the query execution time for the two scenarios. The results indicate that you would have to pay for 12 X DC1.Large nodes to get performance comparable to using a small Amazon Redshift cluster that leverages Redshift Spectrum. 

Chart showing simple aggregation on ~3.7 billion records

So you’ve validated that Spectrum excels at performing large aggregations. Could you benefit by pushing more work down to Redshift Spectrum in your original query? It turns out that you can, by making the following modification:

The clickstream data is stored at a day-level granularity for each customer while your query rolls up the data to the month level per customer. In the earlier query that uses the day/month partition key, you optimized the query so that it only scans and retrieves the data required, but the day level data is still sent back to your Amazon Redshift cluster for joining and aggregation. The query shown here pushes aggregation work down to Redshift Spectrum as indicated by the query plan:

In this query, Redshift Spectrum aggregates the clickstream data to the month level before it is returned to the Amazon Redshift cluster and joined with the dimension tables. This query should complete in about 4 seconds, which is roughly twice as fast as only using the partition key. The speed increase is evident upon reviewing the SVL_S3QUERY_SUMMARY table:

  • Bytes scanned is 21.6X less because of the Parquet data format.
  • Only 90 records are returned back to the Amazon Redshift cluster as a result of the push-down, instead of ~66.2 million, leading to substantially less join overhead, and about 530 MB less data sent back to your cluster.
  • No adverse change in average parallelism.

Assessing the value of Amazon Redshift vs. Redshift Spectrum

At this point, you might be asking yourself, why would I ever not use Redshift Spectrum? Well, you still get additional value for your money by loading data into Amazon Redshift, and querying in Amazon Redshift vs. querying S3.

In fact, it turns out that the last version of our query runs even faster when executed exclusively in native Amazon Redshift, as shown in the following chart:

Chart comparing Amazon Redshift vs. Redshift Spectrum with pushdown aggregation over 3 months of data

As a general rule, queries that aren’t dominated by I/O and which involve multiple joins are better optimized in native Amazon Redshift. For instance, the performance difference between running the partition key query entirely in Amazon Redshift versus with Redshift Spectrum is twice as large as that that of the pushdown aggregation query, partly because the former case benefits more from better join performance.

Furthermore, the variability in latency in native Amazon Redshift is lower. For use cases where you have tight performance SLAs on queries, you may want to consider using Amazon Redshift exclusively to support those queries.

On the other hand, when you perform large scans, you could benefit from the best of both worlds: higher performance at lower cost. For instance, imagine that you wanted to enable your business analysts to interactively discover insights across a vast amount of historical data. In the example below, the pushdown aggregation query is modified to analyze seven years of data instead of three months:

SELECT c.c_name, c.c_mktsegment, t.prettyMonthYear, uv.totalRevenue
…
WHERE customer <= 3 and visitYearMonth >= 199201
… 
FROM dwdate WHERE d_yearmonthnum >= 199201) as t
…
ORDER BY c.c_name, c.c_mktsegment, uv.visitYearMonth ASC

This query requires scanning and aggregating nearly 1.9 billion records. As shown in the chart below, Redshift Spectrum substantially speeds up this query. A large Amazon Redshift cluster would have to be provisioned to support this use case. With the aid of Redshift Spectrum, you could use an existing small cluster, keep a single copy of your data in S3, and benefit from economical, durable storage while only paying for what you use via the pay per query pricing model.

Chart comparing Amazon Redshift vs. Redshift Spectrum with pushdown aggregation over 7 years of data

Summary

Redshift Spectrum lowers the time to value for deeper insights on customer data queries spanning the data lake and data warehouse. It can enable interactive analysis on datasets in cases that weren’t economically practical or technically feasible before.

There are cases where you can get the best of both worlds from Redshift Spectrum: higher performance at lower cost. However, there are still latency-sensitive use cases where you may want native Amazon Redshift performance. For more best practice tips, see the 10 Best Practices for Amazon Redshift post.

Please visit the Amazon Redshift Spectrum PoC Environment Github page. If you have questions or suggestions, please comment below.

 


Additional Reading

Learn more about how Amazon Redshift Spectrum extends data warehousing out to exabytes – no loading required.


About the Author

Dylan Tong is an Enterprise Solutions Architect at AWS. He works with customers to help drive their success on the AWS platform through thought leadership and guidance on designing well architected solutions. He has spent most of his career building on his expertise in data management and analytics by working for leaders and innovators in the space.

 

 

AWS CloudHSM Update – Cost Effective Hardware Key Management at Cloud Scale for Sensitive & Regulated Workloads

Post Syndicated from Jeff Barr original https://aws.amazon.com/blogs/aws/aws-cloudhsm-update-cost-effective-hardware-key-management/

Our customers run an incredible variety of mission-critical workloads on AWS, many of which process and store sensitive data. As detailed in our Overview of Security Processes document, AWS customers have access to an ever-growing set of options for encrypting and protecting this data. For example, Amazon Relational Database Service (RDS) supports encryption of data at rest and in transit, with options tailored for each supported database engine (MySQL, SQL Server, Oracle, MariaDB, PostgreSQL, and Aurora).

Many customers use AWS Key Management Service (KMS) to centralize their key management, with others taking advantage of the hardware-based key management, encryption, and decryption provided by AWS CloudHSM to meet stringent security and compliance requirements for their most sensitive data and regulated workloads (you can read my post, AWS CloudHSM – Secure Key Storage and Cryptographic Operations, to learn more about Hardware Security Modules, also known as HSMs).

Major CloudHSM Update
Today, building on what we have learned from our first-generation product, we are making a major update to CloudHSM, with a set of improvements designed to make the benefits of hardware-based key management available to a much wider audience while reducing the need for specialized operating expertise. Here’s a summary of the improvements:

Pay As You Go – CloudHSM is now offered under a pay-as-you-go model that is simpler and more cost-effective, with no up-front fees.

Fully Managed – CloudHSM is now a scalable managed service; provisioning, patching, high availability, and backups are all built-in and taken care of for you. Scheduled backups extract an encrypted image of your HSM from the hardware (using keys that only the HSM hardware itself knows) that can be restored only to identical HSM hardware owned by AWS. For durability, those backups are stored in Amazon Simple Storage Service (S3), and for an additional layer of security, encrypted again with server-side S3 encryption using an AWS KMS master key.

Open & Compatible  – CloudHSM is open and standards-compliant, with support for multiple APIs, programming languages, and cryptography extensions such as PKCS #11, Java Cryptography Extension (JCE), and Microsoft CryptoNG (CNG). The open nature of CloudHSM gives you more control and simplifies the process of moving keys (in encrypted form) from one CloudHSM to another, and also allows migration to and from other commercially available HSMs.

More Secure – CloudHSM Classic (the original model) supports the generation and use of keys that comply with FIPS 140-2 Level 2. We’re stepping that up a notch today with support for FIPS 140-2 Level 3, with security mechanisms that are designed to detect and respond to physical attempts to access or modify the HSM. Your keys are protected with exclusive, single-tenant access to tamper-resistant HSMs that appear within your Virtual Private Clouds (VPCs). CloudHSM supports quorum authentication for critical administrative and key management functions. This feature allows you to define a list of N possible identities that can access the functions, and then require at least M of them to authorize the action. It also supports multi-factor authentication using tokens that you provide.

AWS-Native – The updated CloudHSM is an integral part of AWS and plays well with other tools and services. You can create and manage a cluster of HSMs using the AWS Management Console, AWS Command Line Interface (CLI), or API calls.

Diving In
You can create CloudHSM clusters that contain 1 to 32 HSMs, each in a separate Availability Zone in a particular AWS Region. Spreading HSMs across AZs gives you high availability (including built-in load balancing); adding more HSMs gives you additional throughput. The HSMs within a cluster are kept in sync: performing a task or operation on one HSM in a cluster automatically updates the others. Each HSM in a cluster has its own Elastic Network Interface (ENI).

All interaction with an HSM takes place via the AWS CloudHSM client. It runs on an EC2 instance and uses certificate-based mutual authentication to create secure (TLS) connections to the HSMs.

At the hardware level, each HSM includes hardware-enforced isolation of crypto operations and key storage. Each customer HSM runs on dedicated processor cores.

Setting Up a Cluster
Let’s set up a cluster using the CloudHSM Console:

I click on Create cluster to get started, select my desired VPC and the subnets within it (I can also create a new VPC and/or subnets if needed):

Then I review my settings and click on Create:

After a few minutes, my cluster exists, but is uninitialized:

Initialization simply means retrieving a certificate signing request (the Cluster CSR):

And then creating a private key and using it to sign the request (these commands were copied from the Initialize Cluster docs and I have omitted the output. Note that ID identifies the cluster):

$ openssl genrsa -out CustomerRoot.key 2048
$ openssl req -new -x509 -days 365 -key CustomerRoot.key -out CustomerRoot.crt
$ openssl x509 -req -days 365 -in ID_ClusterCsr.csr   \
                              -CA CustomerRoot.crt    \
                              -CAkey CustomerRoot.key \
                              -CAcreateserial         \
                              -out ID_CustomerHsmCertificate.crt

The next step is to apply the signed certificate to the cluster using the console or the CLI. After this has been done, the cluster can be activated by changing the password for the HSM’s administrative user, otherwise known as the Crypto Officer (CO).

Once the cluster has been created, initialized and activated, it can be used to protect data. Applications can use the APIs in AWS CloudHSM SDKs to manage keys, encrypt & decrypt objects, and more. The SDKs provide access to the CloudHSM client (running on the same instance as the application). The client, in turn, connects to the cluster across an encrypted connection.

Available Today
The new HSM is available today in the US East (Northern Virginia), US West (Oregon), US East (Ohio), and EU (Ireland) Regions, with more in the works. Pricing starts at $1.45 per HSM per hour.

Jeff;

We Are Not Having a Productive Debate About Women in Tech

Post Syndicated from Bozho original https://techblog.bozho.net/not-productive-debate-women-tech/

Yes, it’s about the “anti-diversity memo”. But I won’t go into particular details of the memo, the firing, who’s right and wrong, who’s liberal and who’s conservative. Actually, I don’t need to repeat this post, which states almost exactly what I think about the particular issue. Just in case, and before someone decided to label me as “sexist white male” that knows nothing, I guess should clearly state that I acknowledge that biases against women are real and that I strongly support equal opportunity, and I think there must be more women in technology. I also have to state that I think the author of “the memo” was well-meaning, had some well argued, research-backed points and should not be ostracized.

But I want to “rant” about the quality of the debate. On one side we have conservatives who are throwing themselves in defense of the fired googler, insisting that liberals are banning conservative points of view, that it is normal to have so few woman in tech and that everything is actually okay, or even that women are inferior. On the other side we have triggered liberals that are ready to shout “discrimination” and “harassment” at anything that resembles an attempt to claim anything different than total and absolute equality, in many cases using a classical “strawman” argument (e.g. “he’s saying women should not work in tech, he’s obviously wrong”).

Everyone seems to be too eager to take side and issue a verdict on who’s right and who’s wrong, to blame the other side for all related and unrelated woes and while doing that, exhibit a huge amount of biases. If the debate is about that, we’d better shut it down as soon as possible, as it’s not going to lead anywhere. No matter how much conservatives want “a debate”, and no matter how much liberals want to advance equality. Oh, and by the way – this “conservatives” vs “liberals” is a false dichotomy. Most people hold a somewhat sensible stance in between. But let’s get to the actual issue:

Women are underrepresented in STEM (Science, technology, engineering, mathematics). That is a fact everyone agrees on and is blatantly obvious when you walk in any software company office.

Why is that the case? The whole debate revolved around biological and social differences, some of which are probably even true – that women value job flexibility more than being promoted or getting higher salary, that they are more neurotic (on average), that they are less confident, that they are more empathic and so on. These difference have been studied and documented, and as much as I have my reservations about psychology studies (so much so, that even meta-analysis are shown by meta-meta-analysis to be flawed) and social science in general, there seems to be a consensus there (by the way, it’s a shame that Gizmodo removed all the scientific references when they first published “the memo”). But that is not the issue. As it has been pointed out, there’s equal applicability of male and female “inherent” traits when working with technology.

Why are we talking about “techonology”, and why not “mining and construction”, as many will point out. Let’s cut that argument once and for all – mining and construction are blue collar jobs that have a high chance of being automated in the near future and are in decline. The problem that we’re trying to solve is – how to make the dominant profession of the future – information technology – one of equal opportunity. Yes, it’s a a bold claim, but software is going to be everywhere and the industry will grow. This is why it’s so important to discuss it, not because we are developers and we are somewhat affected by that.

So, there has been extended research on the matter, and the reasons are – surprise – complex and intertwined and there is no simple issue that, once resolved, will unlock the path of women to tech jobs.

What would diversity give us and why should we care? Let’s assume for a moment we don’t care about equal opportunity and we are right-leaning, conservative people. Well, imagine you have a growing business and you need to hire developers. What would you prefer – having fewer or more people of whom to choose from? Having fewer or more diverse skills (technical and social) on the job market? The answer is obvious. The more people, regardless of their gender, race, whatever, are on the job market, the better for businesses.

So I guess we’ve agreed on the two points so far – that women are underrepresented, and that it’s better for everyone if there are more people with technical skills on the job market, which includes more women.

The “final” questions is – how?

And this questions seems to not be anywhere in the discussion. Instead, we are going in circles with irrelevant arguments trying to either show that we’ve read more scientific papers than others, that we are more liberal than others or that we are more pro free speech.

Back to “how” – in Bulgaria we have a social meme: “I don’t know what is the right way, but the way you are doing it is NOT the right way”. And much of the underlying sentiment of “the memo” is similar – that google should stop doing some of the stuff it is doing about diversity, or do them differently (but doesn’t tell us how exactly). Hiring biases, internal programs, whatever, seem to bother him. But this is just talking about the surface of the problem. These programs are correcting something that remains hidden in “the memo”.

Google, on their diversity page, say that 20% of their tech employees are women. At the same time, in another diversity section, they claim “18% of CS graduates are women”. So, I guess, job done – they’ve reached the maximum possible diversity. They’ve hired as many women in tech as CS graduates there are. Anything more than that, even if it doesn’t mean they’ll hire worse developers, will leave the rest of the industry with less women. So, sure, 50/50 in Google would sound cool, but the industry average will still be bad.

And that’s the actual, underlying reason that we should have already arrived at, and we should’ve started discussing the “how”. Girls do not see STEM as a thing for them. Our biases are projected on younger girls which culminate at a “this is not for girls” mantra. No matter how diverse hiring policies we have, if we don’t address the issue at a way earlier stage, we aren’t getting anywhere.

In schools and even kindergartens we need to have an inclusive environment where “this is not for girls” is frowned upon. We should not discourage girls from liking math, or making math sound uncool and “hard for girls” (in my biased world I actually know more women mathematicians than men). This comic seems like on a different topic (gender-specific toys), but it’s actually not about toys – it’s about what is considered (stereo)typical of a girl to do. And most of these biases are unconscious, and come from all around us (school, TV, outdoor ads, people on the street, relatives, etc.), and it takes effort to confront them.

To do that, we need policy decisions. We need lobbying education departments / ministries to encourage girls more in the STEM direction (and don’t worry, they’ll be good at it). By the way, guess what – Google’s diversity program is not just about hiring more women, it actually includes education policies with stuff like “influencing perception about computer science”, “getting more girls to code” and scholarships.

Let’s discuss the education policies, the path to getting 40-50% of CS graduates to be female, and before that – more girls in schools with technical focus, and ultimately – how to get society to not perceive technology and science as “not for girls”. Let each girl decide on her own. All the other debates are short-sighted and not to the point at all. Will biological differences matter then? They probably will – but not significantly to justify a high gender imbalance.

I am no expert in education policies and I don’t know what will work and what won’t. There is research on the matter that we should look at, and maybe argue about it. Everything else is wasted keystrokes.

The post We Are Not Having a Productive Debate About Women in Tech appeared first on Bozho's tech blog.

AWS Encryption SDK: How to Decide if Data Key Caching Is Right for Your Application

Post Syndicated from June Blender original https://aws.amazon.com/blogs/security/aws-encryption-sdk-how-to-decide-if-data-key-caching-is-right-for-your-application/

AWS KMS image

Today, the AWS Crypto Tools team introduced a new feature in the AWS Encryption SDK: data key caching. Data key caching lets you reuse the data keys that protect your data, instead of generating a new data key for each encryption operation.

Data key caching can reduce latency, improve throughput, reduce cost, and help you stay within service limits as your application scales. In particular, caching might help if your application is hitting the AWS Key Management Service (KMS) requests-per-second limit and raising the limit does not solve the problem.

However, these benefits come with some security tradeoffs. Encryption best practices generally discourage extensive reuse of data keys.

In this blog post, I explore those tradeoffs and provide information that can help you decide whether data key caching is a good strategy for your application. I also explain how data key caching is implemented in the AWS Encryption SDK and describe the security thresholds that you can set to limit the reuse of data keys. Finally, I provide some practical examples of using the security thresholds to meet cost, performance, and security goals.

Introducing data key caching

The AWS Encryption SDK is a client-side encryption library that makes it easier for you to implement cryptography best practices in your application. It includes secure default behavior for developers who are not encryption experts, while being flexible enough to work for the most experienced users.

In the AWS Encryption SDK, by default, you generate a new data key for each encryption operation. This is the most secure practice. However, in some applications, the overhead of generating a new data key for each operation is not acceptable.

Data key caching saves the plaintext and ciphertext of the data keys you use in a configurable cache. When you need a key to encrypt or decrypt data, you can reuse a data key from the cache instead of creating a new data key. You can create multiple data key caches and configure each one independently. Most importantly, the AWS Encryption SDK provides security thresholds that you can set to determine how much data key reuse you will allow.

To make data key caching easier to implement, the AWS Encryption SDK provides LocalCryptoMaterialsCache, an in-memory, least-recently-used cache with a configurable size. The SDK manages the cache for you, including adding store, search, and match logic to all encryption and decryption operations.

We recommend that you use LocalCryptoMaterialsCache as it is, but you can customize it, or substitute a compatible cache. However, you should never store plaintext data keys on disk.

The AWS Encryption SDK documentation includes sample code in Java and Python for an application that uses data key caching to encrypt data sent to and from Amazon Kinesis Streams.

Balance cost and security

Your decision to use data key caching should balance cost—in time, money, and resources—against security. In every consideration, though, the balance should favor your security requirements. As a rule, use the minimal caching required to achieve your cost and performance goals.

Before implementing data key caching, consider the details of your applications, your security requirements, and the cost and frequency of your encryption operations. In general, your application can benefit from data key caching if each operation is slow or expensive, or if you encrypt and decrypt data frequently. If the cost and speed of your encryption operations are already acceptable or can be improved by other means, do not use a data key cache.

Data key caching can be the right choice for your application if you have high encryption and decryption traffic. For example, if you are hitting your KMS requests-per-second limit, caching can help because you get some of your data keys from the cache instead of calling KMS for every request.

However, you can also create a case in the AWS Support Center to raise the KMS limit for your account. If raising the limit solves the problem, you do not need data key caching.

Configure caching thresholds for cost and security

In the AWS Encryption SDK, you can configure data key caching to allow just enough data key reuse to meet your cost and performance targets while conforming to the security requirements of your application. The SDK enforces the thresholds so that you can use them with any compatible cache.

The data key caching security thresholds apply to each cache entry. The AWS Encryption SDK will not use the data key from a cache entry that exceeds any of the thresholds that you set.

  • Maximum age (required): Set the lifetime of each cached key to be long enough to get cache hits, but short enough to limit exposure of a plaintext data key in memory to a specific time period.

You can use the maximum age threshold like a key rotation policy. Use it to limit the reuse of data keys and minimize exposure of cryptographic materials. You can also use it to evict data keys when the type or source of data that your application is processing changes.

  • Maximum messages encrypted (optional; default is 232 messages): Set the number of messages protected by each cached data key to be large enough to get value from reuse, but small enough to limit the number of messages that might potentially be exposed.

The AWS Encryption SDK only caches data keys that use an algorithm suite with a key derivation function. This technique avoids the cryptographic limits on the number of bytes encrypted with a single key. However, the more data that a key encrypts, the more data that is exposed if the data key is compromised.

Limiting the number of messages, rather than the number of bytes, is particularly useful if your application encrypts many messages of a similar size or when potential exposure must be limited to very few messages. This threshold is also useful when you want to reuse a data key for a particular type of message and know in advance how many messages of that type you have. You can also use an encryption context to select particular cached data keys for your encryption requests.

  • Maximum bytes encrypted (optional; default is 263 – 1): Set the bytes protected by each cached data key to be large enough to allow the reuse you need, but small enough to limit the amount of data encrypted under the same key.

Limiting the number of bytes, rather than the number of messages, is preferable when your application encrypts messages of widely varying size or when possibly exposing large amounts of data is much more of a concern than exposing smaller amounts of data.

In addition to these security thresholds, the LocalCryptoMaterialsCache in the AWS Encryption SDK lets you set its capacity, which is the maximum number of entries the cache can hold.

Use the capacity value to tune the performance of your LocalCryptoMaterialsCache. In general, use the smallest value that will achieve the performance improvements that your application requires. You might want to test with a very small cache of 5–10 entries and expand if necessary. You will need a slightly larger cache if you are using the cache for both encryption and decryption requests, or if you are using encryption contexts to select particular cache entries.

Consider these cache configuration examples

After you determine the security and performance requirements of your application, consider the cache security thresholds carefully and adjust them to meet your needs. There are no magic numbers for these thresholds: the ideal settings are specific to each application, its security and performance requirements, and budget. Use the minimal amount of caching necessary to get acceptable performance and cost.

The following examples show ways you can use the LocalCryptoMaterialsCache capacity setting and the security thresholds to help meet your security requirements:

  • Slow master key operations: If your master key processes only 100 transactions per second (TPS) but your application needs to process 1,000 TPS, you can meet your application requirements by allowing a maximum of 10 messages to be protected under each data key.
  • High frequency and volume: If your master key costs $0.01 per operation and you need to process a consistent 1,000 TPS while staying within a budget of $100,000 per month, allow a maximum of 275 messages for each cache entry.
  • Burst traffic: If your application’s processing bursts to 100 TPS for five seconds in each minute but is otherwise zero, and your master key costs $0.01 per operation, setting maximum messages to 3 can achieve significant savings. To prevent data keys from being reused across bursts (55 seconds), set the maximum age of each cached data key to 20 seconds.
  • Expensive master key operations: If your application uses a low-throughput encryption service that costs as much as $1.00 per operation, you might want to minimize the number of operations. To do so, create a cache that is large enough to contain the data keys you need. Then, set the byte and message limits high enough to allow reuse while conforming to your security requirements. For example, if your security requirements do not permit a data key to encrypt more than 10 GB of data, setting bytes processed to 10 GB still significantly minimizes operations and conforms to your security requirements.

Learn more about data key caching

To learn more about data key caching, including how to implement it, how to set the security thresholds, and details about the caching components, see Data Key Caching in the AWS Encryption SDK. Also, see the AWS Encryption SDKs for Java and Python as well as the Javadoc and Python documentation.

If you have comments about this blog post, submit them in the “Comments” section below. If you have questions, file an issue in the GitHub repos for the Encryption SDK in Java or Python, or start a new thread on the KMS forum.

– June

New – Amazon Connect and Amazon Lex Integration

Post Syndicated from Randall Hunt original https://aws.amazon.com/blogs/aws/new-amazon-connect-and-amazon-lex-integration/

I’m really excited to share some recent enhancements to two of my favorite services: Amazon Connect and Amazon Lex. Amazon Connect is a self-service, cloud-based contact center service that makes it easy for any business to deliver better customer service at lower cost. Amazon Lex is a service for building conversational interfaces using voice and text. By integrating these two services you can take advantage of Lex‘s automatic speech recognition (ASR) and natural language processing/understading (NLU) capabilities to create great self-service experiences for your customers. To enable this integration the Lex team added support for 8kHz speech input – more on that later. Why should you care about this? Well, if the a bot can solve the majority of your customer’s requests your customers spend less time waiting on hold and more time using your products.

If you need some more background on Amazon Connect or Lex I strongly recommend Jeff’s previous posts[1][2] on these services – especially if you like LEGOs.


Let’s dive in and learn to use this new integration. We’ll take an application that we built on our Twitch channel and modify it for this blog. At the application’s core a user calls an Amazon Connect number which connects them to an Lex bot which invokes an AWS Lambda function based on an intent from Lex. So what does our little application do?

I want to finally settle the question of what the best code editor is: I like vim, it’s a spectacular editor that does one job exceptionally well – editing code (it’s the best). My colleague Jeff likes emacs, a great operating system editor… if you were born with extra joints in your fingers. My colleague Tara loves Visual Studio and sublime. Rather than fighting over what the best editor is I thought we might let you, dear reader, vote. Don’t worry you can even vote for butterflies.

Interested in voting? Call +1 614-569-4019 and tell us which editor you’re voting for! We don’t store your number or record your voice so feel free to vote more than once for vim. Want to see the votes live? http://best-editor-ever.s3-website-us-east-1.amazonaws.com/.

Now, how do we build this little contraption? We’ll cover each component but since we’ve talked about Lex and Lambda before we’ll focus mostly on the Amazon Connect component. I’m going to assume you already have a connect instance running.

Amazon Lex

Let’s start with the Lex side of things. We’ll create a bot named VoteEditor with two intents: VoteEditor with a single slot called editor and ConnectToAgent with no slots. We’ll populate our editor slot full of different code editor names (maybe we’ll leave out emacs).

AWS Lambda

Our Lambda function will also be fairly simple. First we’ll create a Amazon DynamoDB table to store our votes. Then we’ll make a helper method to respond to Lex (build_response) – it will just wrap our message in a Lex friendly response format. Now we just have to figure out our flow logic.


def lambda_handler(event, context):
    if 'ConnectToAgent' == event['currentIntent']['name']:
        return build_response("Ok, connecting you to an agent.")
    elif 'VoteEditor' == event['currentIntent']['name']:
        editor = event['currentIntent']['slots']['editor']
        resp = ddb.update_item(
            Key={"name": editor.lower()},
            UpdateExpression="SET votes = :incr + if_not_exists(votes, :default)",
            ExpressionAttributeValues={":incr": 1, ":default": 0},
            ReturnValues="ALL_NEW"
        )
        msg = "Awesome, now {} has {} votes!".format(
            resp['Attributes']['name'],
            resp['Attributes']['votes'])
        return build_response(msg)

Let’s make sure we understand the code. So, if we got a vote for an editor and it doesn’t exist yet then we add that editor with 1 vote. Otherwise we increase the number of votes on that editor by 1. If we get a request for an agent, we terminate the flow with a nice message. Easy. Now we just tell our Lex bot to use our Lambda function to fulfill our intents. We can test that everything is working over text in the Lex console before moving on.

Amazon Connect

Before we can use our Lex bot in a Contact Flow we have to make sure our Amazon Connect instance has access to it. We can do this by hopping over to the Amazon Connect service console, selecting our instance, and navigating to “Contact Flows”. There should be a section called Lex where you can add your bots!

Now that our Amazon Connect instance can invoke our Lex bot we can create a new Contact Flow that contains our Lex bot. We add the bot to our flow through the “Get customer input” widget from the “Interact” category.

Once we’re on the widget we have a “DTMF” tab for taking input from number keys on a phone or the “Amazon Lex” tab for taking voiceinput and passing it to the Lex service. We’ll use the Lex tab and put in some configuration.

Lots of options, but in short we add the bot we want to use (including the version of the bot), the intents we want to use from our bot, and a short prompt to introduce the bot (and mayb prompt the customer for input).

Our final contact flow looks like this:

A real world example might allow a customer to perform many transactions through a Lex bot. Then on an error or ConnectToAgent intent put the customer into a queue where they could talk to a real person. It could collect and store information about users and populate a rich interface for an agent to use so they could jump right into the conversation with all the context they need.

I want to especially highlight the advantage of 8kHz audio support in Lex. Lex originally only supported speech input that was sampled at a higher rate than the 8 kHz input from the phone. Modern digital communication appliations typically use audio signals sampled at a minimum of 16 kHz. This higher fidelity recroding makes it easier differentiate between sounds like “ess” (/s/) and “eff” (/f/) – or so the audio experts tell me. Phones, however, use a much lower quality recording. Humans, and their ears, are pretty good at using surrounding words to figure out what a voice is saying from a lower quality recording (just check the NASA apollo recordings for proof of this). Most digital phone systems are setup to use 8 kHz sampling by default – it’s a nice tradeoff in bandwidth and fidelity. That’s why your voice sometimes sounds different on the phone. On top of this fundmental sampling rate issue you also have to deal with the fact that a lot of phone call data is already lossy (can you hear me now?). There are thousands of different devices from hundreds of different manufacturers, and tons of different software implentations. So… how do you solve this recognition issue?

The Lex team decided that the best way to address this was to expand the set of models they were using for speech recognition to include an 8kHz model. Support for an 8 kHz telephony audio sampling rate provides increased speech recognition accuracy and fidelity for your contact center interactions. This was a great effort by the team that enables a lot of customers to do more with Amazon Connect.

One final note is that Amazon Connect uses the exact same PostContent endpoint that you can use as an external developer so you don’t have to be a Amazon Connect user to take advantage of this 8kHz feature in Lex.

I hope you guys enjoyed this post and as always the real details are in the docs and API Reference.

Randall

Pimoroni is 5 now!

Post Syndicated from guru original https://www.raspberrypi.org/blog/pimoroni-is-5-now/

Long read written by Pimoroni’s Paul Beech, best enjoyed over a cup o’ grog.

Every couple of years, I’ve done a “State of the Fleet” update here on the Raspberry Pi blog to tell everyone how the Sheffield Pirates are doing. Half a decade has gone by in a blink, but reading back over the previous posts shows that a lot has happened in that time!

TL;DR We’re an increasingly medium-sized design/manufacturing/e-commerce business with workshops in Sheffield, UK, and Essen, Germany, and we employ almost 40 people. We’re totally lovely. Thanks for supporting us!

 

We’ve come a long way, baby

I’m sitting looking out the window at Sheffield-on-Sea and feeling pretty lucky about how things are going. In the morning, I’ll be flying east for Maker Faire Tokyo with Niko (more on him later), and to say hi to some amazing people in Shenzhen (and to visit Huaqiangbei, of course). This is after I’ve already visited this year’s Maker Faires in New York, San Francisco, and Berlin.

Pimoroni started out small, but we’ve grown like weeds, and we’re steadily sauntering towards becoming a medium-sized business. That’s thanks to fantastic support from the people who buy our stuff and spread the word. In return, we try to be nice, friendly, and human in everything we do, and to make exciting things, ideally with our own hands here in Sheffield.

Pimoroni soldering

Handmade with love

We’ve made it onto a few ‘fastest-growing’ lists, and we’re in the top 500 of the Inc. 5000 Europe list. Adafruit did it first a few years back, and we’ve never gone wrong when we’ve followed in their footsteps.

The slightly weird nature of Pimoroni means we get listed as either a manufacturing or e-commerce business. In reality, we’re about four or five companies in one shell, which is very much against the conventions of “how business is done”. However, having seen what Adafruit, SparkFun, and Seeed do, we’re more than happy to design, manufacture, and sell our stuff in-house, as well as stocking the best stuff from across the maker community.

Pimoroni stocks

Product and process

The whole process of expansion has not been without its growing pains. We’re just under 40 people strong now, and have an outpost in Germany (also hilariously far from the sea for piratical activities). This means we’ve had to change things quickly to improve and automate processes, so that the wheels won’t fall off as things get bigger. Process optimization is incredibly interesting to a geek, especially the making sure that things are done well, that mistakes are easy to spot and to fix, and that nothing is missed.

At the end of 2015, we had a step change in how busy we were, and our post room and support started to suffer. As a consequence, we implemented measures to become more efficient, including small but important things like checking in parcels with a barcode scanner attached to a Raspberry Pi. That Pi has been happily running on the same SD card for a couple of years now without problems 😀

Pimoroni post room

Going postal?

We also hired a full-time support ninja, Matt, to keep the experience of getting stuff from us light and breezy and to ensure that any problems are sorted. He’s had hugely positive impact already by making the emails and replies you see more friendly. Of course, he’s also started using the laser cutters for tinkering projects. It’d be a shame to work at Pimoroni and not get to use all the wonderful toys, right?

Employing all the people

You can see some of the motley crew we employ here and there on the Pimoroni website. And if you drop by at the Raspberry Pi Birthday Party, Pi Wars, Maker Faires, Deer Shed Festival, or New Scientist Live in September, you’ll be seeing new Pimoroni faces as we start to engage with people more about what we do. On top of that, we’re starting to make proper videos (like Sandy’s soldering guide), as opposed to the 101 episodes of Bilge Tank we recorded in a rather off-the-cuff and haphazard fashion. Although that’s the beauty of Bilge Tank, right?

Pimoroni soldering

Such soldering setup

As Emma, Sandy, Lydia, and Tanya gel as a super creative team, we’re starting to create more formal educational resources, and to make kits that are suitable for a wider audience. Things like our Pi Zero W kits are products of their talents.

Emma is our new Head of Marketing. She’s really ‘The Only Marketing Person Who Would Ever Fit In At Pimoroni’, having been a core part of the Sheffield maker scene since we hung around with one Ben Nuttall, in the dark days before Raspberry Pi was a thing.

Through a series of fortunate coincidences, Niko and his equally talented wife Mena were there when we cut the first Pibow in 2012. They immediately pitched in to help us buy our second laser cutter so we could keep up with demand. They have been supporting Pimoroni with sourcing in East Asia, and now Niko has become a member of the Pirates’ Council and the Head of Engineering as we’re increasing the sophistication and scale of the things we do. The Unicorn HAT HD is one of his masterpieces.

Pimoroni devices

ALL the HATs!

We see ourselves as a wonderful island of misfit toys, and it feels good to have the best toy shop ever, and to support so many lovely people. Business is about more than just profits.

Where do we go to, me hearties?

So what are our plans? At the moment we’re still working absolutely flat-out as demand from wholesalers, retailers, and customers increases. We thought Raspberry Pi was big, but it turns out it’s just getting started. Near the end of 2016, it seemed to reach a whole new level of popularityand still we continue to meet people to whom we have to explain what a Pi is. It’s a good problem to have.

We need a bigger space, but it’s been hard to find somewhere suitable in Sheffield that won’t mean we’re stuck on an industrial estate miles from civilisation. That would be bad for the crewwe like having world-class burritos on our doorstep.

The good news is, it looks like our search is at an end! Just in time for the arrival of our ‘Super-Turbo-Death-Star’ new production line, which will enable to make devices in a bigger, better, faster, more ‘Now now now!’ fashion \o/

Pimoroni warehouse

Spacious, but not spacious enough!

We’ve got lots of treasure in the pipeline, but we want to pick up the pace of development even more and create many new HATs, pHATs, and SHIMs, e.g. for environmental sensing and audio applications. Picade will also be getting some love to make it slicker and more hackable.

We’re also starting to flirt with adding more engineering and production capabilities in-house. The plan is to try our hand at anodising, powder-coating, and maybe even injection-moulding if we get the space and find the right machine. Learning how to do things is amazing, and we love having an idea and being able to bring it to life in almost no time at all.

Pimoroni production

This is where the magic happens

Fanks!

There are so many people involved in supporting our success, and some people we love for just existing and doing wonderful things that make us want to do better. The biggest shout-outs go to Liz, Eben, Gordon, James, all the Raspberry Pi crew, and Limor and pt from Adafruit, for being the most supportive guiding lights a young maker company could ever need.

A note from us

It is amazing for us to witness the growth of businesses within the Raspberry Pi ecosystem. Pimoroni is a wonderful example of an organisation that is creating opportunities for makers within its local community, and the company is helping to reinvigorate Sheffield as the heart of making in the UK.

If you’d like to take advantage of the great products built by the Pirates, Monkeys, Robots, and Ninjas of Sheffield, you should do it soon: Pimoroni are giving everyone 20% off their homemade tech until 6 August.

Pimoroni, from all of us here at Pi Towers (both in the UK and USA), have a wonderful birthday, and many a grog on us!

The post Pimoroni is 5 now! appeared first on Raspberry Pi.

NSA Collects MS Windows Error Information

Post Syndicated from Bruce Schneier original https://www.schneier.com/blog/archives/2017/08/nsa_collects_ms.html

Back in 2013, Der Spiegel reported that the NSA intercepts and collects Windows bug reports:

One example of the sheer creativity with which the TAO spies approach their work can be seen in a hacking method they use that exploits the error-proneness of Microsoft’s Windows. Every user of the operating system is familiar with the annoying window that occasionally pops up on screen when an internal problem is detected, an automatic message that prompts the user to report the bug to the manufacturer and to restart the program. These crash reports offer TAO specialists a welcome opportunity to spy on computers.

When TAO selects a computer somewhere in the world as a target and enters its unique identifiers (an IP address, for example) into the corresponding database, intelligence agents are then automatically notified any time the operating system of that computer crashes and its user receives the prompt to report the problem to Microsoft. An internal presentation suggests it is NSA’s powerful XKeyscore spying tool that is used to fish these crash reports out of the massive sea of Internet traffic.

The automated crash reports are a “neat way” to gain “passive access” to a machine, the presentation continues. Passive access means that, initially, only data the computer sends out into the Internet is captured and saved, but the computer itself is not yet manipulated. Still, even this passive access to error messages provides valuable insights into problems with a targeted person’s computer and, thus, information on security holes that might be exploitable for planting malware or spyware on the unwitting victim’s computer.

Although the method appears to have little importance in practical terms, the NSA’s agents still seem to enjoy it because it allows them to have a bit of a laugh at the expense of the Seattle-based software giant. In one internal graphic, they replaced the text of Microsoft’s original error message with one of their own reading, “This information may be intercepted by a foreign sigint system to gather detailed information and better exploit your machine.” (“Sigint” stands for “signals intelligence.”)

The article talks about the (limited) value of this information with regard to specific target computers, but I have another question: how valuable would this database be for finding new zero-day Windows vulnerabilities to exploit? Microsoft won’t have the incentive to examine and fix problems until they happen broadly among its user base. The NSA has a completely different incentive structure.

I don’t remember this being discussed back in 2013.

EDITED TO ADD (8/6): Slashdot thread.

Amazon Redshift Spectrum Extends Data Warehousing Out to Exabytes—No Loading Required

Post Syndicated from Maor Kleider original https://aws.amazon.com/blogs/big-data/amazon-redshift-spectrum-extends-data-warehousing-out-to-exabytes-no-loading-required/

When we first looked into the possibility of building a cloud-based data warehouse many years ago, we were struck by the fact that our customers were storing ever-increasing amounts of data, and yet only a small fraction of that data ever made it into a data warehouse or Hadoop system for analysis. We saw that this wasn’t just a cloud-specific anomaly. It was also true in the broader industry, where the growth rate of the enterprise storage market segment greatly surpassed that of the data warehousing market segment.

We dubbed this the “dark data” problem. Our customers knew that there was untapped value in the data they collected; why else would they spend money to store it? But the systems available to them to analyze this data were simply too slow, complex, and expensive for them to use on all but a select subset of this data. They were storing it with optimistic hope that, someday, someone would find a solution.

Amazon Redshift became one of the fastest-growing AWS services because it helped solve the dark data problem. It was at least an order of magnitude less expensive and faster than most alternatives available. And Amazon Redshift was fully managed from the start—you didn’t have to worry about capacity, provisioning, patching, monitoring, backups, and a host of other DBA headaches. Many customers, including Vevo, Yelp, Redfin, and Edmunds, migrated to Amazon Redshift to improve query performance, reduce DBA overhead, and lower the cost of analytics.

And our customers’ data continues to grow at a very fast rate. Across the board, gigabytes to petabytes, the average Amazon Redshift customer doubles the data analyzed every year. That’s why we implement features that help customers handle their growing data, for example to double the query throughput or improve the compression ratios from 3x to 4x. That gives our customers some time before they have to consider throwing away data or removing it from their analytic environments. However, there is an increasing number of AWS customers who each generate a petabyte of data every day—that’s an exabyte in only three years. There wasn’t a solution for customers like that. If your data is doubling every year, it’s not long before you have to find new, disruptive approaches that transform the cost, performance, and simplicity curves for managing data.

Let’s look at the options available today. You can use Hadoop-based technologies like Apache Hive with Amazon EMR. This is actually a pretty great solution because it makes it easy and cost-effective to operate directly on data in Amazon S3 without ingestion or transformation. You can spin up clusters as you wish when you need, and size them right for that specific job you’re running. These systems are great at high scale-out processing like scans, filters, and aggregates. On the other hand, they’re not that good at complex query processing. For example, join processing requires data to be shuffled across nodes—for a large amount of data and large numbers of nodes that gets very slow. And joins are intrinsic to any meaningful analytics problem.

You can also use a columnar MPP data warehouse like Amazon Redshift. These systems make it simple to run complex analytic queries with orders of magnitude faster performance for joins and aggregations performed over large datasets. Amazon Redshift, in particular, leverages high-performance local disks, sophisticated query execution. and join-optimized data formats. Because it is just standard SQL, you can keep using your existing ETL and BI tools. But you do have to load data, and you have to provision clusters against the storage and CPU requirements you need.

Both solutions have powerful attributes, but they force you to choose which attributes you want. We see this as a “tyranny of OR.” You can have the throughput of local disks OR the scale of Amazon S3. You can have sophisticated query optimization OR high-scale data processing. You can have fast join performance with optimized formats OR a range of data processing engines that work against common data formats. But you shouldn’t have to choose. At this scale, you really can’t afford to choose. You need “all of the above.”

Redshift Spectrum

We built Redshift Spectrum to end this “tyranny of OR.” With Redshift Spectrum, Amazon Redshift customers can easily query their data in Amazon S3. Like Amazon EMR, you get the benefits of open data formats and inexpensive storage, and you can scale out to thousands of nodes to pull data, filter, project, aggregate, group, and sort. Like Amazon Athena, Redshift Spectrum is serverless and there’s nothing to provision or manage. You just pay for the resources you consume for the duration of your Redshift Spectrum query. Like Amazon Redshift itself, you get the benefits of a sophisticated query optimizer, fast access to data on local disks, and standard SQL. And like nothing else, Redshift Spectrum can execute highly sophisticated queries against an exabyte of data or more—in just minutes.

Redshift Spectrum is a built-in feature of Amazon Redshift, and your existing queries and BI tools will continue to work seamlessly. Under the covers, we manage a fleet of thousands of Redshift Spectrum nodes spread across multiple Availability Zones. These are transparently scaled and allocated to your queries based on the data that you need to process, with no provisioning or commitments. Redshift Spectrum is also highly concurrent—you can access your Amazon S3 data from any number of Amazon Redshift clusters.

The life of a Redshift Spectrum query

It all starts when Redshift Spectrum queries are submitted to the leader node of your Amazon Redshift cluster. The leader node optimizes, compiles, and pushes the query execution to the compute nodes in your Amazon Redshift cluster. Next, the compute nodes obtain the information describing the external tables from your data catalog, dynamically pruning nonrelevant partitions based on the filters and joins in your queries. The compute nodes also examine the data available locally and push down predicates to efficiently scan only the relevant objects in Amazon S3.

The Amazon Redshift compute nodes then generate multiple requests depending on the number of objects that need to be processed, and submit them concurrently to Redshift Spectrum, which pools thousands of Amazon EC2 instances per AWS Region. The Redshift Spectrum worker nodes scan, filter, and aggregate your data from Amazon S3, streaming required data for processing back to your Amazon Redshift cluster. Then, the final join and merge operations are performed locally in your cluster and the results are returned to your client.

Redshift Spectrum’s architecture offers several advantages. First, it elastically scales compute resources separately from the storage layer in Amazon S3. Second, it offers significantly higher concurrency because you can run multiple Amazon Redshift clusters and query the same data in Amazon S3. Third, Redshift Spectrum leverages the Amazon Redshift query optimizer to generate efficient query plans, even for complex queries with multi-table joins and window functions. Fourth, it operates directly on your source data in its native format (Parquet, RCFile, CSV, TSV, Sequence, Avro, RegexSerDe and more to come soon). This means that no data loading or transformation is needed. This also eliminates data duplication and associated costs. Fifth, operating on open data formats gives you the flexibility to leverage other AWS services and execution engines across your various teams to collaborate on the same data in Amazon S3. You get all of this, and because Redshift Spectrum is a feature of Amazon Redshift, you get the same level of end-to-end security, compliance, and certifications as with Amazon Redshift.

Designed for performance and cost-effectiveness

With Amazon Redshift Spectrum, you pay only for the queries you run against the data that you actually scan. We encourage you to leverage file partitioning, columnar data formats, and data compression to significantly minimize the amount of data scanned in Amazon S3. This is important for data warehousing because it dramatically improves query performance and reduces cost. Partitioning your data in Amazon S3 by date, time, or any other custom keys enables Redshift Spectrum to dynamically prune nonrelevant partitions to minimize the amount of data processed. If you store data in a columnar format, such as Parquet, Redshift Spectrum scans only the columns needed by your query, rather than processing entire rows. Similarly, if you compress your data using one of Redshift Spectrum’s supported compression algorithms, less data is scanned.

Amazon Redshift and Redshift Spectrum give you the best of both worlds. If you need to run frequent queries on the same data, you can normalize it, store it in Amazon Redshift, and get all of the benefits of a fully featured data warehouse for storing and querying structured data at a flat rate. At the same time, you can keep your additional data in multiple open file formats in Amazon S3, whether it is historical data or the most recent data, and extend your Amazon Redshift queries across your Amazon S3 data lake.

And that is how Amazon Redshift Spectrum scales data warehousing to exabytes—with no loading required. Redshift Spectrum ends the “tyranny of OR,” enabling you to store your data where you want, in the format you want, and have it available for fast processing using standard SQL when you need it, now and in the future.


Additional Reading

10 Best Practices for Amazon Redshift Spectrum
Amazon QuickSight Adds Support for Amazon Redshift Spectrum
Amazon Redshift Spectrum – Exabyte-Scale In-Place Queries of S3 Data

 


 

About the Author

Maor Kleider is a Senior Product Manager for Amazon Redshift, a fast, simple and cost-effective data warehouse. Maor is passionate about collaborating with customers and partners, learning about their unique big data use cases and making their experience even better. In his spare time, Maor enjoys traveling and exploring new restaurants with his family.

 

 

 

New: Server-Side Encryption for Amazon Kinesis Streams

Post Syndicated from Tara Walker original https://aws.amazon.com/blogs/aws/new-server-side-encryption-for-amazon-kinesis-streams/

In this age of smart homes, big data, IoT devices, mobile phones, social networks, chatbots, and game consoles, streaming data scenarios are everywhere. Amazon Kinesis Streams enables you to build custom applications that can capture, process, analyze, and store terabytes of data per hour from thousands of streaming data sources. Since Amazon Kinesis Streams allows applications to process data concurrently from the same Kinesis stream, you can build parallel processing systems. For example, you can emit processed data to Amazon S3, perform complex analytics with Amazon Redshift, and even build robust, serverless streaming solutions using AWS Lambda.

Kinesis Streams enables several streaming use cases for consumers, and now we are making the service more effective for securing your data in motion by adding server-side encryption (SSE) support for Kinesis Streams. With this new Kinesis Streams feature, you can now enhance the security of your data and/or meet any regulatory and compliance requirements for any of your organization’s data streaming needs.
In fact, Kinesis Streams is now one of the AWS Services in Scope for the Payment Card Industry Data Security Standard (PCI DSS) compliance program. PCI DSS is a proprietary information security standard administered by the PCI Security Standards Council founded by key financial institutions. PCI DSS compliance applies to all entities that store, process, or transmit cardholder data and/or sensitive authentication data which includes service providers. You can request the PCI DSS Attestation of Compliance and Responsibility Summary using AWS Artifact. But the good news about compliance with Kinesis Streams doesn’t stop there. Kinesis Streams is now also FedRAMP compliant in AWS GovCloud. FedRAMP stands for Federal Risk and Authorization Management Program and is a U.S. government-wide program that delivers a standard approach to the security assessment, authorization, and continuous monitoring for cloud products and services. You can learn more about FedRAMP compliance with AWS Services here.

Now are you ready to get into the keys? Get it, instead of get into the weeds. Okay a little corny, but it was the best I could do. Coming back to discussing SSE for Kinesis Streams, let me explain the flow of server-side encryption with Kinesis.  Each data record and partition key put into a Kinesis Stream using the PutRecord or PutRecords API is encrypted using an AWS Key Management Service (KMS) master key. With the AWS Key Management Service (KMS) master key, Kinesis Streams uses the 256-bit Advanced Encryption Standard (AES-256 GCM algorithm) to add encryption to the incoming data.

In order to enable server-side encryption with Kinesis Streams for new or existing streams, you can use the Kinesis management console or leverage one of the available AWS SDKs.  Additionally, you can audit the history of your stream encryption, validate the encryption status of a certain stream in the Kinesis Streams console, or check that the PutRecord or GetRecord transactions are encrypted using the AWS CloudTrail service.

 

Walkthrough: Kinesis Streams Server-Side Encryption

Let’s do a quick walkthrough of server-side encryption with Kinesis Streams. First, I’ll go to the Amazon Kinesis console and select the Streams console option.

Once in the Kinesis Streams console, I can add server-side encryption to one of my existing Kinesis streams or opt to create a new Kinesis stream.  For this walkthrough, I’ll opt to quickly create a new Kinesis stream, therefore, I’ll select the Create Kinesis stream button.

I’ll name my stream, KinesisSSE-stream, and allocate one shard for my stream. Remember that the data capacity of your stream is calculated based upon the number of shards specified for the stream.  You can use the Estimate the number of shards you’ll need dropdown within the console or read more calculations to estimate the number of shards in a stream here.  To complete the creation of my stream, now I click the Create Kinesis stream button.

 

With my KinesisSSE-stream created, I will select it in the dashboard and choose the Actions dropdown and select the Details option.


On the Details page of the KinesisSSE-stream, there is now a Server-side encryption section.  In this section, I will select the Edit button.

 

 

Now I can enable server-side encryption for my stream with an AWS KMS master key, by selecting the Enabled radio button. Once selected I can choose which AWS KMS master key to use for the encryption of  data in KinesisSSE-stream. I can either select the KMS master key generated by the Kinesis service, (Default) aws/kinesis, or select one of my own KMS master keys that I have previously generated.  I’ll select the default master key and all that is left is for me to click the Save button.


That’s it!  As you can see from my screenshots below, after only about 20 seconds, server-side encryption was added to my Kinesis stream and now any incoming data into my stream will be encrypted.  One thing to note is server-side encryption only encrypts incoming data after encryption has been enabled. Preexisting data that is in a Kinesis stream prior to server-side encryption being enabled will remain unencrypted.

 

Summary

Kinesis Streams with Server-side encryption using AWS KMS keys makes it easy for you to automatically encrypt the streaming data coming into your  stream. You can start, stop, or update server-side encryption for any Kinesis stream using the AWS management console or the AWS SDK. To learn more about Kinesis Server-Side encryption, AWS Key Management Service, or about Kinesis Streams review the Amazon Kinesis getting started guide, the AWS Key Management Service developer guide, or the Amazon Kinesis product page.

 

Enjoy streaming.

Tara