Tag Archives: integrit

[$] Securing the container image supply chain

Post Syndicated from corbet original https://lwn.net/Articles/754443/rss

“Security is hard” is a tautology, especially in the fast-moving world
of container orchestration. We have previously covered various aspects of
Linux container
security through, for example, the Clear Containers implementation
or the broader question of Kubernetes and
security
, but those are mostly concerned with container isolation; they do not address the
question of trusting a container’s contents. What is a container running?
Who built it and when? Even assuming we have good programmers and solid
isolation layers, propagating that good code around a Kubernetes cluster
and making strong assertions on the integrity of that supply chain is far
from trivial. The 2018 KubeCon
+ CloudNativeCon Europe
event featured some projects that could
eventually solve that problem.

Announcing Local Build Support for AWS CodeBuild

Post Syndicated from Karthik Thirugnanasambandam original https://aws.amazon.com/blogs/devops/announcing-local-build-support-for-aws-codebuild/

Today, we’re excited to announce local build support in AWS CodeBuild.

AWS CodeBuild is a fully managed build service. There are no servers to provision and scale, or software to install, configure, and operate. You just specify the location of your source code, choose your build settings, and CodeBuild runs build scripts for compiling, testing, and packaging your code.

In this blog post, I’ll show you how to set up CodeBuild locally to build and test a sample Java application.

By building an application on a local machine you can:

  • Test the integrity and contents of a buildspec file locally.
  • Test and build an application locally before committing.
  • Identify and fix errors quickly from your local development environment.

Prerequisites

In this post, I am using AWS Cloud9 IDE as my development environment.

If you would like to use AWS Cloud9 as your IDE, follow the express setup steps in the AWS Cloud9 User Guide.

The AWS Cloud9 IDE comes with Docker and Git already installed. If you are going to use your laptop or desktop machine as your development environment, install Docker and Git before you start.

Steps to build CodeBuild image locally

Run git clone https://github.com/aws/aws-codebuild-docker-images.git to download this repository to your local machine.

$ git clone https://github.com/aws/aws-codebuild-docker-images.git

Lets build a local CodeBuild image for JDK 8 environment. The Dockerfile for JDK 8 is present in /aws-codebuild-docker-images/ubuntu/java/openjdk-8.

Edit the Dockerfile to remove the last line ENTRYPOINT [“dockerd-entrypoint.sh”] and save the file.

Run cd ubuntu/java/openjdk-8 to change the directory in your local workspace.

Run docker build -t aws/codebuild/java:openjdk-8 . to build the Docker image locally. This command will take few minutes to complete.

$ cd aws-codebuild-docker-images
$ cd ubuntu/java/openjdk-8
$ docker build -t aws/codebuild/java:openjdk-8 .

Steps to setup CodeBuild local agent

Run the following Docker pull command to download the local CodeBuild agent.

$ docker pull amazon/aws-codebuild-local:latest --disable-content-trust=false

Now you have the local agent image on your machine and can run a local build.

Run the following git command to download a sample Java project.

$ git clone https://github.com/karthiksambandam/sample-web-app.git

Steps to use the local agent to build a sample project

Let’s build the sample Java project using the local agent.

Execute the following Docker command to run the local agent and build the sample web app repository you cloned earlier.

$ docker run -it -v /var/run/docker.sock:/var/run/docker.sock -e "IMAGE_NAME=aws/codebuild/java:openjdk-8" -e "ARTIFACTS=/home/ec2-user/environment/artifacts" -e "SOURCE=/home/ec2-user/environment/sample-web-app" amazon/aws-codebuild-local

Note: We need to provide three environment variables namely  IMAGE_NAME, SOURCE and ARTIFACTS.

IMAGE_NAME: The name of your build environment image.

SOURCE: The absolute path to your source code directory.

ARTIFACTS: The absolute path to your artifact output folder.

When you run the sample project, you get a runtime error that says the YAML file does not exist. This is because a buildspec.yml file is not included in the sample web project. AWS CodeBuild requires a buildspec.yml to run a build. For more information about buildspec.yml, see Build Spec Example in the AWS CodeBuild User Guide.

Let’s add a buildspec.yml file with the following content to the sample-web-app folder and then rebuild the project.

version: 0.2

phases:
  build:
    commands:
      - echo Build started on `date`
      - mvn install

artifacts:
  files:
    - target/javawebdemo.war

$ docker run -it -v /var/run/docker.sock:/var/run/docker.sock -e "IMAGE_NAME=aws/codebuild/java:openjdk-8" -e "ARTIFACTS=/home/ec2-user/environment/artifacts" -e "SOURCE=/home/ec2-user/environment/sample-web-app" amazon/aws-codebuild-local

This time your build should be successful. Upon successful execution, look in the /artifacts folder for the final built artifacts.zip file to validate.

Conclusion:

In this blog post, I showed you how to quickly set up the CodeBuild local agent to build projects right from your local desktop machine or laptop. As you see, local builds can improve developer productivity by helping you identify and fix errors quickly.

I hope you found this post useful. Feel free to leave your feedback or suggestions in the comments.

[$] A kernel integrity subsystem update

Post Syndicated from jake original https://lwn.net/Articles/753276/rss

At the 2018 Linux Storage, Filesystem, and Memory-Management Summit, Mimi
Zohar gave a presentation in the
filesystem track on the Linux integrity subsystem. There is a lot
of talk that the integrity subsystem (usually referred to as “IMA”, which
is the integrity
measurement architecture
, though there is more to the subsystem) is
complex and
not documented well, she
said. So she wanted to give an overview of the subsystem and then to
discuss some filesystem-related concerns.

[$] File-level integrity

Post Syndicated from jake original https://lwn.net/Articles/752614/rss

At the 2018 Linux Storage, Filesystem, and Memory Management Summit, Ted
Ts’o introduced an integrity feature akin to dm-verity that targets Android,
at least to start with. It is meant to protect the integrity of files on
the system so that any tampering would be detectable. The
initial use case would be for a certain special type of Android file, but other
systems may find uses for it as well.

TSB Bank Disaster

Post Syndicated from Bruce Schneier original https://www.schneier.com/blog/archives/2018/04/tsb_bank_disast.html

This seems like an absolute disaster:

The very short version is that a UK bank, TSB, which had been merged into and then many years later was spun out of Lloyds Bank, was bought by the Spanish bank Banco Sabadell in 2015. Lloyds had continued to run the TSB systems and was to transfer them over to Sabadell over the weekend. It’s turned out to be an epic failure, and it’s not clear if and when this can be straightened out.

It is bad enough that bank IT problem had been so severe and protracted a major newspaper, The Guardian, created a live blog for it that has now been running for two days.

The more serious issue is the fact that customers still can’t access online accounts and even more disconcerting, are sometimes being allowed into other people’s accounts, says there are massive problems with data integrity. That’s a nightmare to sort out.

Even worse, the fact that this situation has persisted strongly suggests that Lloyds went ahead with the migration without allowing for a rollback.

This seems to be a mistake, and not enemy action.

AWS Achieves Spain’s ENS High Certification Across 29 Services

Post Syndicated from Oliver Bell original https://aws.amazon.com/blogs/security/aws-achieves-spains-ens-high-certification-across-29-services/

AWS has achieved Spain’s Esquema Nacional de Seguridad (ENS) High certification across 29 services. To successfully achieve the ENS High Standard, BDO España conducted an independent audit and attested that AWS meets confidentiality, integrity, and availability standards. This provides the assurance needed by Spanish Public Sector organizations wanting to build secure applications and services on AWS.

The National Security Framework, regulated under Royal Decree 3/2010, was developed through close collaboration between ENAC (Entidad Nacional de Acreditación), the Ministry of Finance and Public Administration and the CCN (National Cryptologic Centre), and other administrative bodies.

The following AWS Services are ENS High accredited across our Dublin and Frankfurt Regions:

  • Amazon API Gateway
  • Amazon DynamoDB
  • Amazon Elastic Container Service
  • Amazon Elastic Block Store
  • Amazon Elastic Compute Cloud
  • Amazon Elastic File System
  • Amazon Elastic MapReduce
  • Amazon ElastiCache
  • Amazon Glacier
  • Amazon Redshift
  • Amazon Relational Database Service
  • Amazon Simple Queue Service
  • Amazon Simple Storage Service
  • Amazon Simple Workflow Service
  • Amazon Virtual Private Cloud
  • Amazon WorkSpaces
  • AWS CloudFormation
  • AWS CloudTrail
  • AWS Config
  • AWS Database Migration Service
  • AWS Direct Connect
  • AWS Directory Service
  • AWS Elastic Beanstalk
  • AWS Key Management Service
  • AWS Lambda
  • AWS Snowball
  • AWS Storage Gateway
  • Elastic Load Balancing
  • VM Import/Export

AWS Key Management Service now offers FIPS 140-2 validated cryptographic modules enabling easier adoption of the service for regulated workloads

Post Syndicated from Sreekumar Pisharody original https://aws.amazon.com/blogs/security/aws-key-management-service-now-offers-fips-140-2-validated-cryptographic-modules-enabling-easier-adoption-of-the-service-for-regulated-workloads/

AWS Key Management Service (KMS) now uses FIPS 140-2 validated hardware security modules (HSM) and supports FIPS 140-2 validated endpoints, which provide independent assurances about the confidentiality and integrity of your keys. Having additional third-party assurances about the keys you manage in AWS KMS can make it easier to use the service for regulated workloads.

The process of gaining FIPS 140-2 validation is rigorous. First, AWS KMS HSMs were tested by an independent lab; those results were further reviewed by the Cryptographic Module Validation Program run by NIST. You can view the FIPS 140-2 certificate of the AWS Key Management Service HSM to get more details.

AWS KMS HSMs are designed so that no one, not even AWS employees, can retrieve your plaintext keys. The service uses the FIPS 140-2 validated HSMs to protect your keys when you request the service to create keys on your behalf or when you import them. Your plaintext keys are never written to disk and are only used in volatile memory of the HSMs while performing your requested cryptographic operation. Furthermore, AWS KMS keys are never transmitted outside the AWS Regions they were created. And HSM firmware updates are controlled by multi-party access that is audited and reviewed by an independent group within AWS.

AWS KMS HSMs are validated at level 2 overall and at level 3 in the following areas:

  • Cryptographic Module Specification
  • Roles, Services, and Authentication
  • Physical Security
  • Design Assurance

You can also make AWS KMS requests to API endpoints that terminate TLS sessions using a FIPS 140-2 validated cryptographic software module. To do so, connect to the unique FIPS 140-2 validated HTTPS endpoints in the AWS KMS requests made from your applications. AWS KMS FIPS 140-2 validated HTTPS endpoints are powered by the OpenSSL FIPS Object Module. FIPS 140-2 validated API endpoints are available in all commercial regions where AWS KMS is available.

Raspberry Pi 3 Model B+ on sale now at $35

Post Syndicated from Eben Upton original https://www.raspberrypi.org/blog/raspberry-pi-3-model-bplus-sale-now-35/

Here’s a long post. We think you’ll find it interesting. If you don’t have time to read it all, we recommend you watch this video, which will fill you in with everything you need, and then head straight to the product page to fill yer boots. (We recommend the video anyway, even if you do have time for a long read. ‘Cos it’s fab.)

A BRAND-NEW PI FOR π DAY

Raspberry Pi 3 Model B+ is now on sale now for $35, featuring: – A 1.4GHz 64-bit quad-core ARM Cortex-A53 CPU – Dual-band 802.11ac wireless LAN and Bluetooth 4.2 – Faster Ethernet (Gigabit Ethernet over USB 2.0) – Power-over-Ethernet support (with separate PoE HAT) – Improved PXE network and USB mass-storage booting – Improved thermal management Alongside a 200MHz increase in peak CPU clock frequency, we have roughly three times the wired and wireless network throughput, and the ability to sustain high performance for much longer periods.

If you’ve been a Raspberry Pi watcher for a while now, you’ll have a bit of a feel for how we update our products. Just over two years ago, we released Raspberry Pi 3 Model B. This was our first 64-bit product, and our first product to feature integrated wireless connectivity. Since then, we’ve sold over nine million Raspberry Pi 3 units (we’ve sold 19 million Raspberry Pis in total), which have been put to work in schools, homes, offices and factories all over the globe.

Those Raspberry Pi watchers will know that we have a history of releasing improved versions of our products a couple of years into their lives. The first example was Raspberry Pi 1 Model B+, which added two additional USB ports, introduced our current form factor, and rolled up a variety of other feedback from the community. Raspberry Pi 2 didn’t get this treatment, of course, as it was superseded after only one year; but it feels like it’s high time that Raspberry Pi 3 received the “plus” treatment.

So, without further ado, Raspberry Pi 3 Model B+ is now on sale for $35 (the same price as the existing Raspberry Pi 3 Model B), featuring:

  • A 1.4GHz 64-bit quad-core ARM Cortex-A53 CPU
  • Dual-band 802.11ac wireless LAN and Bluetooth 4.2
  • Faster Ethernet (Gigabit Ethernet over USB 2.0)
  • Power-over-Ethernet support (with separate PoE HAT)
  • Improved PXE network and USB mass-storage booting
  • Improved thermal management

Alongside a 200MHz increase in peak CPU clock frequency, we have roughly three times the wired and wireless network throughput, and the ability to sustain high performance for much longer periods.

Behold the shiny

Raspberry Pi 3B+ is available to buy today from our network of Approved Resellers.

New features, new chips

Roger Thornton did the design work on this revision of the Raspberry Pi. Here, he and I have a chat about what’s new.

Introducing the Raspberry Pi 3 Model B+

Raspberry Pi 3 Model B+ is now on sale now for $35, featuring: – A 1.4GHz 64-bit quad-core ARM Cortex-A53 CPU – Dual-band 802.11ac wireless LAN and Bluetooth 4.2 – Faster Ethernet (Gigabit Ethernet over USB 2.0) – Power-over-Ethernet support (with separate PoE HAT) – Improved PXE network and USB mass-storage booting – Improved thermal management Alongside a 200MHz increase in peak CPU clock frequency, we have roughly three times the wired and wireless network throughput, and the ability to sustain high performance for much longer periods.

The new product is built around BCM2837B0, an updated version of the 64-bit Broadcom application processor used in Raspberry Pi 3B, which incorporates power integrity optimisations, and a heat spreader (that’s the shiny metal bit you can see in the photos). Together these allow us to reach higher clock frequencies (or to run at lower voltages to reduce power consumption), and to more accurately monitor and control the temperature of the chip.

Dual-band wireless LAN and Bluetooth are provided by the Cypress CYW43455 “combo” chip, connected to a Proant PCB antenna similar to the one used on Raspberry Pi Zero W. Compared to its predecessor, Raspberry Pi 3B+ delivers somewhat better performance in the 2.4GHz band, and far better performance in the 5GHz band, as demonstrated by these iperf results from LibreELEC developer Milhouse.

Tx bandwidth (Mb/s)Rx bandwidth (Mb/s)
Raspberry Pi 3B35.735.6
Raspberry Pi 3B+ (2.4GHz)46.746.3
Raspberry Pi 3B+ (5GHz)102102

The wireless circuitry is encapsulated under a metal shield, rather fetchingly embossed with our logo. This has allowed us to certify the entire board as a radio module under FCC rules, which in turn will significantly reduce the cost of conformance testing Raspberry Pi-based products.

We’ll be teaching metalwork next.

Previous Raspberry Pi devices have used the LAN951x family of chips, which combine a USB hub and 10/100 Ethernet controller. For Raspberry Pi 3B+, Microchip have supported us with an upgraded version, LAN7515, which supports Gigabit Ethernet. While the USB 2.0 connection to the application processor limits the available bandwidth, we still see roughly a threefold increase in throughput compared to Raspberry Pi 3B. Again, here are some typical iperf results.

Tx bandwidth (Mb/s)Rx bandwidth (Mb/s)
Raspberry Pi 3B94.195.5
Raspberry Pi 3B+315315

We use a magjack that supports Power over Ethernet (PoE), and bring the relevant signals to a new 4-pin header. We will shortly launch a PoE HAT which can generate the 5V necessary to power the Raspberry Pi from the 48V PoE supply.

There… are… four… pins!

Coming soon to a Raspberry Pi 3B+ near you

Raspberry Pi 3B was our first product to support PXE Ethernet boot. Testing it in the wild shook out a number of compatibility issues with particular switches and traffic environments. Gordon has rolled up fixes for all known issues into the BCM2837B0 boot ROM, and PXE boot is now enabled by default.

Clocking, voltages and thermals

The improved power integrity of the BCM2837B0 package, and the improved regulation accuracy of our new MaxLinear MxL7704 power management IC, have allowed us to tune our clocking and voltage rules for both better peak performance and longer-duration sustained performance.

Below 70°C, we use the improvements to increase the core frequency to 1.4GHz. Above 70°C, we drop to 1.2GHz, and use the improvements to decrease the core voltage, increasing the period of time before we reach our 80°C thermal throttle; the reduction in power consumption is such that many use cases will never reach the throttle. Like a modern smartphone, we treat the thermal mass of the device as a resource, to be spent carefully with the goal of optimising user experience.

This graph, courtesy of Gareth Halfacree, demonstrates that Raspberry Pi 3B+ runs faster and at a lower temperature for the duration of an eight‑minute quad‑core Sysbench CPU test.

Note that Raspberry Pi 3B+ does consume substantially more power than its predecessor. We strongly encourage you to use a high-quality 2.5A power supply, such as the official Raspberry Pi Universal Power Supply.

FAQs

We’ll keep updating this list over the next couple of days, but here are a few to get you started.

Are you discontinuing earlier Raspberry Pi models?

No. We have a lot of industrial customers who will want to stick with the existing products for the time being. We’ll keep building these models for as long as there’s demand. Raspberry Pi 1B+, Raspberry Pi 2B, and Raspberry Pi 3B will continue to sell for $25, $35, and $35 respectively.

What about Model A+?

Raspberry Pi 1A+ continues to be the $20 entry-level “big” Raspberry Pi for the time being. We are considering the possibility of producing a Raspberry Pi 3A+ in due course.

What about the Compute Module?

CM1, CM3 and CM3L will continue to be available. We may offer versions of CM3 and CM3L with BCM2837B0 in due course, depending on customer demand.

Are you still using VideoCore?

Yes. VideoCore IV 3D is the only publicly-documented 3D graphics core for ARM‑based SoCs, and we want to make Raspberry Pi more open over time, not less.

Credits

A project like this requires a vast amount of focused work from a large team over an extended period. Particular credit is due to Roger Thornton, who designed the board and ran the exhaustive (and exhausting) RF compliance campaign, and to the team at the Sony UK Technology Centre in Pencoed, South Wales. A partial list of others who made major direct contributions to the BCM2837B0 chip program, CYW43455 integration, LAN7515 and MxL7704 developments, and Raspberry Pi 3B+ itself follows:

James Adams, David Armour, Jonathan Bell, Maria Blazquez, Jamie Brogan-Shaw, Mike Buffham, Rob Campling, Cindy Cao, Victor Carmon, KK Chan, Nick Chase, Nigel Cheetham, Scott Clark, Nigel Clift, Dominic Cobley, Peter Coyle, John Cronk, Di Dai, Kurt Dennis, David Doyle, Andrew Edwards, Phil Elwell, John Ferdinand, Doug Freegard, Ian Furlong, Shawn Guo, Philip Harrison, Jason Hicks, Stefan Ho, Andrew Hoare, Gordon Hollingworth, Tuomas Hollman, EikPei Hu, James Hughes, Andy Hulbert, Anand Jain, David John, Prasanna Kerekoppa, Shaik Labeeb, Trevor Latham, Steve Le, David Lee, David Lewsey, Sherman Li, Xizhe Li, Simon Long, Fu Luo Larson, Juan Martinez, Sandhya Menon, Ben Mercer, James Mills, Max Passell, Mark Perry, Eric Phiri, Ashwin Rao, Justin Rees, James Reilly, Matt Rowley, Akshaye Sama, Ian Saturley, Serge Schneider, Manuel Sedlmair, Shawn Shadburn, Veeresh Shivashimper, Graham Smith, Ben Stephens, Mike Stimson, Yuree Tchong, Stuart Thomson, John Wadsworth, Ian Watch, Sarah Williams, Jason Zhu.

If you’re not on this list and think you should be, please let me know, and accept my apologies.

The post Raspberry Pi 3 Model B+ on sale now at $35 appeared first on Raspberry Pi.

When You Have A Blockchain, Everything Looks Like a Nail

Post Syndicated from Bozho original https://techblog.bozho.net/blockchain-everything-looks-like-nail/

Blockchain, AI, big data, NoSQL, microservices, single page applications, cloud, SOA. What do these have in common? They have been or are hyped. At some point they were “the big thing” du jour. Everyone was investigating the possibility of using them, everyone was talking about them, there were meetups, conferences, articles on Hacker news and reddit. There are more examples, of course (which is the javascript framework this month?) but I’ll focus my examples on those above.

Another thing they have in common is that they are useful. All of them have some pretty good applications that are definitely worth the time and investment.

Yet another thing they have in common is that they are far from universally applicable. I’ve argued that monoliths are often still the better approach and that microservices introduce too much complexity for the average project. Big Data is something very few organizations actually have; AI/machine learning can help a wide variety of problems, but it is just a tool in a toolbox, not the solution to all problems. Single page applications are great for, yeah, applications, but most websites are still websites, not feature-rich frontends – you don’t need an SPA for every type of website. NoSQL has solved niche issues, and issues of scale that few companies have had, but nothing beats a good old relational database for the typical project out there. “The cloud” is not always where you want your software to be; and SOA just means everything (ESBs, direct integrations, even microservices, according to some). And the blockchain – it seems to be having limited success beyond cryptocurrencies.

And finally, another trait many of them share is that the hype has settled down. Only yesterday I read an article about the “death of the microservices madness”. I don’t see nearly as many new NoSQL databases as a few years ago, some of the projects that have been popular have faded. SOA and “the cloud” are already “boring”, and we’ve realized we don’t actually have big data if it fits in an Excel spreadsheet. SPAs and AI are still high in popularity, but we are getting a good understanding as a community why and when they are useful.

But it seems that nuanced reality has never stopped us from hyping a particular technology or approach. And maybe that’s okay in order to get a promising, though niche, technology, the spotlight and let it shine in the particular usecases where it fits.

But countless projects have and will suffer from our collective inability to filter through these hypes. I’d bet millions of developer hours have been wasted in trying to use the above technologies where they just didn’t fit. It’s like that scene from Idiocracy where a guy tries to fit a rectangular figure into a circular hole.

And the new one is not “the blockchain”. I won’t repeat my rant, but in summary – it doesn’t solve many of the problems companies are trying to solve with it right now just because it’s cool. Or at least it doesn’t solve them better than existing solutions. Many pilots will be carried out, many hours will be wasted in figuring out why that thing doesn’t work. A few of those projects will be a good fit and will actually bring value.

Do you need to reach multi-party consensus for the data you store? Can all stakeholder support the infrastructure to run their node(s)? Do they have the staff to administer the node(s)? Do you need to execute distributed application code on the data? Won’t it be easier to just deploy RESTful APIs and integrate the parties through that? Do you need to store all the data, or just parts of it, to guarantee data integrity?

“If you have is a hammer, everything looks like a nail” as the famous saying goes. In the software industry we repeatedly find new and cool hammers and then try to hit as many nails as we can. But only few of them are actual nails. The rest remain ugly, hard to support, “who was the idiot that wrote this” and “I wasn’t here when the decisions were made” types of projects.

I don’t have the illusion that we will calm down and skip the next hypes. Especially if adding the hyped word to your company raises your stock price. But if there’s one thing I’d like people to ask themselves when choosing a technology stack, it is “do we really need that to solve our problems?”.

If the answer is really “yes”, then great, go ahead and deploy the multi-organization permissioned blockchain, or fork Ethereum, or whatever. If not, you can still do a project a home that you can safely abandon. And if you need some pilot project to figure out whether the new piece of technology would be beneficial – go ahead and try it. But have a baseline – the fact that it somehow worked doesn’t mean it’s better than old, tested models of doing the same thing.

The post When You Have A Blockchain, Everything Looks Like a Nail appeared first on Bozho's tech blog.

The Top 10 Most Downloaded AWS Security and Compliance Documents in 2017

Post Syndicated from Sara Duffer original https://aws.amazon.com/blogs/security/the-top-10-most-downloaded-aws-security-and-compliance-documents-in-2017/

AWS download logo

The following list includes the ten most downloaded AWS security and compliance documents in 2017. Using this list, you can learn about what other AWS customers found most interesting about security and compliance last year.

  1. AWS Security Best Practices – This guide is intended for customers who are designing the security infrastructure and configuration for applications running on AWS. The guide provides security best practices that will help you define your Information Security Management System (ISMS) and build a set of security policies and processes for your organization so that you can protect your data and assets in the AWS Cloud.
  2. AWS: Overview of Security Processes – This whitepaper describes the physical and operational security processes for the AWS managed network and infrastructure, and helps answer questions such as, “How does AWS help me protect my data?”
  3. Architecting for HIPAA Security and Compliance on AWS – This whitepaper describes how to leverage AWS to develop applications that meet HIPAA and HITECH compliance requirements.
  4. Service Organization Controls (SOC) 3 Report – This publicly available report describes internal AWS security controls, availability, processing integrity, confidentiality, and privacy.
  5. Introduction to AWS Security –This document provides an introduction to AWS’s approach to security, including the controls in the AWS environment, and some of the products and features that AWS makes available to customers to meet your security objectives.
  6. AWS Best Practices for DDoS Resiliency – This whitepaper covers techniques to mitigate distributed denial of service (DDoS) attacks.
  7. AWS: Risk and Compliance – This whitepaper provides information to help customers integrate AWS into their existing control framework, including a basic approach for evaluating AWS controls and a description of AWS certifications, programs, reports, and third-party attestations.
  8. Use AWS WAF to Mitigate OWASP’s Top 10 Web Application Vulnerabilities – AWS WAF is a web application firewall that helps you protect your websites and web applications against various attack vectors at the HTTP protocol level. This whitepaper outlines how you can use AWS WAF to mitigate the application vulnerabilities that are defined in the Open Web Application Security Project (OWASP) Top 10 list of most common categories of application security flaws.
  9. Introduction to Auditing the Use of AWS – This whitepaper provides information, tools, and approaches for auditors to use when auditing the security of the AWS managed network and infrastructure.
  10. AWS Security and Compliance: Quick Reference Guide – By using AWS, you inherit the many security controls that we operate, thus reducing the number of security controls that you need to maintain. Your own compliance and certification programs are strengthened while at the same time lowering your cost to maintain and run your specific security assurance requirements. Learn more in this quick reference guide.

– Sara

AWS Updated Its ISO Certifications and Now Has 67 Services Under ISO Compliance

Post Syndicated from Chad Woolf original https://aws.amazon.com/blogs/security/aws-updated-its-iso-certifications-and-now-has-67-services-under-iso-compliance/

ISO logo

AWS has updated its certifications against ISO 9001, ISO 27001, ISO 27017, and ISO 27018 standards, bringing the total to 67 services now under ISO compliance. We added the following 29 services this cycle:

Amazon AuroraAmazon S3 Transfer AccelerationAWS [email protected]
Amazon Cloud DirectoryAmazon SageMakerAWS Managed Services
Amazon CloudWatch LogsAmazon Simple Notification ServiceAWS OpsWorks Stacks
Amazon CognitoAuto ScalingAWS Shield
Amazon ConnectAWS BatchAWS Snowball Edge
Amazon Elastic Container RegistryAWS CodeBuildAWS Snowmobile
Amazon InspectorAWS CodeCommitAWS Step Functions
Amazon Kinesis Data StreamsAWS CodeDeployAWS Systems Manager (formerly Amazon EC2 Systems Manager)
Amazon MacieAWS CodePipelineAWS X-Ray
Amazon QuickSightAWS IoT Core

For the complete list of services under ISO compliance, see AWS Services in Scope by Compliance Program.

AWS maintains certifications through extensive audits of its controls to ensure that information security risks that affect the confidentiality, integrity, and availability of company and customer information are appropriately managed.

You can download copies of the AWS ISO certificates that contain AWS’s in-scope services and Regions, and use these certificates to jump-start your own certification efforts:

AWS does not increase service costs in any AWS Region as a result of updating its certifications.

To learn more about compliance in the AWS Cloud, see AWS Cloud Compliance.

– Chad

Protecting code integrity with PGP

Post Syndicated from jake original https://lwn.net/Articles/741454/rss

Linux Foundation Director of IT infrastructure security, Konstantin Ryabitsev, has put together a lengthy guide to using Git and PGP to protect the integrity of source code. In a Google+ post, he called it “beta quality” and asked for help with corrections and fixes. “PGP incorporates a trust delegation mechanism known as the ‘Web of Trust.’ At its core, this is an attempt to replace the need for centralized Certification Authorities of the HTTPS/TLS world. Instead of various software makers dictating who should be your trusted certifying entity, PGP leaves this responsibility to each user.

Unfortunately, very few people understand how the Web of Trust works, and even fewer bother to keep it going. It remains an important aspect of the OpenPGP specification, but recent versions of GnuPG (2.2 and above) have implemented an alternative mechanism called ‘Trust on First Use’ (TOFU).

You can think of TOFU as ‘the SSH-like approach to trust.’ With SSH, the first time you connect to a remote system, its key fingerprint is recorded and remembered. If the key changes in the future, the SSH client will alert you and refuse to connect, forcing you to make a decision on whether you choose to trust the changed key or not.

Similarly, the first time you import someone’s PGP key, it is assumed to be trusted. If at any point in the future GnuPG comes across another key with the same identity, both the previously imported key and the new key will be marked as invalid and you will need to manually figure out which one to keep.

In this guide, we will be using the TOFU trust model.”

How to Easily Apply Amazon Cloud Directory Schema Changes with In-Place Schema Upgrades

Post Syndicated from Mahendra Chheda original https://aws.amazon.com/blogs/security/how-to-easily-apply-amazon-cloud-directory-schema-changes-with-in-place-schema-upgrades/

Now, Amazon Cloud Directory makes it easier for you to apply schema changes across your directories with in-place schema upgrades. Your directory now remains available while Cloud Directory applies backward-compatible schema changes such as the addition of new fields. Without migrating data between directories or applying code changes to your applications, you can upgrade your schemas. You also can view the history of your schema changes in Cloud Directory by using version identifiers, which help you track and audit schema versions across directories. If you have multiple instances of a directory with the same schema, you can view the version history of schema changes to manage your directory fleet and ensure that all directories are running with the same schema version.

In this blog post, I demonstrate how to perform an in-place schema upgrade and use schema versions in Cloud Directory. I add additional attributes to an existing facet and add a new facet to a schema. I then publish the new schema and apply it to running directories, upgrading the schema in place. I also show how to view the version history of a directory schema, which helps me to ensure my directory fleet is running the same version of the schema and has the correct history of schema changes applied to it.

Note: I share Java code examples in this post. I assume that you are familiar with the AWS SDK and can use Java-based code to build a Cloud Directory code example. You can apply the concepts I cover in this post to other programming languages such as Python and Ruby.

Cloud Directory fundamentals

I will start by covering a few Cloud Directory fundamentals. If you are already familiar with the concepts behind Cloud Directory facets, schemas, and schema lifecycles, you can skip to the next section.

Facets: Groups of attributes. You use facets to define object types. For example, you can define a device schema by adding facets such as computers, phones, and tablets. A computer facet can track attributes such as serial number, make, and model. You can then use the facets to create computer objects, phone objects, and tablet objects in the directory to which the schema applies.

Schemas: Collections of facets. Schemas define which types of objects can be created in a directory (such as users, devices, and organizations) and enforce validation of data for each object class. All data within a directory must conform to the applied schema. As a result, the schema definition is essentially a blueprint to construct a directory with an applied schema.

Schema lifecycle: The four distinct states of a schema: Development, Published, Applied, and Deleted. Schemas in the Published and Applied states have version identifiers and cannot be changed. Schemas in the Applied state are used by directories for validation as applications insert or update data. You can change schemas in the Development state as many times as you need them to. In-place schema upgrades allow you to apply schema changes to an existing Applied schema in a production directory without the need to export and import the data populated in the directory.

How to add attributes to a computer inventory application schema and perform an in-place schema upgrade

To demonstrate how to set up schema versioning and perform an in-place schema upgrade, I will use an example of a computer inventory application that uses Cloud Directory to store relationship data. Let’s say that at my company, AnyCompany, we use this computer inventory application to track all computers we give to our employees for work use. I previously created a ComputerSchema and assigned its version identifier as 1. This schema contains one facet called ComputerInfo that includes attributes for SerialNumber, Make, and Model, as shown in the following schema details.

Schema: ComputerSchema
Version: 1

Facet: ComputerInfo
Attribute: SerialNumber, type: Integer
Attribute: Make, type: String
Attribute: Model, type: String

AnyCompany has offices in Seattle, Portland, and San Francisco. I have deployed the computer inventory application for each of these three locations. As shown in the lower left part of the following diagram, ComputerSchema is in the Published state with a version of 1. The Published schema is applied to SeattleDirectory, PortlandDirectory, and SanFranciscoDirectory for AnyCompany’s three locations. Implementing separate directories for different geographic locations when you don’t have any queries that cross location boundaries is a good data partitioning strategy and gives your application better response times with lower latency.

Diagram of ComputerSchema in Published state and applied to three directories

Legend for the diagrams in this post

The following code example creates the schema in the Development state by using a JSON file, publishes the schema, and then creates directories for the Seattle, Portland, and San Francisco locations. For this example, I assume the schema has been defined in the JSON file. The createSchema API creates a schema Amazon Resource Name (ARN) with the name defined in the variable, SCHEMA_NAME. I can use the putSchemaFromJson API to add specific schema definitions from the JSON file.

// The utility method to get valid Cloud Directory schema JSON
String validJson = getJsonFile("ComputerSchema_version_1.json")

String SCHEMA_NAME = "ComputerSchema";

String developmentSchemaArn = client.createSchema(new CreateSchemaRequest()
        .withName(SCHEMA_NAME))
        .getSchemaArn();

// Put the schema document in the Development schema
PutSchemaFromJsonResult result = client.putSchemaFromJson(new PutSchemaFromJsonRequest()
        .withSchemaArn(developmentSchemaArn)
        .withDocument(validJson));

The following code example takes the schema that is currently in the Development state and publishes the schema, changing its state to Published.

String SCHEMA_VERSION = "1";
String publishedSchemaArn = client.publishSchema(
        new PublishSchemaRequest()
        .withDevelopmentSchemaArn(developmentSchemaArn)
        .withVersion(SCHEMA_VERSION))
        .getPublishedSchemaArn();

// Our Published schema ARN is as follows
// arn:aws:clouddirectory:us-west-2:XXXXXXXXXXXX:schema/published/ComputerSchema/1

The following code example creates a directory named SeattleDirectory and applies the published schema. The createDirectory API call creates a directory by using the published schema provided in the API parameters. Note that Cloud Directory stores a version of the schema in the directory in the Applied state. I will use similar code to create directories for PortlandDirectory and SanFranciscoDirectory.

String DIRECTORY_NAME = "SeattleDirectory"; 

CreateDirectoryResult directory = client.createDirectory(
        new CreateDirectoryRequest()
        .withName(DIRECTORY_NAME)
        .withSchemaArn(publishedSchemaArn));

String directoryArn = directory.getDirectoryArn();
String appliedSchemaArn = directory.getAppliedSchemaArn();

// This code section can be reused to create directories for Portland and San Francisco locations with the appropriate directory names

// Our directory ARN is as follows 
// arn:aws:clouddirectory:us-west-2:XXXXXXXXXXXX:directory/XX_DIRECTORY_GUID_XX

// Our applied schema ARN is as follows 
// arn:aws:clouddirectory:us-west-2:XXXXXXXXXXXX:directory/XX_DIRECTORY_GUID_XX/schema/ComputerSchema/1

Revising a schema

Now let’s say my company, AnyCompany, wants to add more information for computers and to track which employees have been assigned a computer for work use. I modify the schema to add two attributes to the ComputerInfo facet: Description and OSVersion (operating system version). I make Description optional because it is not important for me to track this attribute for the computer objects I create. I make OSVersion mandatory because it is critical for me to track it for all computer objects so that I can make changes such as applying security patches or making upgrades. Because I make OSVersion mandatory, I must provide a default value that Cloud Directory will apply to objects that were created before the schema revision, in order to handle backward compatibility. Note that you can replace the value in any object with a different value.

I also add a new facet to track computer assignment information, shown in the following updated schema as the ComputerAssignment facet. This facet tracks these additional attributes: Name (the name of the person to whom the computer is assigned), EMail (the email address of the assignee), Department, and department CostCenter. Note that Cloud Directory refers to the previously available version identifier as the Major Version. Because I can now add a minor version to a schema, I also denote the changed schema as Minor Version A.

Schema: ComputerSchema
Major Version: 1
Minor Version: A 

Facet: ComputerInfo
Attribute: SerialNumber, type: Integer 
Attribute: Make, type: String
Attribute: Model, type: Integer
Attribute: Description, type: String, required: NOT_REQUIRED
Attribute: OSVersion, type: String, required: REQUIRED_ALWAYS, default: "Windows 7"

Facet: ComputerAssignment
Attribute: Name, type: String
Attribute: EMail, type: String
Attribute: Department, type: String
Attribute: CostCenter, type: Integer

The following diagram shows the changes that were made when I added another facet to the schema and attributes to the existing facet. The highlighted area of the diagram (bottom left) shows that the schema changes were published.

Diagram showing that schema changes were published

The following code example revises the existing Development schema by adding the new attributes to the ComputerInfo facet and by adding the ComputerAssignment facet. I use a new JSON file for the schema revision, and for the purposes of this example, I am assuming the JSON file has the full schema including planned revisions.

// The utility method to get a valid CloudDirectory schema JSON
String schemaJson = getJsonFile("ComputerSchema_version_1_A.json")

// Put the schema document in the Development schema
PutSchemaFromJsonResult result = client.putSchemaFromJson(
        new PutSchemaFromJsonRequest()
        .withSchemaArn(developmentSchemaArn)
        .withDocument(schemaJson));

Upgrading the Published schema

The following code example performs an in-place schema upgrade of the Published schema with schema revisions (it adds new attributes to the existing facet and another facet to the schema). The upgradePublishedSchema API upgrades the Published schema with backward-compatible changes from the Development schema.

// From an earlier code example, I know the publishedSchemaArn has this value: "arn:aws:clouddirectory:us-west-2:XXXXXXXXXXXX:schema/published/ComputerSchema/1"

// Upgrade publishedSchemaArn to minorVersion A. The Development schema must be backward compatible with 
// the existing publishedSchemaArn. 

String minorVersion = "A"

UpgradePublishedSchemaResult upgradePublishedSchemaResult = client.upgradePublishedSchema(new UpgradePublishedSchemaRequest()
        .withDevelopmentSchemaArn(developmentSchemaArn)
        .withPublishedSchemaArn(publishedSchemaArn)
        .withMinorVersion(minorVersion));

String upgradedPublishedSchemaArn = upgradePublishedSchemaResult.getUpgradedSchemaArn();

// The Published schema ARN after the upgrade shows a minor version as follows 
// arn:aws:clouddirectory:us-west-2:XXXXXXXXXXXX:schema/published/ComputerSchema/1/A

Upgrading the Applied schema

The following diagram shows the in-place schema upgrade for the SeattleDirectory directory. I am performing the schema upgrade so that I can reflect the new schemas in all three directories. As a reminder, I added new attributes to the ComputerInfo facet and also added the ComputerAssignment facet. After the schema and directory upgrade, I can create objects for the ComputerInfo and ComputerAssignment facets in the SeattleDirectory. Any objects that were created with the old facet definition for ComputerInfo will now use the default values for any additional attributes defined in the new schema.

Diagram of the in-place schema upgrade for the SeattleDirectory directory

I use the following code example to perform an in-place upgrade of the SeattleDirectory to a Major Version of 1 and a Minor Version of A. Note that you should change a Major Version identifier in a schema to make backward-incompatible changes such as changing the data type of an existing attribute or dropping a mandatory attribute from your schema. Backward-incompatible changes require directory data migration from a previous version to the new version. You should change a Minor Version identifier in a schema to make backward-compatible upgrades such as adding additional attributes or adding facets, which in turn may contain one or more attributes. The upgradeAppliedSchema API lets me upgrade an existing directory with a different version of a schema.

// This upgrades ComputerSchema version 1 of the Applied schema in SeattleDirectory to Major Version 1 and Minor Version A
// The schema must be backward compatible or the API will fail with IncompatibleSchemaException

UpgradeAppliedSchemaResult upgradeAppliedSchemaResult = client.upgradeAppliedSchema(new UpgradeAppliedSchemaRequest()
        .withDirectoryArn(directoryArn)
        .withPublishedSchemaArn(upgradedPublishedSchemaArn));

String upgradedAppliedSchemaArn = upgradeAppliedSchemaResult.getUpgradedSchemaArn();

// The Applied schema ARN after the in-place schema upgrade will appear as follows
// arn:aws:clouddirectory:us-west-2:XXXXXXXXXXXX:directory/XX_DIRECTORY_GUID_XX/schema/ComputerSchema/1

// This code section can be reused to upgrade directories for the Portland and San Francisco locations with the appropriate directory ARN

Note: Cloud Directory has excluded returning the Minor Version identifier in the Applied schema ARN for backward compatibility and to enable the application to work across older and newer versions of the directory.

The following diagram shows the changes that are made when I perform an in-place schema upgrade in the two remaining directories, PortlandDirectory and SanFranciscoDirectory. I make these calls sequentially, upgrading PortlandDirectory first and then upgrading SanFranciscoDirectory. I use the same code example that I used earlier to upgrade SeattleDirectory. Now, all my directories are running the most current version of the schema. Also, I made these schema changes without having to migrate data and while maintaining my application’s high availability.

Diagram showing the changes that are made with an in-place schema upgrade in the two remaining directories

Schema revision history

I can now view the schema revision history for any of AnyCompany’s directories by using the listAppliedSchemaArns API. Cloud Directory maintains the five most recent versions of applied schema changes. Similarly, to inspect the current Minor Version that was applied to my schema, I use the getAppliedSchemaVersion API. The listAppliedSchemaArns API returns the schema ARNs based on my schema filter as defined in withSchemaArn.

I use the following code example to query an Applied schema for its version history.

// This returns the five most recent Minor Versions associated with a Major Version
ListAppliedSchemaArnsResult listAppliedSchemaArnsResult = client.listAppliedSchemaArns(new ListAppliedSchemaArnsRequest()
        .withDirectoryArn(directoryArn)
        .withSchemaArn(upgradedAppliedSchemaArn));

// Note: The listAppliedSchemaArns API without the SchemaArn filter returns all the Major Versions in a directory

The listAppliedSchemaArns API returns the two ARNs as shown in the following output.

arn:aws:clouddirectory:us-west-2:XXXXXXXXXXXX:directory/XX_DIRECTORY_GUID_XX/schema/ComputerSchema/1
arn:aws:clouddirectory:us-west-2:XXXXXXXXXXXX:directory/XX_DIRECTORY_GUID_XX/schema/ComputerSchema/1/A

The following code example queries an Applied schema for current Minor Version by using the getAppliedSchemaVersion API.

// This returns the current Applied schema's Minor Version ARN 

GetAppliedSchemaVersion getAppliedSchemaVersionResult = client.getAppliedSchemaVersion(new GetAppliedSchemaVersionRequest()
	.withSchemaArn(upgradedAppliedSchemaArn));

The getAppliedSchemaVersion API returns the current Applied schema ARN with a Minor Version, as shown in the following output.

arn:aws:clouddirectory:us-west-2:XXXXXXXXXXXX:directory/XX_DIRECTORY_GUID_XX/schema/ComputerSchema/1/A

If you have a lot of directories, schema revision API calls can help you audit your directory fleet and ensure that all directories are running the same version of a schema. Such auditing can help you ensure high integrity of directories across your fleet.

Summary

You can use in-place schema upgrades to make changes to your directory schema as you evolve your data set to match the needs of your application. An in-place schema upgrade allows you to maintain high availability for your directory and applications while the upgrade takes place. For more information about in-place schema upgrades, see the in-place schema upgrade documentation.

If you have comments about this blog post, submit them in the “Comments” section below. If you have questions about implementing the solution in this post, start a new thread in the Directory Service forum or contact AWS Support.

– Mahendra

 

Implementing Dynamic ETL Pipelines Using AWS Step Functions

Post Syndicated from Tara Van Unen original https://aws.amazon.com/blogs/compute/implementing-dynamic-etl-pipelines-using-aws-step-functions/

This post contributed by:
Wangechi Dole, AWS Solutions Architect
Milan Krasnansky, ING, Digital Solutions Developer, SGK
Rian Mookencherry, Director – Product Innovation, SGK

Data processing and transformation is a common use case you see in our customer case studies and success stories. Often, customers deal with complex data from a variety of sources that needs to be transformed and customized through a series of steps to make it useful to different systems and stakeholders. This can be difficult due to the ever-increasing volume, velocity, and variety of data. Today, data management challenges cannot be solved with traditional databases.

Workflow automation helps you build solutions that are repeatable, scalable, and reliable. You can use AWS Step Functions for this. A great example is how SGK used Step Functions to automate the ETL processes for their client. With Step Functions, SGK has been able to automate changes within the data management system, substantially reducing the time required for data processing.

In this post, SGK shares the details of how they used Step Functions to build a robust data processing system based on highly configurable business transformation rules for ETL processes.

SGK: Building dynamic ETL pipelines

SGK is a subsidiary of Matthews International Corporation, a diversified organization focusing on brand solutions and industrial technologies. SGK’s Global Content Creation Studio network creates compelling content and solutions that connect brands and products to consumers through multiple assets including photography, video, and copywriting.

We were recently contracted to build a sophisticated and scalable data management system for one of our clients. We chose to build the solution on AWS to leverage advanced, managed services that help to improve the speed and agility of development.

The data management system served two main functions:

  1. Ingesting a large amount of complex data to facilitate both reporting and product funding decisions for the client’s global marketing and supply chain organizations.
  2. Processing the data through normalization and applying complex algorithms and data transformations. The system goal was to provide information in the relevant context—such as strategic marketing, supply chain, product planning, etc. —to the end consumer through automated data feeds or updates to existing ETL systems.

We were faced with several challenges:

  • Output data that needed to be refreshed at least twice a day to provide fresh datasets to both local and global markets. That constant data refresh posed several challenges, especially around data management and replication across multiple databases.
  • The complexity of reporting business rules that needed to be updated on a constant basis.
  • Data that could not be processed as contiguous blocks of typical time-series data. The measurement of the data was done across seasons (that is, combination of dates), which often resulted with up to three overlapping seasons at any given time.
  • Input data that came from 10+ different data sources. Each data source ranged from 1–20K rows with as many as 85 columns per input source.

These challenges meant that our small Dev team heavily invested time in frequent configuration changes to the system and data integrity verification to make sure that everything was operating properly. Maintaining this system proved to be a daunting task and that’s when we turned to Step Functions—along with other AWS services—to automate our ETL processes.

Solution overview

Our solution included the following AWS services:

  • AWS Step Functions: Before Step Functions was available, we were using multiple Lambda functions for this use case and running into memory limit issues. With Step Functions, we can execute steps in parallel simultaneously, in a cost-efficient manner, without running into memory limitations.
  • AWS Lambda: The Step Functions state machine uses Lambda functions to implement the Task states. Our Lambda functions are implemented in Java 8.
  • Amazon DynamoDB provides us with an easy and flexible way to manage business rules. We specify our rules as Keys. These are key-value pairs stored in a DynamoDB table.
  • Amazon RDS: Our ETL pipelines consume source data from our RDS MySQL database.
  • Amazon Redshift: We use Amazon Redshift for reporting purposes because it integrates with our BI tools. Currently we are using Tableau for reporting which integrates well with Amazon Redshift.
  • Amazon S3: We store our raw input files and intermediate results in S3 buckets.
  • Amazon CloudWatch Events: Our users expect results at a specific time. We use CloudWatch Events to trigger Step Functions on an automated schedule.

Solution architecture

This solution uses a declarative approach to defining business transformation rules that are applied by the underlying Step Functions state machine as data moves from RDS to Amazon Redshift. An S3 bucket is used to store intermediate results. A CloudWatch Event rule triggers the Step Functions state machine on a schedule. The following diagram illustrates our architecture:

Here are more details for the above diagram:

  1. A rule in CloudWatch Events triggers the state machine execution on an automated schedule.
  2. The state machine invokes the first Lambda function.
  3. The Lambda function deletes all existing records in Amazon Redshift. Depending on the dataset, the Lambda function can create a new table in Amazon Redshift to hold the data.
  4. The same Lambda function then retrieves Keys from a DynamoDB table. Keys represent specific marketing campaigns or seasons and map to specific records in RDS.
  5. The state machine executes the second Lambda function using the Keys from DynamoDB.
  6. The second Lambda function retrieves the referenced dataset from RDS. The records retrieved represent the entire dataset needed for a specific marketing campaign.
  7. The second Lambda function executes in parallel for each Key retrieved from DynamoDB and stores the output in CSV format temporarily in S3.
  8. Finally, the Lambda function uploads the data into Amazon Redshift.

To understand the above data processing workflow, take a closer look at the Step Functions state machine for this example.

We walk you through the state machine in more detail in the following sections.

Walkthrough

To get started, you need to:

  • Create a schedule in CloudWatch Events
  • Specify conditions for RDS data extracts
  • Create Amazon Redshift input files
  • Load data into Amazon Redshift

Step 1: Create a schedule in CloudWatch Events
Create rules in CloudWatch Events to trigger the Step Functions state machine on an automated schedule. The following is an example cron expression to automate your schedule:

In this example, the cron expression invokes the Step Functions state machine at 3:00am and 2:00pm (UTC) every day.

Step 2: Specify conditions for RDS data extracts
We use DynamoDB to store Keys that determine which rows of data to extract from our RDS MySQL database. An example Key is MCS2017, which stands for, Marketing Campaign Spring 2017. Each campaign has a specific start and end date and the corresponding dataset is stored in RDS MySQL. A record in RDS contains about 600 columns, and each Key can represent up to 20K records.

A given day can have multiple campaigns with different start and end dates running simultaneously. In the following example DynamoDB item, three campaigns are specified for the given date.

The state machine example shown above uses Keys 31, 32, and 33 in the first ChoiceState and Keys 21 and 22 in the second ChoiceState. These keys represent marketing campaigns for a given day. For example, on Monday, there are only two campaigns requested. The ChoiceState with Keys 21 and 22 is executed. If three campaigns are requested on Tuesday, for example, then ChoiceState with Keys 31, 32, and 33 is executed. MCS2017 can be represented by Key 21 and Key 33 on Monday and Tuesday, respectively. This approach gives us the flexibility to add or remove campaigns dynamically.

Step 3: Create Amazon Redshift input files
When the state machine begins execution, the first Lambda function is invoked as the resource for FirstState, represented in the Step Functions state machine as follows:

"Comment": ” AWS Amazon States Language.", 
  "StartAt": "FirstState",
 
"States": { 
  "FirstState": {
   
"Type": "Task",
   
"Resource": "arn:aws:lambda:xx-xxxx-x:XXXXXXXXXXXX:function:Start",
    "Next": "ChoiceState" 
  } 

As described in the solution architecture, the purpose of this Lambda function is to delete existing data in Amazon Redshift and retrieve keys from DynamoDB. In our use case, we found that deleting existing records was more efficient and less time-consuming than finding the delta and updating existing records. On average, an Amazon Redshift table can contain about 36 million cells, which translates to roughly 65K records. The following is the code snippet for the first Lambda function in Java 8:

public class LambdaFunctionHandler implements RequestHandler<Map<String,Object>,Map<String,String>> {
    Map<String,String> keys= new HashMap<>();
    public Map<String, String> handleRequest(Map<String, Object> input, Context context){
       Properties config = getConfig(); 
       // 1. Cleaning Redshift Database
       new RedshiftDataService(config).cleaningTable(); 
       // 2. Reading data from Dynamodb
       List<String> keyList = new DynamoDBDataService(config).getCurrentKeys();
       for(int i = 0; i < keyList.size(); i++) {
           keys.put(”key" + (i+1), keyList.get(i)); 
       }
       keys.put(”key" + T,String.valueOf(keyList.size()));
       // 3. Returning the key values and the key count from the “for” loop
       return (keys);
}

The following JSON represents ChoiceState.

"ChoiceState": {
   "Type" : "Choice",
   "Choices": [ 
   {

      "Variable": "$.keyT",
     "StringEquals": "3",
     "Next": "CurrentThreeKeys" 
   }, 
   {

     "Variable": "$.keyT",
    "StringEquals": "2",
    "Next": "CurrentTwooKeys" 
   } 
 ], 
 "Default": "DefaultState"
}

The variable $.keyT represents the number of keys retrieved from DynamoDB. This variable determines which of the parallel branches should be executed. At the time of publication, Step Functions does not support dynamic parallel state. Therefore, choices under ChoiceState are manually created and assigned hardcoded StringEquals values. These values represent the number of parallel executions for the second Lambda function.

For example, if $.keyT equals 3, the second Lambda function is executed three times in parallel with keys, $key1, $key2 and $key3 retrieved from DynamoDB. Similarly, if $.keyT equals two, the second Lambda function is executed twice in parallel.  The following JSON represents this parallel execution:

"CurrentThreeKeys": { 
  "Type": "Parallel",
  "Next": "NextState",
  "Branches": [ 
  {

     "StartAt": “key31",
    "States": { 
       “key31": {

          "Type": "Task",
        "InputPath": "$.key1",
        "Resource": "arn:aws:lambda:xx-xxxx-x:XXXXXXXXXXXX:function:Execution",
        "End": true 
       } 
    } 
  }, 
  {

     "StartAt": “key32",
    "States": { 
     “key32": {

        "Type": "Task",
       "InputPath": "$.key2",
         "Resource": "arn:aws:lambda:xx-xxxx-x:XXXXXXXXXXXX:function:Execution",
       "End": true 
      } 
     } 
   }, 
   {

      "StartAt": “key33",
       "States": { 
          “key33": {

                "Type": "Task",
             "InputPath": "$.key3",
             "Resource": "arn:aws:lambda:xx-xxxx-x:XXXXXXXXXXXX:function:Execution",
           "End": true 
       } 
     } 
    } 
  ] 
} 

Step 4: Load data into Amazon Redshift
The second Lambda function in the state machine extracts records from RDS associated with keys retrieved for DynamoDB. It processes the data then loads into an Amazon Redshift table. The following is code snippet for the second Lambda function in Java 8.

public class LambdaFunctionHandler implements RequestHandler<String, String> {
 public static String key = null;

public String handleRequest(String input, Context context) { 
   key=input; 
   //1. Getting basic configurations for the next classes + s3 client Properties
   config = getConfig();

   AmazonS3 s3 = AmazonS3ClientBuilder.defaultClient(); 
   // 2. Export query results from RDS into S3 bucket 
   new RdsDataService(config).exportDataToS3(s3,key); 
   // 3. Import query results from S3 bucket into Redshift 
    new RedshiftDataService(config).importDataFromS3(s3,key); 
   System.out.println(input); 
   return "SUCCESS"; 
 } 
}

After the data is loaded into Amazon Redshift, end users can visualize it using their preferred business intelligence tools.

Lessons learned

  • At the time of publication, the 1.5–GB memory hard limit for Lambda functions was inadequate for processing our complex workload. Step Functions gave us the flexibility to chunk our large datasets and process them in parallel, saving on costs and time.
  • In our previous implementation, we assigned each key a dedicated Lambda function along with CloudWatch rules for schedule automation. This approach proved to be inefficient and quickly became an operational burden. Previously, we processed each key sequentially, with each key adding about five minutes to the overall processing time. For example, processing three keys meant that the total processing time was three times longer. With Step Functions, the entire state machine executes in about five minutes.
  • Using DynamoDB with Step Functions gave us the flexibility to manage keys efficiently. In our previous implementations, keys were hardcoded in Lambda functions, which became difficult to manage due to frequent updates. DynamoDB is a great way to store dynamic data that changes frequently, and it works perfectly with our serverless architectures.

Conclusion

With Step Functions, we were able to fully automate the frequent configuration updates to our dataset resulting in significant cost savings, reduced risk to data errors due to system downtime, and more time for us to focus on new product development rather than support related issues. We hope that you have found the information useful and that it can serve as a jump-start to building your own ETL processes on AWS with managed AWS services.

For more information about how Step Functions makes it easy to coordinate the components of distributed applications and microservices in any workflow, see the use case examples and then build your first state machine in under five minutes in the Step Functions console.

If you have questions or suggestions, please comment below.

GDPR – A Practical Guide For Developers

Post Syndicated from Bozho original https://techblog.bozho.net/gdpr-practical-guide-developers/

You’ve probably heard about GDPR. The new European data protection regulation that applies practically to everyone. Especially if you are working in a big company, it’s most likely that there’s already a process for gettign your systems in compliance with the regulation.

The regulation is basically a law that must be followed in all European countries (but also applies to non-EU companies that have users in the EU). In this particular case, it applies to companies that are not registered in Europe, but are having European customers. So that’s most companies. I will not go into yet another “12 facts about GDPR” or “7 myths about GDPR” posts/whitepapers, as they are often aimed at managers or legal people. Instead, I’ll focus on what GDPR means for developers.

Why am I qualified to do that? A few reasons – I was advisor to the deputy prime minister of a EU country, and because of that I’ve been both exposed and myself wrote some legislation. I’m familiar with the “legalese” and how the regulatory framework operates in general. I’m also a privacy advocate and I’ve been writing about GDPR-related stuff in the past, i.e. “before it was cool” (protecting sensitive data, the right to be forgotten). And finally, I’m currently working on a project that (among other things) aims to help with covering some GDPR aspects.

I’ll try to be a bit more comprehensive this time and cover as many aspects of the regulation that concern developers as I can. And while developers will mostly be concerned about how the systems they are working on have to change, it’s not unlikely that a less informed manager storms in in late spring, realizing GDPR is going to be in force tomorrow, asking “what should we do to get our system/website compliant”.

The rights of the user/client (referred to as “data subject” in the regulation) that I think are relevant for developers are: the right to erasure (the right to be forgotten/deleted from the system), right to restriction of processing (you still keep the data, but mark it as “restricted” and don’t touch it without further consent by the user), the right to data portability (the ability to export one’s data), the right to rectification (the ability to get personal data fixed), the right to be informed (getting human-readable information, rather than long terms and conditions), the right of access (the user should be able to see all the data you have about them), the right to data portability (the user should be able to get a machine-readable dump of their data).

Additionally, the relevant basic principles are: data minimization (one should not collect more data than necessary), integrity and confidentiality (all security measures to protect data that you can think of + measures to guarantee that the data has not been inappropriately modified).

Even further, the regulation requires certain processes to be in place within an organization (of more than 250 employees or if a significant amount of data is processed), and those include keeping a record of all types of processing activities carried out, including transfers to processors (3rd parties), which includes cloud service providers. None of the other requirements of the regulation have an exception depending on the organization size, so “I’m small, GDPR does not concern me” is a myth.

It is important to know what “personal data” is. Basically, it’s every piece of data that can be used to uniquely identify a person or data that is about an already identified person. It’s data that the user has explicitly provided, but also data that you have collected about them from either 3rd parties or based on their activities on the site (what they’ve been looking at, what they’ve purchased, etc.)

Having said that, I’ll list a number of features that will have to be implemented and some hints on how to do that, followed by some do’s and don’t’s.

  • “Forget me” – you should have a method that takes a userId and deletes all personal data about that user (in case they have been collected on the basis of consent, and not due to contract enforcement or legal obligation). It is actually useful for integration tests to have that feature (to cleanup after the test), but it may be hard to implement depending on the data model. In a regular data model, deleting a record may be easy, but some foreign keys may be violated. That means you have two options – either make sure you allow nullable foreign keys (for example an order usually has a reference to the user that made it, but when the user requests his data be deleted, you can set the userId to null), or make sure you delete all related data (e.g. via cascades). This may not be desirable, e.g. if the order is used to track available quantities or for accounting purposes. It’s a bit trickier for event-sourcing data models, or in extreme cases, ones that include some sort of blcokchain/hash chain/tamper-evident data structure. With event sourcing you should be able to remove a past event and re-generate intermediate snapshots. For blockchain-like structures – be careful what you put in there and avoid putting personal data of users. There is an option to use a chameleon hash function, but that’s suboptimal. Overall, you must constantly think of how you can delete the personal data. And “our data model doesn’t allow it” isn’t an excuse.
  • Notify 3rd parties for erasure – deleting things from your system may be one thing, but you are also obligated to inform all third parties that you have pushed that data to. So if you have sent personal data to, say, Salesforce, Hubspot, twitter, or any cloud service provider, you should call an API of theirs that allows for the deletion of personal data. If you are such a provider, obviously, your “forget me” endpoint should be exposed. Calling the 3rd party APIs to remove data is not the full story, though. You also have to make sure the information does not appear in search results. Now, that’s tricky, as Google doesn’t have an API for removal, only a manual process. Fortunately, it’s only about public profile pages that are crawlable by Google (and other search engines, okay…), but you still have to take measures. Ideally, you should make the personal data page return a 404 HTTP status, so that it can be removed.
  • Restrict processing – in your admin panel where there’s a list of users, there should be a button “restrict processing”. The user settings page should also have that button. When clicked (after reading the appropriate information), it should mark the profile as restricted. That means it should no longer be visible to the backoffice staff, or publicly. You can implement that with a simple “restricted” flag in the users table and a few if-clasues here and there.
  • Export data – there should be another button – “export data”. When clicked, the user should receive all the data that you hold about them. What exactly is that data – depends on the particular usecase. Usually it’s at least the data that you delete with the “forget me” functionality, but may include additional data (e.g. the orders the user has made may not be delete, but should be included in the dump). The structure of the dump is not strictly defined, but my recommendation would be to reuse schema.org definitions as much as possible, for either JSON or XML. If the data is simple enough, a CSV/XLS export would also be fine. Sometimes data export can take a long time, so the button can trigger a background process, which would then notify the user via email when his data is ready (twitter, for example, does that already – you can request all your tweets and you get them after a while).
  • Allow users to edit their profile – this seems an obvious rule, but it isn’t always followed. Users must be able to fix all data about them, including data that you have collected from other sources (e.g. using a “login with facebook” you may have fetched their name and address). Rule of thumb – all the fields in your “users” table should be editable via the UI. Technically, rectification can be done via a manual support process, but that’s normally more expensive for a business than just having the form to do it. There is one other scenario, however, when you’ve obtained the data from other sources (i.e. the user hasn’t provided their details to you directly). In that case there should still be a page where they can identify somehow (via email and/or sms confirmation) and get access to the data about them.
  • Consent checkboxes – this is in my opinion the biggest change that the regulation brings. “I accept the terms and conditions” would no longer be sufficient to claim that the user has given their consent for processing their data. So, for each particular processing activity there should be a separate checkbox on the registration (or user profile) screen. You should keep these consent checkboxes in separate columns in the database, and let the users withdraw their consent (by unchecking these checkboxes from their profile page – see the previous point). Ideally, these checkboxes should come directly from the register of processing activities (if you keep one). Note that the checkboxes should not be preselected, as this does not count as “consent”.
  • Re-request consent – if the consent users have given was not clear (e.g. if they simply agreed to terms & conditions), you’d have to re-obtain that consent. So prepare a functionality for mass-emailing your users to ask them to go to their profile page and check all the checkboxes for the personal data processing activities that you have.
  • “See all my data” – this is very similar to the “Export” button, except data should be displayed in the regular UI of the application rather than an XML/JSON format. For example, Google Maps shows you your location history – all the places that you’ve been to. It is a good implementation of the right to access. (Though Google is very far from perfect when privacy is concerned)
  • Age checks – you should ask for the user’s age, and if the user is a child (below 16), you should ask for parent permission. There’s no clear way how to do that, but my suggestion is to introduce a flow, where the child should specify the email of a parent, who can then confirm. Obviosuly, children will just cheat with their birthdate, or provide a fake parent email, but you will most likely have done your job according to the regulation (this is one of the “wishful thinking” aspects of the regulation).

Now some “do’s”, which are mostly about the technical measures needed to protect personal data. They may be more “ops” than “dev”, but often the application also has to be extended to support them. I’ve listed most of what I could think of in a previous post.

  • Encrypt the data in transit. That means that communication between your application layer and your database (or your message queue, or whatever component you have) should be over TLS. The certificates could be self-signed (and possibly pinned), or you could have an internal CA. Different databases have different configurations, just google “X encrypted connections. Some databases need gossiping among the nodes – that should also be configured to use encryption
  • Encrypt the data at rest – this again depends on the database (some offer table-level encryption), but can also be done on machine-level. E.g. using LUKS. The private key can be stored in your infrastructure, or in some cloud service like AWS KMS.
  • Encrypt your backups – kind of obvious
  • Implement pseudonymisation – the most obvious use-case is when you want to use production data for the test/staging servers. You should change the personal data to some “pseudonym”, so that the people cannot be identified. When you push data for machine learning purposes (to third parties or not), you can also do that. Technically, that could mean that your User object can have a “pseudonymize” method which applies hash+salt/bcrypt/PBKDF2 for some of the data that can be used to identify a person
  • Protect data integrity – this is a very broad thing, and could simply mean “have authentication mechanisms for modifying data”. But you can do something more, even as simple as a checksum, or a more complicated solution (like the one I’m working on). It depends on the stakes, on the way data is accessed, on the particular system, etc. The checksum can be in the form of a hash of all the data in a given database record, which should be updated each time the record is updated through the application. It isn’t a strong guarantee, but it is at least something.
  • Have your GDPR register of processing activities in something other than Excel – Article 30 says that you should keep a record of all the types of activities that you use personal data for. That sounds like bureaucracy, but it may be useful – you will be able to link certain aspects of your application with that register (e.g. the consent checkboxes, or your audit trail records). It wouldn’t take much time to implement a simple register, but the business requirements for that should come from whoever is responsible for the GDPR compliance. But you can advise them that having it in Excel won’t make it easy for you as a developer (imagine having to fetch the excel file internally, so that you can parse it and implement a feature). Such a register could be a microservice/small application deployed separately in your infrastructure.
  • Log access to personal data – every read operation on a personal data record should be logged, so that you know who accessed what and for what purpose
  • Register all API consumers – you shouldn’t allow anonymous API access to personal data. I’d say you should request the organization name and contact person for each API user upon registration, and add those to the data processing register. Note: some have treated article 30 as a requirement to keep an audit log. I don’t think it is saying that – instead it requires 250+ companies to keep a register of the types of processing activities (i.e. what you use the data for). There are other articles in the regulation that imply that keeping an audit log is a best practice (for protecting the integrity of the data as well as to make sure it hasn’t been processed without a valid reason)

Finally, some “don’t’s”.

  • Don’t use data for purposes that the user hasn’t agreed with – that’s supposed to be the spirit of the regulation. If you want to expose a new API to a new type of clients, or you want to use the data for some machine learning, or you decide to add ads to your site based on users’ behaviour, or sell your database to a 3rd party – think twice. I would imagine your register of processing activities could have a button to send notification emails to users to ask them for permission when a new processing activity is added (or if you use a 3rd party register, it should probably give you an API). So upon adding a new processing activity (and adding that to your register), mass email all users from whom you’d like consent.
  • Don’t log personal data – getting rid of the personal data from log files (especially if they are shipped to a 3rd party service) can be tedious or even impossible. So log just identifiers if needed. And make sure old logs files are cleaned up, just in case
  • Don’t put fields on the registration/profile form that you don’t need – it’s always tempting to just throw as many fields as the usability person/designer agrees on, but unless you absolutely need the data for delivering your service, you shouldn’t collect it. Names you should probably always collect, but unless you are delivering something, a home address or phone is unnecessary.
  • Don’t assume 3rd parties are compliant – you are responsible if there’s a data breach in one of the 3rd parties (e.g. “processors”) to which you send personal data. So before you send data via an API to another service, make sure they have at least a basic level of data protection. If they don’t, raise a flag with management.
  • Don’t assume having ISO XXX makes you compliant – information security standards and even personal data standards are a good start and they will probably 70% of what the regulation requires, but they are not sufficient – most of the things listed above are not covered in any of those standards

Overall, the purpose of the regulation is to make you take conscious decisions when processing personal data. It imposes best practices in a legal way. If you follow the above advice and design your data model, storage, data flow , API calls with data protection in mind, then you shouldn’t worry about the huge fines that the regulation prescribes – they are for extreme cases, like Equifax for example. Regulators (data protection authorities) will most likely have some checklists into which you’d have to somehow fit, but if you follow best practices, that shouldn’t be an issue.

I think all of the above features can be implemented in a few weeks by a small team. Be suspicious when a big vendor offers you a generic plug-and-play “GDPR compliance” solution. GDPR is not just about the technical aspects listed above – it does have organizational/process implications. But also be suspicious if a consultant claims GDPR is complicated. It’s not – it relies on a few basic principles that are in fact best practices anyway. Just don’t ignore them.

The post GDPR – A Practical Guide For Developers appeared first on Bozho's tech blog.

The 10 Most Viewed Security-Related AWS Knowledge Center Articles and Videos for November 2017

Post Syndicated from Maggie Burke original https://aws.amazon.com/blogs/security/the-10-most-viewed-security-related-aws-knowledge-center-articles-and-videos-for-november-2017/

AWS Knowledge Center image

The AWS Knowledge Center helps answer the questions most frequently asked by AWS Support customers. The following 10 Knowledge Center security articles and videos have been the most viewed this month. It’s likely you’ve wondered about a few of these topics yourself, so here’s a chance to learn the answers!

  1. How do I create an AWS Identity and Access Management (IAM) policy to restrict access for an IAM user, group, or role to a particular Amazon Virtual Private Cloud (VPC)?
    Learn how to apply a custom IAM policy to restrict IAM user, group, or role permissions for creating and managing Amazon EC2 instances in a specified VPC.
  2. How do I use an MFA token to authenticate access to my AWS resources through the AWS CLI?
    One IAM best practice is to protect your account and its resources by using a multi-factor authentication (MFA) device. If you plan use the AWS Command Line Interface (CLI) while using an MFA device, you must create a temporary session token.
  3. Can I restrict an IAM user’s EC2 access to specific resources?
    This article demonstrates how to link multiple AWS accounts through AWS Organizations and isolate IAM user groups in their own accounts.
  4. I didn’t receive a validation email for the SSL certificate I requested through AWS Certificate Manager (ACM)—where is it?
    Can’t find your ACM validation emails? Be sure to check the email address to which you requested that ACM send validation emails.
  5. How do I create an IAM policy that has a source IP restriction but still allows users to switch roles in the AWS Management Console?
    Learn how to write an IAM policy that not only includes a source IP restriction but also lets your users switch roles in the console.
  6. How do I allow users from another account to access resources in my account through IAM?
    If you have the 12-digit account number and permissions to create and edit IAM roles and users for both accounts, you can permit specific IAM users to access resources in your account.
  7. What are the differences between a service control policy (SCP) and an IAM policy?
    Learn how to distinguish an SCP from an IAM policy.
  8. How do I share my customer master keys (CMKs) across multiple AWS accounts?
    To grant another account access to your CMKs, create an IAM policy on the secondary account that grants access to use your CMKs.
  9. How do I set up AWS Trusted Advisor notifications?
    Learn how to receive free weekly email notifications from Trusted Advisor.
  10. How do I use AWS Key Management Service (AWS KMS) encryption context to protect the integrity of encrypted data?
    Encryption context name-value pairs used with AWS KMS encryption and decryption operations provide a method for checking ciphertext authenticity. Learn how to use encryption context to help protect your encrypted data.

The AWS Security Blog will publish an updated version of this list regularly going forward. You also can subscribe to the AWS Knowledge Center Videos playlist on YouTube.

– Maggie