Tag Archives: announcements

Meet the newest AWS Heroes including the first DevTools Heroes!

2020-11-12 Ross Barich

Post Syndicated from Ross Barich original https://aws.amazon.com/blogs/aws/meet-the-newest-aws-heroes-including-the-first-devtools-heroes/

The AWS Heroes program recognizes individuals from around the world who have extensive AWS knowledge and go above and beyond to share their expertise with others. The program continues to grow, to better recognize the most influential community leaders across a variety of technical disciplines.

Introducing AWS DevTools Heroes
Today we are introducing AWS DevTools Heroes: passionate advocates of the developer experience on AWS and the tools that enable that experience. DevTools Heroes excel at sharing their knowledge and building community through open source contributions, blogging, speaking, community organizing, and social media. Through their feedback, content, and contributions DevTools Heroes help shape the AWS developer experience and evolve the AWS DevTools, such as the AWS Cloud Development Kit, the AWS SDKs, and AWS Code suite of services.

The first cohort of AWS DevTools Heroes include:

Bhuvaneswari Subramani – Bengaluru, India

DevTools Hero Bhuvaneswari Subramani is Director Engineering Operations at Infor. With two decades of IT experience, specializing in Cloud Computing, DevOps, and Performance Testing, she is one of the community leaders of AWS User Group Bengaluru. She is also an active speaker at AWS community events and industry conferences, and delivers guest lectures on Cloud Computing for staff and students at engineering colleges across India. Her workshops, presentations, and blogs on AWS Developer Tools always stand out.

Jared Short – Washington DC, USA

DevTools Hero Jared Short is an engineer at Stedi, where they are using AWS native tooling and serverless services to build a global network for exchanging B2B transactions in a standard format. Jared’s work includes early contributions to the Serverless Framework. These days, his focus is working with AWS CDK and other toolsets to create intuitive and joyful developer experiences for teams on AWS.

Matt Coulter – Belfast, Northern Ireland

DevTools Hero Matt Coulter is a Technical Architect for Liberty IT, focused on creating the right environment for empowered teams to rapidly deliver business value in a well-architected, sustainable, and serverless-first way. Matt has been creating this environment by building CDK Patterns, an open source collection of serverless architecture patterns built using AWS CDK that reference the AWS Well Architected Framework. Matt also created CDK Day, which was the first community driven, global conference focused on everything CDK (AWS CDK, CDK for Terraform, CDK for Kubernetes, and others).

Paul Duvall – Washington DC, USA

DevTools Hero Paul Duvall is co-founder and former CTO of Stelligent. He is principal author of “Continuous Integration: Improving Software Quality and Reducing Risk,” and is also the author of many other publications including “Continuous Compliance on AWS,” “Continuous Encryption on AWS,” and “Continuous Security on AWS.” Paul hosted the DevOps on AWS Radio podcast for over three years and has been an enthusiastic user and advocate of AWS Developer Tools since their respective releases.

Sebastian Korfmann – Hamburg, Germany

DevTools Hero Sebastian Korfmann is an entrepreneurial Software Engineer with a current focus on Cloud Tooling, Infrastructure as Code, and the Cloud Development Kit (CDK) ecosystem in particular. He is a core contributor to the CDK for Terraform project, which enables users to define infrastructure using TypeScript, Python, and Java while leveraging the hundreds of providers and thousands of module definitions provided by Terraform and the Terraform ecosystem. With cdk.dev, Sebastian co-founded a community-driven hub for all things CDK, and he runs a weekly newsletter covering the growing CDK ecosystem.

Steve Gordon – East Sussex, United Kingdom

DevTools Hero Steve Gordon is a Pluralsight author and senior engineer who is passionate about community and all things .NET related, having worked with .NET for over 16 years. Steve has used AWS extensively for five years as a platform for running .NET microservices. He blogs regularly about running .NET on AWS, including deep dives into how the .NET SDK works, building cloud-native services, and how to deploy .NET containers to Amazon ECS. Steve founded .NET South East, a .NET Meetup group based in Brighton.

Thorsten Höger – Stuttgart, Germany

DevTools Hero Thorsten Höger is CEO and cloud consultant at Taimos, where he is advising customers on how to use AWS. Being a developer, he focuses on improving development processes and automating everything to build efficient deployment pipelines for customers of all sizes. As a supporter of open-source software, Thorsten is maintaining or contributing to several projects on GitHub, like test frameworks for AWS Lambda, Amazon Alexa, or developer tools for AWS. He is also the maintainer of the Jenkins AWS Pipeline plugin and one of the top three non-AWS contributors to AWS CDK.

Meet the rest of the new AWS Heroes
There is more good news! We are thrilled to introduce the remaining new AWS Heroes in this cohort, including the first Heroes from Argentina, Lebanon, and Saudi Arabia:

Ahmed Samir – Riyadh, Saudi Arabia

Community Hero Ahmed Samir is a Cloud Architect and mentor with more than 12 years of experience in IT. He is the leader of three Arabic Meetups in Riyadh: AWS, Amazon SageMaker, and Kubernetes, where he has organized and delivered over 40 Meetup events. Ahmed frequently shares knowledge and evangelizes AWS in Arabic through his social media accounts. He also holds AWS and Kubernetes certifications.

Anas Khattar – Beirut, Lebanon

Community Hero Anas Khattar is co-founder of Digico Solutions. He founded the AWS User Group Lebanon in 2018 and coordinates monthly meetups and workshops on a variety of cloud topics, which helped grow the group to more than 1,000 AWSome members. He also regularly speaks at tech conferences and authors tech blogs on Dev Community, sharing his AWS experiences and best practices. In close collaboration with the regional AWS community leaders and builders, Anas organized AWS Community Day MENA, which started in September 2020 with 12 User Groups from 10 countries, and hosted 27 speakers over 2 days.

Chris Gong – New York, USA

Community Hero Chris Gong is constantly exploring the different ways that cloud services can be applied in game development. Passionate about sharing his knowledge with the world, he routinely creates tutorials and educational videos on his YouTube channel, Flopperam, where the primary focus has been AWS Game Tech and Unreal Engine, specifically the multiplayer and networking aspects of game development. Although Amazon GameLift has been his biggest interest, Chris has plans to cover the usage of other AWS services in game development while exploring how they can be integrated with other game engines besides Unreal Engine.

Damian Olguin – Cordoba, Argentina

Community Hero Damian Olguin is a tech entrepreneur and one of the founders of Teracloud, an AWS APN Partner. As a community leader, he promotes knowledge-sharing experiences in user communities within LATAM. He is co-organizer of AWS User Group Cordoba and co-host of #DeepFridays, a Twitch streaming show that promotes AI/ML technology adoption by playing with DeepRacer, DeepLens, and DeepComposer. Damian is a public speaker who has spoken at AWS Community Day Buenos Aires 2019, AWS re:Invent 2019, and AWS Community Day LATAM 2020.

Denis Bauer – Sydney, Australia

Data Hero Denis Bauer is a Principal Research Scientist at Australia’s government research agency (CSIRO). Her open source products include VariantSpark, the Machine Learning tool for analysing ultra-high dimensional data, which was the first AWS Marketplace health product from a public sector organization. Denis is passionate about facilitating the digital transformation of the health and life-science sector by building a strong community of practice through open source technology, keynote presentations, and inclusive interdisciplinary collaborations. For example, the collaboration with her organization’s visionary Scientific Computing and Cloud Platforms experts as well as AWS Data Hero Lynn Langit has enabled the creation of cloud-based bioinformatics solutions used by 10,000 researchers annually.

Denis Dyack – St. Catharines, Canada

Community Hero Denis Dyack, a video game industry veteran of more than 30 years, is the Founder and CEO of Apocalypse Studios. His studio evangelizes using a cloud-first approach in game development and partners with AWS Game Tech to move the medium of the games industry forward. In his years of experience speaking at games conferences and within AWS communities, Denis has been an advocate for building on Amazon Lumberyard and more recently in moving over game development pipelines to AWS.

Emrah Şamdan – Ankara, Turkey

Serverless Hero Emrah Şamdan is the VP of Products at Thundra. In order to expand the serverless community globally in a pandemic, he co-organized the quarterly held ServerlessDays Virtual. He’s also a local community organizer for AWS Community Day Turkey, ServerlessDays Istanbul, and bi-weekly meetups at Cloud and Serverless Turkey. He’s currently part of the core organizer team of global ServerlessDays and is continuously looking for ways to expand the community. He frequently writes about serverless and cloud-native microservices on Medium and on the Thundra blog.

Franck Pachot – Lausanne, Switzerland

Data Hero Franck Pachot is the Principal Consultant and Database Evangelist at dbi services (an AWS Select Partner) and is passionate about all databases. With over 20 years of experience in development, data modeling, infrastructure, and all DBA tasks, Franck is a recognized database expert across Oracle and AWS. Franck is also an AWS Academy educator for Powercoders, and holds AWS Certified Database Specialty and Oracle Certified Master certifications. Franck contributes to technical communities, educating customers on AWS Databases through his blog, Twitter, and podcast in French. He is active in the Data community, and enjoys talking and meeting other data enthusiasts at conferences.

Hiroko Nishimura – Washington DC, USA

Community Hero Hiroko Nishimura (Hiro) is the founder of AWS Newbies and Cloud Newbies, which help people with non-traditional technical backgrounds begin their explorations into the AWS Cloud. As a “career switcher” herself, she has been community building since 2018 to help others deconstruct Cloud Computing jargon so they, too, can begin a career in the Cloud. Finally putting her degrees in Special Education to good use, Hiro teaches “Introduction to AWS for Non-Engineers” courses at LinkedIn Learning, and introductory coding lessons at egghead.

Juliano Cristian – Santa Catarina, Brazil

Community Hero Juliano Cristian is CEO of Game Business Accelerator Academy and co-founder of Game Developers SC which participates in the AWS APN program. He organizes the AWS Game Tech Lumberyard User Group in Florianópolis, Brazil, holding Meetups, Practical Labs, Game Jams, and Workshops. He also conducts many lectures at universities, speaking with more than 90 educational institutions across Brazil, introducing students to cloud computing and AWS Game Tech services. Whenever he can, Juliano also participates in other AWS User Groups in Brazil and Latin America, working to build an increasingly motivated and productive community.

Jungyoul Yu – Seoul, Korea

Machine Learning Hero Jungyoul Yu works at Danggeun Market as a DevOps Engineer, and is a leader of the AWS DeepRacer Group, part of the AWS Korea User Group. He was one of the AWS DeepRacer League finalists in AWS Summit Seoul and AWS re:Invent 2019. Starting off with zero ML experience, Jungyoul used AWS DeepRacer to learn ML techniques and began sharing his learnings both in the AWS DeepRacer Community, with User Groups, at AWS Community Day, and at various meetups. He has also shared many blog posts and sample code such as DeepRacer Reward Function Simulator, Rank Notifier, and Auto submit bot.

Juv Chan – Singapore

Machine Learning Hero Juv Chan is an AI automation engineer at UBS. He is the AWS DeepRacer League Singapore Summit 2019 champion and a re:Invent Championship Cup 2019 finalist. He is the lead organizer for the AWS DeepRacer Beginner Challenge global virtual community race in 2020. Juv is also involved in sharing his DeepRacer and Machine Learning knowledge with the AWS Machine Learning community at both global and regional scale. Juv is a contributing writer for both Towards Data Science and Towards AI platforms, where he blogs about AWS AI/ML and cloud relevant topics.

Lukonde Mwila – Johannesburg, South Africa

Container Hero Lukonde Mwila is a Senior Software Engineer at Entelect. He has a passion for sharing knowledge through speaking engagements such as meetups and tech conferences, as well as writing technical articles. His talk at DockerCon 2020 on deploying multi-container applications to AWS was one of the top rated and most viewed sessions of the event. He is 3x AWS certified and is an advocate for containerization and serverless technologies. Lukonde enjoys sharing experiences of building out AWS infrastructure on Medium and sharing open source projects on GitHub for the developer community to easily consume, replicate, and improve for their own benefit.

Magdalena Zawada – Rybnik, Poland

Community Hero Magdalena Zawada is Director of Strategy and Expansion at LCloud Ltd. She has been working in IT for 11 years and started her adventure with AWS technology in 2013 as CEO of Hostersi Ltd. Magdalena willingly shares her knowledge and experience with the community, co-organizing AWS UG Warsaw meetings. She also organizes a series of events to support preparation for obtaining AWS Cloud Practitioner Certifications. Magdalena belongs to numerous industry organizations, including FinOps Foundation and ISSA Information Systems Security Association Poland, and in 2019 she was a participant of the AWS re:Invent Community Leader “We Power Tech” Diversity Grant in Las Vegas.

Nick Walter – Lincoln, USA

Data Hero Nick Walter has over 15 years of experience with enterprise IT solutions, including expertise and certifications in AWS, VMware, and Oracle. A passionate evangelist for data management solutions on AWS, Nick can often be found blogging, hosting webinars, or presenting at conferences regarding the latest trends in business critical database technologies. Recently, Nick has focused on helping clients find cost effective ways to handle both the technical and licensing challenges of migrating application stacks backed by commercial database engines, such as Oracle or MS SQL Server, into AWS.

Renato Losio – Berlin, Germany

Data Hero Renato Losio is the Principal Cloud Architect at Funambol, a provider of white label cloud services. He has been working with AWS technologies since 2011, and holds 7 AWS certifications (including the Database Specialty). Renato enjoys speaking at international events, including DevOps Pro Europe, DevOpsConf Russia, All Day DevOps, Codemotion, and Percona Live. Passionate about knowledge sharing, Renato is an editor at InfoQ and writes about different cloud-related topics on his blog, cloudiamo.com. Through his various platforms, he has covered different topics across AWS Databases, such as Amazon RDS Proxy, Amazon RDS, and Amazon Aurora.

Tomasz Lakomy – Poznan, Poland

Community Hero Tomasz Lakomy is a Senior Frontend Engineer at OLX Group, and an egghead.io instructor. Over the last two years, he’s been diving into the world of AWS and sharing what he’s learned with others. After passing the AWS Certified Solutions Architect: Associate exam in 2019 he has recorded multiple courses on serverless technologies, including “Build an App with the AWS Cloud Development Kit” and “Learn AWS Lambda from Scratch.” In addition, he’s active on his Twitter, blog (tlakomy.com), as well as The Practical Dev community, where he posts articles on career advice, testing and – of course – AWS.

If you’d like to learn more about the new Heroes, or connect with a Hero near you, please visit the AWS Hero website.

— Ross;

Using AWS Lambda extensions to send logs to custom destinations

2020-11-12 Julian Wood

Post Syndicated from Julian Wood original https://aws.amazon.com/blogs/compute/using-aws-lambda-extensions-to-send-logs-to-custom-destinations/

You can now send logs from AWS Lambda functions directly to a destination of your choice using AWS Lambda Extensions. Lambda Extensions are a new way for monitoring, observability, security, and governance tools to easily integrate with AWS Lambda. For more information, see “Introducing AWS Lambda Extensions – In preview”.

To help you troubleshoot failures in Lambda functions, AWS Lambda automatically captures and streams logs to Amazon CloudWatch Logs. This stream contains the logs that your function code and extensions generate, in addition to logs the Lambda service generates as part of the function invocation.

Previously, to send logs to a custom destination, you typically configure and operate a CloudWatch Log Group subscription. A different Lambda function forwards logs to the destination of your choice.

Logging tools, running as Lambda extensions, can now receive log streams directly from within the Lambda execution environment, and send them to any destination. This makes it even easier for you to use your preferred extensions for diagnostics.

Today, you can use extensions to send logs to Coralogix, Datadog, Honeycomb, Lumigo, New Relic, and Sumo Logic.

Overview

To receive logs, extensions subscribe using the new Lambda Logs API.

Lambda Logs API

The Lambda service then streams the logs directly to the extension. The extension can then process, filter, and route them to any preferred destination. Lambda still sends the logs to CloudWatch Logs.

You deploy extensions, including ones that use the Logs API, as Lambda layers, with the AWS Management Console and AWS Command Line Interface (AWS CLI). You can also use infrastructure as code tools such as AWS CloudFormation, the AWS Serverless Application Model (AWS SAM), Serverless Framework, and Terraform.

Logging extensions from AWS Lambda Ready Partners and AWS Partners available at launch

Today, you can use logging extensions with the following tools:

The Datadog extension now makes it easier than ever to collect your serverless application logs for visualization, analysis, and archival. Paired with Datadog’s AWS integration, end-to-end distributed tracing, and real-time enhanced AWS Lambda metrics, you can proactively detect and resolve serverless issues at any scale.
Lumigo provides monitoring and debugging for modern cloud applications. With the open source extension from Lumigo, you can send Lambda function logs directly to an S3 bucket, unlocking new post processing use cases.
New Relic enables you to efficiently monitor, troubleshoot, and optimize your Lambda functions. New Relic’s extension allows you send your Lambda service platform logs directly to New Relic’s unified observability platform, allowing you to quickly visualize data with minimal latency and cost.
Coralogix is a log analytics and cloud security platform that empowers thousands of companies to improve security and accelerate software delivery, allowing you to get deep insights without paying for the noise. Coralogix can now read Lambda function logs and metrics directly, without using Cloudwatch or S3, reducing the latency, and cost of observability.
Honeycomb is a powerful observability tool that helps you debug your entire production app stack. Honeycomb’s extension decreases the overhead, latency, and cost of sending events to the Honeycomb service, while increasing reliability.
The Sumo Logic extension enables you to get instant visibility into the health and performance of your mission-critical applications using AWS Lambda. With this extension and Sumo Logic’s continuous intelligence platform, you can now ensure that all your Lambda functions are running as expected, by analyzing function, platform, and extension logs to quickly identify and remediate errors and exceptions.

You can also build and use your own logging extensions to integrate your organization’s tooling.

Showing a logging extension to send logs directly to S3

This demo shows an example of using a simple logging extension to send logs to Amazon Simple Storage Service (S3).

To set up the example, visit the GitHub repo and follow the instructions in the README.md file.

The example extension runs a local HTTP endpoint listening for HTTP POST events. Lambda delivers log batches to this endpoint. The example creates an S3 bucket to store the logs. A Lambda function is configured with an environment variable to specify the S3 bucket name. Lambda streams the logs to the extension. The extension copies the logs to the S3 bucket.

Lambda environment variable specifying S3 bucket

The extension uses the Extensions API to register for INVOKE and SHUTDOWN events. The extension, using the Logs API, then subscribes to receive platform and function logs, but not extension logs.

As the example is an asynchronous system, logs for one invoke may be processed during the next invocation. Logs for the last invoke may be processed during the SHUTDOWN event.

Testing the function from the Lambda console, Lambda sends logs to CloudWatch Logs. The logs stream shows logs from the platform, function, and extension.

Lambda logs visible in CloudWatch Logs

The logging extension also receives the log stream directly from Lambda, and copies the logs to S3.

Browsing to the S3 bucket, the log files are available.

S3 bucket containing copied logs.

Downloading the file shows the log lines. The log contains the same platform and function logs, but not the extension logs, as specified during the subscription.

[{'time': '2020-11-12T14:55:06.560Z', 'type': 'platform.start', 'record': {'requestId': '49e64413-fd42-47ef-b130-6fd16f30148d', 'version': '$LATEST'}},
{'time': '2020-11-12T14:55:06.774Z', 'type': 'platform.logsSubscription', 'record': {'name': 'logs_api_http_extension.py', 'state': 'Subscribed', 'types': ['platform', 'function']}},
{'time': '2020-11-12T14:55:06.774Z', 'type': 'platform.extension', 'record': {'name': 'logs_api_http_extension.py', 'state': 'Ready', 'events': ['INVOKE', 'SHUTDOWN']}},
{'time': '2020-11-12T14:55:06.776Z', 'type': 'function', 'record': 'Function: Logging something which logging extension will send to S3\n'}, {'time': '2020-11-12T14:55:06.780Z', 'type': 'platform.end', 'record': {'requestId': '49e64413-fd42-47ef-b130-6fd16f30148d'}}, {'time': '2020-11-12T14:55:06.780Z', 'type': 'platform.report', 'record': {'requestId': '49e64413-fd42-47ef-b130-6fd16f30148d', 'metrics': {'durationMs': 4.96, 'billedDurationMs': 100, 'memorySizeMB': 128, 'maxMemoryUsedMB': 87, 'initDurationMs': 792.41}, 'tracing': {'type': 'X-Amzn-Trace-Id', 'value': 'Root=1-5fad4cc9-70259536495de84a2a6282cd;Parent=67286c49275ac0ad;Sampled=1'}}}]

Lambda has sent specific logs directly to the subscribed extension. The extension has then copied them directly to S3.

For more example log extensions, see the Github repository.

How do extensions receive logs?

Extensions start a local listener endpoint to receive the logs using one of the following protocols:

TCP – Logs are delivered to a TCP port in Newline delimited JSON format (NDJSON).
HTTP – Logs are delivered to a local HTTP endpoint through PUT or POST, as an array of records in JSON format. http://sandbox:${PORT}/${PATH}. The $PATH parameter is optional.

AWS recommends using an HTTP endpoint over TCP because HTTP tracks successful delivery of the log messages to the local endpoint that the extension sets up.

Once the endpoint is running, extensions use the Logs API to subscribe to any of three different logs streams:

Function logs that are generated by the Lambda function.
Lambda service platform logs (such as the START, END, and REPORT logs in CloudWatch Logs).
Extension logs that are generated by extension code.

The Lambda service then sends logs to endpoint subscribers inside of the execution environment only.

Even if an extension subscribes to one or more log streams, Lambda continues to send all logs to CloudWatch.

Performance considerations

Extensions share resources with the function, such as CPU, memory, disk storage, and environment variables. They also share permissions, using the same AWS Identity and Access Management (IAM) role as the function.

Log subscriptions consume memory resources as each subscription opens a new memory buffer to store the logs. This memory usage counts towards memory consumed within the Lambda execution environment.

For more information on resources, security and performance with extensions, see “Introducing AWS Lambda Extensions – In preview”.

What happens if Lambda cannot deliver logs to an extension?

The Lambda service stores logs before sending to CloudWatch Logs and any subscribed extensions. If Lambda cannot deliver logs to the extension, it automatically retries with backoff. If the log subscriber crashes, Lambda restarts the execution environment. The logs extension re-subscribes, and continues to receive logs.

When using an HTTP endpoint, Lambda continues to deliver logs from the last acknowledged delivery. With TCP, the extension may lose logs if an extension or the execution environment fails.

The Lambda service buffers logs in memory before delivery. The buffer size is proportional to the buffering configuration used in the subscription request. If an extension cannot process the incoming logs quickly enough, the buffer fills up. To reduce the likelihood of an out of memory event due to a slow extension, the Lambda service drops records and adds a platform.logsDropped log record to the affected extension to indicate the number of dropped records.

Disabling logging to CloudWatch Logs

Lambda continues to send logs to CloudWatch Logs even if extensions subscribe to the logs stream.

To disable logging to CloudWatch Logs for a particular function, you can amend the Lambda execution role to remove access to CloudWatch Logs.

{
"Version": "2012-10-17",
"Statement": [
    {
        "Effect": "Deny",
        "Action": [
            "logs:CreateLogGroup",
            "logs:CreateLogStream",
            "logs:PutLogEvents"
        ],
        "Resource": [
            "arn:aws:logs:*:*:*"
        ]
    }
  ]
}

Logs are no longer delivered to CloudWatch Logs for functions using this role, but are still streamed to subscribed extensions. You are no longer billed for CloudWatch logging for these functions.

Pricing

Logging extensions, like other extensions, share the same billing model as Lambda functions. When using Lambda functions with extensions, you pay for requests served and the combined compute time used to run your code and all extensions, in 100 ms increments. To learn more about the billing for extensions, visit the Lambda FAQs page.

Conclusion

Lambda extensions enable you to extend the Lambda service to more easily integrate with your favorite tools for monitoring, observability, security, and governance.

Extensions can now subscribe to receive log streams directly from the Lambda service, in addition to CloudWatch Logs. Today, you can install a number of available logging extensions from AWS Lambda Ready Partners and AWS Partners. Extensions make it easier to use your existing tools with your serverless applications.

To try the S3 demo logging extension, follow the instructions in the README.md file in the GitHub repository.

Extensions are now available in preview in all commercial regions other than the China regions.

For more serverless learning resources, visit https://serverlessland.com.

Announcing AWS Glue DataBrew – A Visual Data Preparation Tool That Helps You Clean and Normalize Data Faster

2020-11-11 Danilo Poccia

Post Syndicated from Danilo Poccia original https://aws.amazon.com/blogs/aws/announcing-aws-glue-databrew-a-visual-data-preparation-tool-that-helps-you-clean-and-normalize-data-faster/

To be able to run analytics, build reports, or apply machine learning, you need to be sure the data you’re using is clean and in the right format. That’s the data preparation step that requires data analysts and data scientists to write custom code and do many manual activities. First, you need to look at the data, understand which possible values are present, and build some simple visualizations to understand if there are correlations between the columns. Then, you need to check for strange values outside of what you’re expecting, such as weather temperature above 200℉ (93℃) or speed of a truck above 200 mph (322 km/h), or for data that is missing. Many algorithms need values to be rescaled to a specific range, for example between 0 and 1, or normalized around the mean. Text fields need to be set to a standard format, and may require advanced transformations such as stemming.

That’s a lot of work. For this reason, I am happy to announce that today AWS Glue DataBrew is available, a visual data preparation tool that helps you clean and normalize data up to 80% faster so you can focus more on the business value you can get.

DataBrew provides a visual interface that quickly connects to your data stored in Amazon Simple Storage Service (S3), Amazon Redshift, Amazon Relational Database Service (RDS), any JDBC accessible data store, or data indexed by the AWS Glue Data Catalog. You can then explore the data, look for patterns, and apply transformations. For example, you can apply joins and pivots, merge different data sets, or use functions to manipulate data.

Once your data is ready, you can immediately use it with AWS and third-party services to gain further insights, such as Amazon SageMaker for machine learning, Amazon Redshift and Amazon Athena for analytics, and Amazon QuickSight and Tableau for business intelligence.

How AWS Glue DataBrew Works
To prepare your data with DataBrew, you follow these steps:

Connect one or more datasets from S3 or the Glue data catalog (S3, Redshift, RDS). You can also upload a local file to S3 from the DataBrew console. CSV, JSON, Parquet, and .XLSX formats are supported.
Create a project to visually explore, understand, combine, clean, and normalize data in a dataset. You can merge or join multiple datasets. From the console, you can quickly spot anomalies in your data with value distributions, histograms, box plots, and other visualizations.
Generate a rich data profile for your dataset with over 40 statistics by running a job in the profile view.
When you select a column, you get recommendations on how to improve data quality.
You can clean and normalize data using more than 250 built-in transformations. For example, you can remove or replace null values, or create encodings. Each transformation is automatically added as a step to build a recipe.
You can then save, publish, and version recipes, and automate the data preparation tasks by applying recipes on all incoming data. To apply recipes to or generate profiles for large datasets, you can run jobs.
At any point in time, you can visually track and explore how datasets are linked to projects, recipes, and job runs. In this way, you can understand how data flows and what are the changes. This information is called data lineage and can help you find the root cause in case of errors in your output.

Let’s see how this works with a quick demo!

Preparing a Sample Dataset with AWS Glue DataBrew
In the DataBrew console, I select the Projects tab and then Create project. I name the new project Comments. A new recipe is also created and will be automatically updated with the data transformations that I will apply next.

I choose to work on a New dataset and name it Comments.

Here, I select Upload file and in the next dialog I upload a comments.csv file I prepared for this demo. In a production use case, here you will probably connect an existing source on S3 or in the Glue Data Catalog. For this demo, I specify the S3 destination for storing the uploaded file. I leave Encryption disabled.

The comments.csv file is very small, but will help show some common data preparation needs and how to complete them quickly with DataBrew. The format of the file is comma-separated values (CSV). The first line contains the name of the columns. Then, each line contains a text comment and a numerical rating made by a customer (customer_id) about an item (item_id). Each item is part of a category. For each text comment, there is an indication of the overall sentiment (comment_sentiment). Optionally, when giving the comment, customers can enable a flag to ask to be contacted for further support (support_needed).

Here’s the content of the comments.csv file:

customer_id,item_id,category,rating,comment,comment_sentiment,support_needed
234,2345,"Electronics;Computer", 5,"I love this!",Positive,False
321,5432,"Home;Furniture",1,"I can't make this work... Help, please!!!",negative,true
123,3245,"Electronics;Photography",3,"It works. But I'd like to do more",,True
543,2345,"Electronics;Computer",4,"Very nice, it's going well",Positive,False
786,4536,"Home;Kitchen",5,"I really love it!",positive,false
567,5432,"Home;Furniture",1,"I doesn't work :-(",negative,true
897,4536,"Home;Kitchen",3,"It seems OK...",,True
476,3245,"Electronics;Photography",4,"Let me say this is nice!",positive,false

In the Access permissions, I select a AWS Identity and Access Management (IAM) role which provides DataBrew read permissions to my input S3 bucket. Only roles where DataBrew is the service principal for the trust policy are shown in the DataBrew console. To create one in the IAM console, select DataBrew as trusted entity.

If the dataset is big, you can use Sampling to limit the number of rows to use in the project. These rows can be selected at the beginning, at the end, or randomly through the data. You are going to use projects to create recipes, and then jobs to apply recipes to all the data. Depending on your dataset, you may not need access to all the rows to define the data preparation recipe.

Optionally, you can use Tagging to manage, search, or filter resources you create with AWS Glue DataBrew.

The project is now being prepared and in a few minutes I can start exploring my dataset.

In the Grid view, the default when I create a new project, I see the data as it has been imported. For each column, there is a summary of the range of values that have been found. For numerical columns, the statistical distribution is given.

In the Schema view, I can drill down on the schema that has been inferred, and optionally hide some of the columns.

In the Profile view, I can run a data profile job to examine and collect statistical summaries about the data. This is an assessment in terms of structure, content, relationships, and derivation. For a large dataset, this is very useful to understand the data. For this small example the benefits are limited, but I run it nonetheless, sending the output of the profile job to a different folder in the same S3 bucket I use to store the source data.

When the profile job has succeeded, I can see a summary of the rows and columns in my dataset, how many columns and rows are valid, and correlations between columns.

Here, if I select a column, for example rating, I can drill down into specific statistical information and correlations for that column.

Now, let’s do some actual data preparation. In the Grid view, I look at the columns. The category contains two pieces of information, separated by a semicolon. For example, the category of the first row is “Electronics;Computers.” I select the category column, then click on the column actions (the three small dots on the right of the column name) and there I have access to many transformations that I can apply to the column. In this case, I select to split the column on a single delimiter. Before applying the changes, I quickly preview them in the console.

I use the semicolon as delimiter, and now I have two columns, category_1 and category_2. I use the column actions again to rename them to category and subcategory. Now, for the first row, category contains Electronics and subcategory Computers. All these changes are added as steps to the project recipe, so that I’ll be able to apply them to similar data.

The rating column contains values between 1 and 5. For many algorithms, I prefer to have these kind of values normalized. In the column actions, I use min-max normalization to rescale the values between 0 and 1. More advanced techniques are available, such as mean or Z-score normalization. A new rating_normalized column is added.

I look into the recommendations that DataBrew gives for the comment column. Since it’s text, the suggestion is to use a standard case format, such as lowercase, capital case, or sentence case. I select lowercase.

The comments contain free text written by customers. To simplify further analytics, I use word tokenization on the column to remove stop words (such as “a,” “an,” “the”), expand contractions (so that “don’t” becomes “do not”), and apply stemming. The destination for these changes is a new column, comment_tokenized.

I still have some special characters in the comment_tokenized column, such as an emoticon :-). In the column actions, I select to clean and remove special characters.

I look into the recommendations for the comment_sentiment column. There are some missing values. I decide to fill the missing values with a neutral sentiment. Now, I still have values written with a different case, so I follow the recommendation to use lowercase for this column.

The comment_sentiment column now contains three different values (positive, negative, or neutral), but many algorithms prefer to have one-hot encoding, where there is a column for each of the possible values, and these columns contain 1, if that is the original value, or 0 otherwise. I select the Encode icon in the menu bar and then One-hot encode column. I leave the defaults and apply. Three new columns for the three possible values are added.

The support_needed column is recognized as boolean, and its values are automatically formatted to a standard format. I don’t have to do anything here.

The recipe for my dataset is now ready to be published and can be used in a recurring job processing similar data. I didn’t have a lot of data, but the recipe can be used with much larger datasets.

In the recipe, you can find a list of all the transformations that I just applied. When running a recipe job, output data is available in S3 and ready to be used with analytics and machine learning platforms, or to build reports and visualization with BI tools. The output can be written in a different format than the input, for example using a columnar storage format like Apache Parquet.

Available Now

AWS Glue DataBrew is available today in US East (N. Virginia), US East (Ohio), US West (Oregon), Europe (Ireland), Europe (Frankfurt), Asia Pacific (Tokyo), Asia Pacific (Sydney).

It’s never been easier to prepare you data for analytics, machine learning, or for BI. In this way, you can really focus on getting the right insights for your business instead of writing custom code that you then have to maintain and update.

To practice with DataBrew, you can create a new project and select one of the sample datasets that are provided. That’s a great way to understand all the available features and how you can apply them to your data.

Learn more and get started with AWS Glue DataBrew today.

— Danilo

Verified, episode 2 – A Conversation with Emma Smith, Director of Global Cyber Security at Vodafone

2020-11-11 Stephen Schmidt

Post Syndicated from Stephen Schmidt original https://aws.amazon.com/blogs/security/verified-episode-2-conversation-with-emma-smith-director-of-global-cyber-security-at-vodafone/

Over the past 8 months, it’s become more important for us all to stay in contact with peers around the globe. Today, I’m proud to bring you the second episode of our new video series, Verified: Presented by AWS re:Inforce. Even though we couldn’t be together this year at re:Inforce, our annual security conference, we still wanted to share some of the conversations with security leaders that would have taken place at the conference. The series showcases conversations with security leaders around the globe. In episode two, I’m talking to Emma Smith, Vodafone’s Global Cyber Security Director.

Vodafone is a global technology communications company with an optimistic culture. Their focus is connecting people and building the digital future for society. During our conversation, Emma detailed how the core values of the Global Cyber Security team were inspired by the company. “We’ve got a team of people who are ultimately passionate about protecting customers, protecting society, protecting Vodafone, protecting all of our services and our employees.” Emma shared experiences about the evolution of the security organization during her past 5 years with the company.

We were also able to touch on one of Emma’s passions, diversity and inclusion. Emma has worked to implement diversity and drive a policy of inclusion at Vodafone. In June, she was named Diversity Champion in the SC Awards Europe. In her own words: “It makes me realize that my job is to smooth the way for everybody else and to try and remove some of those obstacles or barriers that were put in their way… it means that I’m really passionate about trying to get a very diverse team in security, but also in Vodafone, so that we reflect our customer base, so that we’ve got diversity of thinking, of backgrounds, of experience, and people who genuinely feel comfortable being themselves at work—which is easy to say but really hard to create that culture of safety and belonging.”

Stay tuned for future episodes of Verified: Presented by AWS re:Inforce here on the AWS Security Blog. You can watch episode one, an interview with Jason Chan, Vice President of Information Security at Netflix on YouTube. If you have an idea or a topic you’d like covered in this series, please drop us a comment below.

Want more AWS Security how-to content, news, and feature announcements? Follow us on Twitter.

New – Export Amazon DynamoDB Table Data to Your Data Lake in Amazon S3, No Code Writing Required

2020-11-10 Alex Casalboni

Post Syndicated from Alex Casalboni original https://aws.amazon.com/blogs/aws/new-export-amazon-dynamodb-table-data-to-data-lake-amazon-s3/

Hundreds of thousands of AWS customers have chosen Amazon DynamoDB for mission-critical workloads since its launch in 2012. DynamoDB is a nonrelational managed database that allows you to store a virtually infinite amount of data and retrieve it with single-digit-millisecond performance at any scale. To get the most value out of this data, customers had […]

S3 Intelligent-Tiering Adds Archive Access Tiers

2020-11-09 Marcia Villalba

Post Syndicated from Marcia Villalba original https://aws.amazon.com/blogs/aws/s3-intelligent-tiering-adds-archive-access-tiers/

We launched S3 Intelligent-Tiering two years ago, which added the capability to take advantage of S3 without needing to have a deep understanding of your data access patterns. Today we are launching two new optimizations for S3 Intelligent-Tiering that will automatically archive objects that are rarely accessed. These new optimizations will reduce the amount of […]

New – Archive and Replay Events with Amazon EventBridge

2020-11-06 Danilo Poccia

Post Syndicated from Danilo Poccia original https://aws.amazon.com/blogs/aws/new-archive-and-replay-events-with-amazon-eventbridge/

Event-driven architectures use events to share information between the components of one or more applications. Events tell us that “something has happened”, maybe you received an API request, a file has been uploaded to a storage platform, or a database record has been updated. Business events describe something related to your activities, for example that […]

In the Works – New AWS Region in Zurich, Switzerland

2020-11-02 Alex Casalboni

Post Syndicated from Alex Casalboni original https://aws.amazon.com/blogs/aws/in-the-works-new-aws-region-in-zurich-switzerland/

Earlier this year, we launched the new AWS Region in Italy and have plans for three more AWS Regions in Indonesia, Japan, and Spain. Coming to Switzerland in 2022 Today, I’m happy to announce that the AWS Europe (Zurich) Region is in the works. It will open in the second half of 2022 with three […]

Register for the Modern Applications Online Event

2020-10-30 Rachel Richardson

Post Syndicated from Rachel Richardson original https://aws.amazon.com/blogs/compute/register-for-the-modern-applications-online-event/

Earlier this year we hosted the first serverless themed virtual event, the Serverless-First Function. We enjoyed the opportunity to virtually connect with our customers so much that we want to do it again. This time, we’re expanding the scope to feature serverless, containers, and front-end development content. The Modern Applications Online Event is scheduled for November 4-5, 2020.

This free, two-day event addresses how to build and operate modern applications at scale across your organization, enabling you to become more agile and respond to change faster. The event covers topics including serverless application development, containers best practices, front-end web development and more. If you missed the containers or serverless virtual events earlier this year, this is great opportunity to watch the content and interact directly with expert moderators. The full agenda is listed below.

Organizational Level Operations

Wednesday, November 4, 2020, 9:00 AM – 1:00 PM PT

Move fast and ship things: Using serverless to increase speed and agility within your organization
In this session, Adrian Cockcroft demonstrates how you can use serverless to build modern applications faster than ever. Cockcroft uses real-life examples and customer stories to debunk common misconceptions about serverless.

Eliminating busywork at the organizational level: Tips for using serverless to its fullest potential
In this session, David Yanacek discusses key ways to unlock the full benefits of serverless, including building services around APIs and using service-oriented architectures built on serverless systems to remove the roadblocks to continuous innovation.

Faster Mobile and Web App Development with AWS Amplify
In this session, Brice Pellé, introduces AWS Amplify, a set of tools and services that enables mobile and front-end web developers to build full stack serverless applications faster on AWS. Learn how to accelerate development with AWS Amplify’s use-case centric open-source libraries and CLI, and its fully managed web hosting service with built-in CI/CD.

Built Serverless-First: How Workgrid Software transformed from a Liberty Mutual project to its own global startup
Connected through a central IT team, Liberty Mutual has embraced serverless since AWS Lambda’s inception in 2014. In this session, Gillian McCann discusses Workgrid’s serverless journey—from internal microservices project within Liberty Mutual to independent business entity, all built serverless-first. Introduction by AWS Principal Serverless SA, Sam Dengler.

Market insights: A conversation with Forrester analyst Jeffrey Hammond & Director of Product for Lambda Ajay Nair
In this session, guest speaker Jeffrey Hammond and Director of Product for AWS Lambda, Ajay Nair, discuss the state of serverless, Lambda-based architectural approaches, Functions-as-a-Service platforms, and more. You’ll learn about the high-level and enduring enterprise patterns and advancements that analysts see driving the market today and determining the market in the future.

AWS Fargate Platform Version 1.4
In this session we will go through a brief introduction of AWS Fargate, what it is, its relation to EKS and ECS and the problems it addresses for customers. We will later introduce the concept of Fargate “platform versions” and we will then dive deeper into the new features that the new platform version 1.4 enables.

Persistent Storage on Containers
Containerizing applications that require data persistence or shared storage is often challenging since containers are ephemeral in nature, are scaled in and out dynamically, and typically clear any saved state when terminated. In this session you will learn about Amazon Elastic File System (EFS), a fully managed, elastic, highly-available, scalable, secure, high-performance, cloud native, shared file system that enables data to be persisted separately from compute for your containerized applications.

Security Best Practices on Amazon ECR
In this session, we will cover best practices with securing your container images using ECR. Learn how user access controls, image assurance, and image scanning contribute to securing your images.

Application Level Design

Thursday, November 5, 2020, 9:00 AM – 1:00 PM PT

Building a Live Streaming Platform with Amplify Video
In this session, learn how to build a live-streaming platform using Amplify Video and the platform powering it, AWS Elemental Live. Amplify video is an open source plugin for the Amplify CLI that makes it easy to incorporate video streaming into your mobile and web applications powered by AWS Amplify.

Building Serverless Web Applications
In this session, follow along as Ben Smith shows you how to build and deploy a completely serverless web application from scratch. The application will span from a mobile friendly front end to complex business logic on the back end.

Automating serverless application development workflows
In this talk, Eric Johnson breaks down how to think about CI/CD when building serverless applications with AWS Lambda and Amazon API Gateway. This session will cover using technologies like AWS SAM to build CI/CD pipelines for serverless application back ends.

Observability for your serverless applications
In this session, Julian Wood walks you through how to add monitoring, logging, and distributed tracing to your serverless applications. Join us to learn how to track platform and business metrics, visualize the performance and operations of your application, and understand which services should be optimized to improve your customer’s experience.

Happy Building with AWS Copilot
The hard part’s done. You and your team have spent weeks pouring over pull requests, building micro-services and containerizing them. Congrats! But what do you do now? How do you get those services on AWS? Copilot is a new command line tool that makes building, developing and operating containerized apps on AWS a breeze. In this session, we’ll talk about how Copilot helps you and your team set up modern applications that follow AWS best practices

CDK for Kubernetes
The CDK for Kubernetes (cdk8s) is a new open-source software development framework for defining Kubernetes applications and resources using familiar programming languages. Applications running on Kubernetes are composed of dozens of resources maintained through carefully maintained YAML files. As applications evolve and teams grow, these YAML files become harder and harder to manage. It’s also really hard to reuse and create abstractions through config files — copying & pasting from previous projects is not the solution! In this webinar, the creators of cdk8s show you how to define your first cdk8s application, define reusable components called “constructs” and generally say goodbye (and thank you very much) to writing in YAML.

Machine Learning on Amazon EKS
Amazon EKS has quickly emerged as a leading choice for machine learning workloads. In this session, we’ll walk through some of the recent ML related enhancements the Kubernetes team at AWS has released. We will then dive deep with walkthroughs of how to optimize your machine learning workloads on Amazon EKS, including demos of the latest features we’ve contributed to popular open source projects like Spark and Kubeflow

Deep Dive on Amazon ECS Capacity Providers
In this talk, we’ll dive into the different ways that ECS Capacity Providers can enable teams to focus more on their core business, and less on the infrastructure behind the scenes. We’ll look at the benefits and discuss scenarios where Capacity Providers can help solve the problems that customers face when using container orchestration. Lastly, we’ll review what features have been released with Capacity Providers, as well as look ahead at what’s to come.

New – Application Load Balancer Support for End-to-End HTTP/2 and gRPC

2020-10-29 Danilo Poccia

Post Syndicated from Danilo Poccia original https://aws.amazon.com/blogs/aws/new-application-load-balancer-support-for-end-to-end-http-2-and-grpc/

Thanks to its efficiency and support for numerous programming languages, gRPC is a popular choice for microservice integrations and client-server communications. gRPC is a high performance remote procedure call (RPC) framework using HTTP/2 for transport and Protocol Buffers to describe the interface. To make it easier to use gRPC with your applications, Application Load Balancer (ALB) […]

AWS extends its MTCS Level 3 certification scope to cover United States Regions

2020-10-29 Clara Lim

Post Syndicated from Clara Lim original https://aws.amazon.com/blogs/security/aws-extends-its-mtcs-level-3-certification-scope-to-cover-united-states-regions/

We’re excited to announce the completion of the Multi-Tier Cloud Security (MTCS) Level 3 triennial certification in September 2020. The scope was expanded to cover the United States Amazon Web Services (AWS) Regions, excluding AWS GovCloud (US) Regions, in addition to Singapore and Seoul. AWS was the first cloud service provider (CSP) to attain the MTCS Level 3 certification in Singapore since 2014, and the services in scope have increased to 130—an approximately 27% increase since the last recertification audit in September 2019, and three times the number of services in scope since the last triennial audit in 2017. This provides customers with more services to choose from in the regions.

MTCS was the world’s first cloud security standard to specify a management system for cloud security that covers multiple tiers, and it can be applied by CSPs to meet differing cloud user needs for data sensitivity and business criticality. The certified CSPs will be able to better specify the levels of security that they can offer to their users. CSPs can achieve this through third-party certification and a self-disclosure requirement for CSPs that covers service-oriented information normally captured in service level agreements. The different levels of security help local businesses to pick the right CSP, and use of MTCS is mandated by the Singapore government as a requirement for public sector agencies and regulated organizations.

MTCS has three levels of security, Level 1 being the base and Level 3 the most stringent:

Level 1 was designed for non–business critical data and systems with basic security controls, to counter certain risks and threats targeting low-impact information systems (for example, a website that hosts public information).
Level 2 addresses the needs of organizations that run their business-critical data and systems in public or third-party cloud systems (for example, confidential business data and email).
Level 3 was designed for regulated organizations with specific and more stringent security requirements. Industry-specific regulations can be applied in addition to the baseline controls, in order to supplement and address security risks and threats in high-impact information systems (for example, highly confidential business data, financial records, and medical records).

Benefits of MTCS certification

Singapore customers in regulated industries with the strictest security requirements can securely host applications and systems with highly sensitive information, ranging from confidential business data to financial and medical records with level 3 compliance.

Financial Services Industry (FSI) customers in Korea are able to accelerate cloud adoption without the need to validate 109 out of 141 controls as required in the relevant regulations (the Financial Security Institute’s Guideline on Use of Cloud Computing Services in the Financial Industry, and the Regulation on Supervision on Electronic Financial Transactions (RSEFT)).

With increasing cloud adoption across different industries, MTCS certification has the potential to provide assurance to customers globally now that the scope is extended beyond Singapore and Korea to the United States AWS Regions. This extension also provides an alternative for Singapore government agencies to leverage the AWS services that haven’t yet launched locally, and provides resiliency and recovery use cases as well.

You can now download the latest MTCS certificates and the MTCS Self-Disclosure Form in AWS Artifact.

If you have feedback about this post, submit comments in the Comments section below.

Want more AWS Security how-to content, news, and feature announcements? Follow us on Twitter.

AWS achieves FedRAMP P-ATO for 5 services in AWS US East/West and GovCloud (US) Regions

2020-10-27 Amendaze Thomas

Post Syndicated from Amendaze Thomas original https://aws.amazon.com/blogs/security/aws-achieves-fedramp-p-ato-for-5-services-in-aws-us-east-west-and-govcloud-us-regions/

We’re pleased to announce that five additional AWS services have achieved provisional authorization (P-ATO) by the Federal Risk and Authorization Management Program (FedRAMP) Joint Authorization Board (JAB). These services provide the following capabilities for the federal government and customers with regulated workloads:

Enable your organization’s developers, scientists, and engineers to easily and efficiently run hundreds of thousands of batch computing jobs with AWS Batch.
Aggregate, organize, and prioritize your security alerts or findings from multiple AWS services using AWS Security Hub.
Provision, manage, and deploy public and private Secure Sockets Layer/Transport Layer Security (SSL/TLS) certificates using AWS Certificate Manager.
Enable customers to set up and govern a new, secure, multi-account AWS environment using AWS Control Tower.
Provide a fully managed Kubernetes service with Amazon Elastic Kubernetes Service.

The following services are now listed on the FedRAMP Marketplace and the AWS Services in Scope by Compliance Program page.

AWS US East/West Regions (FedRAMP Moderate Authorization)

AWS GovCloud (US) Regions (FedRAMP High Authorization)

AWS is continually expanding the scope of our compliance programs to help enable your organization to use our services for sensitive and regulated workloads. Today, AWS offers 90 AWS services authorized in the AWS US East/West Regions under FedRAMP Moderate Authorization, and 76 services authorized in the AWS GovCloud (US) Regions under FedRAMP High Authorization.

To learn what other public sector customers are doing on AWS, see our Government, Education, and Nonprofits Case Studies and Customer Success Stories. Stay tuned for future updates on our Services in Scope by Compliance Program page. If you have feedback about this blog post, let us know in the Comments section below.

Want more AWS Security how-to content, news, and feature announcements? Follow us on Twitter.

Introducing Amazon SNS FIFO – First-In-First-Out Pub/Sub Messaging

2020-10-23 Danilo Poccia

Post Syndicated from Danilo Poccia original https://aws.amazon.com/blogs/aws/introducing-amazon-sns-fifo-first-in-first-out-pub-sub-messaging/

When designing a distributed software architecture, it is important to define how services exchange information. For example, the use of asynchronous communication decouples components and simplifies scaling, reducing the impact of changes and making it easier to release new features.

The two most common forms of asynchronous service-to-service communication are message queues and publish/subscribe messaging:

With message queues, messages are stored on the queue until they are processed and deleted by a consumer. On AWS, Amazon Simple Queue Service (SQS) provides a fully managed message queuing service with no administrative overhead.
With pub/sub messaging, a message published to a topic is delivered to all subscribers to the topic. On AWS, Amazon Simple Notification Service (SNS) is a fully managed pub/sub messaging service that enables message delivery to a large number of subscribers. Each subscriber can also set a filter policy to receive only the messages that it cares about.

You can use topics when you want to fan out messages to multiple applications, and queues when you want to send messages to one application. Using topics and queues together, you can decouple microservices, distributed systems, and serverless applications.

With SQS, you can use FIFO (First-In-First-Out) queues to preserve the order in which messages are sent and received, and to avoid that a message is processed more than once.

Introducing SNS FIFO Topics
Today, we are adding similar capabilities for pub/sub messaging with the introduction of SNS FIFO topics, providing strict message ordering and deduplicated message delivery to one or more subscribers.

FIFO topics manage ordering and deduplication similar to FIFO queues:

Ordering – You configure a message group by including a message group ID when publishing a message to a FIFO topic. For each message group ID, all messages are sent and delivered in order of their arrival. For example, to ensure the delivery of messages related to the same customer in order, you can publish these messages to the topic using the customer’s account number as the message group ID. There is no limit in the number of message groups with FIFO topics and queues. You don’t need to declare in advance the message group ID, any value will work. If you don’t have a logical distinction between messages, you can simply use the same message group ID for all and have a single group of ordered messages. The message group ID is passed to any subscribed FIFO queue.

Deduplication – Distributed systems (like SNS) and client applications sometimes generate duplicate messages. You can avoid duplicated message deliveries from the topic in two ways: either by enabling content-based deduplication on the topic, or by adding a deduplication ID to the messages that you publish. With message content-based deduplication, SNS uses a SHA-256 hash to generate the message deduplication ID using the body of the message. After a message with a specific deduplication ID is published successfully, there is a 5-minute interval during which any message with the same deduplication ID is accepted but not delivered. If you subscribe a FIFO queue to a FIFO topic, the deduplication ID is passed to the queue and it is used by SQS to avoid duplicate messages being received.

You can use FIFO topics and queues together to simplify the implementation of applications where the order of operations and events is critical, or when you cannot tolerate duplicates. For example, to process financial operations and inventory updates, or to asynchronously apply commands that you receive from a client device. FIFO queues can use message filtering in FIFO topics to selectively receive only a subset of messages rather than every message published to the topic.

How to Use SNS FIFO Topics
A common scenario where FIFO topics can help is when you receive updates that need to be processed in order. For example, I can use a FIFO topic to receive updates from an application where my customers edit their account profiles. Then, I subscribe an SQS FIFO queue to the FIFO topic, and use the queue as trigger for a Lambda function that applies the account updates to an Amazon DynamoDB table used by my Customer management system that needs to be kept in sync.

The decoupling introduced by the FIFO topic makes it easier to add new functionality with minimal impact to existing applications. For example, to reward my loyal customers with additional promotions, I add a new Loyalty application that is storing information in a relational database managed by Amazon Aurora. To keep the customer’s information stored in the Loyalty database in sync with my other applications, I can subscribe a new FIFO queue to the same FIFO topic, and add a new Lambda function that receives customer updates in the same order as they are generated, and applies them to the Loyalty database. In this way, I don’t need to change code and configuration of other applications to integrate the new Loyalty app.

First, I create two FIFO queues in the SQS console, leaving all options to their defaults:

The customer.fifo queue to process updates in my Customer management system.
The loyalty.fifo queue to help me collect and store customer updates for the Loyalty application.

In the SNS console, I create the updates.fifo topic. I select FIFO as type, and enable Content-based message deduplication.

Then, I subscribe the customer.fifo and loyalty.fifo queues to the topic.

To be able to receive messages, I add a statement to the access policy of both queues granting the updates.fifo topic permissions to send messages to the queues. For example, for the customer.fifo queue the statement is:

{
  "Effect": "Allow",
  "Principal": {
    "Service": "sns.amazonaws.com"
  },
  "Action": "SQS:SendMessage",
  "Resource": "arn:aws:sqs:us-east-2:123412341234:customer.fifo",
  "Condition": {
    "ArnLike": {
      "aws:SourceArn": "arn:aws:sns:us-east-2:123412341234:updates.fifo"
    }
  }
}

Now, I use the SNS console to publish 4 messages in sequence. For all messages, I use the same message group ID. In this way, they are all in the same message group. The only part that is different is the message body, where I use in order:

Update One
Update Two
Update Three
Update One

In the SQS console, I see that only 3 messages have been delivered to the FIFO queues:

Why is that? When I created the FIFO topics, I enabled content-based deduplication. The 4 messages were sent within the 5-minute deduplication window. The last message has been recognized as a duplicate of the first one and has not been delivered to the subscribed queues.

Let’s see the actual messages in the queues. I use the AWS Command Line Interface (CLI) to receive the messages from SQS, and the jq command-line JSON processor to format the output and get only the Message in the Body.

Here are the messages in the customer.fifo queue:

$ aws sqs receive-message --queue-url https://sqs.us-east-2.amazonaws.com/123412341234/customer.fifo --max-number-of-messages 10 | jq '.Messages[].Body | fromjson | .Message'

"Update One"
"Update Two"
"Update Three"

And these are the messages in the loyalty.fifo queue:

$ aws sqs receive-message --queue-url https://sqs.us-east-2.amazonaws.com/123412341234/loyalty.fifo --max-number-of-messages 10 | jq '.Messages[].Body | fromjson | .Message'

"Update One"
"Update Two"
"Update Three"

As expected, the 3 messages with unique content have been delivered to both queues in the same order as they were sent.

Available Now
You can use SNS FIFO topics in all commercial regions. You can process up to 300 transactions per second (TPS) per FIFO topic or FIFO queue. With SNS, you pay only for what you use, you can find more information in the pricing page.

To learn more, please see the documentation.

— Danilo

Building event-driven architectures with Amazon SNS FIFO

2020-10-23 James Beswick

Post Syndicated from James Beswick original https://aws.amazon.com/blogs/compute/building-event-driven-architectures-with-amazon-sns-fifo/

This post is courtesy of Christian Mueller, Principal Solutions Architect.

Developers increasingly adopt event-driven architectures to decouple their distributed applications. Often, these events must be propagated in a strictly ordered manner to all subscribed applications. Using Amazon SNS FIFO topics and Amazon SQS FIFO queues, you can address use cases that require end-to-end message ordering, deduplication, filtering, and encryption.

In this blog post, I introduce a sample event-driven architecture. I walk through an implementation based on Amazon SNS FIFO topics and Amazon SQS FIFO queues.

Common requirements in event-driven-architectures

In event-driven architectures, data consistency is a common business requirement. This is often translated into technical requirements such as zero message loss and strict message ordering. For example, if you update your domain object rapidly, you want to be sure that all events are received by each subscriber in exactly the order they occurred. This way, the current domain object state is what each subscriber received as the latest update event. Similarly, all update events should be received after the initial create event.

Before Amazon SNS FIFO, architects had to design applications to check if messages are received out of order before processing.

Another common challenge is preventing message duplicates when sending events to the messaging service. If an event publisher receives an error, such as a network timeout, the publisher does not know if the messaging service could receive and successfully process the message or not.

The client may retry, as this is the default behavior for some HTTP response codes in AWS SDKs. This can cause duplicate messages.

Before Amazon SNS FIFO, developers had to design receivers to be idempotent. In some cases, where the event cannot be idempotent, this requires the receiver to be implemented in an idempotent way. Often, this is done by adding a key-value store like Amazon DynamoDB or Amazon ElastiCache for Redis to the service. Using this approach, the receiver can track if the event has been seen before.

Exploring the recruiting agency example

This sample application models a recruitment agency with a job listings website. The application is composed of multiple services. I explain 3 of them in more detail.

A custom service, the anti-corruption service, receives a change data capture (CDC) event stream of changes from a relational database. This service translates the low-level technical database events into meaningful business events for the domain services for easy consumption. These business events are sent to the SNS FIFO “JobEvents.fifo“ topic. Here, interested services subscribe to these events and process them asynchronously.

In this domain, the analytics service is interested in all events. It has an SQS FIFO “AnalyticsJobEvents.fifo” queue subscribed to the SNS FIFO “JobEvents.fifo“ topic. It uses SQS FIFO as event source for AWS Lambda, which processes and stores these events in Amazon S3. S3 is object storage service with high scalability, data availability, durability, security, and performance. This allows you to use services like Amazon EMR, AWS Glue or Amazon Athena to get insights into your data to extract value.

The inventory service owns an SQS FIFO “InventoryJobEvents.fifo” queue, which is subscribed to the SNS FIFO “JobEvents.fifo“ topic. It is only interested in “JobCreated” and “JobDeleted” events, as it only tracks which jobs are currently available and stores this information in a DynamoDB table. Therefore, it uses an SNS filter policy to only receive these events, instead of receiving all events.

This sample application focuses on the SNS FIFO capabilities, so I do not explore other services subscribed to the SNS FIFO topic. This sample follows the SQS best practices and SNS redrive policy recommendations and configures dead-letter queues (DLQ). This is useful in case SNS cannot deliver an event to the subscribed SQS queue. It also helps if the function fails to process an event from the corresponding SQS FIFO queue multiple times. As a requirement in both cases, the attached SQS DLQ must be an SQS FIFO queue.

Deploying the application

To deploy the application using infrastructure as code, it uses the AWS Serverless Application Model (SAM). SAM provides shorthand syntax to express functions, APIs, databases, and event source mappings. It is expanded into AWS CloudFormation syntax during deployment.

To get started, clone the “event-driven-architecture-with-sns-fifo” repository, from here. Alternatively, download the repository as a ZIP file from here and extract it to a directory of your choice.

As a prerequisite, you must have SAM CLI, Python 3, and PIP installed. You must also have the AWS CLI configured properly.

Navigate to the root directory of this project and build the application with SAM. SAM downloads required dependencies and stores them locally. Execute the following commands in your terminal:

git clone https://github.com/aws-samples/event-driven-architecture-with-amazon-sns-fifo.git
cd event-driven-architecture-with-amazon-sns-fifo
sam build

You see the following output:

Now, deploy the application:

sam deploy --guided

Provide arguments for the deployments, such as the stack name and preferred AWS Region:

After a successful deployment, you see the following output:

Learning more about the implementation

I explore the three services forming this sample application, and how they use the features of SNS FIFO.

Anti-corruption service

The anti-corruption service owns the SNS FIFO “JobEvents.fifo” topic, where it publishes business events related to job postings. It uses an SNS FIFO topic, as end-to-end ordering per job ID is required. SNS FIFO is configured not to perform content-based deduplication, as I require a unique message deduplication ID for each event for deduplication. The corresponding definition in the SAM template looks like this:

  JobEventsTopic:
    Type: AWS::SNS::Topic
    Properties:
      TopicName: JobEvents.fifo
      FifoTopic: true
      ContentBasedDeduplication: false

For simplicity, the anti-corruption function in the sample application doesn’t consume an external database CDC stream. It uses Amazon CloudWatch Events as an event source to trigger the function every minute.

I provide the SNS FIFO topic Amazon Resource Name (ARN) as an environment variable in the function. This makes this function more portable to deploy in different environments and stages. The function’s AWS Identity and Access Management (IAM) policy grants permissions to publish messages to only this SNS topic:

  AntiCorruptionFunction:
    Type: AWS::Serverless::
    Properties:
      CodeUri: anti-corruption-service/
      Handler: app.lambda_handler
      Runtime: python3.7
      MemorySize: 256
      Environment:
        Variables:
          TOPIC_ARN: !Ref JobEventsTopic
      Policies:
        - SNSPublishMessagePolicy
            TopicName: !GetAtt JobEventsTopic.TopicName
      Events:
        Trigger:
          Type: 
          Properties:
            Schedule: 'rate(1 minute)'

The anti-corruption function uses features in the SNS publish API, which allows you to define a “MessageDeduplicationId” and a “MessageGroupId”. The “MessageDeduplicationId” is used to filter out duplicate messages, which are sent to SNS FIFO within in 5-minute deduplication interval. The “MessageGroupId” is required, as SNS FIFO processes all job events for the same message group in a strictly ordered manner, isolated from other message groups processed through the same topic.

Another important aspect in this implementation is the use of “MessageAttributes”. We define a message attribute with the name “eventType” and values like “JobCreated”, “JobSalaryUpdated”, and “JobDeleted”. This allows subscribers to define SNS filter policies to only receive certain events they are interested in:

import boto3
from datetime import datetime
import json
import os
import random
import uuid

TOPIC_ARN = os.environ['TOPIC_ARN']

sns = boto3.client('sns')

def lambda_handler(event, context):
    jobId = str(random.randrange(0, 1000))

    send_job_created_event(jobId)
    send_job_updated_event(jobId)
    send_job_deleted_event(jobId)
    return

def send_job_created_event(jobId):
    messageId = str(uuid.uuid4())

    response = sns.publish(
        TopicArn=TOPIC_ARN,
        Subject=f'Job {jobId} created',
        MessageDeduplicationId=messageId,
        MessageGroupId=f'JOB-{jobId}',
        Message={...},
        MessageAttributes = {
            'eventType': {
                'DataType': 'String',
                'StringValue': 'JobCreated'
            }
        }
    )
    print('sent message and received response: {}'.format(response))
    return

def send_job_updated_event(jobId):
    messageId = str(uuid.uuid4())

    response = sns.publish(...)
    print('sent message and received response: {}'.format(response))
    return

def send_job_deleted_event(jobId):
    messageId = str(uuid.uuid4())

    response = sns.publish(...)
    print('sent message and received response: {}'.format(response))
    return

Analytics service

The analytics service owns an SQS FIFO “AnalyticsJobEvents.fifo” queue which is subscribed to the SNS FIFO “JobEvents.fifo” topic. Following best practices, I define redrive policies for the SQS FIFO queue and the SNS FIFO subscription in the template:

  AnalyticsJobEventsQueue:
    Type: AWS::SQS::Queue
    Properties:
      QueueName: AnalyticsJobEvents.fifo
      FifoQueue: true
      RedrivePolicy:
        deadLetterTargetArn: !GetAtt AnalyticsJobEventsQueueDLQ.Arn
        maxReceiveCount: 3

  AnalyticsJobEventsQueueToJobEventsTopicSubscription:
    Type: AWS::SNS::Subscription
    Properties:
      Endpoint: !GetAtt AnalyticsJobEventsQueue.Arn
      Protocol: sqs
      RawMessageDelivery: true
      TopicArn: !Ref JobEventsTopic
      RedrivePolicy: !Sub '{"deadLetterTargetArn": "${AnalyticsJobEventsSubscriptionDLQ.Arn}"}'

The analytics function uses SQS FIFO as an event source for Lambda. The S3 bucket name is an environment variable for the function, which increases the code portability across environments and stages. The IAM policy for this function only grants permissions write objects to this S3 bucket:

  AnalyticsFunction:
    Type: AWS::Serverless::Function
    Properties:
      CodeUri: analytics-service/
      Handler: app.lambda_handler
      Runtime: python3.7
      MemorySize: 256
      Environment:
        Variables:
          BUCKET_NAME: !Ref AnalyticsBucket
      Policies:
        - S3WritePolicy:
            BucketName: !Ref AnalyticsBucket
      Events:
        Trigger:
          Type: SQS
          Properties:
            Queue: !GetAtt AnalyticsJobEventsQueue.Arn
            BatchSize: 10

View the function implementation at the GitHub repo.

Inventory service

The inventory service also owns an SQS FIFO “InventoryJobEvents.fifo” queue which is subscribed to the SNS FIFO “JobEvents.fifo” topic. It uses redrive policies for the SQS FIFO queue and the SNS FIFO subscription as well. This service is only interested in certain events, so uses an SNS filter policy to specify these events:

  InventoryJobEventsQueue:
    Type: AWS::SQS::Queue
    Properties:
      QueueName: InventoryJobEvents.fifo
      FifoQueue: true
      RedrivePolicy:
        deadLetterTargetArn: !GetAtt InventoryJobEventsQueueDLQ.Arn
        maxReceiveCount: 3

  InventoryJobEventsQueueToJobEventsTopicSubscription:
    Type: AWS::SNS::Subscription
    Properties:
      Endpoint: !GetAtt InventoryJobEventsQueue.Arn
      Protocol: sqs
      RawMessageDelivery: true
      TopicArn: !Ref JobEventsTopic
      FilterPolicy: '{"eventType":["JobCreated", "JobDeleted"]}'
      RedrivePolicy: !Sub '{"deadLetterTargetArn": "${InventoryJobEventsQueueSubscriptionDLQ.Arn}"}'

The inventory function also uses SQS FIFO as event source for Lambda. The DynamoDB table name is set as an environment variable, so the function can look up the name during initialization. The IAM policy grants read/write permissions for only this table:

  InventoryFunction:
    Type: AWS::Serverless::Function
    Properties:
      CodeUri: inventory-service/
      Handler: app.lambda_handler
      Runtime: python3.7
      MemorySize: 256
      Environment:
        Variables:
          TABLE_NAME: !Ref InventoryTable
      Policies:
        - DynamoDBCrudPolicy:
            TableName: !Ref InventoryTable
      Events:
        Trigger:
          Type: SQS
          Properties:
            Queue: !GetAtt InventoryJobEventsQueue.Arn
            BatchSize: 10

View the function implementation at the GitHub repo.

Conclusion

Amazon SNS FIFO topics can simplify the design of event-driven architectures and reduce custom code in building such applications.

By using the native integration with Amazon SQS FIFO queues, you can also build architectures that fan out to thousands of subscribers. This pattern helps achieve data consistency, deduplication, filtering, and encryption in near real time, using managed services.

For information on regional availability and service quotas, see SNS endpoints and quotas and SQS endpoints and quotas. For more information on the FIFO functionality, see SNS FIFO and SQS FIFO in their Developer Guides.

New Chair and Trustees of the Raspberry Pi Foundation

2020-10-22 Philip Colligan

Post Syndicated from Philip Colligan original https://www.raspberrypi.org/blog/new-chair-and-trustees-foundation/

I am delighted to share the news that we have appointed a new Chair and Trustees of the Raspberry Pi Foundation. Between them, they bring an enormous range of experience and expertise to what is already a fantastic Board of Trustees, and I am really looking forward to working with them.

New Chair of the Board of Trustees: John Lazar

John Lazar has been appointed as the new Chair of the Board of Trustees. John is a software engineer and business leader who is focused on combining technology and entrepreneurship to generate lasting positive impact.

Formerly the Chairman and CEO of Metaswitch Networks, John is now an angel investor, startup mentor, non-executive chairman and board director, including serving as the Chair of What3Words. He is a Fellow of the Royal Academy of Engineering and played an active role in developing the programme of study for England’s school Computer Science curriculum. John has also spent many years working on tech-related non-profit initiatives in Africa and co-founded Enza Capital, which invests in early-stage African technology companies that solve pressing problems.

John takes over the Chair from David Cleevely, who has reached the end of his two three-year terms as Trustee and Chair of the Foundation. David has made a huge contribution to the Foundation over that time, and we are delighted that he will continue to be involved in our work as one of the founding members of the Supporters Club.

New Trustees: Amali de Alwis, Charles Leadbeater, Dan Labbad

Alongside John, we are welcoming three new Trustees to the Board of Trustees:

Amali de Alwis is the UK Managing Director of Microsoft for Startups, and is the former CEO of Code First: Girls. She is also a Board member at Ada National College for Digital Skills, sits on the Diversity & Inclusion Board at the Institute of Coding, is an Advisory Board member at the Founders Academy, and was a founding member at Tech Talent Charter.
Charles Leadbeater is an independent author, a social entrepreneur, and a leading authority on innovation and creativity. He has advised companies, cities, and governments around the world on innovation strategy and has researched and written extensively on innovation in education. Charles is also a Trustee of the Paul Hamlyn Foundation.
Dan Labbad is Chief Executive and Executive Member of the Board of The Crown Estate. He was previously at Lendlease, where he was Chief Executive Officer of Europe from 2009. Dan is also a Director of The Hornery Institute and Ark Schools.

New Member: Suranga Chandratillake

I am also delighted to announce that we have appointed Suranga Chandratillake as a Member of the Raspberry Pi Foundation. Suranga is a technologist, entrepreneur, and investor.

He founded the intelligent search company blinkx and is now a General Partner at Balderton Capital. Suranga is a Fellow of the Royal Academy of Engineering and a World Economic Forum Young Global Leader, and he serves on the UK Government’s Council for Science and Technology.

What is a Board of Trustees anyway?

As a charity, the Raspberry Pi Foundation is governed by a Board of Trustees that is ultimately responsible for what we do and how we are run. It is the Trustees’ job to make sure that we are focused on our mission, which for us means helping more people learn about computing, computer science, and related subjects. The Trustees also have all the usual responsibilities of company directors, including making sure that we use our resources effectively. As Chief Executive, I am accountable to the Board of Trustees.

We’ve always been fortunate to attract the most amazing people to serve as Trustees and, as volunteers, they are incredibly generous with their time, sharing their expertise and experience on a wide range of issues. They are an important part of the team. Trustees serve for up to two terms of three years so that we always have fresh views and experience to draw on.

How do you appoint Trustees?

Appointments to the Board of Trustees follow open recruitment and selection processes that are overseen by the Foundation’s Nominations Committee, supported by independent external advisers. Our aim is to appoint Trustees who bring different backgrounds, perspectives, and lived experience, as well as a range of skills. As with all appointments, we consider diversity at every aspect of the recruitment and selection processes.

Formally, Trustees are elected by the Foundation’s Members at our Annual General Meeting. This year’s AGM took place last week on Zoom. Members are also volunteers, and they play an important role in holding the Board of Trustees to account, helping to shape our strategy, and acting as advocates for our mission.

You can see the full list of Trustees and Members on our website.

The post New Chair and Trustees of the Raspberry Pi Foundation appeared first on Raspberry Pi.

Introducing the first video in our new series, Verified, featuring Netflix’s Jason Chan

2020-10-19 Stephen Schmidt

Post Syndicated from Stephen Schmidt original https://aws.amazon.com/blogs/security/introducing-first-video-new-series-verified-featuring-netflix-jason-chan/

The year has been a profoundly different one for us all, and like many of you, I’ve been adjusting, both professionally and personally, to this “new normal.” Here at AWS we’ve seen an increase in customers looking for secure solutions to maintain productivity in an increased work-from-home world. We’ve also seen an uptick in requests for training; it’s clear, a sense of community and learning are critically important as workforces physically distance.

For these reasons, I’m happy to announce the launch of Verified: Presented by AWS re:Inforce. I’m hosting this series, but I’ll be joined by leaders in cloud security across a variety of industries. The goal is to have an open conversation about the common issues we face in securing our systems and tools. Topics will include how the pandemic is impacting cloud security, tips for creating an effective security program from the ground up, how to create a culture of security, emerging security trends, and more. Learn more by following me on Twitter (@StephenSchmidt), and get regular updates from @AWSSecurityInfo. Verified is just one of the many ways we will continue sharing best practices with our customers during this time. You can find more by reading the AWS Security Blog, reviewing our documentation, visiting the AWS Security and Compliance webpages, watching re:Invent and re:Inforce playlists, and/or reviewing the Security Pillar of Well Architected.

Our first conversation, above, is with Jason Chan, Vice President of Information Security at Netflix. Jason spoke to us about the security program at Netflix, his approach to hiring security talent, and how Zero Trust enables a remote workforce. Jason also has solid insights to share about how he started and grew the security program at Netflix.

“In the early days, what we were really trying to figure out is how do we build a large-scale consumer video-streaming service in the public cloud, and how do you do that in a secure way? There wasn’t a ton of expertise in that, so when I was building the security team at Netflix, I thought, ‘how do we bring in folks from a variety of backgrounds, generalists … to tackle this problem?’”

He also gave his view on how a growing security team can measure ROI. “I think it’s difficult to have a pure equation around that. So what we try to spend our time doing is really making sure that we, as a team, are aligned on what is the most important—what are the most important assets to protect, what are the most critical risks that we’re trying to prevent—and then make sure that leadership is aligned with that, because, as we all know, there’s not unlimited resources, right? You can’t hire an unlimited number of folks or spend an unlimited amount of money, so you’re always trying to figure out how do you prioritize, and how do you find where is going to be the biggest impact for your value?”

Check out Jason’s full interview above, and stay tuned for further videos in this series. If you have an idea or a topic you’d like covered in this series, please drop us a comment below. Thanks!

Want more AWS Security how-to content, news, and feature announcements? Follow us on Twitter.

Building Extensions for AWS Lambda – In preview

2020-10-08 Julian Wood

Post Syndicated from Julian Wood original https://aws.amazon.com/blogs/compute/building-extensions-for-aws-lambda-in-preview/

AWS Lambda is announcing a preview of Lambda Extensions, a new way to easily integrate Lambda with your favorite monitoring, observability, security, and governance tools. Extensions enable tools to integrate deeply into the Lambda execution environment to control and participate in Lambda’s lifecycle. This simplified experience makes it easier for you to use your preferred tools across your application portfolio today.

In this post I explain how Lambda extensions work, the changes to the Lambda lifecycle, and how to build an extension. To learn how to use extensions with your functions, see the companion blog post “Introducing AWS Lambda extensions”.

Extensions are built using the new Lambda Extensions API, which provides a way for tools to get greater control during function initialization, invocation, and shut down. This API builds on the existing Lambda Runtime API, which enables you to bring custom runtimes to Lambda.

You can use extensions from AWS, AWS Lambda Ready Partners, and open source projects for use-cases such as application performance monitoring, secrets management, configuration management, and vulnerability detection. You can also build your own extensions to integrate your own tooling using the Extensions API.

There are extensions available today for AppDynamics, Check Point, Datadog, Dynatrace, Epsagon, HashiCorp, Lumigo, New Relic, Thundra, Splunk, AWS AppConfig, and Amazon CloudWatch Lambda Insights. For more details on these, see “Introducing AWS Lambda extensions”.

The Lambda execution environment

Lambda functions run in a sandboxed environment called an execution environment. This isolates them from other functions and provides the resources, such as memory, specified in the function configuration.

Lambda automatically manages the lifecycle of compute resources so that you pay for value. Between function invocations, the Lambda service freezes the execution environment. It is thawed if the Lambda service needs the execution environment for subsequent invocations.

Previously, only the runtime process could influence the lifecycle of the execution environment. It would communicate with the Runtime API, which provides an HTTP API endpoint within the execution environment to communicate with the Lambda service.

Lambda and Runtime API

The runtime uses the API to request invocation events from Lambda and deliver them to the function code. It then informs the Lambda service when it has completed processing an event. The Lambda service then freezes the execution environment.

The runtime process previously exposed two distinct phases in the lifecycle of the Lambda execution environment: Init and Invoke.

1. Init: During the Init phase, the Lambda service initializes the runtime, and then runs the function initialization code (the code outside the main handler). The Init phase happens either during the first invocation, or in advance if Provisioned Concurrency is enabled.

2. Invoke: During the invoke phase, the runtime requests an invocation event from the Lambda service via the Runtime API, and invokes the function handler. It then returns the function response to the Runtime API.

After the function runs, the Lambda service freezes the execution environment and maintains it for some time in anticipation of another function invocation.

If the Lambda function does not receive any invokes for a period of time, the Lambda service shuts down and removes the environment.

Previous Lambda lifecycle

With the addition of the Extensions API, extensions can now influence, control, and participate in the lifecycle of the execution environment. They can use the Extensions API to influence when the Lambda service freezes the execution environment.

AWS Lambda execution environment with the Extensions API

Extensions are initialized before the runtime and the function. They then continue to run in parallel with the function, get greater control during function invocation, and can run logic during shut down.

Extensions allow integrations with the Lambda service by introducing the following changes to the Lambda lifecycle:

An updated Init phase. There are now three discrete Init tasks: extensions Init, runtime Init, and function Init. This creates an order where extensions and the runtime can perform setup tasks before the function code runs.
Greater control during invocation. During the invoke phase, as before, the runtime requests the invocation event and invokes the function handler. In addition, extensions can now request lifecycle events from the Lambda service. They can run logic in response to these lifecycle events, and respond to the Lambda service when they are done. The Lambda service freezes the execution environment when it hears back from the runtime and all extensions. In this way, extensions can influence the freeze/thaw behavior.
Shutdown phase: we are now exposing the shutdown phase to let extensions stop cleanly when the execution environment shuts down. The Lambda service sends a shut down event, which tells the runtime and extensions that the environment is about to be shut down.

New Lambda lifecycle with extensions

Each Lambda lifecycle phase starts with an event from the Lambda service to the runtime and all registered extensions. The runtime and extensions signal that they have completed by requesting the Next invocation event from the Runtime and Extensions APIs. Lambda freezes the execution environment and all extensions when there are no pending events.

Lambda lifecycle for execution environment, runtime, extensions, and function.png

For more information on the lifecycle phases and the Extensions API, see the documentation.

How are extensions delivered and run?

You deploy extensions as Lambda layers, which are ZIP archives containing shared libraries or other dependencies.

To add a layer, use the AWS Management Console, AWS Command Line Interface (AWS CLI), or infrastructure as code tools such as AWS CloudFormation, the AWS Serverless Application Model (AWS SAM), and Terraform.

When the Lambda service starts the function execution environment, it extracts the extension files from the Lambda layer into the /opt directory. Lambda then looks for any extensions in the /opt/extensions directory and starts initializing them. Extensions need to be executable as binaries or scripts. As the function code directory is read-only, extensions cannot modify function code.

Extensions can run in either of two modes, internal and external.

Internal extensions run as part of the runtime process, in-process with your code. They are not separate processes. Internal extensions allow you to modify the startup of the runtime process using language-specific environment variables and wrapper scripts. You can use language-specific environment variables to add options and tools to the runtime for Java Correto 8 and 11, Node.js 10 and 12, and .NET Core 3.1. Wrapper scripts allow you to delegate the runtime startup to your script to customize the runtime startup behavior. You can use wrapper scripts with Node.js 10 and 12, Python 3.8, Ruby 2.7, Java 8 and 11, and .NET Core 3.1. For more information, see “Modifying-the-runtime-environment”.
External extensions allow you to run separate processes from the runtime but still within the same execution environment as the Lambda function. External extensions can start before the runtime process, and can continue after the runtime shuts down. External extensions work with Node.js 10 and 12, Python 3.7 and 3.8, Ruby 2.5 and 2.7, Java Corretto 8 and 11, .NET Core 3.1, and custom runtimes.

External extensions can be written in a different language to the function. We recommend implementing external extensions using a compiled language as a self-contained binary. This makes the extension compatible with all of the supported runtimes. If you use a non-compiled language, ensure that you include a compatible runtime in the extension.

Extensions run in the same execution environment as the function, so share resources such as CPU, memory, and disk storage with the function. They also share environment variables, in addition to permissions, using the same AWS Identity and Access Management (IAM) role as the function.

For more details on resources, security, and performance with extensions, see the companion blog post “Introducing AWS Lambda extensions”.

For example extensions and wrapper scripts to help you build your own extensions, see the GitHub repository.

Showing extensions in action

The demo shows how external extensions integrate deeply with functions and the Lambda runtime. The demo creates an example Lambda function with a single extension using either the AWS CLI, or AWS SAM.

The example shows how an external extension can start before the runtime, run during the Lambda function invocation, and shut down after the runtime shuts down.

To set up the example, visit the GitHub repo, and follow the instructions in the README.md file.

The example Lambda function uses the custom provided.al2 runtime based on Amazon Linux 2. Using the custom runtime helps illustrate in more detail how the Lambda service, Runtime API, and the function communicate. The extension is delivered using a Lambda layer.

The runtime, function, and extension, log their status events to Amazon CloudWatch Logs. The extension initializes as a separate process and waits to receive the function invocation event from the Extensions API. It then sleeps for 5 seconds before calling the API again to register to receive the next event. The extension sleep simulates the processing of a parallel process. This could, for example, collect telemetry data to send to an external observability service.

When the Lambda function is invoked, the extension, runtime and function perform the following steps. I walk through the steps using the log output.

1. The Lambda service adds the configured extension Lambda layer. It then searches the /opt/extensions folder, and finds an extension called extension1.sh. The extension executable launches before the runtime initializes. It registers with the Extensions API to receive INVOKE and SHUTDOWN events using the following API call.

curl -sS -LD "$HEADERS" -XPOST "http://${AWS_LAMBDA_RUNTIME_API}/2020-01-01/extension/register" --header "Lambda-Extension-Name: ${LAMBDA_EXTENSION_NAME}" -d "{ \"events\": [\"INVOKE\", \"SHUTDOWN\"]}" > $TMPFILE

Extension discovery, registration, and start

2. The Lambda custom provided.al2 runtime initializes from the bootstrap file.

Runtime initialization

3. The runtime calls the Runtime API to get the next event using the following API call. The HTTP request is blocked until the event is received.

curl -sS -LD "$HEADERS" -X GET "http://${AWS_LAMBDA_RUNTIME_API}/2018-06-01/runtime/invocation/next" > $TMPFILE &

The extension calls the Extensions API and waits for the next event. The HTTP request is again blocked until one is received.

curl -sS -L -XGET "http://${AWS_LAMBDA_RUNTIME_API}/2020-01-01/extension/event/next" --header "Lambda-Extension-Identifier: ${EXTENSION_ID}" > $TMPFILE &

Runtime and extension call APIs to get the next event

4. The Lambda service receives an invocation event. It sends the event payload to the runtime using the Runtime API. It sends an event to the extension informing it about the invocation, using the Extensions API.

Runtime and extension receive event

5. The runtime invokes the function handler. The function receives the event payload.

Runtime invokes handler

6. The function runs the handler code. The Lambda runtime receives back the function response and sends it back to the Runtime API with the following API call.

curl -sS -X POST "http://${AWS_LAMBDA_RUNTIME_API}/2018-06-01/runtime/invocation/$REQUEST_ID/response" -d "$RESPONSE" > $TMPFILE

Runtime receives function response and sends to Runtime API

7. The Lambda runtime then waits for the next invocation event (warm start).

Runtime waits for next event

8. The extension continues processing for 5 seconds, simulating the processing of a companion process. The extension finishes, and uses the Extensions API to register again to wait for the next event.

Extension processing

9. The function invocation report is logged.

Function invocation report

10. When Lambda is about to shut down the execution environment, it sends the Runtime API a shut down event.

Lambda runtime shut down event

11. Lambda then sends a shut down event to the extensions. The extension finishes processing and then shuts down after the runtime.

Lambda extension shut down event

The demo shows the steps the runtime, function, and extensions take during the Lambda lifecycle.

An external extension registers and starts before the runtime. When Lambda receives an invocation event, it sends it to the runtime. It then sends an event to the extension informing it about the invocation. The runtime invokes the function handler, and the extension does its own processing of the event. The extension continues processing after the function invocation completes. When Lambda is about to shut down the execution environment, it sends a shut down event to the runtime. It then sends one to the extension, so it can finish processing.

To see a sequence diagram of this flow, see the Extensions API documentation.

Pricing

Extensions share the same billing model as Lambda functions. When using Lambda functions with extensions, you pay for requests served and the combined compute time used to run your code and all extensions, in 100 ms increments. To learn more about the billing for extensions, visit the Lambda FAQs page.

Conclusion

Lambda extensions enable you to extend Lambda’s execution environment to more easily integrate with your favorite tools for monitoring, observability, security, and governance.

Extensions can run additional code; before, during, and after a function invocation. There are extensions available today from AWS Lambda Ready Partners. These cover use-cases such as application performance monitoring, secrets management, configuration management, and vulnerability detection. Extensions make it easier to use your existing tools with your serverless applications. For more information on the available extensions, see the companion post “Introducing Lambda Extensions – In preview“.

You can also build your own extensions to integrate your own tooling using the new Extensions API. For example extensions and wrapper scripts, see the GitHub repository.

Extensions are now available in preview in the following Regions: us-east-1, us-east-2, us-west-1, us-west-2, ca-central-1, eu-west-1, eu-west-2, eu-west-3, eu-central-1, eu-north-1, eu-south-1, sa-east-1, me-south-1, ap-northeast-1, ap-northeast-2, ap-northeast-3, ap-southeast-1, ap-southeast-2, ap-south-1, and ap-east-1.

For more serverless learning resources, visit https://serverlessland.com.

Introducing AWS Lambda Extensions – In preview

2020-10-08 Julian Wood

Post Syndicated from Julian Wood original https://aws.amazon.com/blogs/compute/introducing-aws-lambda-extensions-in-preview/

AWS Lambda is announcing a preview of Lambda Extensions, a new way to easily integrate Lambda with your favorite monitoring, observability, security, and governance tools. In this post I explain how Lambda extensions work, how you can begin using them, and the extensions from AWS Lambda Ready Partners that are available today.

Extensions help solve a common request from customers to make it easier to integrate their existing tools with Lambda. Previously, customers told us that integrating Lambda with their preferred tools required additional operational and configuration tasks. In addition, tools such as log agents, which are long-running processes, could not easily run on Lambda.

Extensions are a new way for tools to integrate deeply into the Lambda environment. There is no complex installation or configuration, and this simplified experience makes it easier for you to use your preferred tools across your application portfolio today. You can use extensions for use-cases such as:

capturing diagnostic information before, during, and after function invocation
automatically instrumenting your code without needing code changes
fetching configuration settings or secrets before the function invocation
detecting and alerting on function activity through hardened security agents, which can run as separate processes from the function

You can use extensions from AWS, AWS Lambda Ready Partners, and open source projects. There are extensions available today for AppDynamics, Check Point, Datadog, Dynatrace, Epsagon, HashiCorp, Lumigo, New Relic, Thundra, Splunk SignalFX, AWS AppConfig, and Amazon CloudWatch Lambda Insights.

You can learn how to build your own extensions, in the companion post “Building Extensions for AWS Lambda – In preview“.

Overview

Lambda Extensions is designed to be the easiest way to plug in the tools you use today without complex installation or configuration management. You deploy extensions as Lambda layers, with the AWS Management Console and AWS Command Line Interface (AWS CLI). You can also use infrastructure as code tools such as AWS CloudFormation, the AWS Serverless Application Model (AWS SAM), Serverless Framework, and Terraform. You can use Stackery to automate the integration of extensions from Epsagon, New Relic, Lumigo, and Thundra.

There are two components to the Lambda Extensions capability: the Extensions API and extensions themselves. Extensions are built using the new Lambda Extensions API which provides a way for tools to get greater control during function initialization, invocation, and shut down. This API builds on the existing Lambda Runtime API, which enables you to bring custom runtimes to Lambda.

AWS Lambda execution environment with the Extensions API

Most customers will use extensions without needing to know about the capabilities of the Extensions API that enables them. You can just consume capabilities of an extension by configuring the options in your Lambda functions. Developers who build extensions use the Extensions API to register for function and execution environment lifecycle events.

Extensions can run in either of two modes – internal and external.

Internal extensions run as part of the runtime process, in-process with your code. They allow you to modify the startup of the runtime process using language-specific environment variables and wrapper scripts. Internal extensions enable use cases such as automatically instrumenting code.
External extensions allow you to run separate processes from the runtime but still within the same execution environment as the Lambda function. External extensions can start before the runtime process, and can continue after the runtime shuts down. External extensions enable use cases such as fetching secrets before the invocation, or sending telemetry to a custom destination outside of the function invocation. These extensions run as companion processes to Lambda functions.

For more information on the Extensions API and the changes to the Lambda lifecycle, see “Building Extensions for AWS Lambda – In preview”

AWS Lambda Ready Partners extensions available at launch

Today, you can use extensions with the following AWS and AWS Lambda Ready Partner’s tools, and there are more to come:

AppDynamics provides end-to-end transaction tracing for AWS Lambda. With the AppDynamics extension, it is no longer mandatory for developers to include the AppDynamics tracer as a dependency in their function code, making tracing transactions across hybrid architectures even simpler.
The Datadog extension brings comprehensive, real-time visibility to your serverless applications. Combined with Datadog’s existing AWS integration, you get metrics, traces, and logs to help you monitor, detect, and resolve issues at any scale. The Datadog extension makes it easier than ever to get telemetry from your serverless workloads.
The Dynatrace extension makes it even easier to bring AWS Lambda metrics and traces into the Dynatrace platform for intelligent observability and automatic root cause detection. Get comprehensive, end-to-end observability with the flip of a switch and no code changes.
Epsagon helps you monitor, troubleshoot, and lower the cost for your Lambda functions. Epsagon’s extension reduces the overhead of sending traces to the Epsagon service, with minimal performance impact to your function.
HashiCorp Vault allows you to secure, store, and tightly control access to your application’s secrets and sensitive data. With the Vault extension, you can now authenticate and securely retrieve dynamic secrets before your Lambda function invokes.
Lumigo provides a monitoring and observability platform for serverless and microservices applications. The Lumigo extension enables the new Lumigo Lambda Profiler to see a breakdown of function resources, including CPU, memory, and network metrics. Receive actionable insights to reduce Lambda runtime duration and cost, fix bottlenecks, and increase efficiency.
Check Point CloudGuard provides full lifecycle security for serverless applications. The CloudGuard extension enables Function Self Protection data aggregation as an out-of-process extension, providing detection and alerting on application layer attacks.
New Relic provides a unified observability experience for your entire software stack. The New Relic extension uses a simpler companion process to report function telemetry data. This also requires fewer AWS permissions to add New Relic to your application.
Thundra provides an application debugging, observability and security platform for serverless, container and virtual machine (VM) workloads. The Thundra extension adds asynchronous telemetry reporting functionality to the Thundra agents, getting rid of network latency.
Splunk offers an enterprise-grade cloud monitoring solution for real-time full-stack visibility at scale. The Splunk extension provides a simplified runtime-independent interface to collect high-resolution observability data with minimal overhead. Monitor, manage, and optimize the performance and cost of your serverless applications with Splunk Observability solutions.
AWS AppConfig helps you manage, store, and safely deploy application configurations to your hosts at runtime. The AWS AppConfig extension integrates Lambda and AWS AppConfig seamlessly. Lambda functions have simple access to external configuration settings quickly and easily. Developers can now dynamically change their Lambda function’s configuration safely using robust validation features.
Amazon CloudWatch Lambda Insights enables you to efficiently monitor, troubleshoot, and optimize Lambda functions. The Lambda Insights extension simplifies the collection, visualization, and investigation of detailed compute performance metrics, errors, and logs. You can more easily isolate and correlate performance problems to optimize your Lambda environments.

You can also build and use your own extensions to integrate your organization’s tooling. For instance, the Cloud Foundations team at Square has built their own extension. They say:

The Cloud Foundations team at Square works to make the cloud accessible and secure. We partnered with the Security Infrastructure team, who builds infrastructure to secure Square’s sensitive data, to enable serverless applications at Square, and provide mTLS identities to Lambda.

Since beginning work on Lambda, we have focused on creating a streamlined developer experience. Teams adopting Lambda need to learn a lot about AWS, and we see extensions as a way to abstract away common use cases. For our initial exploration, we wanted to make accessing secrets easy, as with our current tools each Lambda function usually pulls 3-5 secrets.

The extension we built and open source fetches secrets on cold starts, before the Lambda function is invoked. Each function includes a configuration file that specifies which secrets to pull. We knew this configuration was key, as Lambda functions should only be doing work they need to do. The secrets are cached in the local /tmp directory, which the function reads when it needs the secret data. This makes Lambda functions not only faster, but reduces the amount of code for accessing secrets.

Showing extensions in action with AWS AppConfig

This demo shows an example of using the AWS AppConfig with a Lambda function. AWS AppConfig is a capability of AWS Systems Manager to create, manage, and quickly deploy application configurations. It lets you dynamically deploy external configuration without having to redeploy your applications. As AWS AppConfig has robust validation features, all configuration changes can be tested safely before rolling out to your applications.

AWS AppConfig has an available extension which gives Lambda functions access to external configuration settings quickly and easily. The extension runs a separate local process to retrieve and cache configuration data from the AWS AppConfig service. The function code can then fetch configuration data faster using a local call rather than over the network.

To set up the example, visit the GitHub repo and follow the instructions in the README.md file.

The example creates an AWS AppConfig application, environment, and configuration profile. It stores a loglevel value, initially set to normal.

AWS AppConfig application, environment, and configuration profile

An AWS AppConfig deployment runs to roll out the initial configuration.

AWS AppConfig deployment

The example contains two Lambda functions that include the AWS AppConfig extension. For a list of the layers that have the AppConfig extension, see the blog post “AWS AppConfig Lambda Extension”.

As extensions share the same permissions as Lambda functions, the functions have execution roles that allow access to retrieve the AWS AppConfig configuration.

Lambda function add layer

The functions use the extension to retrieve the loglevel value from AWS AppConfig, returning the value as a response. In a production application, this value could be used within function code to determine what level of information to send to CloudWatch Logs. For example, to troubleshoot an application issue, you can change the loglevel value centrally. Subsequent function invocations for both functions use the updated value.

Both Lambda functions are configured with an environment variable that specifies which AWS AppConfig configuration profile and value to use.

Lambda environment variable specifying AWS AppConfig profile

The functions also return whether the invocation is a cold start.

Running the functions with a test payload returns the loglevel value normal. The first invocation is a cold start.

{
  "event": {
    "hello": "world"
  },
  "ColdStart": true,
  "LogLevel": "normal"
}

Subsequent invocations return the same value with ColdStart set to false.

{
  "event": {
    "hello": "world"
  },
  "ColdStart": false,
  "LogLevel": "normal"
}

Create a new AWS Config hosted configuration profile version setting the loglevel value to verbose. Run a new AWS AppConfig deployment to update the value. The extension for both functions retrieves the new value. The function configuration itself is not changed.

Running another test invocation for both functions returns the updated value still without a cold start.

{
  "event": {
    "hello": "world"
  },
  "ColdStart": false,
  "LogLevel": "verbose"
}

AWS AppConfig has worked seamlessly with Lambda to update a dynamic external configuration setting for multiple Lambda functions without having to redeploy the function configuration.

The only function configuration required is to add the layer which contains the AWS AppConfig extension.

Pricing

Resources, security, and performance with extensions

Extensions run in the same execution environment as the function code. Therefore, they share resources with the function, such as CPU, memory, disk storage, and environment variables. They also share permissions, using the same AWS Identity and Access Management (IAM) role as the function.

You can configure up to 10 extensions per function, using up to five layers at a time. Multiple extensions can be included in a single layer.

The size of the extensions counts towards the deployment package limit. This cannot exceed the unzipped deployment package size limit of 250 MB.

External extensions are initialized before the runtime is started so can increase the delay before the function is invoked. Today, the function invocation response is returned after all extensions have completed. An extension that takes time to complete can increase the delay before the function response is returned. If an extension performs compute-intensive operations, function execution duration may increase. To measure the additional time the extension runs after the function invocation, use the new PostRuntimeExtensionsDuration CloudWatch metric to measure the extra time the extension takes after the function execution. To understand the impact of a specific extension, you can use the Duration and MaxMemoryUsed CloudWatch metrics, and run different versions of your function with and without the extension. Adding more memory to a function also proportionally increases CPU and network throughput.

The function and all extensions must complete within the function’s configured timeout setting which applies to the entire invoke phase.

Conclusion

Lambda extensions enable you to extend the Lambda service to more easily integrate with your favorite tools for monitoring, observability, security, and governance.

Today, you can install a number of available extensions from AWS Lambda Ready Partners. These cover use-cases such as application performance monitoring, secrets management, configuration management, and vulnerability detection. Extensions make it easier to use your existing tools with your serverless applications.

You can also build extensions to integrate your own tooling using the new Extensions API. For more information, see the companion post “Building Extensions for AWS Lambda – In preview“.

For more serverless learning resources, visit https://serverlessland.com.

10 additional AWS services authorized at DoD Impact Level 6 for the AWS Secret Region

2020-10-07 Tyler Harding

Post Syndicated from Tyler Harding original https://aws.amazon.com/blogs/security/10-additional-aws-services-authorized-dod-impact-level-6-for-aws-secret-region/

The Defense Information Systems Agency (DISA) has authorized 10 additional AWS services in the AWS Secret Region for production workloads at the Department of Defense (DoD) Impact Level (IL) 6 under the DoD’s Cloud Computing Security Requirements Guide (DoD CC SRG). With this authorization at DoD IL 6, DoD Mission Owners can process classified and mission critical workloads for National Security Systems in the AWS Secret Region. The AWS Secret Region is available to the Department of Defense on the AWS’s GSA IT Multiple Award Schedule.

AWS successfully completed an independent evaluation by members of the Intelligence Community (IC) that confirmed AWS effectively implemented 859 security controls using applicable criteria from NIST SP 800-53 Rev 4, the DoD CC SRG, and the Committee on National Security Systems Instruction No. 1253 at the Moderate Confidentiality, Moderate Integrity, and Moderate Availability impact levels.

The 10 AWS services newly authorized by DISA at IL 6 provide additional choices for DoD Mission Owners to use the capabilities of the AWS Cloud in service areas such as compute and storage, management and developer tools, analytics, and networking. With the addition of these 10 newly authorized AWS services (listed with links below), AWS expands the capabilities for DoD Mission Owners to use a total of 36 services and features.

Compute and Storage:

Amazon Elastic Container Registry (Amazon ECR): Reliably store, manage, and deploy containers for your applications.
Amazon Elastic Container Service (Amazon ECS): Use a fully managed secure container orchestration service to run the most sensitive and mission critical applications.
AWS Lambda: Run code without provisioning or managing servers and pay only for the compute time consumed.
AWS Snowball Edge: Undertake local processing and edge-computing workloads in addition to transferring data between your local environment and the AWS cloud.

Management and Developer Tools:

AWS Personal Health Dashboard: Monitor, manage, and optimize your AWS environment with a personalized view into the performance and availability of the AWS services underlying your AWS resources.
AWS Systems Manager: Automatically collect software inventory, apply OS patches, create system images, configure Windows and Linux operating systems, and seamlessly bridge your existing infrastructure with AWS.
AWS CodeDeploy: A fully managed deployment service that automates software deployments to a variety of compute services such as Amazon EC2, AWS Lambda, and on-premises servers.

Analytics:

AWS Data Pipeline: Reliably process and move data between different AWS compute and storage services, as well as on-premises data sources, at specified intervals.

Networking:

AWS PrivateLink: Use secure private connectivity between Amazon Virtual Private Cloud (Amazon VPC), AWS services, and on-premises applications on the AWS network, and eliminate the exposure of data to the public internet.
AWS Transit Gateway: Easily connect Amazon VPC, AWS accounts, and on-premises networks to a single gateway.

Figure 1: 10 additional AWS services authorized at DoD Impact Level 6

Newly authorized AWS services and features at DoD Impact Level 6

Existing authorized AWS services and features at DoD Impact Level 6

To learn more about AWS solutions for DoD, please see our AWS solution offerings. Follow the AWS Security Blog for future updates on our Services in Scope by Compliance Program page. If you have feedback about this post, let us know in the Comments section below.

Want more AWS Security how-to content, news, and feature announcements? Follow us on Twitter.

Amazon SageMaker Continues to Lead the Way in Machine Learning and Announces up to 18% Lower Prices on GPU Instances

2020-10-07 Julien Simon

Post Syndicated from Julien Simon original https://aws.amazon.com/blogs/aws/amazon-sagemaker-leads-way-in-machine-learning/

Since 2006, Amazon Web Services (AWS) has been helping millions of customers build and manage their IT workloads. From startups to large enterprises to public sector, organizations of all sizes use our cloud computing services to reach unprecedented levels of security, resiliency, and scalability. Every day, they’re able to experiment, innovate, and deploy to production in less time and at lower cost than ever before. Thus, business opportunities can be explored, seized, and turned into industrial-grade products and services.

As Machine Learning (ML) became a growing priority for our customers, they asked us to build an ML service infused with the same agility and robustness. The result was Amazon SageMaker, a fully managed service launched at AWS re:Invent 2017 that provides every developer and data scientist with the ability to build, train, and deploy ML models quickly.

Today, Amazon SageMaker is helping tens of thousands of customers in all industry segments build, train and deploy high quality models in production: financial services (Euler Hermes, Intuit, Slice Labs, Nerdwallet, Root Insurance, Coinbase, NuData Security, Siemens Financial Services), healthcare (GE Healthcare, Cerner, Roche, Celgene, Zocdoc), news and media (Dow Jones, Thomson Reuters, ProQuest, SmartNews, Frame.io, Sportograf), sports (Formula 1, Bundesliga, Olympique de Marseille, NFL, Guiness Six Nations Rugby), retail (Zalando, Zappos, Fabulyst), automotive (Atlas Van Lines, Edmunds, Regit), dating (Tinder), hospitality (Hotels.com, iFood), industry and manufacturing (Veolia, Formosa Plastics), gaming (Voodoo), customer relationship management (Zendesk, Freshworks), energy (Kinect Energy Group, Advanced Microgrid Systems), real estate (Realtor.com), satellite imagery (Digital Globe), human resources (ADP), and many more.

When we asked our customers why they decided to standardize their ML workloads on Amazon SageMaker, the most common answer was: “SageMaker removes the undifferentiated heavy lifting from each step of the ML process.” Zooming in, we identified five areas where SageMaker helps them most.

#1 – Build Secure and Reliable ML Models, Faster
As many ML models are used to serve real-time predictions to business applications and end users, making sure that they stay available and fast is of paramount importance. This is why Amazon SageMaker endpoints have built-in support for load balancing across multiple AWS Availability Zones, as well as built-in Auto Scaling to dynamically adjust the number of provisioned instances according to incoming traffic.

For even more robustness and scalability, Amazon SageMaker relies on production-grade open source model servers such as TensorFlow Serving, the Multi-Model Server, and TorchServe. A collaboration between AWS and Facebook, TorchServe is available as part of the PyTorch project, and makes it easy to deploy trained models at scale without having to write custom code.

In addition to resilient infrastructure and scalable model serving, you can also rely on Amazon SageMaker Model Monitor to catch prediction quality issues that could happen on your endpoints. By saving incoming requests as well as outgoing predictions, and by comparing them to a baseline built from a training set, you can quickly identify and fix problems like missing features or data drift.

Says Aude Giard, Chief Digital Officer at Veolia Water Technologies: “In 8 short weeks, we worked with AWS to develop a prototype that anticipates when to clean or change water filtering membranes in our desalination plants. Using Amazon SageMaker, we built a ML model that learns from previous patterns and predicts the future evolution of fouling indicators. By standardizing our ML workloads on AWS, we were able to reduce costs and prevent downtime while improving the quality of the water produced. These results couldn’t have been realized without the technical experience, trust, and dedication of both teams to achieve one goal: an uninterrupted clean and safe water supply.” You can learn more in this video.

#2 – Build ML Models Your Way
When it comes to building models, Amazon SageMaker gives you plenty of options. You can visit AWS Marketplace, pick an algorithm or a model shared by one of our partners, and deploy it on SageMaker in just a few clicks. Alternatively, you can train a model using one of the built-in algorithms, or your own code written for a popular open source ML framework (TensorFlow, PyTorch, and Apache MXNet), or your own custom code packaged in a Docker container.

You could also rely on Amazon SageMaker AutoPilot, a game-changing AutoML capability. Whether you have little or no ML experience, or you’re a seasoned practitioner who needs to explore hundreds of datasets, SageMaker AutoPilot takes care of everything for you with a single API call. It automatically analyzes your dataset, figures out the type of problem you’re trying to solve, builds several data processing and training pipelines, trains them, and optimizes them for maximum accuracy. In addition, the data processing and training source code is available in auto-generated notebooks that you can review, and run yourself for further experimentation. SageMaker Autopilot also now creates machine learning models up to 40% faster with up to 200% higher accuracy, even with small and imbalanced datasets.

Another popular feature is Automatic Model Tuning. No more manual exploration, no more costly grid search jobs that run for days: using ML optimization, SageMaker quickly converges to high-performance models, saving you time and money, and letting you deploy the best model to production quicker.

“NerdWallet relies on data science and ML to connect customers with personalized financial products“, says Ryan Kirkman, Senior Engineering Manager. “We chose to standardize our ML workloads on AWS because it allowed us to quickly modernize our data science engineering practices, removing roadblocks and speeding time-to-delivery. With Amazon SageMaker, our data scientists can spend more time on strategic pursuits and focus more energy where our competitive advantage is—our insights into the problems we’re solving for our users.” You can learn more in this case study.

Says Tejas Bhandarkar, Senior Director of Product, Freshworks Platform: “We chose to standardize our ML workloads on AWS because we could easily build, train, and deploy machine learning models optimized for our customers’ use cases. Thanks to Amazon SageMaker, we have built more than 30,000 models for 11,000 customers while reducing training time for these models from 24 hours to under 33 minutes. With SageMaker Model Monitor, we can keep track of data drifts and retrain models to ensure accuracy. Powered by Amazon SageMaker, Freddy AI Skills is constantly-evolving with smart actions, deep-data insights, and intent-driven conversations.“

#3 – Reduce Costs
Building and managing your own ML infrastructure can be costly, and Amazon SageMaker is a great alternative. In fact, we found out that the total cost of ownership (TCO) of Amazon SageMaker over a 3-year horizon is over 54% lower compared to other options, and developers can be up to 10 times more productive. This comes from the fact that Amazon SageMaker manages all the training and prediction infrastructure that ML typically requires, allowing teams to focus exclusively on studying and solving the ML problem at hand.

Furthermore, Amazon SageMaker includes many features that help training jobs run as fast and as cost-effectively as possible: optimized versions of the most popular machine learning libraries, a wide range of CPU and GPU instances with up to 100GB networking, and of course Managed Spot Training which lets you save up to 90% on your training jobs. Last but not least, Amazon SageMaker Debugger automatically identifies complex issues developing in ML training jobs. Unproductive jobs are terminated early, and you can use model information captured during training to pinpoint the root cause.

Amazon SageMaker also helps you slash your prediction costs. Thanks to Multi-Model Endpoints, you can deploy several models on a single prediction endpoint, avoiding the extra work and cost associated with running many low-traffic endpoints. For models that require some hardware acceleration without the need for a full-fledged GPU, Amazon Elastic Inference lets you save up to 90% on your prediction costs. At the other end of the spectrum, large-scale prediction workloads can rely on AWS Inferentia, a custom chip designed by AWS, for up to 30% higher throughput and up to 45% lower cost per inference compared to GPU instances.

Lyft, one of the largest transportation networks in the United States and Canada, launched its Level 5 autonomous vehicle division in 2017 to develop a self-driving system to help millions of riders. Lyft Level 5 aggregates over 10 terabytes of data each day to train ML models for their fleet of autonomous vehicles. Managing ML workloads on their own was becoming time-consuming and expensive. Says Alex Bain, Lead for ML Systems at Lyft Level 5: “Using Amazon SageMaker distributed training, we reduced our model training time from days to couple of hours. By running our ML workloads on AWS, we streamlined our development cycles and reduced costs, ultimately accelerating our mission to deliver self-driving capabilities to our customers.“

#4 – Build Secure and Compliant ML Systems
Security is always priority #1 at AWS. It’s particularly important to customers operating in regulated industries such as financial services or healthcare, as they must implement their solutions with the highest level of security and compliance. For this purpose, Amazon SageMaker implements many security features, making it compliant with the following global standards: SOC 1/2/3, PCI, ISO, FedRAMP, DoD CC SRG, IRAP, MTCS, C5, K-ISMS, ENS High, OSPAR, and HITRUST CSF. It’s also HIPAA BAA eligible.

Says Ashok Srivastava, Chief Data Officer, Intuit: “With Amazon SageMaker, we can accelerate our Artificial Intelligence initiatives at scale by building and deploying our algorithms on the platform. We will create novel large-scale machine learning and AI algorithms and deploy them on this platform to solve complex problems that can power prosperity for our customers.”

#5 – Annotate Data and Keep Humans in the Loop
As ML practitioners know, turning data into a dataset requires a lot of time and effort. To help you reduce both, Amazon SageMaker Ground Truth is a fully managed data labeling service that makes it easy to annotate and build highly accurate training datasets at any scale (text, image, video, and 3D point cloud datasets).

Says Magnus Soderberg, Director, Pathology Research, AstraZeneca: “AstraZeneca has been experimenting with machine learning across all stages of research and development, and most recently in pathology to speed up the review of tissue samples. The machine learning models first learn from a large, representative data set. Labeling the data is another time-consuming step, especially in this case, where it can take many thousands of tissue sample images to train an accurate model. AstraZeneca uses Amazon SageMaker Ground Truth, a machine learning-powered, human-in-the-loop data labeling and annotation service to automate some of the most tedious portions of this work, resulting in reduction of time spent cataloging samples by at least 50%.”

Amazon SageMaker is Evaluated
The hundreds of new features added to Amazon SageMaker since launch are testimony to our relentless innovation on behalf of customers. In fact, the service was highlighted in February 2020 as the overall leader in Gartner’s Cloud AI Developer Services Magic Quadrant. Gartner subscribers can click here to learn more about why we have an overall score of 84/100 in their “Solution Scorecard for Amazon SageMaker, July 2020”, the highest rating among our peer group. According to Gartner, we met 87% of required criteria, 73% of preferred, and 85% of optional.

Announcing a Price Reduction on GPU Instances

To thank our customers for their trust and to show our continued commitment to make Amazon SageMaker the best and most cost-effective ML service, I’m extremely happy to announce a significant price reduction on all ml.p2 and ml.p3 GPU instances. It will apply starting October 1st for all SageMaker components and across the following regions: US East (N. Virginia), US East (Ohio), US West (Oregon), EU (Ireland), EU (Frankfurt), EU (London), Canada (Central), Asia Pacific (Singapore), Asia Pacific (Sydney), Asia Pacific (Seoul), Asia Pacific (Tokyo), Asia Pacific (Mumbai), and AWS GovCloud (US-Gov-West).

Instance Name	Price Reduction
ml.p2.xlarge	-11%
ml.p2.8xlarge	-14%
ml.p2.16xlarge	-18%
ml.p3.2xlarge	-11%
ml.p3.8xlarge	-14%
ml.p3.16xlarge	-18%
ml.p3dn.24xlarge	-18%

Getting Started with Amazon SageMaker
As you can see, there are a lot of exciting features in Amazon SageMaker, and I encourage you to try them out! Amazon SageMaker is available worldwide, so chances are you can easily get to work on your own datasets. The service is part of the AWS Free Tier, letting new users work with it for free for hundreds of hours during the first two months.

If you’d like to kick the tires, this tutorial will get you started in minutes. You’ll learn how to use SageMaker Studio to build, train, and deploy a classification model based on the XGBoost algorithm.

Last but not least, I just published a book named “Learn Amazon SageMaker“, a 500-page detailed tour of all SageMaker features, illustrated by more than 60 original Jupyter notebooks. It should help you get up to speed in no time.

As always, we’re looking forward to your feedback. Please share it with your usual AWS support contacts, or on the AWS Forum for SageMaker.

– Julien

Bhuvaneswari Subramani – Bengaluru, India

Jared Short – Washington DC, USA

Matt Coulter – Belfast, Northern Ireland

Paul Duvall – Washington DC, USA

Sebastian Korfmann – Hamburg, Germany

Steve Gordon – East Sussex, United Kingdom

Thorsten Höger – Stuttgart, Germany

Ahmed Samir – Riyadh, Saudi Arabia

Anas Khattar – Beirut, Lebanon

Chris Gong – New York, USA

Damian Olguin – Cordoba, Argentina

Denis Bauer – Sydney, Australia

Denis Dyack – St. Catharines, Canada

Emrah Şamdan – Ankara, Turkey

Franck Pachot – Lausanne, Switzerland

Hiroko Nishimura – Washington DC, USA

Juliano Cristian – Santa Catarina, Brazil

Jungyoul Yu – Seoul, Korea

Juv Chan – Singapore

Lukonde Mwila – Johannesburg, South Africa

Magdalena Zawada – Rybnik, Poland

Nick Walter – Lincoln, USA

Renato Losio – Berlin, Germany

Tomasz Lakomy – Poznan, Poland

Overview

Logging extensions from AWS Lambda Ready Partners and AWS Partners available at launch

Showing a logging extension to send logs directly to S3

How do extensions receive logs?

Performance considerations

What happens if Lambda cannot deliver logs to an extension?

Disabling logging to CloudWatch Logs

Pricing

Conclusion

Organizational Level Operations

Application Level Design

Benefits of MTCS certification

AWS US East/West Regions (FedRAMP Moderate Authorization)

AWS GovCloud (US) Regions (FedRAMP High Authorization)

Common requirements in event-driven-architectures

Exploring the recruiting agency example

Deploying the application

Learning more about the implementation

Anti-corruption service

Analytics service

Inventory service

Conclusion

New Chair of the Board of Trustees: John Lazar

New Trustees: Amali de Alwis, Charles Leadbeater, Dan Labbad

New Member: Suranga Chandratillake

What is a Board of Trustees anyway?

How do you appoint Trustees?

The Lambda execution environment

How are extensions delivered and run?

Showing extensions in action

Pricing

Conclusion

Overview

AWS Lambda Ready Partners extensions available at launch

Showing extensions in action with AWS AppConfig

Pricing

Resources, security, and performance with extensions

Conclusion

The collective thoughts of the interwebz