Tag Archives: serverless

Why Signeasy chose AWS Serverless to build their SaaS dashboard

Post Syndicated from Venkatramana Ameth Achar original https://aws.amazon.com/blogs/architecture/why-signeasy-chose-aws-serverless-to-build-their-saas-dashboard/

Signeasy is a leading eSignature company that offers an easy-to-use, cross-platform and cloud-based eSignature and document transaction management software as a service (SaaS) solution for businesses. Over 43,000 companies worldwide use Signeasy to digitize and streamline business workflows. In this blog, you will learn why and how Signeasy used AWS Serverless to create a SaaS dashboard for their tenants.

Signeasy’s SaaS tenants asked for an easier way to get insights into tenant usage data on Signeasy’s eSignature platform. To address that, Signeasy built a self-service usage metrics dashboard for their SaaS tenant using AWS Serverless.

Usage reports

What was it like before the self-service dashboard experience? In the past, tenants requested Signeasy to share their usage metrics through support channels or emails. The Signeasy support team compiled the reports and then emailed the report back to the tenant to service the request. This was a repetitive manual task. It involved querying a database, fetching and collating the results into an Excel table to be emailed to the tenant. The turnaround time on these manual reports was eight hours.

The following table illustrates the report format (with example data) that the tenants received through email.

Archives usage reports

Figure 1. Archived usage reports

The design

Signeasy deliberated numerous aspects and arrived at the following design considerations:

  • Enhance tenant experience — Provide the reports to tenants on-demand, using a self-service mechanism.
  • Scalable aggregation queries — The reports ran aggregation queries on usage data within a time range on a relational database management system (RDBMS). Signeasy considered moving to a data store that has the scalability to store and run aggregation queries on millions of records.
  • Agility — Signeasy wanted to build the module in a time-bound manner and deliver it to tenants as quickly as possible.
  • Reduce infrastructure management — The load on the reports infrastructure that stores and processes data increases linearly in relation to the count of usage reports requested. This meant an increase in the undifferentiated heavy lifting of infrastructure management tasks such as capacity management and patching.

With the design considerations and constraints called out, Signeasy began to look for the suitable solution. Signeasy decided to build their usage reports on a serverless architecture. They chose AWS Serverless, because it offers scalable compute and database, application integration capabilities, automatic scaling, and a pay-for-use billing model. This reduces infrastructure management tasks such as capacity provisioning and patching. Refer to the following diagram to see how Signeasy augmented their existing SaaS with self-service usage reports.

Architecture of self-service usage reports

Architecture diagram depicting the data flow of the self-service usage reports

Figure 2. Architecture diagram depicting the data flow of the self-service usage reports

  1. Signeasy’s tenant users log in to the Signeasy portal to authenticate their tenant identity.
  2. The Signeasy portal uses a combination of tenant ID and user ID in JSON Web Tokens (JWT) to distinguish one tenant user from another when storing and processing documents.
  3. The documents are stored in Amazon Simple Storage Service (Amazon S3).
  4. The users’ actions are stored in the transactional database on Amazon Relational Database Service (Amazon RDS).
  5. The user actions are also written as messages into message queue on Amazon Simple Queue Service (Amazon SQS). Signeasy used the queue to loosely couple their existing microservices on Amazon Elastic Kubernetes Service (Amazon EKS) with the new serverless part of the stack.
  6. This allows Signeasy to asynchronously process the messages in Amazon SQS with minimal changes to the existing microservices on EKS.
  7. The messages are processed by a report writer service (Python script) on AWS Lambda and written to the reports database on Amazon Timestream. The reports database on Timestream stores metadata attributes such as user ID and signature document ID, signature document sent, signature request received, document signed, and signature request cancelled or declined, and timestamp of the data point. To view usage reports, the tenant administrators navigate to the Reports section of the Signeasy portal and select Usage Reports.
  8. The usage reports request from the (tenant) Web Client on the browser is an API call to Amazon API Gateway.
  9. API Gateway works as a front door for the backend reports service running on a separate Lambda function.
  10. The reports service on Lambda uses the user ID from login details to query the Amazon Timestream database to generate the report and send it back to the web client through the API Gateway. The report is immediately available for the administrator to view, which is a huge improvement from having to wait for eight hours before this self-service feature was made available to their SaaS tenants.

Following is a mock-up of the Usage Reports dashboard:

A mockup of the Usage Reports page of the Signeasy portal

Figure 3. A mock-up of the Usage Reports page of the Signeasy portal

So, how did AWS Serverless help Signeasy?

Amazon SQS persists messages up to 14 days, and enables retry functionality for message processed in Lambda. Lambda is an event-driven serverless compute service that manages deployment and runs code, with logging and monitoring through Amazon CloudWatch. The integration of API Gateway with Lambda helped Signeasy easily deploy and manage the backend processing logic for the reports service. As usage of the reports grew, Timestream continued to scale, without the need to re-architect their application. Signeasy continued to use SQL to query data within the reports database on Timestream in a cost optimized manner.

Signeasy used AWS Serverless for its functionality without the undifferentiated heavy lifting of infrastructure management tasks such as capacity provisioning and patching. Signeasy’s support team is now more focused on higher-level organizational needs such as customer engagements, quarterly business reviews, and signature and payment related issues instead of managing infrastructure.

Conclusion

  • Going from eight hours to on-demand self-service (0 hours) response time for usage reports is a huge improvement in their SaaS tenant experience.
  • The AWS Serverless services scale out and in to meet customer needs. Signeasy pays only for what they use, and they don’t run compute infrastructure 24/7 in anticipation of requests throughout the day.
  • Signeasy’s support and customer success teams have repurposed their time toward higher value customer engagements vs. capacity, or patch management.
  • Development time for the Usage Reports dashboard was two weeks.

Further reading

How Hudl built a cost-optimized AWS Glue pipeline with Apache Hudi datasets

Post Syndicated from Indira Balakrishnan original https://aws.amazon.com/blogs/big-data/how-hudl-built-a-cost-optimized-aws-glue-pipeline-with-apache-hudi-datasets/

This is a guest blog post co-written with Addison Higley and Ramzi Yassine from Hudl.

Hudl Agile Sports Technologies, Inc. is a Lincoln, Nebraska based company that provides tools for coaches and athletes to review game footage and improve individual and team play. Its initial product line served college and professional American football teams. Today, the company provides video services to youth, amateur, and professional teams in American football as well as other sports, including soccer, basketball, volleyball, and lacrosse. It now serves 170,000 teams in 50 different sports around the world. Hudl’s overall goal is to capture and bring value to every moment in sports.

Hudl’s mission is to make every moment in sports count. Hudl does this by expanding access to more moments through video and data and putting those moments in context. Our goal is to increase access by different people and increase context with more data points for every customer we serve. Using data to generate analytics, Hudl is able to turn data into actionable insights, telling powerful stories with video and data.

To best serve our customers and provide the most powerful insights possible, we need to be able to compare large sets of data between different sources. For example, enriching our MongoDB and Amazon DocumentDB (with MongoDB compatibility) data with our application logging data leads to new insights. This requires resilient data pipelines.

In this post, we discuss how Hudl has iterated on one such data pipeline using AWS Glue to improve performance and scalability. We talk about the initial architecture of this pipeline, and some of the limitations associated with this approach. We also discuss how we iterated on that design using Apache Hudi to dramatically improve performance.

Problem statement

A data pipeline that ensures high-quality MongoDB and Amazon DocumentDB statistics data is available in our central data lake, and is a requirement for Hudl to be able to deliver sports analytics. It’s important to maintain the integrity of the data between MongoDB and Amazon DocumentDB transactional data with the data lake capturing changes in near-real time along with upserts to records in the data lake. Because Hudl statistics are backed by MongoDB and Amazon DocumentDB databases, in addition to a broad range of other data sources, it’s important that relevant MongoDB and Amazon DocumentDB data is available in a central data lake where we can run analytics queries to compare statistics data between sources.

Initial design

The following diagram demonstrates the architecture of our initial design.

Intial Ingestion Pipeline Design

Let’s discuss the key AWS services of this architecture:

  • AWS Data Migration Service (AWS DMS) allowed our team to move quickly in delivering this pipeline. AWS DMS gives our team a full snapshot of the data, and also offers ongoing change data capture (CDC). By combining these two datasets, we can ensure our pipeline delivers the latest data.
  • Amazon Simple Storage Service (Amazon S3) is the backbone of Hudl’s data lake because of its durability, scalability, and industry-leading performance.
  • AWS Glue allows us to run our Spark workloads in a serverless fashion, with minimal setup. We chose AWS Glue for its ease of use and speed of development. Additionally, features such as AWS Glue bookmarking simplified our file management logic.
  • Amazon Redshift offers petabyte-scale data warehousing. Amazon Redshift provides consistently fast performance, and easy integrations with our S3 data lake.

The data processing flow includes the following steps:

  1. Amazon DocumentDB holds the Hudl statistics data.
  2. AWS DMS gives us a full export of statistics data from Amazon DocumentDB, and ongoing changes in the same data.
  3. In the S3 Raw Zone, the data is stored in JSON format.
  4. An AWS Glue job merges the initial load of statistics data with the changed statistics data to give a snapshot of statistics data in JSON format for reference, eliminating duplicates.
  5. In the S3 Cleansed Zone, the JSON data is normalized and converted to Parquet format.
  6. AWS Glue uses a COPY command to insert Parquet data into Amazon Redshift consumption base tables.
  7. Amazon Redshift stores the final table for consumption.

The following is a sample code snippet from the AWS Glue job in the initial data pipeline:

from awsglue.context import GlueContext 
from pyspark.sql.session import SparkSession

spark = SparkSession.builder.getOrCreate() 
spark_context = spark.sparkContext 
gc = GlueContext(spark_context)
   full_df = read_full_data()#Load entire dataset from S3 Cleansed Zone


cdc_df = read_cdc_data() # Read new CDC data which represents delta in the source MongoDB/DocumentDB


joined_df = full_df.join(cdc_df, '_id', 'full_outer') #Calculate final snapshot by joining the existing data with delta


result = joined_df.filter((joined_df.Op != 'D') | (joined_df.Op.isNull())) .select(coalesce(cdc_df._doc, full_df._doc).alias('_doc'))

gc.write_dynamic_frame.from_options(frame=DynamicFrame.fromDF(result, gc) , connection_type = "s3", connection_options = {"path": output_path}, format = "parquet", transformation_ctx = "ctx4")

Challenges

Although this initial solution met our need for data quality, we felt there was room for improvement:

  • The pipeline was slow – The pipeline ran slowly (over 2 hours) because for each batch, the whole dataset was compared. Every record had to be compared, flattened, and converted to Parquet, even when only a few records were changed from the previous daily run.
  • The pipeline was expensive – As the data size grew daily, the job duration also grew significantly (especially in step 4). To mitigate the impact, we needed to allocate more AWS Glue DPUs (Data Processing Units) to scale the job, which led to higher cost.
  • The pipeline limited our ability to scale – Hudl’s data has a long history of rapid growth with increasing customers and sporting events. Given this trend, our pipeline needed to run as efficiently as possible to handle only changing datasets to have predictable performance.

New design

The following diagram illustrates our updated pipeline architecture.

Although the overall architecture looks roughly the same, the internal logic in AWS Glue was significantly changed, along with addition of Apache Hudi datasets.

In step 4, AWS Glue now interacts with Apache HUDI datasets in the S3 Cleansed Zone to upsert or delete changed records as identified by AWS DMS CDC. The AWS Glue to Apache Hudi connector helps convert JSON data to Parquet format and upserts into the Apache HUDI dataset. Retaining the full documents in our Apache HUDI dataset allows us to easily make schema changes to our final Amazon Redshift tables without needing to re-export data from our source systems.

The following is a sample code snippet from the new AWS Glue pipeline:

from awsglue.context import GlueContext 
from pyspark.sql.session import SparkSession

spark = SparkSession.builder.getOrCreate() 
spark_context = spark.sparkContext 
gc = GlueContext(spark_context)

upsert_conf = {'className': 'org.apache.hudi', '
hoodie.datasource.hive_sync.use_jdbc': 'false', 
'hoodie.datasource.write.precombine.field': 'write_ts', 
'hoodie.datasource.write.recordkey.field': '_id', 
'hoodie.table.name': 'glue_table', 
'hoodie.consistency.check.enabled': 'true', 
'hoodie.datasource.hive_sync.database': 'glue_database', 'hoodie.datasource.hive_sync.table': 'glue_table', 'hoodie.datasource.hive_sync.enable': 'true', 'hoodie.datasource.hive_sync.support_timestamp': 'true', 'hoodie.datasource.hive_sync.sync_as_datasource': 'false', 
'path': 's3://bucket/prefix/', 'hoodie.compact.inline': 'false', 'hoodie.datasource.hive_sync.partition_extractor_class':'org.apache.hudi.hive.NonPartitionedExtractor, 'hoodie.datasource.write.keygenerator.class': 'org.apache.hudi.keygen.NonpartitionedKeyGenerator', 'hoodie.upsert.shuffle.parallelism': 200, 
'hoodie.datasource.write.operation': 'upsert', 
'hoodie.cleaner.policy': 'KEEP_LATEST_COMMITS', 
'hoodie.cleaner.commits.retained': 10 }

gc.write_dynamic_frame.from_options(frame=DynamicFrame.fromDF(cdc_upserts_df, gc, "cdc_upserts_df"), connection_type="marketplace.spark", connection_options=upsert_conf)

Results

With this new approach using Apache Hudi datasets with AWS Glue deployed after May 2022, the pipeline runtime was predictable and less expensive than the initial approach. Because we only handled new or modified records by eliminating the full outer join over the entire dataset, we saw an 80–90% reduction in runtime for this pipeline, thereby reducing costs by 80–90% compared to the initial approach. The following diagram illustrates our processing time before and after implementing the new pipeline.

Conclusion

With Apache Hudi’s open-source data management framework, we simplified incremental data processing in our AWS Glue data pipeline to manage data changes at the record level in our S3 data lake with CDC from Amazon DocumentDB.

We hope that this post will inspire your organization to build AWS Glue pipelines with Apache Hudi datasets that reduce cost and bring performance improvements using serverless technologies to achieve your business goals.


About the authors

Addison Higley is a Senior Data Engineer at Hudl. He manages over 20 data pipelines to help ensure data is available for analytics so Hudl can deliver insights to customers.

Ramzi Yassine is a Lead Data Engineer at Hudl. He leads the architecture, implementation of Hudl’s data pipelines and data applications, and ensures that our data empowers internal and external analytics.

Swagat Kulkarni is a Senior Solutions Architect at AWS and an AI/ML enthusiast. He is passionate about solving real-world problems for customers with cloud-native services and machine learning. Swagat has over 15 years of experience delivering several digital transformation initiatives for customers across multiple domains, including retail, travel and hospitality, and healthcare. Outside of work, Swagat enjoys travel, reading, and meditating.

Indira Balakrishnan is a Principal Solutions Architect in the AWS Analytics Specialist SA Team. She is passionate about helping customers build cloud-based analytics solutions to solve their business problems using data-driven decisions. Outside of work, she volunteers at her kids’ activities and spends time with her family.

How SOCAR built a streaming data pipeline to process IoT data for real-time analytics and control

Post Syndicated from DoYeun Kim original https://aws.amazon.com/blogs/big-data/how-socar-built-a-streaming-data-pipeline-to-process-iot-data-for-real-time-analytics-and-control/

SOCAR is the leading Korean mobility company with strong competitiveness in car-sharing. SOCAR has become a comprehensive mobility platform in collaboration with Nine2One, an e-bike sharing service, and Modu Company, an online parking platform. Backed by advanced technology and data, SOCAR solves mobility-related social problems, such as parking difficulties and traffic congestion, and changes the car ownership-oriented mobility habits in Korea.

SOCAR is building a new fleet management system to manage the many actions and processes that must occur in order for fleet vehicles to run on time, within budget, and at maximum efficiency. To achieve this, SOCAR is looking to build a highly scalable data platform using AWS services to collect, process, store, and analyze internet of things (IoT) streaming data from various vehicle devices and historical operational data.

This in-car device data, combined with operational data such as car details and reservation details, will provide a foundation for analytics use cases. For example, SOCAR will be able to notify customers if they have forgotten to turn their headlights off or to schedule a service if a battery is running low. Unfortunately, the previous architecture didn’t enable the enrichment of IoT data with operational data and couldn’t support streaming analytics use cases.

AWS Data Lab offers accelerated, joint-engineering engagements between customers and AWS technical resources to create tangible deliverables that accelerate data and analytics modernization initiatives. The Build Lab is a 2–5-day intensive build with a technical customer team.

In this post, we share how SOCAR engaged the Data Lab program to assist them in building a prototype solution to overcome these challenges, and to build the basis for accelerating their data project.

Use case 1: Streaming data analytics and real-time control

SOCAR wanted to utilize IoT data for a new business initiative. A fleet management system, where data comes from IoT devices in the vehicles, is a key input to drive business decisions and derive insights. This data is captured by AWS IoT and sent to Amazon Managed Streaming for Apache Kafka (Amazon MSK). By joining the IoT data to other operational datasets, including reservations, car information, device information, and others, the solution can support a number of functions across SOCAR’s business.

An example of real-time monitoring is when a customer turns off the car engine and closes the car door, but the headlights are still on. By using IoT data related to the car light, door, and engine, a notification is sent to the customer to inform them that the car headlights should be turned off.

Although this real-time control is important, they also want to collect historical data—both raw and curated data—in Amazon Simple Storage Service (Amazon S3) to support historical analytics and visualizations by using Amazon QuickSight.

Use case 2: Detect table schema change

The first challenge SOCAR faced was existing batch ingestion pipelines that were prone to breaking when schema changes occurred in the source systems. Additionally, these pipelines didn’t deliver data in a way that was easy for business analysts to consume. In order to meet the future data volumes and business requirements, they needed a pattern for the automated monitoring of batch pipelines with notification of schema changes and the ability to continue processing.

The second challenge was related to the complexity of the JSON files being ingested. The existing batch pipelines weren’t flattening the five-level nested structure, which made it difficult for business users and analysts to gain business insights without any effort on their end.

Overview of solution

In this solution, we followed the serverless data architecture to establish a data platform for SOCAR. This serverless architecture allowed SOCAR to run data pipelines continuously and scale automatically with no setup cost and without managing servers.

AWS Glue is used for both the streaming and batch data pipelines. Amazon Kinesis Data Analytics is used to deliver streaming data with subsecond latencies. In terms of storage, data is stored in Amazon S3 for historical data analysis, auditing, and backup. However, when frequent reading of the latest snapshot data is required by multiple users and applications concurrently, the data is stored and read from Amazon DynamoDB tables. DynamoDB is a key-value and document database that can support tables of virtually any size with horizontal scaling.

Let’s discuss the components of the solution in detail before walking through the steps of the entire data flow.

Component 1: Processing IoT streaming data with business data

The first data pipeline (see the following diagram) processes IoT streaming data with business data from an Amazon Aurora MySQL-Compatible Edition database.

Whenever a transaction occurs in two tables in the Aurora MySQL database, this transaction is captured as data and then loaded into two MSK topics via AWS Database Management (AWS DMS) tasks. One topic conveys the car information table, and the other topic is for the device information table. This data is loaded into a single DynamoDB table that contains all the attributes (or columns) that exist in the two tables in the Aurora MySQL database, along with a primary key. This single DynamoDB table contains the latest snapshot data from the two DB tables, and is important because it contains the latest information of all the cars and devices for the lookup against the streaming IoT data. If the lookup were done on the database directly with the streaming data, it would impact the production database performance.

When the snapshot is available in DynamoDB, an AWS Glue streaming job runs continuously to collect the IoT data and join it with the latest snapshot data in the DynamoDB table to produce the up-to-date output, which is written into another DynamoDB table.

The up-to-date data in DynamoDB is used for real-time monitoring and control that SOCAR’s Data Analytics team performs for safety maintenance and fleet management. This data is ultimately consumed by a number of apps to perform various business activities, including route optimization, real-time monitoring for oil consumption and temperature, and to identify a driver’s driving pattern, tire wear and defect detection, and real-time car crash notifications.

Component 2: Processing IoT data and visualizing the data in dashboards

The second data pipeline (see the following diagram) batch processes the IoT data and visualizes it in QuickSight dashboards.

There are two data sources. The first is the Aurora MySQL database. The two database tables are exported into Amazon S3 from the Aurora MySQL cluster and registered in the AWS Glue Data Catalog as tables. The second data source is Amazon MSK, which receives streaming data from AWS IoT Core. This requires you to create a secure AWS Glue connection for an Apache Kafka data stream. SOCAR’s MSK cluster requires SASL_SSL as a security protocol (for more information, refer to Authentication and authorization for Apache Kafka APIs). To create an MSK connection in AWS Glue and set up connectivity, we use the following CLI command:

aws glue create-connection —connection-input
'{"Name":"kafka-connection","Description":"kafka connection example",
"ConnectionType":"KAFKA",
"ConnectionProperties":{
"KAFKA_BOOTSTRAP_SERVERS":"<server-ip-addresses>",
"KAFKA_SSL_ENABLED":"true",
// "KAFKA_CUSTOM_CERT": "s3://bucket/prefix/cert.pem",
"KAFKA_SECURITY_PROTOCOL" : "SASL_SSL",
"KAFKA_SKIP_CUSTOM_CERT_VALIDATION":"false",
"KAFKA_SASL_MECHANISM": "SCRAM-SHA-512",
"KAFKA_SASL_SCRAM_USERNAME": "<username>",
"KAFKA_SASL_SCRAM_PASSWORD: "<password>"
},
"PhysicalConnectionRequirements":
{"SubnetId":"subnet-xxx","SecurityGroupIdList":["sg-xxx"],"AvailabilityZone":"us-east-1a"}}'

Component 3: Real-time control

The third data pipeline processes the streaming IoT data in millisecond latency from Amazon MSK to produce the output in DynamoDB, and sends a notification in real time if any records are identified as an outlier based on business rules.

AWS IoT Core provides integrations with Amazon MSK to set up real-time streaming data pipelines. To do so, complete the following steps:

  1. On the AWS IoT Core console, choose Act in the navigation pane.
  2. Choose Rules, and create a new rule.
  3. For Actions, choose Add action and choose Kafka.
  4. Choose the VPC destination if required.
  5. Specify the Kafka topic.
  6. Specify the TLS bootstrap servers of your Amazon MSK cluster.

You can view the bootstrap server URLs in the client information of your MSK cluster details. The AWS IoT rule was created with the Kafka topic as an action to provide data from AWS IoT Core to Kafka topics.

SOCAR used Amazon Kinesis Data Analytics Studio to analyze streaming data in real time and build stream-processing applications using standard SQL and Python. We created one table from the Kafka topic using the following code:

CREATE TABLE table_name (
column_name1 VARCHAR,
column_name2 VARCHAR(100),
column_name3 VARCHAR,
column_name4 as TO_TIMESTAMP (`time_column`, 'EEE MMM dd HH:mm:ss z yyyy'),
 WATERMARK FOR column AS column -INTERVAL '5' SECOND
)
PARTITIONED BY (column_name5)
WITH (
'connector'= 'kafka',
'topic' = 'topic_name',
'properties.bootstrap.servers' = '<bootstrap servers shown in the MSK client info dialog>',
'format' = 'json',
'properties.group.id' = 'testGroup1',
'scan.startup.mode'= 'earliest-offset'
);

Then we applied a query with business logic to identify a particular set of records that need to be alerted. When this data is loaded back into another Kafka topic, AWS Lambda functions trigger the downstream action: either load the data into a DynamoDB table or send an email notification.

Component 4: Flattening the nested structure JSON and monitoring schema changes

The final data pipeline (see the following diagram) processes complex, semi-structured, and nested JSON files.

This step uses an AWS Glue DynamicFrame to flatten the nested structure and then land the output in Amazon S3. After the data is loaded, it’s scanned by an AWS Glue crawler to update the Data Catalog table and detect any changes in the schema.

Data flow: Putting it all together

The following diagram illustrates our complete data flow with each component.

Let’s walk through the steps of each pipeline.

The first data pipeline (in red) processes the IoT streaming data with the Aurora MySQL business data:

  1. AWS DMS is used for ongoing replication to continuously apply source changes to the target with minimal latency. The source includes two tables in the Aurora MySQL database tables (carinfo and deviceinfo), and each is linked to two MSK topics via AWS DMS tasks.
  2. Amazon MSK triggers a Lambda function, so whenever a topic receives data, a Lambda function runs to load data into DynamoDB table.
  3. There is a single DynamoDB table with columns that exist from the carinfo table and the deviceinfo table of the Aurora MySQL database. This table consists of all the data from two tables and stores the latest data by performing an upsert operation.
  4. An AWS Glue job continuously receives the IoT data and joins it with data in the DynamoDB table to produce the output into another DynamoDB target table.
  5. This target table contains the final data, which includes all the device and car status information from the IoT devices as well as metadata from the Aurora MySQL table.

The second data pipeline (in green) batch processes IoT data to use in dashboards and for visualization:

  1. The car and reservation data (in two DB tables) is exported via a SQL command from the Aurora MySQL database with the output data available in an S3 bucket. The folders that contain data are registered as an S3 location for the AWS Glue crawler and become available via the AWS Glue Data Catalog.
  2. The MSK input topic continuously receives data from AWS IoT. Each car has a number of IoT devices, and each device captures data and sends it to an MSK input topic. The Amazon MSK S3 sink connector is configured to export data from Kafka topics to Amazon S3 in JSON formats. In addition, the S3 connector exports data by guaranteeing exactly-once delivery semantics to consumers of the S3 objects it produces.
  3. The AWS Glue job runs in a daily batch to load the historical IoT data into Amazon S3 and into two tables (refer to step 1) to produce the output data in an Enriched folder in Amazon S3.
  4. Amazon Athena is used to query data from Amazon S3 and make it available as a dataset in QuickSight for visualizing historical data.

The third data pipeline (in blue) processes streaming IoT data from Amazon MSK with millisecond latency to produce the output in DynamoDB and send a notification:

  1. An Amazon Kinesis Data Analytics Studio notebook powered by Apache Zeppelin and Apache Flink is used to build and deploy its output as a Kinesis Data Analytics application. This application loads data from Amazon MSK in real time, and users can apply business logic to select particular events coming from the IoT real-time data, for example, the car engine is off and the doors are closed, but the headlights are still on. The particular event that users want to capture can be sent to another MSK topic (Outlier) via the Kinesis Data Analytics application.
  2. Amazon MSK triggers a Lambda function, so whenever a topic receives data, a Lambda function runs to send an email notification to users that are subscribed to an Amazon Simple Notification Service (Amazon SNS) topic. An email is published using an SNS notification.
  3. The Kinesis Data Analytics application loads data from AWS IoT, applies business logic, and then loads it into another MSK topic (output). Amazon MSK triggers a Lambda function when data is received, which loads data into a DynamoDB Append table.
  4. Amazon Kinesis Data Analytics Studio is used to run SQL commands for ad hoc interactive analysis on streaming data.

The final data pipeline (in yellow) processes complex, semi-structured, and nested JSON files, and sends a notification when a schema evolves.

  1. An AWS Glue job runs and reads the JSON data from Amazon S3 (as a source), applies logic to flatten the nested schema using a DynamicFrame, and pivots out array columns from the flattened frame.
  2. The output is stored in Amazon S3 and is automatically registered to the AWS Glue Data Catalog table.
  3. Whenever there is a new attribute or change in the JSON input data at any level in the nested structure, the new attribute and change are captured in Amazon EventBridge as an event from the AWS Glue Data Catalog. An email notification is published using Amazon SNS.

Conclusion

As a result of the four-day Build Lab, the SOCAR team left with a working prototype that is custom fit to their needs, gaining a clear path to production. The Data Lab allowed the SOCAR team to build a new streaming data pipeline, enrich IoT data with operational data, and enhance the existing data pipeline to process complex nested JSON data. This establishes a baseline architecture to support the new fleet management system beyond the car-sharing business.


About the Authors

DoYeun Kim is the Head of Data Engineering at SOCAR. He is a passionate software engineering professional with 19+ years experience. He leads a team of 10+ engineers who are responsible for the data platform, data warehouse and MLOps engineering, as well as building in-house data products.

SangSu Park is a Lead Data Architect in SOCAR’s cloud DB team. His passion is to keep learning, embrace challenges, and strive for mutual growth through communication. He loves to travel in search of new cities and places.

YoungMin Park is a Lead Architect in SOCAR’s cloud infrastructure team. His philosophy in life is-whatever it may be-to challenge, fail, learn, and share such experiences to build a better tomorrow for the world. He enjoys building expertise in various fields and basketball.

Younggu Yun is a Senior Data Lab Architect at AWS. He works with customers around the APAC region to help them achieve business goals and solve technical problems by providing prescriptive architectural guidance, sharing best practices, and building innovative solutions together. In his free time, his son and he are obsessed with Lego blocks to build creative models.

Vicky Falconer leads the AWS Data Lab program across APAC, offering accelerated joint engineering engagements between teams of customer builders and AWS technical resources to create tangible deliverables that accelerate data analytics modernization and machine learning initiatives.

Introducing the AWS Lambda Telemetry API

Post Syndicated from Julian Wood original https://aws.amazon.com/blogs/compute/introducing-the-aws-lambda-telemetry-api/

This blog post is written by Anton Aleksandrov, Principal Solution Architect and Shridhar Pandey, Senior Product Manager

Today AWS is announcing the AWS Lambda Telemetry API. This provides an easier way to receive enhanced function telemetry directly from the Lambda service and send it to custom destinations. Developers and operators can now more easily monitor and observe their Lambda functions using Lambda extensions from their preferred observability tool providers.

Extensions can use the Lambda Logs API to collect logs generated by the Lambda service and code running in their Lambda function. While the Logs API provides extensions with access to logs, it does not provide a way to collect additional telemetry, such as traces and metrics, which the Lambda service generates during initialization and invocation of your Lambda function.

Previously, observability tools retrieved traces from AWS X-Ray using the AWS X-Ray API or built their own custom tracing libraries to generate traces during Lambda function invocation. Tools required customers to modify AWS Identity and Access Management (IAM) policies to grant access to the traces from X-Ray. This caused additional complexity for tools to collect traces and metrics from multiple sources and introduced latency in seeing Lambda function traces in observability tool dashboards.

The Lambda Telemetry API is a new API that enhances the existing Lambda Logs API functionality. With the new Telemetry API, observability tools can receive function and extension logs, and also events, traces, and metrics directly from within the Lambda service. You do not need to install additional tracing libraries. This reduces latency and simplifies access permissions, as the extension does not require additional access to X-Ray.

Today you can use Telemetry API-enabled extensions to send telemetry data to Coralogix, Datadog, Dynatrace, Lumigo, New Relic, Sedai, Site24x7, Serverless.com, Sumo Logic, Sysdig, Thundra, or your own custom destinations.

Overview

To receive logs, extensions subscribe using the new Lambda Telemetry API.

Lambda Telemetry API

Lambda Telemetry API

The Lambda service then streams the telemetry events directly to the extension. The events include platform events, trace spans, function and extension logs, and additional Lambda platform metrics. The extension can then process, filter, and route them to any preferred destination.

You can add an extension from the tooling provider of your choice to your Lambda function. You can deploy extensions, including ones that use the Telemetry API, as Lambda layers, with the AWS Management Console and AWS Command Line Interface (AWS CLI). You can also use infrastructure as code tools such as AWS CloudFormation, the AWS Serverless Application Model (AWS SAM), Serverless Framework, and Terraform.

Lambda Extensions from the AWS Partner Network (APN) available at launch

Today, you can use Lambda extensions that use Telemetry API from the following Lambda partners:

  • The Coralogix AWS Lambda Telemetry Exporter extension now offers improved monitoring and alerting for Lambda functions by further streamlining collection and correlation of logs, metrics, and traces.
  • The Datadog extension further simplifies how you visualize the impact of cold starts, and monitor and alert on latency, duration, and payload size of your Lambda functions by collecting logs, traces, and real-time metrics from your function in a simple and cost-effective way.
  • Dynatrace now provides a simplified observability configuration for AWS Lambda through a seamless integration. The new solution delivers low-latency telemetry, enables monitoring at scale, and helps reduce monitoring costs for your serverless workloads.
  • The Lumigo lambda-log-shipper extension simplifies aggregating and forwarding Lambda logs to third-party tools. It now also makes it easy for you to detect Lambda function timeouts.
  • The New Relic extension now provides a unified observability view for your Lambda functions with insights that help you better understand and optimize the performance of your functions.
  • Sedai now uses the Telemetry API to help you improve the performance and availability of your Lambda functions by gathering insights about your function and providing recommendations for manual and autonomous remediation in a cost-effective manner.
  • The Site24x7 extension now offers new metrics, which enable you to get deeper insights into the different phases of the Lambda function lifecycle, such as initialization and invocation.
  • Serverless.com now uses the Telemetry API to provide real-time performance details for your Lambda function through the Dev Mode feature of their new Serverless Console V.2 offering, which simplifies debugging in the AWS Cloud.
  • Sumo Logic now makes it easier, faster, and more cost-effective for you to get your mission-critical Lambda function telemetry sent directly to Sumo Logic so you could quickly analyze and remediate errors and exceptions.
  • The Sysdig Monitor extension generates and collects real-time metrics directly from the Lambda platform. The simplified instrumentation offers lower latency, reduced MTTR (mean time to resolution) for critical issues, and cost benefits while monitoring your serverless applications.
  • The Thundra extension enables you to export logs, metrics, and events for Lambda execution environment lifecycle events emitted by the Telemetry API to a destination of your choice such as an S3 bucket, a database, or a monitoring backend.

Seeing example Telemetry API extensions in action

This demo shows an example of using a telemetry extension to receive telemetry, batch, and send it to a desired destination.

To set up the example, visit the GitHub repo for the extension implemented in the language of your choice and follow the instructions in the README.md file.

To configure the batching behavior, which controls when the extension sends the data, set the Lambda environment variable DISPATCH_MIN_BATCH_SIZE. When the extension receives the batch threshold, it POSTs the telemetry events batch to the destination specified in the DISPATCH_POST_URI environment variable.

You can configure an example DISPATCH_POST_URL for the extension to deliver the telemetry data using https://webhook.site/.

Lambda environment variables

Lambda environment variables

Telemetry events for one invoke may be received and processed during the next invocation. Events for the last invoke may be processed during the SHUTDOWN event.

Test and invoke the function from the Lambda console, or AWS CLI. You can see that the webhook receives the telemetry data.

Webhook receiving telemetry data

Webhook receiving telemetry data

You can also view the function and extension logs in CloudWatch Logs. The example extension includes verbose logging to understand the extension lifecycle.

CloudWatch Logs showing extension verbose logging

Sample Telemetry API events

When the extension receives telemetry data, each event contains a JSON dictionary with additional information, such as related metrics or trace spans. The following example shows a function initialization event. You can see that the function initializes with on-demand concurrency. The runtime version is Node.js 14, the initialization is successful, and the initialization duration is 123 milliseconds.

{
  "time": "2022-08-02T12:01:23.521Z",
  "type": "platform.initStart",
  "record": {
    "initializationType": "on-demand",
    "phase":"init",
    "runtimeVersion": "nodejs-14.v3",
    "runtimeVersionArn": "arn"
  }
}

{
  "time": "2022-08-02T12:01:23.521Z",
  "type": "platform.initRuntimeDone",
  "record": {
    "initializationType": "on-demand",
    "status": "success"
  }
}

{
  "time": "2022-08-02T12:01:23.521Z",
  "type": "platform.initReport",
  "record": {
    "initializationType": "on-demand",
    "phase":"init",
    "metrics": {
      "durationMs": 123.0,
    }
  }
}

Function invocation events include the associated requestId and tracing information connecting this invocation with the X-Ray tracing context, and platform spans showing response latency and response duration as well as invocation metrics such as duration in milliseconds.

{
    "time": "2022-08-02T12:01:23.521Z",
    "type": "platform.start",
    "record": {
      "requestId": "e6b761a9-c52d-415d-b040-7ba94b9452f3",
      "version": "$LATEST",
      "tracing": {
        "spanId": "54565fb41ac79632",
        "type": "X-Amzn-Trace-Id",
        "value": "Root=1-62e900b2-710d76f009d6e7785905449a;Parent=0efbd19962d95b05;Sampled=1"
      }
    }
  }
  
  {
    "time": "2022-08-02T12:01:23.521Z",
    "type": "platform.runtimeDone",
    "record": {
      "requestId": "e6b761a9-c52d-415d-b040-7ba94b9452f3",
      "status": "success",
      "tracing": {
        "spanId": "54565fb41ac79632",
        "type": "X-Amzn-Trace-Id",
        "value": "Root=1-62e900b2-710d76f009d6e7785905449a;Parent=0efbd19962d95b05;Sampled=1"
      },
      "spans": [
        {
          "name": "responseLatency", 
          "start": "2022-08-02T12:01:23.521Z",
          "durationMs": 23.02
        },
        {
          "name": "responseDuration", 
          "start": "2022-08-02T12:01:23.521Z",
          "durationMs": 20
        }
      ],
      "metrics": {
        "durationMs": 200.0,
        "producedBytes": 15
      }
    }
  }
  
  {
    "time": "2022-08-02T12:01:23.521Z",
    "type": "platform.report",
    "record": {
      "requestId": "e6b761a9-c52d-415d-b040-7ba94b9452f3",
      "metrics": {
        "durationMs": 220.0,
        "billedDurationMs": 300,
        "memorySizeMB": 128,
        "maxMemoryUsedMB": 90,
        "initDurationMs": 200.0
      },
      "tracing": {
        "spanId": "54565fb41ac79632",
        "type": "X-Amzn-Trace-Id",
        "value": "Root=1-62e900b2-710d76f009d6e7785905449a;Parent=0efbd19962d95b05;Sampled=1"
      }
    }
  }

Building a Telemetry API extension

Lambda extensions run as independent processes in the execution environment and continue to run after the function invocation is fully processed. Because extensions run as separate processes, you can write them in a language different from the function code. We recommend implementing extensions using a compiled language as a self-contained binary. This makes the extension compatible with all the supported runtimes.

Extensions that use the Telemetry API have the following lifecycle.

Telemetry API lifecycle

Telemetry API lifecycle

  1. The extension registers itself using the Lambda Extension API and subscribes to receive INVOKE and SHUTDOWN events. With the Telemetry API, the registration response body contains additional information, such as function name, function version, and account ID.
  2. The extensions start a telemetry listener. This is a local HTTP or TCP endpoint. We recommend using HTTP rather than TCP.
  3. The extensions use the Telemetry API to subscribe to desired telemetry event streams.
  4. The Lambda service POSTs telemetry stream data to your telemetry listener. We recommend batching the telemetry data as it arrives to the listener. You can perform any custom processing on this data and send it on to an S3 bucket, other custom destination, or an external observability service.

See the Telemetry API documentation and sample extensions for additional details.

The Lambda Telemetry API supersedes the Lambda Logs API. While the Logs API remains fully functional, AWS recommends using the Telemetry API. New functionality is only available with the Extensions API. Extensions can only subscribe to either the Logs or Telemetry API. After subscribing to one of them, any attempt to subscribe to the other returns an error.

Mapping Telemetry API schema to OpenTelemetry spans

The Lambda Telemetry API schema is semantically compatible with OpenTelemetry (OTEL). You can use events received from the Telemetry API to build and report OTEL spans. Three Telemetry API lifecycle events represent a single function invocation: start, runtimeDone, and runtimeReport. You should represent this as a single OTEL span. You can add additional details to your spans using information available in runtimeDone events under the event.spans property.

Mapping of Telemetry API events to OTEL spans is described in the Telemetry API documentation.

Metrics and pricing

The Telemetry API introduces new per-invoke metrics to help you understand the impact of extensions on your function’s performance. The metrics are available within the report.runtimeDone event.

  • platform.runtime measures the time taken by the Lambda Runtime to run your function handler code.
  • producedBytes measures the number of bytes returned during the invoke phase.

There are also two new trace spans available within the report.runtimeDone event:

  • responseLatencyMs measures the time taken by the Runtime to send a response.
  • responseDurationMs measures the time taken by the Runtime to finish sending the response from when it starts streaming it.

Extensions using Telemetry API, like other extensions, share the same billing model as Lambda functions. When using Lambda functions with extensions, you pay for requests served, and the combined compute time used to run your code and all extensions, in 1-ms increments. To learn more about the billing for extensions, visit the Lambda pricing page.

Useful links

Conclusion

The Lambda Telemetry API allows you to receive enhanced telemetry data more easily using your preferred monitoring and observability tools. The Telemetry API enhances the functionality of the Logs API to receive logs, metrics, and traces directly from the Lambda service. Developers and operators can send telemetry to destinations without custom libraries, with reduced latency, and simplified permissions.

To see how the Telemetry API works, try the demos in the GitHub repository.

Build your own extensions using the Telemetry API today, or use extensions provided by the Lambda observability partners.

For more serverless learning resources, visit Serverless Land.

Enriching operational events with AWS Serverless

Post Syndicated from James Beswick original https://aws.amazon.com/blogs/compute/enriching-operational-events-with-aws-serverless/

This post was written by Ben Moses, Senior Solutions Architect, Enterprise.

AWS Serverless is a fit for many IT automation and operations use cases, especially for reacting to events. Infrastructure events are a useful way to understand the health of your infrastructure that supports your applications and customers and this blog examines using serverless to help enrich these operational events.

The scenario used in this post shows how an infrastructure event can be intercepted in real-time, enriched with additional information from your AWS environment and workloads, and be sent to a downstream consumer with the added valuable information.

This example focuses on Amazon EC2 state change events. The concept applies to any type of event, for example those emitted by other AWS services to Amazon CloudWatch Events. These events could also include events produced by AWS Config, and some of AWS CloudTrail’s events, including CloudTrail Insights.

The purpose is to add more valuable information and context to events in real-time. Operators and downstream consumers can then identify emerging patterns in near real-time.

How does this happen today?

It is common for existing solutions to store infrastructure events in whatever format the source system generates, or in a standardized open or proprietary format. Operations staff and systems then analyze these logs to understand patterns and to support root cause analysis. This data must often be enriched by other sources to give it context and meaning. This is done either in a scheduled batch operation by using CSV data from other systems, or by integrating with other enterprise tooling.

The state of your cloud infrastructure changes frequently due to the elasticity and disposability of resources. This can cause an issue with your data quality when using the schedule batch method. When you come to enrich an infrastructure event, the state may have changed by the time your scheduled batch runs. This leads to gaps or inaccuracies in data, which makes it harder for operators to spot trends and anomalies.

A serverless approach

This example uses serverless services and concepts from event driven architecture (EDA). With this architecture, you only pay when events happen and are enriched. There’s no need for any third-party tooling, and your events are enriched in near real-time.

The EC2 “State Change Event” is enriched by obtaining the instance’s name tag, if it has one. The end-to-end journey look like this:

Overview

  1. An EC2 instance’s state changes (i.e., shutdown, restart).
  2. An Amazon EventBridge rule that matches the event pattern triggers a target action to run an AWS Step Functions state machine.
  3. The state machine transforms inputs, makes a native AWS API SDK call to the EC2 service to find a name tag, and emits a newly enriched event back to EventBridge.
  4. An EventBridge rule matching the enriched event triggers an action to send an email via Amazon SNS to simulate a downstream consumer.

EventBridge is a serverless event bus that can be used with event driven architectures on AWS. An EventBridge rule is defined with a pattern, and if an event matches that pattern, then the rule’s target action is triggered. In this example, the rule is:

{
  "detail-type": ["EC2 Instance State-change Notification"],
  "source": ["aws.ec2"]
}

An EC2 state change event looks like this:

{
  "version": "0",
  "id": "672123fe-53aa-3b22-3b37-1fae26df2aff",
  "detail-type": "EC2 Instance State-change Notification",
  "source": "aws.ec2",
  "account": "1234567890",
  "time": "2022-08-17T18:25:01Z",
  "region": "eu-west-1",
  "resources": [
    "arn:aws:ec2:eu-west-1:1234567890:instance/i-1234567890"
  ],
  "detail": {
    "instance-id": "i-0123456789",
    "state": "running"
  }
}

See the detail-type and source fields in the event. These match the rule and this entire event payload is passed on to the next component of the architecture: the Step Functions state machine.

Step Functions uses JSONPath to select, transform, and move data through the states within a state machine. This flexibility means that, in this example, no compute resources such as AWS Lambda are required. This can mean less custom code, lower cost, and less complexity.

Step Functions Workflow Studio lets you design workflows visually. These are the key actions that take place when the state machine runs using the EC2 state change event:

Step Functions state machine

1. Remove problem characters from input

Pass states allow us to transform inputs and outputs. In this architecture, a Pass state is used to remove any problem characters from the incoming event that are known to cause issues in future steps, such as API calls to services.

In this example, the parameters for the API call used in Step 2 requires the EC2 instance ID. This information is in the detail of the original event, but the API action can’t use anything with a hyphen in it.

To solve this, use a JSONPath Parameter to effectively rewrite this information without the hyphen. This creates a new field named instanceid, which is assigned the value from the original event’s detail.

{
  "instanceid.$": "$.detail.instance-id"
}

2. Get instance name from Tag

The “EC2: DescribeInstances” task in Step Functions is an example of a native SDK integration with an AWS service. This action expects a single parameter to the API, an array of EC2 instance IDs.

{
  "InstanceIds.$": "States.Array($.detail.refined.instanceid)"
}

The States.Array() intrinsic function is used to wrap the instance ID from the re-written field created in step 1. This single-member array is then passed to the EC2 Describe Instances API.

When a response is received from the EC2 Describe Instances API call, it is passed to a Result Selector. The purpose of this is to extract the value of a “Name” tag, if one was returned from the EC2 Describe Instances API.

Step Functions supports the use of JSONPath filter expressions.

{
  "instancename.$": "$..Reservations[0].Instances[0].Tags[?(@.Key==Name)].Value",
  "instanceid.$": "$.Reservations[0].Instances[0].InstanceId"
}

To understand the advanced JSONPath filter expression used in this example, read this blog post.

If an error occurs with the API call, or the filter expression is unable to find a “Name” tag on the EC2 instance, then Step Functions allows you to handle these errors within the workflow.

3. Convert instance name to a string

The output from the previous state returns an array, but an EC2 instance can only have one unique “Name” tag. A pass state is used again, with a parameter as seen in Step 1. This parameter expression takes the first element from the array and stores it in a new field named instancename.

{
  "instancename.$": "$.detail.refined.instancename[0]",
  "instanceid.$": "$.detail.refined.instanceid"
}

As with previous steps, the instanceid is re-written as part of the output, and both of these values are appended to the state’s output.

4. Get default name from Parameter Store

If the filter expression in the result selector in step 2 fails for any reason, then Step Functions error handling moves here.

Failures can happen for a variety of reasons, and with Step Functions, you can branch out error handling for each different error type. In this example, all errors are dealt with the same regardless of the cause being a missing “Name” tag, or a permissions issue. In this architecture, a default placeholder value is used in place of the name of the instance. In your context, a different approach may be more suitable.

The default placeholder name is stored as a static value in AWS Systems Manager Parameter Store. The native Systems Manager: GetParameter action within Step Functions can retrieve this value directly. An advantage of this approach is that the parameter can be updated externally without having to make any changes to the Step Functions state machine itself.

5. Add ID back to refined

A pass state is used to format the response from the Parameter Store API and parameter expression then appends the default instance name on to the output.

Whether the workflow execution followed the intended execution path, or encountered an error, there is now an enriched event payload with an instance name.

6. Emit enriched event

The EventBridge: PutEvents native SDK action within Step Functions is used to construct and emit the enriched event.

{
  "Entries": [
    {
      "Detail": {
        "Message.$": "$"
      },
      "DetailType": "EnrichedEC2Event",
      "EventBusName": "serverless-event-enrichment-ApplicationEventBus",
      "Source": "custom.enriched.ec2"
    }
  ]
}

The DetailType and Source of the enriched event are custom values, specified in the last step of the state machine. As you consider schemas for your events within your organization, note that the AWS prefix is reserved for AWS service events.

The enriched event payload looks like this:

{
  "version": "0",
  "id": "a80e378b-e9a7-8007-1f18-b947e6d72c4b",
  "detail-type": "EnrichedEC2Event",
  "source": "custom.enriched.ec2",
  "account": "123456789",
  "time": "2022-08-17T18:25:03Z",
  "region": "eu-west-1",
  "resources": [
    "arn:aws:states:eu-west-1:123456789:stateMachine:EventEnrichmentStateMachine-2T5jFlCPOha1",
    "arn:aws:states:eu-west-1:123456789:execution:EventEnrichmentStateMachine-2T5jFlCPOha1:672123fe-53aa-3b22-3b37-1fae26df2aff_90821b68-ba92-2374-5015-8804c8da5769"
  ],
  "detail": {
    "Message": {
      "version": "0",
      "id": "672123fe-53aa-3b22-3b37-1fae26df2aff",
      "detail-type": "EC2 Instance State-change Notification",
      "source": "aws.ec2",
      "account": "123456789",
      "time": "2022-08-17T18:25:01Z",
      "region": "eu-west-1",
      "resources": [
        "arn:aws:ec2:eu-west-1:123456789:instance/i-123456789"
      ],
      "detail": {
        "instance-id": "i-123456789",
        "state": "running",
        "refined": {
          "instancename": "ec2-enrichment-demo-instance",
          "instanceid": "i-123456789"
        }
      }
    }
  }
}

Consuming enriched events

When enriching event data in real-time, the events are only valuable if they’re consumed. To use these enriched events, a consuming service must create and own a new EventBridge rule on the custom application bus. In this architecture, an appropriate rule pattern is:

{
  "detail-type": ["EnrichedEC2Event"],
  "source": ["custom.enriched.ec2"]
}

The target of the rule depends on the use case. For operational events, then service management applications or log aggregation services may make the most sense. In this example, the rule has an SNS topic as the target. When SNS receives a message, it is sent to operator via email. With EventBridge, future consumers can add their own rules to match the enriched events, and add their specific target actions to suit their use case.

Conclusion

This post shows how you can create rules in EventBridge to react to operational events from AWS services. These events are routed to Step Functions, which runs a workflow consisting of steps to enrich the event, handle errors, and emit the enriched event. The example shows how to consume the enriched events, resulting in an operator receiving an email.

This example is available on GitHub as an AWS Serverless Application Model (AWS SAM) template. It contains instructions to deploy, test, and then remove all of the resources when you’ve finished.

For more serverless learning resources, visit Serverless Land.

Server-side rendering micro-frontends – the architecture

Post Syndicated from James Beswick original https://aws.amazon.com/blogs/compute/server-side-rendering-micro-frontends-the-architecture/

This post is written by Luca Mezzalira, Principal Specialist Solutions Architect, Serverless.

Microservices are a common pattern for building distributed systems. As frontend developers have modified their approaches to build architectures at scale, many are building micro-frontends.

This blog series explores how to implement micro-frontends using a server-side rendering (SSR) approach with AWS services. This first article covers the architecture characteristics and building blocks for designing a successful micro-frontends architecture in the AWS Cloud.

What are micro-frontends?

Micro-frontends are the technical representation of a business subdomain. They allow independent teams to work in parallel, reducing external dependencies and increasing delivery throughput. They embody several microservices characteristics such as governance decentralization, design for failure, and evolutionary design.

The main difference between micro-frontends and components is related to the domain ownership present inside a micro-frontend. With components, the domain knowledge is usually delegated to its container, which knows how to use the component’s property based on the context. Owning the domain inside a micro-frontend enables the independence that you expect in a distributed system. This doesn’t mean that micro-frontends cannot communicate or share resources, but the mindset is different compared with components.

If you are using microservices today, you may benefit from micro-frontends for scaling your frontend applications. Before micro-frontends, scaling was based primarily on developers’ expertise. Micro-frontends allow you to modernize frontend applications iteratively like you would with microservices. Every user downloads only the code needed for accomplishing a specific task, increasing the performance and users experience of a web application.

Architecture characteristics

This blog series builds a product details page of an example ecommerce website using micro-frontends with serverless infrastructure.

Page layout

The page is composed of:

  • A template that includes a header. This could include more common parts but uses one in this example.
  • A notifications micro-frontend that is client-side rendered. The notifications system must react to user interactions, so cannot be server-side rendered with the rest of the page.
  • A product details micro-frontend.
  • A reviews micro-frontend.

Every micro-frontend is independent and can be developed by different teams working on the same project. This can reduce external dependencies and potential bugs across the entire application.

The key system characteristics of this project are:

  1. Server-side rendering: The system must be designed with a server-side rendering approach. This provides fast rendering of the page inside modern browsers and reduces the need of client-side JavaScript for rendering the page.
  2. Framework agnostic: The solution must work with a broad variety of JavaScript libraries available and not be bound or optimized to a specific framework.
  3. Use optimizations best practices: Optimization is a key feature for server-side rendering applications. Many industries rely on these characteristics for increasing sales. This example encapsulates core web vitals metrics, progressive hydration, and different levels of caches to speed up the response times of the webpages.
  4. Team independence: Every micro-frontend must be developed with minimum external dependencies. Constant coordination across teams can be a sign of design-time coupling that invalidates the purpose behind a distributed system.
  5. Serverless infrastructure for frontend developers: The serverless paradigm helps developers focus on the business logic instead of infrastructure, using a “pay for value” model, which helps to reduce costs. You can cache micro-frontend responses and reduce the traffic on the origin and the need to scale every part of the system in the same way.

High-level architecture design

This is the high-level design to incorporate these architectural characteristics:

Architectural overview

  1. The application entry point is a content delivery network (CDN) that is used for caching, performance, and security reasons.
  2. The server-side rendering approach requires a place to store all the static files to hydrate the JavaScript code in the browser and for styling components.
  3. Pages requests require a UI composer that retrieves the micro-frontends and stitches them together to provide the page consumed by a browser. It streams the final HTML page to the browser to enhance the largest contentful paint (LCP) metric from the core web vitals.
  4. Decouple micro-frontends from the UI composer relies on two mechanisms: A micro-frontends discovery that acts like a service discovery in a microservice architecture, and an HTML template per page that describes where to inject the micro-frontends inside a page. The templates can live in the same repository where the other static files are present.
  5. The notification micro-frontend reacts to user interactions, providing a notification when a user adds a product in the cart.
  6. The product details micro-frontend has highly cacheable data that doesn’t require many changes over time.
  7. The reviews micro-frontend must retrieve user reviews of a specific product.

The key element for avoiding design-time coupling in this architecture is the micro-frontends discovery. The main advantages of this approach are to provide discoverability to simplify multi-environments strategies, and also to reduce the blast radius thanks to using blue/green deployments or canary releases. This topic will be covered in depth in an upcoming post.

From high-level design into implementation

The framework-agnostic approach helps to enable control over system evolution. It achieves this by using HTML-over-the-wire, where every micro-frontend renders an HTML fragment and returns it to the UI composer.

When the UI composer gathers the HTML fragments, it composes the final page to render using transclusion. Every page is represented by a specific template hosted in static files. The UI composer retrieves the template and then retrieves placeholder references in the template that can be replaced with the micro-frontend fragments.

This is the architecture used:

Architecture diagram

  1. Amazon CloudFront provides a unique entry point to the application. The distribution has two origins: the first for static files and the second for the UI composer.
  2. Static files are hosted in an Amazon S3 bucket. They are consumed by the browser and the UI composer for HTML templates.
  3. The UI composer runs on a containers cluster in AWS Fargate. Using a containerized solution allows you to use streaming capabilities and multithreading rendering if needed.
  4. AWS Systems Manager Parameter Store is used as a basic micro-frontends discovery system. This service is a key-value store used by the UI composer for retrieving the micro-frontends endpoints to consume.
  5. The notifications micro-frontend stores the optimized JavaScript bundle in the S3 bucket. This renders on the client since it must react to user interactions.
  6. The reviews micro-frontend is composed by an AWS Lambda function with the user reviews stored in Amazon DynamoDB. It’s rendered fully server-side and it outputs an HTML fragment.
  7. The product details micro-frontend is a low-code micro-frontend using AWS Step Functions. The Express Workflow can be invoked synchronously and contains the logic for rendering the HTML fragment and a caching layer. This increases performance due to the native integration with over 200 AWS services.

Using this approach, every team developing a micro-frontend is independent to build and evolve their business domain. The main touchpoints with other teams are related to the initial integrations and the communication mechanism between micro-frontends present in the same page. When these points are achieved, every team reduces external dependencies and can embrace the evolutionary nature of micro-frontends.

Conclusion

This first post starts the journey into micro-frontends, a distributed architecture for frontend applications. The next post will explore the UI composer and micro-frontends discovery implementations.

If you are interested in learning more about micro-frontends, see the micro-frontends decisions framework, a mental model created for the initial complexity of approaching micro-frontends design. When used as a north star, the decisions framework simplifies the development of micro-frontends applications.

In the AWS reference architectures section, you can find a complete diagram similar to the application described in this blog series with additional details.

For more serverless learning resources, visit Serverless Land.

California State University Chancellor’s Office reduces cost and improves efficiency using Amazon QuickSight for streamlined HR reporting in higher education

Post Syndicated from Madi Hsieh original https://aws.amazon.com/blogs/big-data/california-state-university-chancellors-office-reduces-cost-and-improves-efficiency-using-amazon-quicksight-for-streamlined-hr-reporting-in-higher-education/

The California State University Chancellor’s Office (CSUCO) sits at the center of America’s most significant and diverse 4-year universities. The California State University (CSU) serves approximately 477,000 students and employs more than 55,000 staff and faculty members across 23 universities and 7 off-campus centers. The CSU provides students with opportunities to develop intellectually and personally, and to contribute back to the communities throughout California. For this large organization, managing a wide system of campuses while maintaining the decentralized autonomy of each is crucial. In 2019, they needed a highly secure tool to streamline the process of pulling HR data. The CSU had been using a legacy central data warehouse based on data from their financial system, but it lacked the robustness to keep up with modern technology. This wasn’t going to work for their HR reporting needs.

Looking for a tool to match the cloud-based infrastructure of their other operations, the Business Intelligence and Data Operations (BI/DO) team within the Chancellor’s Office chose Amazon QuickSight, a fast, easy-to-use, cloud-powered business analytics service that makes it easy for all employees within an organization to build visualizations, perform ad hoc analysis, and quickly get business insights from their data, any time, on any device. The team uses QuickSight to organize HR information across the CSU, implementing a centralized security system.

“It’s easy to use, very straightforward, and relatively intuitive. When you couple the experience of using QuickSight, with a huge cost difference to [the BI platform we had been using], to me, it’s a simple choice,”

– Andy Sydnor, Director Business Intelligence and Data Operations at the CSUCO.

With QuickSight, the team has the capability to harness security measures and deliver data insights efficiently across their campuses.

In this post, we share how the CSUCO uses QuickSight to reduce cost and improve efficiency in their HR reporting.

Delivering BI insights across the CSU’s 23 universities

The CSUCO serves the university system’s faculty, students, and staff by overseeing operations in several areas, including finance, HR, student information, and space and facilities. Since migrating to QuickSight in 2019, the team has built dashboards to support these operations. Dashboards include COVID-related leaves of absence, historical financial reports, and employee training data, along with a large selection of dashboards to track employee data at an individual campus level or from a system-wide perspective.

The team created a process for reading security roles from the ERP system and then translating them using QuickSight groups for internal HR reporting. QuickSight allowed them to match security measures with the benefits of low maintenance and familiarity to their end-users.

With QuickSight, the CSUCO is able to run a decentralized security process where campus security teams can provision access directly and users can get to their data faster. Before transitioning to QuickSight, the BI/DO team spent hours trying to get to specific individual-level data, but with QuickSight, the retrieval time was shortened to just minutes. For the first time, Sydnor and his team were able to pinpoint a specific employee’s work history without having to take additional actions to find the exact data they needed.

Cost savings compared to other BI tools

Sydnor shares that, for a public organization, one of the most attractive qualities of QuickSight is the immense cost savings. The BI/DO team at the Chancellor’s Office estimates that they’re saving roughly 40% on costs since switching from their previous BI platform, which is a huge benefit for a public organization of this scale. Their previous BI tool was costing them extensive amounts of money on licensing for features they didn’t require; the CSUCO felt they weren’t getting the best use of their investment.

The functionality of QuickSight to meet their reporting needs at an affordable price point is what makes QuickSight the CSUCO’s preferred BI reporting tool. Sydnor likes that with QuickSight, “we don’t have to go out and buy a subscription or a license for somebody, we can just provision access. It’s much easier to distribute the product.” QuickSight allows the CSUCO to focus their budget in other areas rather than having to pay for charges by infrequent users.

Simple and intuitive interface

Getting started in QuickSight was a no-brainer for Sydnor and his team. As a public organization, the procurement process can be cumbersome, thereby slowing down valuable time for putting their data to action. As an existing AWS customer, the CSUCO could seamlessly integrate QuickSight into their package of AWS services. An issue they were running into with other BI tools was encountering roadblocks to setting up the system, which wasn’t an issue with QuickSight, because it’s a fully managed service that doesn’t require deploying any servers.

The following screenshot shows an example of the CSUCO security audit dashboard.

example of the CSUCO security audit dashboard.

Sydnor tells us, “Our previous BI tool had a huge library of visualization, but we don’t need 95% of those. Our presentations look great with the breadth of visuals QuickSight provides. Most people just want the data and ultimately, need a robust vehicle to get data out of a database and onto a table or visualization.”

Converting from their original BI tool to QuickSight was painless for his team. Sydnor tells us that he has “yet to see something we can’t do with QuickSight.” One of Sydnor’s employees who was a user of the previous tool learned QuickSight in just 30 minutes. Now, they conduct QuickSight demos all the time.

Looking to the future: Expanding BI integration and adopting Amazon QuickSight Q

With QuickSight, the Chancellor’s Office aims to roll out more HR dashboards across its campuses and extend the tool for faculty use in the classroom. In the upcoming year, two campuses are joining CSUCO in building their own HR reporting dashboards through QuickSight. The organization is also making plans to use QuickSight to report on student data and implement external-facing dashboards. Some of the data points they’re excited to explore are insights into at-risk students and classroom scheduling on campus.

Thinking ahead, CSUCO is considering Amazon QuickSight Q, a machine learning-powered natural language capability that gives anyone in an organization the ability to ask business questions in natural language and receive accurate answers with relevant visualizations. Sydnor says, “How cool would that be if professors could go in and ask simple, straightforward questions like, ‘How many of my department’s students are taking full course loads this semester?’ It has a lot of potential.”

Summary

The CSUCO is excited to be a champion of QuickSight in the CSU, and are looking for ways to increase its implementation across their organization in the future.

To learn more, visit the website for the California State University Chancellor’s Office. For more on QuickSight, visit the Amazon QuickSight product page, or browse other Big Data Blog posts featuring QuickSight.


About the authors

Madi Hsieh, AWS 2022 Summer Intern, UCLA.

Tina Kelleher, Program Manager at AWS.

Build the next generation, cross-account, event-driven data pipeline orchestration product

Post Syndicated from Maria Guerra original https://aws.amazon.com/blogs/big-data/build-the-next-generation-cross-account-event-driven-data-pipeline-orchestration-product/

This is a guest post by Mehdi Bendriss, Mohamad Shaker, and Arvid Reiche from Scout24.

At Scout24 SE, we love data pipelines, with over 700 pipelines running daily in production, spread across over 100 AWS accounts. As we democratize data and our data platform tooling, each team can create, maintain, and run their own data pipelines in their own AWS account. This freedom and flexibility is required to build scalable organizations. However, it’s full of pitfalls. With no rules in place, chaos is inevitable.

We took a long road to get here. We’ve been developing our own custom data platform since 2015, developing most tools ourselves. Since 2016, we have our self-developed legacy data pipeline orchestration tool.

The motivation to invest a year of work into a new solution was driven by two factors:

  • Lack of transparency on data lineage, especially dependency and availability of data
  • Little room to implement governance

As a technical platform, our target user base for our tooling includes data engineers, data analysts, data scientists, and software engineers. We share the vision that anyone with relevant business context and minimal technical skills can create, deploy, and maintain a data pipeline.

In this context, in 2015 we created the predecessor of our new tool, which allows users to describe their pipeline in a YAML file as a list of steps. It worked well for a while, but we faced many problems along the way, notably:

  • Our product didn’t support pipelines to be triggered by the status of other pipelines, but based on the presence of _SUCCESS files in Amazon Simple Storage Service (Amazon S3). Here we relied on periodic pulls. In complex organizations, data jobs often have strong dependencies to other work streams.
  • Given the previous point, most pipelines could only be scheduled based on a rough estimate of when their parent pipelines might finish. This led to cascaded failures when the parents failed or didn’t finish on time.
  • When a pipeline fails and gets fixed, then manually redeployed, all its dependent pipelines must be rerun manually. This means that the data producer bears the responsibility of notifying every single team downstream.

Having data and tooling democratized without the ability to provide insights into which jobs, data, and dependencies exist diminishes synergies within the company, leading to silos and problems in resource allocation. It became clear that we needed a successor for this product that would give more flexibility to the end-user, less computing costs, and no infrastructure management overhead.

In this post, we describe, through a hypothetical case study, the constraints under which the new solution should perform, the end-user experience, and the detailed architecture of the solution.

Case study

Our case study looks at the following teams:

  • The core-data-availability team has a data pipeline named listings that runs every day at 3:00 AM on the AWS account Account A, and produces on Amazon S3 an aggregate of the listings events published on the platform on the previous day.
  • The search team has a data pipeline named searches that runs every day at 5:00 AM on the AWS account Account B, and exports to Amazon S3 the list of search events that happened on the previous day.
  • The rent-journey team wants to measure a metric referred to as X; they create a pipeline named pipeline-X that runs daily on the AWS account Account C, and relies on the data of both previous pipelines. pipeline-X should only run daily, and only after both the listings and searches pipelines succeed.

User experience

We provide users with a CLI tool that we call DataMario (relating to its predecessor DataWario), and which allows users to do the following:

  • Set up their AWS account with the necessary infrastructure needed to run our solution
  • Bootstrap and manage their data pipeline projects (creating, deploying, deleting, and so on)

When creating a new project with the CLI, we generate (and require) every project to have a pipeline.yaml file. This file describes the pipeline steps and the way they should be triggered, alerting, type of instances and clusters in which the pipeline will be running, and more.

In addition to the pipeline.yaml file, we allow advanced users with very niche and custom needs to create their pipeline definition entirely using a TypeScript API we provide them, which allows them to use the whole collection of constructs in the AWS Cloud Development Kit (AWS CDK) library.

For the sake of simplicity, we focus on the triggering of pipelines and the alerting in this post, along with the definition of pipelines through pipeline.yaml.

The listings and searches pipelines are triggered as per a scheduling rule, which the team defines in the pipeline.yaml file as follows:

trigger: 
    schedule: 
        hour: 3

pipeline-x is triggered depending on the success of both the listings and searches pipelines. The team defines this dependency relationship in the project’s pipeline.yaml file as follows:

trigger: 
    executions: 
        allOf: 
            - name: listings 
              account: Account_A_ID 
              status: 
                  - SUCCESS 
            - name: searches 
              account: Account_B_ID 
              status: 
                  - SUCCESS

The executions block can define a complex set of relationships by combining the allOf and anyOf blocks, along with a logical operator operator: OR / AND, which allows mixing the allOf and anyOf blocks. We focus on the most basic use case in this post.

Accounts setup

To support alerting, logging, and dependencies management, our solution has components that must be pre-deployed in two types of accounts:

  • A central AWS account – This is managed by the Data Platform team and contains the following:
    • A central data pipeline Amazon EventBridge bus receiving all the run status changes of AWS Step Functions workflows running in user accounts
    • An AWS Lambda function logging the Step Functions workflow run changes in an Amazon DynamoDB table to verify if any downstream pipelines should be triggered based on the current event and previous run status changes log
    • A Slack alerting service to send alerts to the Slack channels specified by users
    • A trigger management service that broadcasts triggering events to the downstream buses in the user accounts
  • All AWS user accounts using the service – These accounts contain the following:
    • A data pipeline EventBridge bus that receives Step Functions workflow run status changes forwarded from the central EventBridge bus
    • An S3 bucket to store data pipelines artifacts, along their logs
    • Resources needed to run Amazon EMR clusters, like security groups, AWS Identity and Access Management (IAM) roles, and more

With the provided CLI, users can set up their account by running the following code:

$ dpc setup-user-account

Solution overview

The following diagram illustrates the architecture of the cross-account, event-driven pipeline orchestration product.

In this post, we refer to the different colored and numbered squares to reference a component in the architecture diagram. For example, the green square with label 3 refers to the EventBridge bus default component.

Deployment flow

This section is illustrated with the orange squares in the architecture diagram.

A user can create a project consisting of a data pipeline or more using our CLI tool as follows:

$ dpc create-project -n 'project-name'

The created project contains several components that allow the user to create and deploy data pipelines, which are defined in .yaml files (as explained earlier in the User experience section).

The workflow of deploying a data pipeline such as listings in Account A is as follows:

  • Deploy listings by running the command dpc deploy in the root folder of the project. An AWS CDK stack with all required resources is automatically generated.
  • The previous stack is deployed as an AWS CloudFormation template.
  • The stack uses custom resources to perform some actions, such as storing information needed for alerting and pipeline dependency management.
  • Two Lambda functions are triggered, one to store the mapping pipeline-X/slack-channels used for alerting in a DynamoDB table, and another one to store the mapping between the deployed pipeline and its triggers (other pipelines that should result in triggering the current one).
  • To decouple alerting and dependency management services from the other components of the solution, we use Amazon API Gateway for two components:
    • The Slack API.
    • The dependency management API.
  • All calls for both APIs are traced in Amazon CloudWatch log groups and two Lambda functions:
    • The Slack channel publisher Lambda function, used to store the mapping pipeline_name/slack_channels in a DynamoDB table.
    • The dependencies publisher Lambda function, used to store the pipelines dependencies (the mapping pipeline_name/parents) in a DynamoDB table.

Pipeline trigger flow

This is an event-driven mechanism that ensures that data pipelines are triggered as requested by the user, either following a schedule or a list of fulfilled upstream conditions, such as a group of pipelines succeeding or failing.

This flow relies heavily on EventBridge buses and rules, specifically two types of rules:

  • Scheduling rules.
  • Step Functions event-based rules, with a payload matching the set of statuses of all the parents of a given pipeline. The rules indicate for which set of statuses all the parents of pipeline-X should be triggered.

Scheduling

This section is illustrated with the black squares in the architecture diagram.

The listings pipeline running on Account A is set to run every day at 3:00 AM. The deployment of this pipeline creates an EventBridge rule and a Step Functions workflow for running the pipeline:

  • The EventBridge rule is of type schedule and is created on the default bus (this is the EventBridge bus responsible for listening to native AWS events—this distinction is important to avoid confusion when introducing the other buses). This rule has two main components:
    • A cron-like notation to describe the frequency at which it runs: 0 3 * * ? *.
    • The target, which is the Step Functions workflow describing the workflow of the listings pipeline.
  • The listings Step Function workflow describes and runs immediately when the rule gets triggered. (The same happens to the searches pipeline.)

Each user account has a default EventBridge bus, which listens to the default AWS events (such as the run of any Lambda function) and scheduled rules.

Dependency management

This section is illustrated with the green squares in the architecture diagram. The current flow starts after the Step Functions workflow (black square 2) starts, as explained in the previous section.

As a reminder, pipeline-X is triggered when both the listings and searches pipelines are successful. We focus on the listings pipeline for this post, but the same applies to the searches pipeline.

The overall idea is to notify all downstream pipelines that depend on it, in every AWS account, passing by and going through the central orchestration account, of the change of status of the listings pipeline.

It’s then logical that the following flow gets triggered multiple times per pipeline (Step Functions workflow) run as its status changes from RUNNING to either SUCCEEDED, FAILED, TIMED_OUT, or ABORTED. The reason being that there could be pipelines downstream potentially listening on any of those status change events. The steps are as follows:

  • The event of the Step Functions workflow starting is listened to by the default bus of Account A.
  • The rule export-events-to-central-bus, which specifically listens to the Step Function workflow run status change events, is then triggered.
  • The rule forwards the event to the central bus on the central account.
  • The event is then caught by the rule trigger-events-manager.
  • This rule triggers a Lambda function.
  • The function gets the list of all children pipelines that depend on the current run status of listings.
  • The current run is inserted in the run log Amazon Relational Database Service (Amazon RDS) table, following the schema sfn-listings, time (timestamp), status (SUCCEEDED, FAILED, and so on). You can query the run log RDS table to evaluate the running preconditions of all children pipelines and get all those that qualify for triggering.
  • A triggering event is broadcast in the central bus for each of those eligible children.
  • Those events get broadcast to all accounts through the export rules—including Account C, which is of interest in our case.
  • The default EventBridge bus on Account C receives the broadcasted event.
  • The EventBridge rule gets triggered if the event content matches the expected payload of the rule (notably that both pipelines have a SUCCEEDED status).
  • If the payload is valid, the rule triggers the Step Functions workflow pipeline-X and triggers the workflow to provision resources (which we discuss later in this post).

Alerting

This section is illustrated with the gray squares in the architecture diagram.

Many teams handle alerting differently across the organization, such as Slack alerting messages, email alerts, and OpsGenie alerts.

We decided to allow users to choose their preferred methods of alerting, giving them the flexibility to choose what kind of alerts to receive:

  • At the step level – Tracking the entire run of the pipeline
  • At the pipeline level – When it fails, or when it finishes with a SUCCESS or FAILED status

During the deployment of the pipeline, a new Amazon Simple Notification Service (Amazon SNS) topic gets created with the subscriptions matching the targets specified by the user (URL for OpsGenie, Lambda for Slack or email).

The following code is an example of what it looks like in the user’s pipeline.yaml:

notifications:
    type: FULL_EXECUTION
    targets:
        - channel: SLACK
          addresses:
               - data-pipeline-alerts
        - channel: EMAIL
          addresses:
               - [email protected]

The alerting flow includes the following steps:

  1. As the pipeline (Step Functions workflow) starts (black square 2 in the diagram), the run gets logged into CloudWatch Logs in a log group corresponding to the name of the pipeline (for example, listings).
  2. Depending on the user preference, all the run steps or events may get logged or not thanks to a subscription filter whose target is the execution-tracker-lambda Lambda function. The function gets called anytime a new event gets published in CloudWatch.
  3. This Lambda function parses and formats the message, then publishes it to the SNS topic.
  4. For the email and OpsGenie flows, the flow stops here. For posting the alert message on Slack, the Slack API caller Lambda function gets called with the formatted event payload.
  5. The function then publishes the message to the /messages endpoint of the Slack API Gateway.
  6. The Lambda function behind this endpoint runs, and posts the message in the corresponding Slack channel and under the right Slack thread (if applicable).
  7. The function retrieves the secret Slack REST API key from AWS Secrets Manager.
  8. It retrieves the Slack channels in which the alert should be posted.
  9. It retrieves the root message of the run, if any, so that subsequent messages get posted under the current run thread on Slack.
  10. It posts the message on Slack.
  11. If this is the first message for this run, it stores the mapping with the DB schema execution/slack_message_id to initiate a thread for future messages related to the same run.

Resource provisioning

This section is illustrated with the light blue squares in the architecture diagram.

To run a data pipeline, we need to provision an EMR cluster, which in turn requires some information like Hive metastore credentials, as shown in the workflow. The workflow steps are as follows:

  • Trigger the Step Functions workflow listings on schedule.
  • Run the listings workflow.
  • Provision an EMR cluster.
  • Use a custom resource to decrypt the Hive metastore password to be used in Spark jobs relying on central Hive tables or views.

End-user experience

After all preconditions are fulfilled (both the listings and searches pipelines succeeded), the pipeline-X workflow runs as shown in the following diagram.

As shown in the diagram, the pipeline description (as a sequence of steps) defined by the user in the pipeline.yaml is represented by the orange block.

The steps before and after this orange section are automatically generated by our product, so users don’t have to take care of provisioning and freeing compute resources. In short, the CLI tool we provide our users synthesizes the user’s pipeline definition in the pipeline.yaml and generates the corresponding DAG.

Additional considerations and next steps

We tried to stay consistent and stick to one programming language for the creation of this product. We chose TypeScript, which played well with AWS CDK, the infrastructure as code (IaC) framework that we used to build the infrastructure of the product.

Similarly, we chose TypeScript for building the business logic of our Lambda functions, and of the CLI tool (using Oclif) we provide for our users.

As demonstrated in this post, EventBridge is a powerful service for event-driven architectures, and it plays a central and important role in our products. As for its limitations, we found that pairing Lambda and EventBridge could fulfill all our current needs and granted a high level of customization that allowed us to be creative in the features we wanted to serve our users.

Needless to say, we plan to keep developing the product, and have a multitude of ideas, notably:

  • Extend the list of core resources on which workloads run (currently only Amazon EMR) by adding other compute services, such Amazon Elastic Compute Cloud (Amazon EC2)
  • Use the Constructs Hub to allow users in the organization to develop custom steps to be used in all data pipelines (we currently only offer Spark and shell steps, which suffice in most cases)
  • Use the stored metadata regarding pipeline dependencies for data lineage, to have an overview of the overall health of the data pipelines in the organization, and more

Conclusion

This architecture and product brought many benefits. It allows us to:

  • Have a more robust and clear dependency management of data pipelines at Scout24.
  • Save on compute costs by avoiding scheduling pipelines based approximately on when its predecessors are usually triggered. By shifting to an event-driven paradigm, no pipeline gets started unless all its prerequisites are fulfilled.
  • Track our pipelines granularly and in real time on a step level.
  • Provide more flexible and alternative business logic by exposing multiple event types that downstream pipelines can listen to. For example, a fallback downstream pipeline might be run in case of a parent pipeline failure.
  • Reduce the cross-team communication overhead in case of failures or stopped runs by increasing the transparency of the whole pipelines’ dependency landscape.
  • Avoid manually restarting pipelines after an upstream pipeline is fixed.
  • Have an overview of all jobs that run.
  • Support the creation of a performance culture characterized by accountability.

We have big plans for this product. We will use DataMario to implement granular data lineage, observability, and governance. It’s a key piece of infrastructure in our strategy to scale data engineering and analytics at Scout24.

We will make DataMario open source towards the end of 2022. This is in line with our strategy to promote our approach to a solution on a self-built, scalable data platform. And with our next steps, we hope to extend this list of benefits and ease the pain in other companies solving similar challenges.

Thank you for reading.


About the authors

Mehdi Bendriss is a Senior Data / Data Platform Engineer, MSc in Computer Science and over 9 years of experience in software, ML, and data and data platform engineering, designing and building large-scale data and data platform products.

Mohamad Shaker is a Senior Data / Data Platform Engineer, with over 9 years of experience in software and data engineering, designing and building large-scale data and data platform products that enable users to access, explore, and utilize their data to build great data products.

Arvid Reiche is a Data Platform Leader, with over 9 years of experience in data, building a data platform that scales and serves the needs of the users.

Marco Salazar is a Solutions Architect working with Digital Native customers in the DACH region with over 5 years of experience building and delivering end-to-end, high-impact, cloud native solutions on AWS for Enterprise and Sports customers across EMEA. He currently focuses on enabling customers to define technology strategies on AWS for the short- and long-term that allow them achieve their desired business objectives, specializing on Data and Analytics engagements. In his free time, Marco enjoys building side-projects involving mobile/web apps, microcontrollers & IoT, and most recently wearable technologies.

Retain more for less with tiered storage for Amazon MSK

Post Syndicated from Masudur Rahaman Sayem original https://aws.amazon.com/blogs/big-data/retain-more-for-less-with-tiered-storage-for-amazon-msk/

Organizations are adopting Apache Kafka and Amazon Managed Streaming for Apache Kafka (Amazon MSK) to capture and analyze data in real-time. Amazon MSK allows you to build and run production applications on Apache Kafka without needing Kafka infrastructure management expertise or having to deal with the complex overheads associated with running Apache Kafka on your own. With increasing maturity, customers seek to build sophisticated use cases that combine aspects of real time and batch processing. For instance, you may want to train machine learning (ML) models based on historic data and then use these models to do real time inferencing. Or you may want to be able to recompute previous results when the application logic changed, e.g., when a new KPI is added to a streaming analytics application or when a bug was fixed that caused incorrect output. These use cases often require storing data for several weeks, months, or even years.

Apache Kafka is well positioned to support these kind of use cases. Data is retained in the Kafka cluster as long as required by configuring the retention policy. In this way, the most recent data can be processed in real time for low-latency use cases while historic data remains accessible in the cluster and can be processed in a batch fashion.

However, retaining data in a Kafka cluster can become expensive because storage and compute are tightly coupled in a cluster. To scale storage, you need to add more brokers. But adding more brokers with the sole purpose of increasing the storage squanders the rest of the compute resources like CPU and memory. Also, a large cluster with more nodes adds operational complexity with a longer time to recover and rebalance when a broker fails. To avoid that operational complexity and higher cost, you can move your data to Amazon Simple Storage Service (Amazon S3) for long-term access and with cost-effective storage classes in Amazon S3 you can optimize your overall storage cost. This solves cost challenges, but now you have to build and maintain that part of the architecture for data movement to a different data store. You also need to build different data processing logic using different APIs for consuming data (Kafka API for streaming, Amazon S3 API for historic reads).

Today, we’re announcing Amazon MSK tiered storage, which brings a virtually unlimited and low-cost storage tier for Amazon MSK, making it simpler and cost-effective for developers to build streaming data applications. Since the launch of Amazon MSK in 2019, we have enabled capabilities such as vertical scaling and automatic scaling of broker storage so you can operate your Kafka workloads in a cost-effective way. Earlier this year, we launched provisioned throughput which enables seamlessly scaling I/O without having to provision additional brokers. Tiered storage makes it even more cost-effective for you to run Kafka workloads. You can now store data in Apache Kafka without worrying about limits. You can effectively balance your performance and costs by using the performance-optimized primary storage for real-time data and the new low-cost tier for the historical data. With a few clicks, you can move streaming data into a lower-cost tier to store data and only pay for what you use.

Tiered storage frees you from making hard trade-offs between supporting the data retention needs of your application teams and the operational complexity that comes with it. This enables you to use the same code to process both real-time and historical data to minimize redundant workflows and simplify architectures. With Amazon MSK tiered storage, you can implement a Kappa architecture – a streaming-first software architecture deployment pattern – to use the same data processing pipeline for correctness and completeness of data over a much longer time horizon for business analysis.

How Amazon MSK tiered storage works

Let’s look at how tiered storage works for Amazon MSK. Apache Kafka stores data in files called log segments. As each segment completes, based on the segment size configured at cluster or topic level, it’s copied to the low-cost storage tier. Data is held in performance-optimized storage for a specified retention time, or up to a specified size, and then deleted. There is a separate time and size limit setting for the low-cost storage, which must be longer than the performance-optimized storage tier. If clients request data from segments stored in the low-cost tier, the broker reads the data from it and serves the data in the same way as if it were being served from the performance-optimized storage. The APIs and existing clients work with minimal changes. When your application starts reading data from the low-cost tier, you can expect an increase in read latency for the first few bytes. As you start reading the remaining data sequentially from the low-cost tier, you can expect latencies that are similar to the primary storage tier. With tiered storage, you pay for the amount of data you store and the amount of data you retrieve.

For a pricing example, let’s consider a workload where your ingestion rate is 15 MB/s, with a replication factor of 3, and you want to retain data in your Kafka cluster for 7 days. For such a workload, it requires 6x m5.large brokers, with 32.4 TB EBS storage, which costs $4,755. But if you use tiered storage for the same workload with local retention of 4 hours and overall data retention of 7 days, it requires 3x m5.large brokers, with 0.8 TB EBS storage and 9 TB of tiered storage, which costs $1,584. If you want to read all the historic data at once, it costs $13 ($0.0015 per GB retrieval cost). In this example with tiered storage, you save around 66% of your overall cost.

Get started using Amazon MSK tiered storage

To enable tiered storage on your existing cluster, upgrade your MSK cluster to Kafka version 2.8.2.tiered and then choose Tiered storage and EBS storage as your cluster storage mode on the Amazon MSK console.

After tiered storage is enabled on the cluster level, run the following command to enable tiered storage on an existing topic. In this example, you’re enabling tiered storage on a topic called msk-ts-topic with 7 days’ retention (local.retention.ms=604800000) for a local high-performance storage tier, setting 180 days’ retention (retention.ms=15550000000) to retain the data in the low-cost storage tier, and updating the log segment size to 48 MB:

bin/kafka-configs.sh --bootstrap-server $bsrv --alter --entity-type topics --entity-name msk-ts-topic --add-config 'remote.storage.enable=true, local.retention.ms=604800000, retention.ms=15550000000, segment.bytes=50331648'

Availability and pricing

Amazon MSK tiered storage is available in all AWS regions where Amazon MSK is available excluding the AWS China, AWS GovCloud regions. This low-cost storage tier scales to virtually unlimited storage and requires no upfront provisioning. You pay only for the volume of data retained and retrieved in the low-cost tier.

For more information about this feature and its pricing, see the Amazon MSK developer guide and Amazon MSK pricing page. For finding the right sizing for your cluster, see the best practices page.

Summary

With Amazon MSK tiered storage you don’t need to provision storage for the low-cost tier or manage the infrastructure. Tiered storage enables you to scale to virtually unlimited storage. You can access data in the low-cost tier using the same clients you currently use to read data from the high-performance primary storage tier. Apache Kafka’s consumer API, streams API, and connectors consume data from both tiers without changes. You can modify the retention limits on the low-cost storage tier similarly as to how you can modify the retention limits on the high-performance storage.

Enable tiered storage on your MSK clusters today to retain data longer at a lower cost.


About the Author

Masudur Rahaman Sayem is a Streaming Architect at AWS. He works with AWS customers globally to design and build data streaming architecture to solve real-world business problems. He is passionate about distributed systems. He also likes to read, especially classic comic books.

Implementing a UML state machine using AWS Step Functions

Post Syndicated from Benjamin Smith original https://aws.amazon.com/blogs/compute/implementing-a-uml-state-machine-using-aws-step-functions/

This post is written by  Michael Havey, Senior Specialist Solutions Architect, AWS

This post shows how to model a Unified Modeling Language (UML) state machine as an AWS Step Functions workflow. A UML state machine models the behavior of an object, naming each of its possible resting states and specifying how it moves from one state to another. A Step Functions state machine implements the object behavior. This post shows how the UML diagram guides the state machine implementation and traces back to it.

State machines are often used to model real-time systems and devices. This post uses a stock order as an example. What drives a business object from one state to another is its interaction with applications through services. When the object enters a new state, it typically responds by calling the application through a service. It is typically an event arising from that application that transitions the business object to the next state. The UML model declares each state and the service interactions. Step Functions, which is designed for service orchestration, implements the behavior.

Overview

This is the approach discussed in this post:

  1. A developer implements a Step Functions state machine guided by a UML state machine designed in a third-party UML modeling tool. The implementation explicitly traces back to the UML model by logging execution of state machine activities.
  2. To invoke the target application, the Step Functions state machine invokes an AWS Lambda handler function. This invocation implements a UML state machine activity.
  3. The handler function, in turn, invokes the application. The implementation of the call is application-specific.
  4. If a callback from the application is expected, the application sends an event to a Lambda event dispatcher function. The implementation of this message exchange is application-specific.
  5. If a callback is expected, the Lambda event dispatcher function calls back the Step Functions state machine with the application event. This enables the Step Functions state machine to implement a UML state transition to the next state.

Traceability is the best way to link the Step Functions implementation to the UML model. This is because it ensures that the implementation is doing what the model intends.

An alternative is to generate Step Functions code based on the UML model using a standard XML format known as XML Metadata Interchange (XMI). A code generator tool can introspect the XMI to generate code from it. While technically feasible, UML state machines are highly expressive with many patterns and idioms. A generator often can’t produce code as lean and readable as that of a developer.

Walkthrough

This example shows a UML state machine in MagicDraw, a UML design tool. This diagram is the basis for the Step Functions implementation. This Git repository includes the XMI file for the UML diagram and the code to set up the Step Functions implementation.

The walkthrough has the following steps:

  1. Deploy Step Functions and AWS Lambda resources.
  2. Run the Step Functions state machine. Check the execution results to see how they trace back to the UML state machine.
  3. Clean up AWS resources.

Provision resources

To run this example, you need an AWS account with permission to use Step Functions and Lambda. On your machine, install the AWS Command Line Interface (CLI) and the AWS Serverless Application Model (AWS SAM) CLI.

Complete the following steps on your machine:

  1. Clone the Git repository.
  2. In a command shell, navigate to the sam folder of the clone.
  3. Run sam build to build the application.
  4. Run sam deploy –-guided to deploy the application to your AWS account.
  5. In the output, find names of Step Functions state machines and Lambda functions created.

The application creates several state machines, but in this post we consider the simplest: Test Buy Sell. The example models the behavior of a buy/sell stock order, which is based on an example from the Step Functions documentation: https://docs.aws.amazon.com/step-functions/latest/dg/sample-lambda-orchestration.html.

Explore UML model for Test BuySell

Begin with the following UML model (also available in the GitHub repository).

In the model:

  1. The black dot on the upper left is the initial state. It has an arrow (a transition) to CheckingStockPrice (a state).
  2. CheckingStockPrice has an activity, called checkStockPrice, of type do. When that state is visited, the activity is automatically run. When the activity finishes, the machine transitions automatically (a completion transition) to the next state.
  3. That state, GeneratingBuySellRecommendation, has its own do activity generateBuySellRecommendation. Completion of that activity moves to the next state.
  4. The next state is Approving, whose activity routeForApproval is of type entry. That activity is run when the state is entered. It waits for an event to move the machine forward. There are three transitions from Approving. Each has a trigger, indicating the type of event expected, called approvalComplete. Each has a guard that distinguishes the outcome of the approval.
  5. If the guard is sell, transition to the state SellingStock.
  6. If it’s buy, transition to the state BuyingStock.
  7. If it’s reject, transition to the terminate state (denoted by an X) and run a transition activity called logReject.
  8. BuyingStock and SellingStock each have a do activity – buyStock and sellStock – and transition on completion to the state ReportingResult. That state has do activity reportResult.
  9. Transition to the final state (the black dot enclosed in a circle).

Explore Step Functions implementation

Find the Step Functions implementation in the AWS Console. Under the list of State Machines, select the function with a name starting with BlogBuySell. Choose Edit to view the design of the machine. From there, open it in Workflow Studio to show the state machine workflow visualization:

The Step Function state machine implements all the activities from the UML state machine. There are Lambda tasks to implement the major state do activities: Check Stock Price, Generate Buy/Sell Recommendation, Buy Stock, Sell Stock, Report Result. There is also a Lambda function for the transition activity: Log Reject. Each Lambda function traces back to the UML state machine and uses the following format to log trace records:

{
 "sourceState": S,
 "activityType": stateEntry|stateExit|stateDo|transition,
 "activityName": N
 "trigger" T, // if transition activity
 "guard": G // if transition activity and has a guard
}

The control flow in the Step Functions state machine intuitively matches the UML state machine. The UML model has mostly completion transitions, so the Step Functions state machine largely flows from one Lambda task to another. However, I must consider the Approving state, where the machine waits for an event and then transitions in one of three directions from the choice state Buy or Sell. For this, use the Step Functions callback capability. Route For Approval is a Lambda task with the Wait For Callback option enabled. The Lambda task has three responsibilities:

  • Executes the UML state entry activity routeForApproval by calling the application.
  • Logs a tracing record that it has executed that activity.
  • Passes the task token provided by the Step Functions state machine to the application.

When the application has an approval decision, it sends an event through messaging. A separate Lambda event dispatcher function receives the message and, using the Step Functions API, calls back the Step Functions state machine with key details from the message: task token, trigger, guard.

Finally, notice the fail step after Log Reject. This implements the terminate state in the UML model.

Execute the Step Functions state machine

Execute the state machine by choosing Start Execution for the BlogBuySell state machine in the Step Functions console. Use this input:

{"appData": "Insert your JSON here"}

The console shows a graph view of the progress of the state machine. It should pause at the Route For Approval task.

Confirm traceability

Check the event view to confirm the tracing back to the UML model. The Task Scheduled event for Check Stock Price shows:

      "sourceState": "CheckingStockPrice",
      "activityType": "stateDo",
      "activityName": "checkStockPrice",

The Task Scheduled event for Generate buy/sell Recommendation shows:

      "sourceState": "GeneratingBuySellRecommendation",
      "activityType": "stateDo",
      "activityName": "generateBuySellRecommendation",

The Task Scheduled event for Route For Approval shows output resembling the following. Your taskToken will be different.

      "sourceState": "Approving",
      "activityType": "stateEntry",
      "activityName": "routeForApproval",
   "taskToken": "AAAAK . . . 99es="

Approve for buy

The state machine is waiting at Route For Approval. Simulate an application event to continue it forward. First, copy the task token value from above, excluding the quotes.

In a separate browser tab, open the Lambda console and find the function whose name contains BlogDummyUMLEventDispatcher. In the Test tab, create a new event:

{
    "taskToken": "<paste the task token here>",
    "trigger": "approvalComplete",
    "guard": "buy",
    "appData": {"x": "y"}
}
 

Choose Test to call the Lambda function with this input, which calls back the state machine.

Confirm execution of approval

In the Step Functions console, confirm that the flow is completed and taken the Buy stock path.

More examples and patterns

The AWS SAM application deploys two additional examples, which show important patterns:

  • Hierarchical or composite states.
  • Parallel or orthogonal states
  • Cancellation events
  • Internal transitions
  • Transition to history
  • Using an event loop for complex flow

You can find a discussion of these examples in the Git repo.

Comparing UML and Step Functions state machines

Step Functions transitions tasks in sequence with the ability to conditionally branch, loop, or parallelize tasks. These tasks aren’t quite the same as states in a UML model. In this approach, tasks map to UML states or transition activities.

A UML state machine spends most of its time waiting in its current state for the next event to happen. A standard workflow in Step Functions can wait too. It can run for up to one year because some activities can pause until they are called back by an external trigger. I used that capability to implement a pattern to trigger the next transition by calling back the Step Functions state machine.

Cleaning up

To avoid incurring future charges, navigate to the directory where you deployed the application and run sam delete to undeploy it.

Conclusion

This post shows code recipes for implementing UML state machines using Step Functions. If your organization already uses modeling tools, this discussion helps you understand the Step Functions implementation path. If you are a Step Functions designer, this discussion shows UML’s expressive power as the model for your implementation.

Learn more about Step Functions implementations on the Sample projects for Step Functions page.

Introducing Amazon Neptune Serverless – A Fully Managed Graph Database that Adjusts Capacity for Your Workloads

Post Syndicated from Danilo Poccia original https://aws.amazon.com/blogs/aws/introducing-amazon-neptune-serverless-a-fully-managed-graph-database-that-adjusts-capacity-for-your-workloads/

Amazon Neptune is a fully managed graph database service that makes it easy to build and run applications that work with highly connected datasets. With Neptune, you can use open and popular graph query languages to execute powerful queries that are easy to write and perform well on connected data. You can use Neptune for graph use cases such as recommendation engines, fraud detection, knowledge graphs, drug discovery, and network security.

Neptune has always been fully managed and handles time-consuming tasks such as provisioning, patching, backup, recovery, failure detection and repair. However, managing database capacity for optimal cost and performance requires you to monitor and reconfigure capacity as workload characteristics change. Also, many applications have variable or unpredictable workloads where the volume and complexity of database queries can change significantly. For example, a knowledge graph application for social media may see a sudden spike in queries due to sudden popularity.

Introducing Amazon Neptune Serverless
Today, we’re making that easier with the launch of Amazon Neptune Serverless. Neptune Serverless scales automatically as your queries and your workloads change, adjusting capacity in fine-grained increments to provide just the right amount of database resources that your application needs. In this way, you pay only for the capacity you use. You can use Neptune Serverless for development, test, and production workloads and optimize your database costs compared to provisioning for peak capacity.

With Neptune Serverless you can quickly and cost-effectively deploy graphs for your modern applications. You can start with a small graph, and as your workload grows, Neptune Serverless will automatically and seamlessly scale your graph databases to provide the performance you need. You no longer need to manage database capacity and you can now run graph applications without the risk of higher costs from over-provisioning or insufficient capacity from under-provisioning.

With Neptune Serverless, you can continue to use the same query languages (Apache TinkerPop Gremlin, openCypher, and RDF/SPARQL) and features (such as snapshots, streams, high availability, and database cloning) already available in Neptune.

Let’s see how this works in practice.

Creating an Amazon Neptune Serverless Database
In the Neptune console, I choose Databases in the navigation pane and then Create database. For Engine type, I select Serverless and enter my-database as the DB cluster identifier.

Console screenshot.

I can now configure the range of capacity, expressed in Neptune capacity units (NCUs), that Neptune Serverless can use based on my workload. I can now choose a template that will configure some of the next options for me. I choose the Production template that by default creates a read replica in a different Availability Zone. The Development and Testing template would optimize my costs by not having a read replica and giving access to DB instances that provide burstable capacity.

Console screenshot.

For Connectivity, I use my default VPC and its default security group.

Console screenshot.

Finally, I choose Create database. After a few minutes, the database is ready to use. In the list of databases, I choose the DB identifier to get the Writer and Reader endpoints that I am going to use later to access the database.

Using Amazon Neptune Serverless
There is no difference in the way you use Neptune Serverless compared to a provisioned Neptune database. I can use any of the query languages supported by Neptune. For this walkthrough, I choose to use openCypher, a declarative query language for property graphs originally developed by Neo4j that was open-sourced in 2015 and contributed to the openCypher project.

To connect to the database, I start an Amazon Linux Amazon Elastic Compute Cloud (Amazon EC2) instance in the same AWS Region and associate the default security group and a second security group that gives me SSH access.

With a property graph I can represent connected data. In this case, I want to create a simple graph that shows how some AWS services are part of a service category and implement common enterprise integration patterns.

I use curl to access the Writer openCypher HTTPS endpoint and create a few nodes that represent patterns, services, and service categories. The following commands are split into multiple lines in order to improve readability.

curl https://<my-writer-endpoint>:8182/openCypher \
-d "query=CREATE (mq:Pattern {name: 'Message Queue'}),
(pubSub:Pattern {name: 'Pub/Sub'}),
(eventBus:Pattern {name: 'Event Bus'}),
(workflow:Pattern {name: 'WorkFlow'}),
(applicationIntegration:ServiceCategory {name: 'Application Integration'}),
(sqs:Service {name: 'Amazon SQS'}), (sns:Service {name: 'Amazon SNS'}),
(eventBridge:Service {name: 'Amazon EventBridge'}), (stepFunctions:Service {name: 'AWS StepFunctions'}),
(sqs)-[:IMPLEMENT]->(mq), (sns)-[:IMPLEMENT]->(pubSub),
(eventBridge)-[:IMPLEMENT]->(eventBus),
(stepFunctions)-[:IMPLEMENT]->(workflow),
(applicationIntegration)-[:CONTAIN]->(sqs),
(applicationIntegration)-[:CONTAIN]->(sns),
(applicationIntegration)-[:CONTAIN]->(eventBridge),
(applicationIntegration)-[:CONTAIN]->(stepFunctions);"

This is a visual representation of the nodes and their relationships for the graph created by the previous command. The type (such as Service or Pattern) and properties (such as name) are shown inside each node. The arrows represent the relationships (such as CONTAIN or IMPLEMENT) between the nodes.

Visualization of graph data.

Now, I query the database to get some insights. To query the database, I can use either a Writer or a Reader endpoint. First, I want to know the name of the service implementing the “Message Queue” pattern. Note how the syntax of openCypher resembles that of SQL with MATCH instead of SELECT.

curl https://<my-endpoint>:8182/openCypher \
-d "query=MATCH (s:Service)-[:IMPLEMENT]->(p:Pattern {name: 'Message Queue'}) RETURN s.name;"
{
  "results" : [ {
    "s.name" : "Amazon SQS"
  } ]
}

I use the following query to see how many services are in the “Application Integration” category. This time, I use the WHERE clause to filter results.

curl https://<my-endpoint>:8182/openCypher \
-d "query=MATCH (c:ServiceCategory)-[:CONTAIN]->(s:Service) WHERE c.name='Application Integration' RETURN count(s);"
{
  "results" : [ {
    "count(s)" : 4
  } ]
}

There are many options now that I have this graph database up and running. I can add more data (services, categories, patterns) and more relationships between the nodes. I can focus on my application and let Neptune Serverless manage capacity and infrastructure for me.

Availability and Pricing
Amazon Neptune Serverless is available today in the following AWS Regions: US East (Ohio, N. Virginia), US West (N. California, Oregon), Asia Pacific (Tokyo), and Europe (Ireland, London).

With Neptune Serverless, you only pay for what you use. The database capacity is adjusted to provide the right amount of resources you need in terms of Neptune capacity units (NCUs). Each NCU is a combination of approximately 2 gibibytes (GiB) of memory with corresponding CPU and networking. The use of NCUs is billed per second. For more information, see the Neptune pricing page.

Having a serverless graph database opens many new possibilities. To learn more, see the Neptune Serverless documentation. Let us know what you build with this new capability!

Simplify the way you work with highly connected data using Neptune Serverless.

Danilo

Serverless and Application Integration sessions at AWS re:Invent 2022

Post Syndicated from James Beswick original https://aws.amazon.com/blogs/compute/serverless-and-application-integration-sessions-at-aws-reinvent-2022/

This post is written by Josh Kahn, Tech Leader, AWS Serverless.

AWS re:Invent 2022 is only a few weeks away, featuring an exciting slate of sessions on Serverless and Application Integration. This post highlights many of the sessions we are hosting on Serverless and Application Integration. It groups sessions by theme to help you quickly find the sessions most interesting to you.

AWS re:Invent 2022

As in past years, the conference offers a variety of session formats:

  • Breakout sessions: lecture-style presentations delivered by AWS experts, builders, and customers.
  • Builder’s sessions: smaller sessions led by AWS experts during which you will build a project on your own laptop.
  • Chalk talks: interactive sessions led by experts on a variety of topics. Share your own experiences and feedback.
  • Workshops: hands-on learning sessions designed to help you learn about new technologies. Bring your own laptop.

For detailed descriptions and schedule, visit the AWS re:Invent Session Catalog. If you are attending re:Invent, we would love to connect at our AWS Village and Serverlesspresso booths in the Expo or the Modern Applications Zone at the Venetian. You can also reach out to your AWS account team.

Don’t have a ticket yet? Join us in Las Vegas from November 28-December 2, 2022 by registering for re:Invent 2022.

Leadership session (SVS210)

Join Holly Mesrobian, Vice President of Serverless Compute at AWS, to learn how serverless technology empowers organizations to go to market faster while lowering cost across a wide range of applications. Learn about the innovations happening at all layers of the stack, across both serverless functions and serverless containers. Explore newly released innovations that enable more secure, reliable, and performant applications.

Getting started

Are you new to Serverless or taking your first steps? Hear from AWS experts and customers on best practices and strategies for building serverless workloads. Get hands-on with services by building the next great “to do” app or customer experience for a theme park:

We also offer a series of Builder’s Sessions where you can build the same serverless project using three different infrastructure as code frameworks (attend one or more). These sessions are an opportunity to test drive another IaC framework or understand how your framework of choice can be used with serverless:

Event-driven architectures

Event-driven architectures (EDA) are a popular approach to building modern applications. EDA utilizes events (a change in state) to communicate between decoupled services. This architectural approach lends itself well to a wide-variety of use cases from ecommerce to order fulfillment with individual components able to scale (and fail) independently.

Whether you are getting started with EDA, want to get hands-on, or dive into complex architectures, there is a session for you:

Building serverless architectures

Explore the range of tools available to build serverless architectures and cross-cutting concerns, such as security and observability. These sessions cover the brass tacks of building with serverless, going to beyond “hello world” to help builders understand how to implement a serverless strategy:

Orchestration

AWS offers several options to orchestrate complex workflows. Whether you need to tightly control data processing workflows or user sign-ups, you can take advantage of these orchestration engines to simplify, become more agile, and modernize your workflows.

Integration patterns

Explore the variety of enterprise integration patterns available using AWS, including Amazon SNS, Amazon SQS, Amazon MQ, and more. These sessions explore the wide variety of patterns available using managed services:

Advanced topics

If you are already familiar with serverless, advanced sessions provide an opportunity to go deeper, including under the hood of the AWS Lambda service. Learn advanced design patterns, best practices, and how to build performant, reliable workloads:

Building serverless applications with Java

New this year, there are several sessions dedicating to building serverless applications with the Java runtime. These sessions dive deep on best practices for building performant Java-based applications:

Other talks

Serverless has become such a popular topic that you will also find related sessions in other tracks as well. This list is not exhaustive, but includes talks that you may want to explore:

If you are unable to join us in-person, Breakout Sessions will be available via our YouTube channel after the event. Contact your AWS Account Team is you are interested in learning more about any of these sessions or how to bring our experts to you.

We look forward to seeing you at re:Invent 2022! For more serverless learning resources, visit Serverless Land.

Propagating valid mTLS client certificate identity to downstream services using Amazon API Gateway

Post Syndicated from Eric Johnson original https://aws.amazon.com/blogs/compute/propagating-valid-mtls-client-certificate-identity-to-downstream-services-using-amazon-api-gateway/

This blog written by Omkar Deshmane, Senior SA and Anton Aleksandrov, Principal SA, Serverless.

This blog shows how to use Amazon API Gateway with a custom authorizer to process incoming requests, validate the mTLS client certificate, extract the client certificate subject, and propagate it to the downstream application in a base64 encoded HTTP header.

This pattern allows you to terminate mTLS at the edge so downstream applications do not need to perform client certificate validation. With this approach, developers can focus on application logic and offload mTLS certificate management and validation to a dedicated service, such as API Gateway.

Overview

Authentication is one of the core security aspects that you must address when building a cloud application. Successful authentication proves you are who you are claiming to be. There are various common authentication patterns, such as cookie-based authentication, token-based authentication, or the topic of this blog – a certificate-based authentication.

Transport Layer Security (TLS) certificates are at the core of a secure and safe internet. TLS certificates secure the connection between the client and server by encrypting data, ensuring private communication. When using the TLS protocol, the server must prove its identity to the client using a certificate signed by a certificate authority trusted by the client.

Mutual TLS (mTLS) introduces an additional layer of security, in which both the client and server must prove their identities to each other. Developers commonly use mTLS for application-to-application authentication — using digital certificates to represent both client and server apps is a common authentication pattern for building such workflows. We highly recommended decoupling the mTLS implementation from the application business logic so that you do not have to update the application when changing the mTLS configuration. It is a common pattern to implement the mTLS authentication and termination in a network appliance at the edge, such as Amazon API Gateway.

In this solution, we show a pattern of using API Gateway with an authorizer implemented with AWS Lambda to validate the mTLS client certificate, extract the client certificate subject, and propagate it to the downstream application in a base64 encoded HTTP header.

While this blog describes how to implement this pattern for identities extracted from the mTLS client certificates, you can generalize it and apply to propagating information obtained via any other means of authentication.

mTLS Sample Application

This blog includes a sample application implemented using the AWS Serverless Application Model (AWS SAM). It creates a demo environment containing resources like API Gateway, a Lambda authorizer, and an Amazon EC2 instance, which simulates the backend application.

The EC2 instance is used for the backend application to mimic common customer scenarios. You can use any other type of compute, such as Lambda functions or containerized applications with Amazon Elastic Container Service (Amazon ECS) or Amazon Elastic Kubernetes Service (Amazon EKS), as a backend application layer as well.

The following diagram shows the solution architecture:

Example architecture diagram

Example architecture diagram

  1. Store the client certificate in a trust store in an Amazon S3 bucket.
  2. The client makes a request to the API Gateway endpoint, supplying the client certificate to establish the mTLS session.
  3. API Gateway retrieves the trust store from the S3 bucket. It validates the client certificate, matches the trusted authorities, and terminates the mTLS connection.
  4. API Gateway invokes the Lambda authorizer, providing the request context and the client certificate information.
  5. The Lambda authorizer extracts the client certificate subject. It performs any necessary custom validation, and returns the extracted subject to API Gateway as a part of the authorization context.
  6. API Gateway injects the subject extracted in the previous step into the integration request HTTP header and sends the request to a downstream endpoint.
  7. The backend application receives the request, extracts the injected subject, and uses it with custom business logic.

Prerequisites and deployment

Some resources created as part of this sample architecture deployment have associated costs, both when running and idle. This includes resources like Amazon Virtual Private Cloud (Amazon VPC), VPC NAT Gateway, and EC2 instances. We recommend deleting the deployed stack after exploring the solution to avoid unexpected costs. See the Cleaning Up section for details.

Refer to the project code repository for instructions to deploy the solution using AWS SAM. The deployment provisions multiple resources, taking several minutes to complete.

Following the successful deployment, refer to the RestApiEndpoint variable in the Output section to locate the API Gateway endpoint. Note this value for testing later.

AWS CloudFormation output

AWS CloudFormation output

Key areas in the sample code

There are two key areas in the sample project code.

In src/authorizer/index.js, the Lambda authorizer code extracts the subject from the client certificate. It returns the value as part of the context object to API Gateway. This allows API Gateway to use this value in the subsequent integration request.

const crypto = require('crypto');

exports.handler = async (event) => {
    console.log ('> handler', JSON.stringify(event, null, 4));

    const clientCertPem = event.requestContext.identity.clientCert.clientCertPem;
    const clientCert = new crypto.X509Certificate(clientCertPem);
    const clientCertSub = clientCert.subject.replaceAll('\n', ',');

    const response = {
        principalId: clientCertSub,
        context: { clientCertSub },
        policyDocument: {
            Version: '2012-10-17',
            Statement: [{
                Action: 'execute-api:Invoke',
                Effect: 'allow',
                Resource: event.methodArn
            }]
        }
    };

    console.log('Authorizer Response', JSON.stringify(response, null, 4));
    return response;
};

In template.yaml, API Gateway injects the client certificate subject extracted by the Lambda authorizer previously into the integration request as X-Client-Cert-Sub HTTP header. X-Client-Cert-Sub is a custom header name and you can choose any other custom header name instead.

SayHelloGetMethod:
    Type: AWS::ApiGateway::Method
    Properties:
      AuthorizationType: CUSTOM
      AuthorizerId: !Ref CustomAuthorizer
      HttpMethod: GET
      ResourceId: !Ref SayHelloResource
      RestApiId: !Ref RestApi
      Integration:
        Type: HTTP_PROXY
        ConnectionType: VPC_LINK
        ConnectionId: !Ref VpcLink
        IntegrationHttpMethod: GET
        Uri: !Sub 'http://${NetworkLoadBalancer.DNSName}:3000/'
        RequestParameters:
          'integration.request.header.X-Client-Cert-Sub': 'context.authorizer.clientCertSub' 

Testing the example

You create a client key and certificate during the deployment, which are stored in the /certificates directory. Use the curl command to make a request to the REST API endpoint using these files.

curl --cert certificates/client.pem --key certificates/client.key \
<use the RestApiEndpoint found in CloudFormation output>
Example flow diagram

Example flow diagram

The client request to API Gateway uses mTLS with a client certificate supplied for mutual TLS authentication. API Gateway uses the Lambda authorizer to extract the certificate subject, and inject it into the Integration request.

The HTTP server runs on the EC2 instance, simulating the backend application. It accepts the incoming request and echoes it back, supplying request headers as part of the response body. The HTTP response received from the backend application contains a simple message and a copy of the request headers sent from API Gateway to the backend.

One header is x-client-cert-sub with the Common Name value you provided to the client certificate during creation. Verify that the value matches the Common Name that you provided when generating the client certificate.

Response example

Response example

API Gateway validated the mTLS client certificate, used the Lambda authorizer to extract the subject common name from the certificate, and forwarded it to the downstream application.

Cleaning up

Use the sam delete command in the api-gateway-certificate-propagation directory to delete the sample application infrastructure:

sam delete

You can also refer to the project code repository for the clean-up instructions.

Conclusion

This blog shows how to use the API Gateway with a Lambda authorizer for mTLS client certificate validation, custom field extraction, and downstream propagation to backend systems. This pattern allows you to terminate mTLS at the edge so that downstream applications are not responsible for client certificate validation.

For additional documentation, refer to Using API Gateway with Lambda Authorizer. Download the sample code from the project code repository. For more serverless learning resources, visit Serverless Land.

Code versioning using AWS Glue Studio and GitHub

Post Syndicated from Leonardo Gomez original https://aws.amazon.com/blogs/big-data/code-versioning-using-aws-glue-studio-and-github/

AWS Glue now offers integration with Git, an open-source version control system widely used across the developer community. Thanks to this integration, you can incorporate your existing DevOps practices on AWS Glue jobs. AWS Glue is a serverless data integration service that helps you create jobs based on Apache Spark or Python to perform extract, transform, and load (ETL) tasks on datasets of almost any size.

Git integration in AWS Glue works for all AWS Glue job types, both visual and code-based. It offers built-in integration with both GitHub and AWS CodeCommit, and makes it easier to use automation tools like Jenkins and AWS CodeDeploy to deploy AWS Glue jobs. AWS Glue Studio’s visual editor now also supports parameterizing data sources and targets for transparent deployments between environments.

Overview of solution

To demonstrate how to integrate AWS Glue Studio with a code hosting platform for version control and collaboration, we use the Toronto parking tickets dataset, specifically the data about parking tickets issued in the city of Toronto in 2019. The goal is to create a job to filter parking tickets based on a specific category and push the code to a GitHub repo for version control. After the job is uploaded on the repository, we make some changes to the code and pull the changes back to the AWS Glue job.

Prerequisites

For this walkthrough, you should have the following prerequisites:

If the AWS account you use to follow this post uses AWS Lake Formation to manage permissions on the AWS Glue Data Catalog, make sure that you log in as a user with access to create databases and tables. For more information, refer to Implicit Lake Formation permissions.

Launch your CloudFormation stack

To create your resources for this use case, complete the following steps:

  1. Launch your CloudFormation stack in us-east-1:
  2. Under Parameters, for paramBucketName, enter a name for your S3 bucket (include your account number).
  3. Select I acknowledge that AWS CloudFormation might create IAM resources with custom names.
  4. Choose Create stack.
  5. Wait until the creation of the stack is complete, as shown on the AWS CloudFormation console.

Launching this stack creates AWS resources. You need the following resources from the Outputs tab for the next steps:

  • CFNGlueRole – The IAM role to run AWS Glue jobs
  • S3Bucket – The name of the S3 bucket to store solution-related files
  • CFNDatabaseBlog – The AWS Glue database to store the table related to this post
  • CFNTableTickets – The AWS Glue table to use as part of the sample job

Configure the GitHub repository

We use GitHub as the source control system for this post. In order to use it, you need a GitHub account. After the account is created, you need to create following components:

  • GitHub repository – Create a repository and name it glue-ver-log. For instructions, refer to Create a repo.
  • Branch – Create a branch and name it develop. For instructions, refer to Managing branches.
  • Personal access token – For instructions, refer to Creating a personal access token. Make sure to keep the personal access token handy because you use it in later steps.

Create an AWS Glue Studio job

Now that the infrastructure is set up, let’s author an AWS Glue job in our account. Complete the following steps:

  1. On the AWS Glue console, choose Jobs in the navigation pane.
  2. Select Visual job with blank canvas and choose Create.
  3. Enter a name for the job using the title editor. For example, aws-glue-git-demo-job.
  4. On the Visual tab, choose Source and then choose AWS Glue Data Catalog

  5. For Database, choose torontoparking and for Table, choose tickets.
  6. Choose Transform and then Filter.
  7. Add a filter by infraction_description and set the value to PARK ON PRIVATE PROPERTY.
  8. Choose Target and then choose Amazon S3.
  9. For Format, choose Parquet.
  10. For S3 Target Location, enter s3://glue-version-blog-YOUR ACOUNT NUMBER/output/.
  11. For Data Catalog update options, select Do not update the Data Catalog.
  12. Go to the Script tab to verify that a script has been generated.
  13. Go to the Job Details tab to make sure that the role GlueBlogRole is selected and leave everything else with the default values.

    Because the catalog table names in the production and development environment may be different, AWS Glue Studio now allows you to parameterize visual jobs. To do so, perform the following steps:
  14. On the Job details tab, scroll to the Job parameters section under Advanced properties.
  15. Create the --source.database.name parameter and set the value to torontoparking.
  16. Create the --souce.table.name parameter and set the value to tickets.
  17. Go to the Visual tab and choose the AWS Glue Data Catalog node.Notice that under each of the database and table selection options is a new expandable section called Use runtime parameters.
  18. The run time parameters are auto populated with the parameters previously created. Clicking on the Apply button will apply the default values for these parameters.
  19. Go to the Script tab to review the script.AWS Glue Studio code generation automatically picks up the parameters to resolve and then makes the appropriate references in the script so that the parameters can be used.
    Now the job is ready to be pushed into the develop branch of our version control system.
  20. On the Version Control tab, for Version control system, choose Github.
  21. For Personal access token, enter your GitHub token.
  22. For Repository owner, enter the owner of your GitHub account.
  23. In the Repository configuration section, for Repository, choose glue-ver-blog.
  24. For Branch, choose develop.
  25. For Folder, leave it blank.
  26. Choose Save to save the job.

Push to the repository

Now the job can be pushed to the remote repository.

  1. On the Actions menu, choose Push to repository.
  2. Choose Confirm to confirm the operation.

    After the operation succeeds, the page reloads to reflect the latest information from the version control system. A notification shows the latest available commit and links you to the commit on GitHub.
  3. Choose the commit link to go to the repository on GitHub.

You have successfully created your first commit to GitHub from AWS Glue Studio!

Pull from the repository

Now that we have committed the AWS Glue job to GitHub, it’s time to see how we can pull changes using AWS Glue Studio. For this demo, we make a small modification in our example job using the GitHub UI and then pull the changes using AWS Glue Studio.

  1. On GitHub, choose the develop branch.
  2. Choose the aws-glue-git-demo-job folder.
  3. Choose the aws-glue-git-demo-job.json file.
  4. Choose the edit icon.
  5. Set the MaxRetries parameter to 1.
  6. Choose Commit changes.
  7. Return to the AWS Glue console and on the Actions menu, choose Pull from repository.
  8. Choose Confirm.

Notice that the commit ID has changed.

On the Job details tab, you can see that the value for Number of retries is 1.

Clean up

To avoid incurring future charges, and to clean up unused roles and policies, delete the resources you created: the datasets, CloudFormation stack, S3 bucket, AWS Glue job, AWS Glue database, and AWS Glue table.

Conclusion

This post showed how to integrate AWS Glue with GitHub, but this is only the beginning—now you can use the most popular functionalities offered by Git.

To learn more and get started using the AWS Glue Studio Git integration, refer to Configuring Git integration in AWS Glue.


About the authors

Leonardo Gómez is a Senior Analytics Specialist Solutions Architect at AWS. Based in Toronto, Canada, he has over a decade of experience in data management, helping customers around the globe address their business and technical needs.

Daiyan Alamgir is a Principal Frontend Engineer on AWS Glue based in New York. He leads the AWS Glue UI team and is focused on building interactive web-based applications for data analysts and engineers to address their data integration use cases.

Simplifying serverless permissions with AWS SAM Connectors

Post Syndicated from Eric Johnson original https://aws.amazon.com/blogs/compute/simplifying-serverless-permissions-with-aws-sam-connectors/

This post written by Kurt Tometich, Senior Solutions Architect, AWS.

Developers have been using the AWS Serverless Application Model (AWS SAM) to streamline the development of serverless applications with AWS since late 2018. Besides making it easier to create, build, test, and deploy serverless applications, AWS SAM now further simplifies permission management between serverless components with AWS SAM Connectors.

Connectors allow the builder to focus on the relationships between components without expert knowledge of AWS Identity and Access Management (IAM) or direct creation of custom policies. AWS SAM connector supports AWS Step Functions, Amazon DynamoDB, AWS Lambda, Amazon SQS, Amazon SNS, Amazon API Gateway, Amazon EventBridge and Amazon S3, with more resources planned in the future.

AWS SAM policy templates are an existing feature that helps builders deploy serverless applications with minimally scoped IAM policies. Because there are a finite number of templates, they’re a good fit when a template exists for the services you’re using. Connectors are best for those getting started and who want to focus on modeling the flow of data and events within their applications. Connectors will take the desired relationship model and create the permissions for the relationship to exist and function as intended.

In this blog post, I show you how to speed up serverless development while maintaining secure best practices using AWS SAM connector. Defining a connector in an AWS SAM template requires a source, destination, and a permission (for example, read or write). From this definition, IAM policies with minimal privileges are automatically created by the connector.

Usage

Within an AWS SAM template:

  1. Create serverless resource definitions.
  2. Define a connector.
  3. Add a source and destination ID of the resources to connect.
  4. Define the permissions (read, write) of the connection.

This example creates a Lambda function that requires write access to an Amazon DynamoDB table to keep track of orders created from a website.

AWS Lambda function needing write access to an Amazon DynamoDB table

AWS Lambda function needing write access to an Amazon DynamoDB table

The AWS SAM connector for the resources looks like the following:

LambdaDynamoDbWriteConnector:
  Type: AWS::Serverless::Connector
  Properties:
    Source:
      Id: CreateOrder
    Destination:
      Id: Orders
    Permissions:
      - Write

“LambdaDynamoDbWriteConnector” is the name of the connector, while the “Type” designates it as an AWS SAM connector. “Properties” contains the source and destination logical ID for our serverless resources found within our template. Finally, the “Permissions” property defines a read or write relationship between the components.

This basic example shows how easy it is to define permissions between components. No specific role or policy names are required, and this syntax is consistent across many other serverless components, enforcing standardization.

Example

AWS SAM connectors save you time as your applications grow and connections between serverless components become more complex. Manual creation and management of permissions become error prone and difficult at scale. To highlight the breadth of support, we’ll use an AWS Step Functions state machine to operate with several other serverless components. AWS Step Functions is a serverless orchestration workflow service that integrates natively with other AWS services.

Solution overview

Architectural overview

Architectural overview

This solution implements an image catalog moderation pipeline. Amazon Rekognition checks for inappropriate content, and detects objects and text in an image. It processes valid images and stores metadata in an Amazon DynamoDB table, otherwise emailing a notification for invalid images.

Prerequisites

  1. Git installed
  2. AWS SAM CLI version 1.58.0 or greater installed

Deploying the solution

  1. Clone the repository and navigate to the solution directory:
    git clone https://github.com/aws-samples/step-functions-workflows-collection
    cd step-functions-workflows-collection/moderated-image-catalog
  2. Open the template.yaml file located at step-functions-workflows-collection/moderated-image-catalog and replace the “ImageCatalogStateMachine:” section with the following snippet. Ensure to preserve YAML formatting.
    ImageCatalogStateMachine:
        Type: AWS::Serverless::StateMachine
        Properties:
          Name: moderated-image-catalog-workflow
          DefinitionUri: statemachine/statemachine.asl.json
          DefinitionSubstitutions:
            CatalogTable: !Ref CatalogTable
            ModeratorSNSTopic: !Ref ModeratorSNSTopic
          Policies:
            - RekognitionDetectOnlyPolicy: {}
  3. Within the same template.yaml file, add the following after the ModeratorSNSTopic section and before the Outputs section:
    # Serverless connector permissions
    StepFunctionS3ReadConnector:
      Type: AWS::Serverless::Connector
      Properties:
        Source:
          Id: ImageCatalogStateMachine
        Destination:
          Id: IngestionBucket
        Permissions:
          - Read
    
    StepFunctionDynamoWriteConnector:
      Type: AWS::Serverless::Connector
      Properties:
        Source:
          Id: ImageCatalogStateMachine
        Destination:
          Id: CatalogTable
        Permissions:
          - Write
    
    StepFunctionSNSWriteConnector:
      Type: AWS::Serverless::Connector
      Properties:
        Source:
          Id: ImageCatalogStateMachine
        Destination:
          Id: ModeratorSNSTopic
        Permissions:
          - Write

    You have removed the existing inline policies for the state machine and replaced them with AWS SAM connector definitions, except for the Amazon Rekognition policy. At the time of publishing this blog, connectors do not support Amazon Rekognition. Take some time to review each of the connector’s syntax.

  4. Deploy the application using the following command:
    sam deploy --guided

    Provide a stack name, Region, and moderators’ email address. You can accept defaults for the remaining prompts.

Verifying permissions

Once the deployment has completed, you can verify the correct role and policies.

  1. Navigate to the Step Functions service page within the AWS Management Console and ensure you have the correct Region selected.
  2. Select State machines from the left menu and then the moderated-image-catalog-workflow state machine.
  3. Select the “IAM role ARN” link, which will take you to the IAM role and policies created.

You should see a list of policies that correspond to the AWS SAM connectors in the template.yaml file with the actions and resources.

Permissions list in console

Permissions list in console

You didn’t need to supply the specific policy actions: Use Read or Write as the permission and the service handles the rest. This results in improved readability, standardization, and productivity, while retaining security best practices.

Testing

  1. Upload a test image to the Amazon S3 bucket created during the deployment step. To find the name of the bucket, navigate to the AWS CloudFormation console. Select the CloudFormation stack via the name entered as part of “sam deploy –guided.” Select the Outputs tab and note the IngestionBucket name.
  2. After uploading the image, navigate to the AWS Step Functions console and select the “moderated-image-catalog-workflow” workflow.
  3. Select Start Execution and input an event:
    {
        "bucket": "<S3-bucket-name>",
        "key": "<image-name>.jpeg"
    }
  4. Select Start Execution and observe the execution of the workflow.
  5. Depending on the image selected, it will either add to the image catalog, or send a content moderation email to the email address provided. Find out more about content considered inappropriate by Amazon Rekognition.

Cleanup

To delete any images added to the Amazon S3 bucket, and the resources created by this template, use the following commands from the same project directory.

aws s3 rm s3://< bucket_name_here> --recursive
sam delete

Conclusion

This blog post shows how AWS SAM connectors simplify connecting serverless components. View the Developer Guide to find out more about AWS SAM connectors. For further sample serverless workflows like the one used in this blog, see Serverless Land.

Announcing server-side encryption with Amazon Simple Queue Service -managed encryption keys (SSE-SQS) by default

Post Syndicated from James Beswick original https://aws.amazon.com/blogs/compute/announcing-server-side-encryption-with-amazon-simple-queue-service-managed-encryption-keys-sse-sqs-by-default/

This post is written by Sofiya Muzychko (Sr Product Manager), Nipun Chagari (Principal Solutions Architect), and Hardik Vasa (Senior Solutions Architect).

Amazon Simple Queue Service (SQS) now provides server-side encryption (SSE) using SQS-owned encryption (SSE-SQS) by default. This feature further simplifies the security posture to encrypt the message body in SQS queues.

SQS is a fully managed message queuing service that enables you to decouple and scale microservices, distributed systems, and serverless applications. Customers are increasingly decoupling their monolithic applications to microservices and moving sensitive workloads to SQS, such as financial and healthcare applications, whose compliance regulations mandate data encryption.

SQS already supports server-side encryption with customer-provided encryption keys using the AWS Key Management Service (SSE-KMS) or using SQS-owned encryption keys (SSE-SQS). Both encryption options greatly reduce the operational burden and complexity involved in protecting data. Additionally, with the SSE-SQS encryption type, you do not need to create, manage, or pay for SQS-managed encryption keys.

Using the default encryption

With this feature, all newly created queues using HTTPS (TLS) and Signature Version 4 endpoints are encrypted using SQS-owned encryption (SSE-SQS) by default, enhancing the protection of your data against unauthorized access. Any new queue created using the non-TLS endpoint will not enable SSE-SQS encryption by default. We hence encourage you to create SQS queues using HTTPS endpoints as a security best practice.

The SSE-SQS default encryption is available for both standard and FIFO. You do not need to make any code or application changes to encrypt new queues. This does not affect existing queues. You can however change the encryption option for existing queues at any time using the SQS console, AWS Command Line Interface, or API.

Create queue

The preceding image shows the SQS queue creation console wizard with configuration options for encryption. As you can see, server-side encryption is enabled by default with encryption key type SSE-SQS option selected.

Creating a new SQS queue with SSE-SQS encryption using AWS CloudFormation

Default SSE-SQS encryption is also supported in AWS CloudFormation. To learn more, see this documentation page.

Here is the sample CloudFormation template to create an SQS standard queue with SQS owned Server Side Encryption (SSE-SQS) explicitly enabled.

AWSTemplateFormatVersion: "2010-09-09"
Description: SSE-SQS Cloudformation template
Resources:
  SQSEncryptionQueue:
    Type: AWS::SQS::Queue
    Properties: 
      MaximumMessageSize: 262144
      MessageRetentionPeriod: 86400
      QueueName: SSESQSQueue
      SqsManagedSseEnabled: true
      KmsDataKeyReusePeriodSeconds: 900
      VisibilityTimeout: 30

Note that if the SqsManagedSseEnabled: true property is not specified, SSE-SQS is enabled by default.

Configuring SSE-SQS encryption for existing queues vis AWS Management Console

To configure SSE-SQS encryption for an existing queue using the SQS console:

  1. Navigate to the SQS console at https://console.aws.amazon.com/sqs/.
  2. In the navigation pane, choose Queues.
  3. Select a queue, and then choose Edit.
  4. Under the Encryption dialog box, for Server-side encryption, choose Enabled.
  5. Select Amazon SQS key (SSE-SQS).
  6. Choose Save.

Edit standard queue

To configure SSE-SQS encryption for an existing queue using the AWS CLI

To enable SSE-SQS to an existing queue with no encryption, use the following AWS CLI command

aws sqs set-queue-attributes --queue-url <queueURL> --attributes SqsManagedSseEnabled=true

Replace <queueURL> with the URL of your SQS queue.

To disable SSE-SQS for an existing queue using the AWS CLI, run:

aws sqs set-queue-attributes --queue-url <queueURL> --attributes SqsManagedSseEnabled=false

Testing the queue with the SSE-SQS encryption enabled

To test sending message to the SQS queue with SSE-SQS enabled, run:

aws sqs send-message --queue-url <queueURL> --message-body test-message

Replace <queueURL> with the URL of your SQS queue. You see the following response, which means the message is successfully sent to the queue:

{
    "MD5OfMessageBody": "beaa0032306f083e847cbf86a09ba9b2",
    "MessageId": "6e53de76-7865-4c45-a640-f058c24a619b"
}

Default SSE-SQS encryption key rotation

You can choose how often the keys will be rotated by configuring the KmsDataKeyReusePeriodSeconds queue attribute. The value must be an integer between 60 (1 minute) and 86,400 (24 hours). The default is 300 (5 minutes).

To update the KMS data key reuse period for an existing SQS queue, run:

aws sqs set-queue-attributes --queue-url <queueURL> --attributes KmsDataKeyReusePeriodSeconds=900

This configures the queue with KMS key rotation to every 900 seconds (15 minutes).

Default SSE-SQS and encrypted messages

Encrypting a message makes its contents unavailable to unauthorized or anonymous users. Anonymous requests are requests made to a queue that is open to a public network without any authentication. Note, if you are using anonymous SendMessage and ReceiveMessage requests to the newly created queues, the requests will now be rejected with SSE-SQS enabled by default.

Making anonymous requests to SQS queues does not follow SQS security best practices. We strongly recommend updating your policy to make signed requests to SQS queues using AWS SDK or AWS CLI and to continue using SSE-SQS enabled by default.

Look at the SQS service response for anonymous messages when SSE-SQS encryption is enabled. For an existing queue, you can change the queue policy to grant all users (anonymous users) SendMessage permission for a queue named EncryptionQueue:

{
  "Version": "2012-10-17",
  "Id": "Queue1_Policy_UUID",
  "Statement": [
    {
      "Sid": "Queue1_SendMessage",
      "Effect": "Allow",
      "Principal": "*",
      "Action": "sqs:SendMessage",
      "Resource": "<queueARN>"
    }
  ]
}

You can then make an anonymous request against the queue:

curl <queueURL> -d 'Action=SendMessage&MessageBody=Hello'

You get an error message similar to the following:

<?xml version="1.0"?>
<ErrorResponse
	xmlns="http://queue.amazonaws.com/doc/2012-11-05/">
	<Error>
		<Type>Sender</Type>
		<Code>AccessDenied</Code>
		<Message>Access to the resource The specified queue does not exist or you do not have access to it. is denied.</Message>
		<Detail/>
	</Error>
	<RequestId> RequestID </RequestId>
</ErrorResponse>

However, for any reason if you want to continue using anonymous requests to the newly created queues in the future, you must create or update the queue with SSE-SQS encryption disabled.

SqsManagedSseEnabled=false

You can also disable the SSE-SQS using the Amazon SQS console.

Encrypting SQS queues with your own encryption keys

You can always change the default of SSE-SQS queues encryption and use your own keys. To encrypt SQS queues with your own encryption keys using the AWS Key Management Service (SSE-KMS), the default encryption with SSE-SQS can be overwritten to SSE-KMS during the queue creation process or afterwards.

You can update the SQS queue Server-side encryption key type using the Amazon SQS console, AWS Command Line Interface, or API.

Benefits of SQS owned encryption (SSE-SQS)

There are a number of significant benefits to encrypting your data with SQS owned encryption (SSE-SQS):

  • SSE-SQS lets you transmit data more securely and improve your security posture commonly required for compliance and regulations with no additional overhead, as you do not need to create and manage encryption keys.
  • Encryption at rest using the default SSE-SQS is provided at no additional charge.
  • The encryption and decryption of your data are handled transparently and continue to deliver the same performance you expect.
  • Data is encrypted using the 256-bit Advanced Encryption Standard (AES-256 GCM algorithm), so that only authorized roles and services can access data.

In addition, customers can enable CloudWatch Alarms to alarm on activities such as authorization failures, AWS Identity and Access Management (IAM) policy changes, or tampering with CloudTrail logs to help detect and stay on top of security incidents in the customer application (to learn more, see Amazon CloudWatch User Guide).

Conclusion

SQS now provides server-side encryption (SSE) using SQS-owned encryption (SSE-SQS) by default. This enhancement makes it easier to create SQS queues, while greatly reducing the operational burden and complexity involved in protecting data.

Encryption at rest using the default SSE-SQS is provided at no additional charge and is supported for both Standard and FIFO SQS queues using HTTPS endpoints. The default SSE-SQS encryption is available now.

To learn more about Amazon Simple Queue Service (SQS), see Getting Started with Amazon SQS and Amazon Simple Queue Service Developer Guide.

For more serverless learning resources, visit Serverless Land.

ICYMI: Serverless Q3 2022

Post Syndicated from David Boyne original https://aws.amazon.com/blogs/compute/serverless-icymi-q3-2022/

Welcome to the 19th edition of the AWS Serverless ICYMI (in case you missed it) quarterly recap. Every quarter, we share all the most recent product launches, feature enhancements, blog posts, webinars, Twitch live streams, and other interesting things that you might have missed!

In case you missed our last ICYMI, check out what happened last quarter here.

AWS Lambda

AWS has now introduced tiered pricing for Lambda. With tiered pricing, customers who run large workloads on Lambda can automatically save on their monthly costs. Tiered pricing is based on compute duration measured in GB-seconds. The tiered pricing breaks down as follows:

With tiered pricing, you can save on the compute duration portion of your monthly Lambda bills. This allows you to architect, build, and run large-scale applications on Lambda and take advantage of these tiered prices automatically. To learn more about Lambda cost optimizations, watch the new serverless office hours video.

Developers are using AWS SAM CLI to simplify serverless development making it easier to build, test, package, and deploy their serverless applications.  For JavaScript and TypeScript developers, you can now simplify your Lambda development further using esbuild in the AWS SAM CLI.

Code example of esbuild with SAM

Together with esbuild and SAM Accelerate, you can rapidly iterate on your code changes in the AWS Cloud. You can approximate the same levels of productivity as when testing locally, while testing against a realistic application environment in the cloud. esbuild helps simplify Lambda development with support for tree shaking, minification, source maps, and loaders. To learn more about this feature, read the documentation.

Lambda announced support for Attribute-Based Access Control (ABAC). ABAC is designed to simplify permission management using access permissions based on tags. These can be attached to IAM resources, such as IAM users, and roles. ABAC support for Lambda functions allows you to scale your permissions as your organization innovates. It gives granular access to developers without requiring a policy update when a user or project is added, removed, or updated. To learn more about ABAC, read about ABAC for Lambda.

AWS Lambda Powertools is an open-source library to help customers discover and incorporate serverless best practices more easily. Powertools for TypeScript is now generally available and currently focused on three observability features: distributed tracing (Tracer), structured logging (Logger), and asynchronous business and application metrics (Metrics). Powertools is helping builders around the world with more than 10M downloads it is also available in Python and Java programming languages.

To learn more:

AWS Step Functions

Amazon States Language (ASL) provides a set of functions known as intrinsics that perform basic data transformations. Customers have asked for additional intrinsics to perform more data transformation tasks, such as formatting JSON strings, creating arrays, generating UUIDs, and encoding data. Step functions have now added 14 new intrinsic functions which can be grouped into six categories:

Intrinsic functions allow you to reduce the use of other services to perform basic data manipulations in your workflow. Read the release blog for use-cases and more details.

Step Functions expanded its AWS SDK integrations with support for Amazon Pinpoint API 2.0, AWS Billing Conductor,  Amazon GameSparks, and 195 more AWS API actions. This brings the total to 223 AWS Services and 10,000+ API Actions.

Amazon EventBridge

EventBridge released support for bidirectional event integrations with Salesforce, allowing customers to consume Salesforce events directly into their AWS accounts. Customers can also utilize API Destinations to send EventBridge events back to Salesforce, completing the bidirectional event integrations between Salesforce and AWS.

EventBridge also released the ability to start receiving events from GitHub, Stripe, and Twilio using quick starts. Customers can subscribe to events from these SaaS applications and receive them directly onto their EventBridge event bus for further processing. With Quick Starts, you can use AWS CloudFormation templates to create HTTP endpoints for your event bus that are configured with security best practices.

To learn more:

Amazon DynamoDB

DynamoDB now supports bulk imports from Amazon S3 into new DynamoDB tables. You can use bulk imports to help you migrate data from other systems, load test your applications, facilitate data sharing between tables and accounts, or simplify your disaster recovery and business continuity plans. Bulk imports support CSV, DynamoDB JSON, and Amazon Ion as input formats. You can get started with DynamoDB import via API calls or the AWS Management Console. To learn more, read the documentation or follow this guide.

DynamoDB now supports up to 100 actions per transaction. With Amazon DynamoDB transactions, you can group multiple actions together and submit them as a single all-or-nothing operation. The maximum number of actions in a single transaction has now increased from 25 to 100. The previous limit of 25 actions per transaction would sometimes require writing additional code to break transactions into multiple parts. Now with 100 actions per transaction, builders will encounter this limit much less frequently. To learn more about best practices for transactions, read the documentation.

Amazon SNS

SNS has introduced the public preview of message data protection to help customers discover and protect sensitive data in motion without writing custom code. With message data protection for SNS, you can scan messages in real time for PII/PHI data and receive audit reports containing scan results. You can also prevent applications from receiving sensitive data by blocking inbound messages to an SNS topic or outbound messages to an SNS subscription. These scans include people’s names, addresses, social security numbers, credit card numbers, and prescription drug codes.

To learn more:

EDA Day – London 2022

The Serverless DA team hosted the world’s first event-driven architecture (EDA) day in London on September 1. This brought together prominent figures in the event-driven architecture community, AWS, and customer speakers, and AWS product leadership from EventBridge and Step Functions.

EDA day covered 13 sessions, 3 workshops, and a Q&A panel. The conference was keynoted by Gregor Hohpe and speakers included Sheen Brisals and Sarah Hamilton from Lego, Toli Apostolidis from Cinch, David Boyne and Marcia Villalba from Serverless DA, and the AWS product team leadership for the panel. Customers could also interact with EDA experts at the Serverlesspresso bar and the Ask the Experts whiteboard.

Gregor Hohpe talking at EDA Day London 2022

Gregor Hohpe talking at EDA Day London 2022

Picture of the crowd at EDA day 2022 in London

Serverless snippets collection added to Serverless Land

Serverless Land is a website that is maintained by the Serverless Developer Advocate team to help you build with workshops, patterns, blogs, and videos. The team has extended Serverless Land and introduced the new AWS Serverless snippets collection. Builders can use serverless snippets to find and integrate tools, code examples, and Amazon CloudWatch Logs Insights queries to help with their development workflow.

Serverless Blog Posts

July

Jul 13 – Optimizing Node.js dependencies in AWS Lambda

Jul 15 – Simplifying serverless best practices with AWS Lambda Powertools for TypeScript

Jul 15 – Creating a serverless Apache Kafka publisher using AWS Lambda 

Jul 18 – Understanding AWS Lambda scaling and throughput

Jul 19 – Introducing Amazon CodeWhisperer in the AWS Lambda console (In preview)

Jul 19 – Scaling AWS Lambda permissions with Attribute-Based Access Control (ABAC)

Jul 25 – Migrating mainframe JCL jobs to serverless using AWS Step Functions

Jul 28 – Using AWS Lambda to run external transactions on Db2 for IBM i

August

Aug 1 – Using certificate-based authentication for iOS applications with Amazon SNS

Aug 4 – Introducing tiered pricing for AWS Lambda

Aug 5 – Securely retrieving secrets with AWS Lambda

Aug 8 – Estimating cost for Amazon SQS message processing using AWS Lambda

Aug 9 – Building AWS Lambda governance and guardrails

Aug 11 – Introducing the new AWS Serverless Snippets Collection

Aug 12 – Introducing bidirectional event integrations with Salesforce and Amazon EventBridge

Aug 17 – Using custom consumer group ID support for AWS Lambda event sources for MSK and self-managed Kafka

Aug 24 – Speeding up incremental changes with AWS SAM Accelerate and nested stacks

Aug 29 – Deploying AWS Lambda functions using AWS Controllers for Kubernetes (ACK)

Aug 30 – Building cost-effective AWS Step Functions workflows

September

Sep 05 – Introducing new intrinsic functions for AWS Step Functions

Sep 08 – Introducing message data protection for Amazon SNS

Sep 14 – Lifting and shifting a web application to AWS Serverless: Part 1

Sep 14 – Lifting and shifting a web application to AWS Serverless: Part 2

Videos

Serverless Office Hours – Tues 10AM PT

Weekly live virtual office hours. In each session we talk about a specific topic or technology related to serverless and open it up to helping you with your real serverless challenges and issues. Ask us anything you want about serverless technologies and applications.

YouTube: youtube.com/serverlessland
Twitch: twitch.tv/aws

July

Jul 5 – AWS SAM Accelerate GA + more!

Jul 12 – Infrastructure as actual code

Jul 19 – The AWS Step Functions Workflows Collection

Jul 26 – AWS Lambda Attribute-Based Access Control (ABAC)

August

Aug 2 – AWS Lambda Powertools for TypeScript/Node.js

Aug 9 – AWS CloudFormation Hooks

Aug 16 – Java on Lambda best-practices

Aug 30 – Alex de Brie: DynamoDB Misconceptions

September

Sep 06 – AWS Lambda Cost Optimization

Sep 13 – Amazon EventBridge Salesforce integration

Sep 20 – .NET on AWS Lambda best practices

FooBar Serverless YouTube channel

Marcia Villalba frequently publishes new videos on her popular serverless YouTube channel. You can view all of Marcia’s videos at https://www.youtube.com/c/FooBar_codes.

July

Jul 7 – Amazon Cognito – Add authentication and authorization to your web apps

Jul 14 – Add Amazon Cognito to an existing application – NodeJS-Express and React

Jul 21 – Introduction to Amazon CloudFront – Add CDN to your applications

Jul 28 – Add Amazon S3 storage and use a CDN in an existing application

August

Aug 04 – Testing serverless application locally – Demo with Node.js, Express, and React

Aug 11 – Building Amazon CloudWatch dashboards with AWS CDK

Aug 19 – Let’s code – Lift and Shift migration to Serverless of Node.js, Express, React and Mongo app

Aug 25 – Let’s code – Lift and Shift migration to Serverless, migrating Authentication and Authorization

Aug 29 – Deploying AWS Lambda functions using AWS Controllers for Kubernetes (ACK)

September

Sep 1 – Run Artillery in a Lambda function | Load test your serverless applications

Sep 8 – Let’s code – Lift and Shift migration to Serverless, migrating Storage with Amazon S3 and CloudFront

Sep 15 – What are Event-Driven Architectures? Why we care?

Sep 22 – Queues – Point to Point Messaging – Exploring Event-Driven Patterns

Still looking for more?

The Serverless landing page has more information. The Lambda resources page contains case studies, webinars, whitepapers, customer stories, reference architectures, and even more Getting Started tutorials.

You can also follow the Serverless Developer Advocacy team on Twitter to see the latest news, follow conversations, and interact with the team.

D1: our quest to simplify databases

Post Syndicated from Nevi Shah original https://blog.cloudflare.com/whats-new-with-d1/

D1: our quest to simplify databases

D1: our quest to simplify databases

When we announced D1 in May of this year, we knew it would be the start of something new – our first SQL database with Cloudflare Workers. Prior to D1 we’ve announced storage options like KV (key-value store), Durable Objects (single location, strongly consistent data storage) and R2 (blob storage). But the question always remained “How can I store and query relational data without latency concerns and an easy API?”

The long awaited “Cloudflare Database” was the true missing piece to build your application entirely on Cloudflare’s global network, going from a blank canvas in VSCode to a full stack application in seconds. Compatible with the popular SQLite API, D1 empowers developers to build out their databases without getting bogged down by complexity and having to manage every underlying layer.

Since our launch announcement in May and private beta in June, we’ve made great strides in building out our vision of a serverless database. With D1 still in private beta but an open beta on the horizon, we’re excited to show and tell our journey of building D1 and what’s to come.

The D1 Experience

We knew from Cloudflare Workers feedback that using Wrangler as the mechanism to create and deploy applications is loved and preferred by many. That’s why when Wrangler 2.0 was announced this past May alongside D1, we took advantage of the new and improved CLI for every part of the experience from data creation to every update and iteration. Let’s take a quick look on how to get set up in a few easy steps.

Create your database

With the latest version of Wrangler installed, you can create an initialized empty database with a quick

npx wrangler d1 create my_database_name

To get your database up and running! Now it’s time to add your data.

Bootstrap it

It wouldn’t be the “Cloudflare way” if you had to sit through an agonizingly long process to get set up. So we made it easy and painless to bring your existing data from an old database and bootstrap your new D1 database.  You can run

wrangler d1 execute my_database-name --file ./filename.sql

and pass through an existing SQLite .sql file of your choice. Your database is now ready for action.

Develop & Test Locally

With all the improvements we’ve made to Wrangler since version 2 launched a few months ago, we’re pleased to report that D1 has full remote & local wrangler dev support:

D1: our quest to simplify databases

When running wrangler dev -–local -–persist, an SQLite file will be created inside .wrangler/state. You can then use a local GUI program for managing it, like SQLiteFlow (https://www.sqliteflow.com/) or Beekeeper (https://www.beekeeperstudio.io/).

Or you can simply use SQLite directly with the SQLite command line by running sqlite3 .wrangler/state/d1/DB.sqlite3:

D1: our quest to simplify databases

Automatic backups & one-click restore

No matter how much you test your changes, sometimes things don’t always go according to plan. But with Wrangler you can create a backup of your data, view your list of backups or restore your database from an existing backup. In fact, during the beta, we’re taking backups of your data every hour automatically and storing them in R2, so you will have the option to rollback if needed.

D1: our quest to simplify databases

And the best part – if you want to use a production snapshot for local development or to reproduce a bug, simply copy it into the .wrangler/state directory and wrangler dev –-local –-persist will pick it up!

Let’s download a D1 backup to our local disk. It’s SQLite compatible.

D1: our quest to simplify databases

Now let’s run our D1 worker locally, from the backup.

D1: our quest to simplify databases

Create and Manage from the dashboard

However, we realize that CLIs are not everyone’s jam. In fact, we believe databases should be accessible to every kind of developer – even those without much database experience! D1 is available right from the Cloudflare dashboard giving you near total command parity with Wrangler in just a few clicks. Bootstrapping your database, creating tables, updating your database, viewing tables and triggering backups are all accessible right at your fingertips.

D1: our quest to simplify databases

Changes made in the UI are instantly available to your Worker — no deploy required!

We’ve told you about some of the improvements we’ve landed since we first announced D1, but as always, we also wanted to give you a small taste (with some technical details) of what’s ahead. One really important functionality of a database is transactions — something D1 wouldn’t be complete without.

Sneak peek: how we’re bringing JavaScript transactions to D1

With D1, we strive to present a dramatically simplified interface to creating and querying relational data, which for the most part is a good thing. But simplification occasionally introduces drawbacks, where a use-case is no longer easily supported without introducing some new concepts. D1 transactions are one example.

Transactions are a unique challenge

You don’t need to specify where a Cloudflare Worker or a D1 database run—they simply run everywhere they need to. For Workers, that is as close as possible to the users that are hitting your site right this second. For D1 today, we don’t try to run a copy in every location worldwide, but dynamically manage the number and location of read-only replicas based on how many queries your database is getting, and from where. However, for queries that make changes to a database (which we generally call “writes” for short), they all have to travel back to the single Primary D1 instance to do their work, to ensure consistency.

But what if you need to do a series of updates at once? While you can send multiple SQL queries with .batch() (which does in fact use database transactions under the hood), it’s likely that, at some point, you’ll want to interleave database queries & JS code in a single unit of work.

This is exactly what database transactions were invented for, but if you try running BEGIN TRANSACTION in D1 you’ll get an error. Let’s talk about why that is.

Why native transactions don’t work
The problem arises from SQL statements and JavaScript code running in dramatically different places—your SQL executes inside your D1 database (primary for writes, nearest replica for reads), but your Worker is running near the user, which might be on the other side of the world. And because D1 is built on SQLite, only one write transaction can be open at once. Meaning that, if we permitted BEGIN TRANSACTION, any one Worker request, anywhere in the world, could effectively block your whole database! This is a quite dangerous thing to allow:

  • A Worker could start a transaction then crash due to a software bug, without calling ROLLBACK. The primary would be blocked, waiting for more commands from a Worker that would never come (until, probably, some timeout).
  • Even without bugs or crashes, transactions that require multiple round-trips between JavaScript and SQL could end up blocking your whole system for multiple seconds, dramatically limiting how high an application built with Workers & D1 could scale.

But allowing a developer to define transactions that mix both SQL and JavaScript makes building applications with Workers & D1 so much more flexible and powerful. We need a new solution (or, in our case, a new version of an old solution).

A way forward: stored procedures
Stored procedures are snippets of code that are uploaded to the database, to be executed directly next to the data. Which, at first blush, sounds exactly like what we want.

However, in practice, stored procedures in traditional databases are notoriously frustrating to work with, as anyone who’s developed a system making heavy use of them will tell you:

  • They’re often written in a different language to the rest of your application. They’re usually written in (a specific dialect of) SQL or an embedded language like Tcl/Perl/Python. And while it’s technically possible to write them in JavaScript (using an embedded V8 engine), they run in such a different environment to your application code it still requires significant context-switching to maintain them.
  • Having both application code and in-database code affects every part of the development lifecycle, from authoring, testing, deployment, rollbacks and debugging. But because stored procedures are usually introduced to solve a specific problem, not as a general purpose application layer, they’re often managed completely manually. You can end up with them being written once, added to the database, then never changed for fear of breaking something.

With D1, we can do better.

The point of a stored procedure was to execute directly next to the data—uploading the code and executing it inside the database was simply a means to that end. But we’re using Workers, a global JavaScript execution platform, can we use them to solve this problem?

It turns out, absolutely! But here we have a few options of exactly how to make it work, and we’re working with our private beta users to find the right API. In this section, I’d like to share with you our current leading proposal, and invite you all to give us your feedback.

When you connect a Worker project to a D1 database, you add the section like the following to your wrangler.toml:

[[ d1_databases ]]
# What binding name to use (e.g. env.DB):
binding = "DB"
# The name of the DB (used for wrangler d1 commands):
database_name = "my-d1-database"
# The D1's ID for deployment:
database_id = "48a4224e-...3b09"
# Which D1 to use for `wrangler dev`:
# (can be the same as the previous line)
preview_database_id = "48a4224e-...3b09"

# NEW: adding "procedures", pointing to a new JS file:
procedures = "./src/db/procedures.js"

That D1 Procedures file would contain the following (note the new db.transaction() API, that is only available within a file like this):

export default class Procedures {
  constructor(db, env, ctx) {
    this.db = db
  }

  // any methods you define here are available on env.DB.Procedures
  // inside your Worker
  async Checkout(cartId: number) {
    // Inside a Procedure, we have a new db.transaction() API
    const result = await this.db.transaction(async (txn) => {
      
      // Transaction has begun: we know the user can't add anything to
      // their cart while these actions are in progress.
      const [cart, user] = Helpers.loadCartAndUser(cartId)

      // We can update the DB first, knowing that if any of the later steps
      // fail, all these changes will be undone.
      await this.db
        .prepare(`UPDATE cart SET status = ?1 WHERE cart_id = ?2`)
        .bind('purchased', cartId)
        .run()
      const newBalance = user.balance - cart.total_cost
      await this.db
        .prepare(`UPDATE user SET balance = ?1 WHERE user_id = ?2`)
        // Note: the DB may have a CHECK to guarantee 'user.balance' can not
        // be negative. In that case, this statement may fail, an exception
        // will be thrown, and the transaction will be rolled back.
        .bind(newBalance, cart.user_id)
        .run()

      // Once all the DB changes have been applied, attempt the payment:
      const { ok, details } = await PaymentAPI.processPayment(
        user.payment_method_id,
        cart.total_cost
      )
      if (!ok) {
        // If we throw an Exception, the transaction will be rolled back
        // and result.error will be populated:
        // throw new PaymentFailedError(details)
        
        // Alternatively, we can do both of those steps explicitly
        await txn.rollback()
        // The transaction is rolled back, our DB is now as it was when we
        // started. We can either move on and try something new, or just exit.
        return { error: new PaymentFailedError(details) }
      }

      // This is implicitly called when the .transaction() block finishes,
      // but you can explicitly call it too (potentially committing multiple
      // times in a single db.transaction() block).
      await txn.commit()

      // Anything we return here will be returned by the 
      // db.transaction() block
      return {
        amount_charged: cart.total_cost,
        remaining_balance: newBalance,
      }
    })

    if (result.error) {
      // Our db.transaction block returned an error or threw an exception.
    }

    // We're still in the Procedure, but the Transaction is complete and
    // the DB is available for other writes. We can either do more work
    // here (start another transaction?) or return a response to our Worker.
    return result
  }
}

And in your Worker, your DB binding now has a “Procedures” property with your function names available:

const { error, amount_charged, remaining_balance } =
  await env.DB.Procedures.Checkout(params.cartId)

if (error) {
  // Something went wrong, `error` has details
} else {
  // Display `amount_charged` and `remaining_balance` to the user.
}

Multiple Procedures can be triggered at one time, but only one db.transaction() function can be active at once: any other write queries or other transaction blocks will be queued, but all read queries will continue to hit local replicas and run as normal. This API gives you the ability to ensure consistency when it’s essential but with the minimal impact on total overall performance worldwide.

Request for feedback

As with all our products, feedback from our users drives the roadmap and development. While the D1 API is in beta testing today, we’re still seeking feedback on the specifics. However, we’re pleased that it solves both the problems with transactions that are specific to D1 and the problems with stored procedures described earlier:

  • Code is executing as close as possible to the database, removing network latency while a transaction is open.
  • Any exceptions or cancellations of a transaction cause an instant rollback—there is no way to accidentally leave one open and block the whole D1 instance.
  • The code is in the same language as the rest of your Worker code, in the exact same dialect (e.g. same TypeScript config as it’s part of the same build).
  • It’s deployed seamlessly as part of your Worker. If two Workers bind to the same D1 instance but define different procedures, they’ll only see their own code. If you want to share code between projects or databases, extract a library as you would with any other shared code.
  • In local development and test, the procedure works just like it does in production, but without the network call, allowing seamless testing and debugging as if it was a local function.
  • Because procedures and the Worker that define them are treated as a single unit, rolling back to an earlier version never causes a skew between the code in the database and the code in the Worker.

The D1 ecosystem: contributions from the community

We’ve told you about what we’ve been up to and what’s ahead, but one of the unique things about this project is all the contributions from our users. One of our favorite parts of private betas is not only getting feedback and feature requests, but also seeing what ideas and projects come to fruition. While sometimes this means personal projects, with D1, we’re seeing some incredible contributions to the D1 ecosystem. Needless to say, the work on D1 hasn’t just been coming from within the D1 team, but also from the wider community and other developers at Cloudflare. Users have been showing off their D1 additions within our Discord private beta channel and giving others the opportunity to use them as well. We wanted to take a moment to highlight them.

workers-qb

Dealing with raw SQL syntax is powerful (and using the D1 .bind() API, safe against SQL injections) but it can be a little clumsy. On the other hand, most existing query builders assume direct access to the underlying DB, and so aren’t suitable to use with D1. So Cloudflare developer Gabriel Massadas designed a small, zero-dependency query builder called workers-qb:

import { D1QB } from 'workers-qb'
const qb = new D1QB(env.DB)

const fetched = await qb.fetchOne({
    tableName: "employees",
    fields: "count(*) as count",
    where: {
      conditions: "active = ?1",
      params: [true]
    },
})

Check out the project homepage for more information: https://workers-qb.massadas.com/.

D1 console

While you can interact with D1 through both Wrangler and the dashboard, Cloudflare Community champion, Isaac McFadyen created the very first D1 console where you can quickly execute a series of queries right through your terminal. With the D1 console, you don’t need to spend time writing the various Wrangler commands we’ve created – just execute your queries.

This includes all bells and whistles you would expect from a modern database console including multiline input, command history, validation for things D1 may not yet support, and ability to save your Cloudflare credentials for later use.

Check out the full project on GitHub or NPM for more information.

Miniflare test Integration

The Miniflare project, which powers Wrangler’s local development experience, also provides fully-fledged test environments for popular JavaScript test runners, Jest and Vitest. With this comes the concept of Isolated Storage, allowing each test to run independently, so that changes made in one don’t affect the others. Brendan Coll, creator of Miniflare, guided the D1 test implementation to give the same benefits:

import Worker from ‘../src/index.ts’
const { DB } = getMiniflareBindings();

beforeAll(async () => {
  // Your D1 starts completely empty, so first you must create tables
  // or restore from a schema.sql file.
  await DB.exec(`CREATE TABLE entries (id INTEGER PRIMARY KEY, value TEXT)`);
});

// Each describe block & each test gets its own view of the data.
describe(‘with an empty DB’, () => {
  it(‘should report 0 entries’, async () => {
    await Worker.fetch(...)
  })
  it(‘should allow new entries’, async () => {
    await Worker.fetch(...)
  })
])

// Use beforeAll & beforeEach inside describe blocks to set up particular DB states for a set of tests
describe(‘with two entries in the DB’, () => {
  beforeEach(async () => {
    await DB.prepare(`INSERT INTO entries (value) VALUES (?), (?)`)
            .bind(‘aaa’, ‘bbb’)
            .run()
  })
  // Now, all tests will run with a DB with those two values
  it(‘should report 2 entries’, async () => {
    await Worker.fetch(...)
  })
  it(‘should not allow duplicate entries’, async () => {
    await Worker.fetch(...)
  })
])

All the databases for tests are run in-memory, so these are lightning fast. And fast, reliable testing is a big part of building maintainable real-world apps, so we’re thrilled to extend that to D1.

Want access to the private beta?

Feeling inspired?

We love to see what our beta users build or want to build especially when our products are at an early stage. As we march toward an open beta, we’ll be looking specifically for your feedback. We are slowly letting more folks into the beta, but if you haven’t received your “golden ticket” yet with access, sign up here! Once you’ve been invited in, you’ll receive an official welcome email.

As always, happy building!

Integrating Amazon MemoryDB for Redis with Java-based AWS Lambda

Post Syndicated from Benjamin Smith original https://aws.amazon.com/blogs/compute/integrating-amazon-memorydb-for-redis-with-java-based-aws-lambda/

This post is written by Mansi Y Doshi, Consultant and Aditya Goteti, Sr. Lead Consultant.

Enterprises are modernizing and migrating their applications to the AWS Cloud to improve scalability, reduce cost, innovate, and reduce time to market new features. Legacy applications are often built with RDBMS as the only backend solution.

Modernizing legacy Java applications with microservices requires breaking down a single monolithic application into multiple independent services. Each microservice does a specific job and requires its own database to persist data, but one database does not fit all use cases. Modern applications require purpose-built databases catering to their specific needs and data models.

This post discusses some of the common use cases for one such data store, Amazon MemoryDB for Redis, which is built to provide durability and faster reads and writes.

Use cases

Modern tech stacks often begin with a backend that interacts with a durable database like MongoDB, Amazon Aurora, or Amazon DynamoDB for their data persistence needs.

But, as traffic volume increases, it often makes sense to introduce a caching layer like ElastiCache. This is populated with data by service logic each time a database read happens, such that the subsequent reads of the same data become faster. While ElastiCache is effective, you must manage and pay for two separate data sources for the same data. You must also write custom logic to handle the cache reads/writes besides the existing read/write logic used for durable databases.

While traditional databases like MySQL, Postgres and DynamoDB provide data durability at the cost of speed, transient data stores like ElastiCache trade durability for faster reads/writes (usually within microseconds). ElastiCache provides writes and strongly consistent reads on the primary node of each shard and eventually consistent reads from read replicas. There is a possibility that the latest data written to the primary node is lost during a failover, which makes ElastiCache fast but not durable.

MemoryDB addresses both these issues. It provides strong consistency on the primary node and eventual consistency reads on replica nodes. The consistency model of MemoryDB is like ElastiCache for Redis. However, in MemoryDB, data is not lost across failovers, allowing clients to read their writes from primaries regardless of node failures. Only data that is successfully persisted in the Multi-AZ transaction log is visible. Replica nodes are still eventually consistent. Because of its distributed transaction model, MemoryDB can provide both durability and microsecond response time.

MemoryDB is most ideal for services that are read-heavy and sensitive to latency, like configuration, search, authentication and leaderboard services. These must operate at microsecond read latency and still be able to persist the data for high availability and durability. Services like leaderboards, having millions of records, often break down the data into smaller chunks/batches and process them in parallel. This needs a data store that can perform calculations on the fly and also store results temporarily. Redis can process millions of operations per second and store temporary calculations for fast retrieval and also run other operations (like aggregations). Since Redis is single-threaded, from the command’s execution point of view, it also helps to avoid dirty writes and reads.

Another use case is a configuration service, where users store, change, and retrieve their configuration data. In large distributed systems, there are often hundreds of independent services interacting with each other using well-defined REST APIs. These services depend on the configuration data to perform specific actions. The configuration service must serve the required information at a low latency to avoid being a bottleneck for the other dependent services.

MemoryDB can read at microsecond latencies durably. It also persists data across multiple Availability Zones. It uses multi- Availability Zone transaction logs to enable fast failover, database recovery, and node restarts. You can use it as a primary database without the need to maintain another cache to lower the data access latency. This also reduces the need to maintain additional caching service, which further reduces cost.

These use cases are a good fit for using MemoryDB. Next, you see how to access, store, and retrieve data in MemoryDB from your Java-based AWS Lambda function.

Overview

This blog shows how to build an Amazon MemoryDB cluster and integrate it with AWS Lambda. Amazon API Gateway and Lambda can be paired together to create a client-facing application, which can be easier to maintain, highly scalable, and secure. Both are fully managed services with no need to provision or manage servers. They can be cost effective when compared to running the application on servers for workloads with long idle periods. Using Lambda authorizers you can also write custom code to control access to your API.

Walkthrough

The following steps show how to provision an Amazon MemoryDB cluster along with Amazon VPC, subnets, security groups and integrate it with a Lambda function using Redis/Jedis Java client. Here, the Lambda function is configured to connect to the same VPC where MemoryDB is provisioned. The steps include provisioning through an AWS SAM template.

Prerequisites

  1. Create an AWS account if you do not already have one and login.
  2. Configure your account and set up permissions to access MemoryDB.
  3. Java 8 or above
  4. Install Maven
  5. Java Client for Redis
  6. Install AWS SAM if you do not already have one

Creating the MemoryDB cluster

Refer to the serverless pattern for a quick setup and customize as required. The AWS SAM template creates VPC, subnets, security groups, the MemoryDB cluster, API Gateway, and Lambda.

To access the MemoryDB cluster from the Lambda function, the security group of the Lambda function is added to the security group of the cluster. The MemoryDB cluster is always launched in a VPC. If the subnet is not specified, the cluster is launched into your default Amazon VPC.

You can also use your existing VPC and subnets and customize the template accordingly. If you are creating a new VPC, you can change the CIDR block and other configuration values as needed. Make sure the DNS hostname and DNS Support of the VPC is enabled. Use the optional parameters section to customize your templates. Parameters enable you to input custom values to your template each time you create or update a stack.

Recommendations

As your workload requirements change, you might want to increase the performance of your cluster or reduce costs by scaling in/out the cluster. To improve the read/write performance, you can scale your cluster horizontally by increasing the number of read replicas or shards for read and write throughout, respectively.

To reduce cost in case the instances are over-provisioned, you can perform vertical scale-in by reducing the size of your cluster, or scale-out by increasing the size to overcome CPU bottlenecks/ memory pressure. Both vertical scaling and horizontal scaling are applied with no downtime and cluster restarts are not required. You can customize the following parameters in the memoryDBCluster as required.

NodeType: db.t4g.small
NumReplicasPerShard: 2
NumShards: 2

In MemoryDB, all the writes are carried on a primary node in a shard and all the reads are performed on the standby nodes. Identifying the right number of read replicas, type of nodes and shards in a cluster is crucial to get the optimal performance and to avoid any additional cost because of over-provisioning the resources. It’s recommended to always start with a minimal number of required resources and scale out as needed.

Replicas improve read scalability, and it is recommended to have at least two read replicas per shard but depending upon the size of the payload and for read heavy workloads, it might be more than two. Adding more read replicas than required does not give any performance improvement, and it attracts additional cost. The following benchmarking is performed using the tool Redis benchmark. The benchmarking is done only on GET requests to simulate a read heavy workload.

The metrics on both the clusters are almost the same with 10 million requests with 1kb of data payload per request. Increasing the size of the payload to 5kb and number of GET requests to 20 million, the cluster with two primary and two replicas could not process, whereas the second cluster processed successfully. To achieve the right sizing, load testing is recommended on the staging/pre-production environment with a similar load as production.

Creating a Lambda function and allow access to the MemoryDB cluster

In the lambda-redis/HelloWorldFunction/pom.xml file, add the following dependency. This adds the Java Jedis client to connect the MemoryDB cluster:

<dependency>
    <groupId>redis.clients</groupId>
    <artifactId>jedis</artifactId>
    <version>4.2.0</version>
</dependency>

The simplest way to connect the Lambda function to the MemoryDB cluster is by configuring it within the same VPC where the MemoryDB cluster was launched.

To create a Lambda function, add the following code in the template.yaml file in the Resources section:

HelloWorldFunction:
    Type: AWS::Serverless::Function
    Properties:
      CodeUri: HelloWorldFunction
      Handler: helloworld.App::handleRequest
      Runtime: java8
      MemorySize: 512
      Timeout: 900 #seconds
      Events:
        HelloWorld:
          Type: Api
          Properties:
            Path: /hello
            Method: get
      VpcConfig:
        SecurityGroupIds:
          - !GetAtt lambdaSG.GroupId
        SubnetIds:
          - !GetAtt privateSubnetA.SubnetId
          - !GetAtt privateSubnetB.SubnetId
      Environment:
        Variables:
          ClusterAddress: !GetAtt memoryDBCluster.ClusterEndpoint.Address

Java code to access MemoryDB

  1. In your Java class, connect to Redis using Jedis client:
    HostAndPort hostAndPort = new HostAndPort(System.getenv("ClusterAddress"), 6379);
    JedisCluster jedisCluster = new JedisCluster(Collections.singleton(hostAndPort), 5000, 5000, 2, null, null, new GenericObjectPoolConfig (), true);
  2. You can now perform set and get operations on Redis as follows
    jedisCluster.set(“test”, “value”)
    jedisCluster.get(“test”)

JedisCluster maintains its own pool of connections and takes care of connection teardown. But you can also customize the configuration for closing idle connections using the GenericObjectPoolConfig object.

Clean Up

To delete the entire stack, run the command “sam delete”.

Conclusion

In this post, you learn how to provision a MemoryDB cluster and access it using Lambda. MemoryDB is suitable for applications requiring microsecond reads and single-digit millisecond writes along with durable storage. Accessing MemoryDB through Lambda using API Gateway reduces the further need for provisioning and maintaining servers.

For more serverless learning resources, visit Serverless Land.

Let’s Architect! Architecting with custom chips and accelerators

Post Syndicated from Luca Mezzalira original https://aws.amazon.com/blogs/architecture/lets-architect-custom-chips-and-accelerators/

It’s hard to imagine a world without computer chips. They are at the heart of the devices that we use to work and play every day. Currently, Amazon Web Services (AWS) is offering customers the next generation of computer chip, with lower cost, higher performance, and a reduced carbon footprint.

This edition of Let’s Architect! focuses on custom computer chips, accelerators, and technologies developed by AWS, such as AWS Nitro System, custom-designed Arm-based AWS Graviton processors that support data-intensive workloads, as well as AWS Trainium, and AWS Inferentia chips optimized for machine learning training and inference.

In this post, we discuss these new AWS technologies, their main characteristics, and how to take advantage of them in your architecture.

Deliver high performance ML inference with AWS Inferentia

As Deep Learning models become increasingly large and complex, the training cost for these models increases, as well as the inference time for serving.

With AWS Inferentia, machine learning practitioners can deploy complex neural-network models that are built and trained on popular frameworks, such as Tensorflow, PyTorch, and MXNet on AWS Inferentia-based Amazon EC2 Inf1 instances.

This video introduces you to the main concepts of AWS Inferentia, a service designed to reduce both cost and latency for inference. To speed up inference, AWS Inferentia: selects and shares a model across multiple chips, places pieces inside the on-chip cache, then streams the data via pipeline for low-latency predictions.

Presenters discuss through the structure of the chip, software considerations, as well as anecdotes from the Amazon Alexa team, who uses AWS Inferentia to serve predictions. If you want to learn more about high throughput coupled with low latency, explore Achieve 12x higher throughput and lowest latency for PyTorch Natural Language Processing applications out-of-the-box on AWS Inferentia on the AWS Machine Learning Blog.

AWS Inferentia shares a model across different chips to speed up inference

AWS Inferentia shares a model across different chips to speed up inference

AWS Lambda Functions Powered by AWS Graviton2 Processor – Run Your Functions on Arm and Get Up to 34% Better Price Performance

AWS Lambda is a serverless, event-driven compute service that enables code to run from virtually any type of application or backend service, without provisioning or managing servers. Lambda uses a high-availability compute infrastructure and performs all of the administration of the compute resources, including server- and operating-system maintenance, capacity-provisioning, and automatic scaling and logging.

AWS Graviton processors are designed to deliver the best price and performance for cloud workloads. AWS Graviton3 processors are the latest in the AWS Graviton processor family and provide up to: 25% increased compute performance, two-times higher floating-point performance, and two-times faster cryptographic workload performance compared with AWS Graviton2 processors. This means you can migrate AWS Lambda functions to Graviton in minutes, plus get as much as 19% improved performance at approximately 20% lower cost (compared with x86).

Comparison between x86 and Arm/Graviton2 results for the AWS Lambda function computing prime numbers

Comparison between x86 and Arm/Graviton2 results for the AWS Lambda function computing prime numbers (click to enlarge)

Powering next-gen Amazon EC2: Deep dive on the Nitro System

The AWS Nitro System is a collection of building-block technologies that includes AWS-built hardware offload and security components. It is powering the next generation of Amazon EC2 instances, with a broadening selection of compute, storage, memory, and networking options.

In this session, dive deep into the Nitro System, reviewing its design and architecture, exploring new innovations to the Nitro platform, and understanding how it allows for fasting innovation and increased security while reducing costs.

Traditionally, hypervisors protect the physical hardware and bios; virtualize the CPU, storage, networking; and provide a rich set of management capabilities. With the AWS Nitro System, AWS breaks apart those functions and offloads them to dedicated hardware and software.

AWS Nitro System separates functions and offloads them to dedicated hardware and software, in place of a traditional hypervisor

AWS Nitro System separates functions and offloads them to dedicated hardware and software, in place of a traditional hypervisor

How Amazon migrated a large ecommerce platform to AWS Graviton

In this re:Invent 2021 session, we learn about the benefits Amazon’s ecommerce Datapath platform has realized with AWS Graviton.

With a range of 25%-40% performance gains across 53,000 Amazon EC2 instances worldwide for Prime Day 2021, the Datapath team is lowering their internal costs with AWS Graviton’s improved price performance. Explore the software updates that were required to achieve this and the testing approach used to optimize and validate the deployments. Finally, learn about the Datapath team’s migration approach that was used for their production deployment.

AWS Graviton2: core components

AWS Graviton2: core components

See you next time!

Thanks for exploring custom computer chips, accelerators, and technologies developed by AWS. Join us in a couple of weeks when we talk more about architectures and the daily challenges faced while working with distributed systems.

Other posts in this series

Looking for more architecture content?

AWS Architecture Center provides reference architecture diagrams, vetted architecture solutions, Well-Architected best practices, patterns, icons, and more!