Tag Archives: Events

How to set up Amazon Quicksight dashboard for Amazon Pinpoint and Amazon SES engagement events

2021-12-07 satyaso

Post Syndicated from satyaso original https://aws.amazon.com/blogs/messaging-and-targeting/how-to-set-up-amazon-quicksight-dashboard-for-amazon-pinpoint-and-amazon-ses-events/

In this post, we will walk through using Amazon Pinpoint and Amazon Quicksight to create customizable messaging campaign reports. Amazon Pinpoint is a flexible and scalable outbound and inbound marketing communications service that allows customers to connect with users over channels like email, SMS, push, or voice. Amazon QuickSight is a scalable, serverless, embeddable, machine learning-powered business intelligence (BI) service built for the cloud. This solution allows event and user data from Amazon Pinpoint to flow into Amazon Quicksight. Once in Quicksight, customers can build their own reports that shows campaign performance on a more granular level.

Engagement Event Dashboard

Customers want to view the results of their messaging campaigns in ever increasing levels of granularity and ensure their users see value from the email, SMS or push notifications they receive. Customers also want to analyze how different user segments respond to different messages, and how to optimize subsequent user communication. Previously, customers could only view this data in Amazon Pinpoint analytics, which offers robust reporting on: events, funnels, and campaigns. However, does not allow analysis across these different parameters and the building of custom reports. For example, show campaign revenue across different user segments, or show what events were generated after a user viewed a campaign in a funnel analysis. Customers would need to extract this data themselves and do the analysis in excel.

Prerequisites

Digital user engagement event database solution must be setup at 1st.
Customers should be prepared to purchase Amazon Quicksight because it has its own set of costs which is not covered within Amazon Pinpoint cost.

Solution Overview

This Solution uses the Athena tables created by Digital user engagement events database solution. The AWS CloudFormation template given in this post automatically sets up the different architecture components, to capture detailed notifications about Amazon Pinpoint engagement events and log those in Amazon Athena in the form of Athena views. You still need to manually configure Amazon Quicksight dashboards to link to these newly generated Athena views. Please follow the steps below in order for further information.

Use case(s)

Event dashboard solutions have following use cases: –

Deep dive into engagement insights. (eg: SMS events, Email events, Campaign events, Journey events)
The ability to view engagement events at the individual user level.
Data/process mining turn raw event data into useful marking insights.
User engagement benchmarking and end user event funneling.
Compute campaign conversions (post campaign user analysis to show campaign effectiveness)
Build funnels that shows user progression.

Getting started with solution deployment

Prerequisite tasks to be completed before deploying the logging solution

Step 1 – Create AWS account, Pinpoint Project, Implement Event-Database-Solution.
As part of this step customers need to implement DUE Event database solution as the current solution (DUE event dashboard) is an extension of DUE event database solution. The basic assumption here is that the customer has already configured Amazon Pinpoint project or Amazon SES within the required AWS region before implementing this step.

The steps required to implement an event dashboard solution are as follows.

a/Follow the steps mentioned in Event database solution to implement the complete stack. Prior installing the complete stack copy and save the name Athena events database name as shown in the diagram. For my case it is due_eventdb. Database name is required as an input parameter for the current Event Dashboard solution.

b/Once the solution is deployed, navigate to the output page of the cloud formation stack, and copy, and save the following information, which will be required as input parameters in step 2 of the current Event Dashboard solution.

Step 2 – Deploy Cloud formation template for Event dashboard solution
This step generates a number of new Amazon Athena views that will serve as a data source for Amazon Quicksight. Continue with the following actions.

Download the cloud formation template(“Event-dashboard.yaml”) from AWS samples.
Navigate to Cloud formation page in AWS console, click up right on “Create stack” and select the option “With new resources (standard)”
Leave the “Prerequisite – Prepare template” to “Template is ready” and for the “Specify template” option, select “Upload a template file”. On the same page, click on “Choose file”, browse to find the file “Event-dashboard.yaml” file and select it. Once the file is uploaded, click “Next” and deploy the stack.

Enter following information under the section “Specify stack details”:

- EventAthenaDatabaseName – As mentioned in Step 1-a.
- S3DataLogBucket- As mentioned in Step 1-b
- This solution will create additional 5 Athena views which are
  - All_email_events
  - All_SMS_events
  - All_custom_events (Custom events can be Mobile app/WebApp/Push Events)
  - All_campaign_events
  - All_journey_events

Step 3 – Create Amazon Quicksight engagement Dashboard
This step walks you through the process of creating an Amazon Quicksight dashboard for Amazon Pinpoint engagement events using the Athena views you created in step-2

To Setup Amazon Quicksight for the 1st time please follow this link (this process is not needed if you have already setup Amazon Quicksight). Please make sure you are an Amazon Quicksight Administrator.
Go/search Amazon Quicksight on AWS console.
Create New Analysis and then select “New dataset”
Select Athena as data source
As a next step, you need to select what all analysis you need for respective events. This solution provides option to create 5 different set of analysis as mentioned in Step 2. They are a/All email events, b/All SMS Events, c/All Custom Events (Mobile/Web App, web push etc), d/ All Campaign events, e/All Journey events. Dashboard can be created from Quicksight analysis and same can be shared among the organization stake holders. Following are the steps to create analysis and dashboards for different type of events.
Email Events –
- For all email events, name the analysis “All-emails-events” (this can be any kind of customer preferred nomenclature), select Athena workgroup as primary, and then create a data source.
- Once you create the data source Quicksight lists all the views and tables available under the specified database (in our case it is:- due_eventdb). Select the email_all_events view as data source.
- Select the event data location for analysis. There are mainly two options available which are a/ Import to Spice quicker analysis b/ Directly query your data. Please select the preferred options and then click on “visualize the data”.
- Import to Spice quicker analysis – SPICE is the Amazon QuickSight Super-fast, Parallel, In-memory Calculation Engine. It’s engineered to rapidly perform advanced calculations and serve data. In Enterprise edition, data stored in SPICE is encrypted at rest. (1 GB of storage is available for free for extra storage customer need to pay extra, please refer cost section in this document )
- Directly query your data – This process enables Quicksight to query directly to the Athena or source database (In the current case it is Athena) and Quicksight will not store any data.
- Now that you have selected a data source, you will be taken to a blank quick sight canvas (Blank analysis page) as shown in the following Image, please drag and drop what visualization type you need to visualize onto the auto-graph pane. Please note that Amazon QuickSight is a Busines intelligence platform, so customers are free to choose the desired visualization types to observe the individual engagement events.
- As part of this blog, we have displayed how to create some simple analysis graphs to visualize the engagement events.
- As an initial step please Select tabular Visualization as shown in the Image.
- Select all the event dimensions that you want to put it as part of the Table in X axis. Amazon Quicksight table can be extended to show as many as tables columns, this completely depends upon the business requirement how much data marketers want to visualize.
- Further filtering on the table can be done using Quicksight filters, you can apply the filter on specific granular values to enable further filtering. For Eg – If you want to apply filtering on the destination email Id then 1/Select the filter from left hand menu 2/Add destination field as the filtering criterion 3/ Tick on the destination field you are trying to filter or search for the Destination email ID that 4/ All the result in the table gets further filtered as per the filter criterion
- As a next step please add another visual from top left corner “Add -> Add Visual”, then select the Donut Chart from Visual types pane. Donut charts are always used for displaying aggregation.
- Then select the “event_type” as the Group to visualize the aggregated events, this helps marketers/business users to figure out how many email events occurred and what are the aggregated success ratio, click ratio, complain ratio or bounce ratio etc for the emails/Campaign that’s sent to end users.
- To create a Quicksight dashboards from the Quicksight analysis click Share menu option at the top right corner then select publish dashboard”. Provide required dashboard name while publishing the dashboard”. Same dashboard can be shared with multiple audiences in the Organization.
- Following is the final version of the dashboard. As mentioned above Quicksight dashboards can be shared with other stakeholders and also complete dashboard can be exported as excel sheet.
SMS Events-
- As shown above SMS events can be analyzed using Quicksight and dash boards can be created out of the analysis. Please repeat all of the sub-steps listed in step 6. Following is a sample SMS dashboard.
Custom Events-
- After you integrate your application (app) with Amazon Pinpoint, Amazon Pinpoint can stream event data about user activity, different type custom events, and message deliveries for the app. Eg :- Session.start, Product_page_view, _session.stop etc. Do repeat all of the sub-steps listed in step 6 create a custom event dashboards.
Campaign events
- As shown before campaign also can be included in the same dashboard or you can create new dashboard only for campaign events.

Cost for Event dashboard solution
You are responsible for the cost of the AWS services used while running this solution. As of the date of publication, the cost for running this solution with default settings in the US West (Oregon) Region is approximately $65 a month. The cost estimate includes the cost of AWS Lambda, Amazon Athena, Amazon Quicksight. The estimate assumes querying 1TB of data in a month, and two authors managing Amazon Quicksight every month, four Amazon Quicksight readers witnessing the events dashboard unlimited times in a month, and a Quicksight spice capacity is 50 GB per month. Prices are subject to change. For full details, see the pricing webpage for each AWS service you will be using in this solution.

Clean up

When you’re done with this exercise, complete the following steps to delete your resources and stop incurring costs:

On the CloudFormation console, select your stack and choose Delete. This cleans up all the resources created by the stack,
Delete the Amazon Quicksight Dashboards and data sets that you have created.

Conclusion

In this blog post, I have demonstrated how marketers, business users, and business analysts can utilize Amazon Quicksight dashboards to evaluate and exploit user engagement data from Amazon SES and Pinpoint event streams. Customers can also utilize this solution to understand how Amazon Pinpoint campaigns lead to business conversions, in addition to analyzing multi-channel communication metrics at the individual user level.

Next steps

The personas for this blog are both the tech team and the marketing analyst team, as it involves a code deployment to create very simple Athena views, as well as the steps to create an Amazon Quicksight dashboard to analyse Amazon SES and Amazon Pinpoint engagement events at the individual user level. Customers may then create their own Amazon Quicksight dashboards to illustrate the conversion ratio and propensity trends in real time by integrating campaign events with app-level events such as purchase conversions, order placement, and so on.

Extending the solution

You can download the AWS Cloudformation templates, code for this solution from our public GitHub repository and modify it to fit your needs.

About the Author

Satyasovan Tripathy works at Amazon Web Services as a Senior Specialist Solution Architect. He is based in Bengaluru, India, and specialises on the AWS Digital User Engagement product portfolio. He likes reading and travelling outside of work.

New – Site-to-Site Connectivity with AWS Direct Connect SiteLink

2021-12-02 Sébastien Stormacq

Post Syndicated from Sébastien Stormacq original https://aws.amazon.com/blogs/aws/new-site-to-site-connectivity-with-aws-direct-connect-sitelink/

We are launching AWS Direct Connect SiteLink, a new capability of AWS Direct Connect that lets you create connections between your on-premises networks through the AWS global network backbone.

Until today, when you needed direct connectivity between your data centers or branch offices, you had to rely on public internet or expensive and hard-to-deploy fixed networks. These are geographically constrained and can be tied to long-term contracts. This rigidity becomes a pain point as you expand your businesses globally. In turn, you’re required to create custom workarounds to interconnect networks from different providers, which increases your operating costs.

Starting today, you may connect your sites through Direct Connect locations, without sending your traffic through an AWS Region. We have 108 Direct Connect locations available in 32 countries as I am writing this post, located across Africa, Americas, Asia-Pacific, Europe, and the Middle East. Traffic flows from one Direct Connect location to another following the shortest possible path. You no longer need to connect through the closest AWS Region and manage and configure an AWS Transit Gateway for site-to-site network connectivity.

You can take advantage of Direct Connect’s reliability and global footprint to build a network that grows with your business, with no long-term contracts, flexible pay-as-you-go pricing, and a wide range of port-speeds, from 50 Mbps to 100 Gbps. SiteLink also integrates with other AWS services, letting you reach your VPCs, other AWS services, and your on-premises networks from your Direct Connect connections.

When talking about network topology, a small diagram is always more descriptive than long phrases.

The following diagram shows the way that you use Direct Connect today. Direct Connect is currently optimized to let you reach your AWS Resources running in any Region as quickly as possible. Sending data from one Direct Connect location to another is not possible.

Once you connect your locations (NY1, AM3, Paris, and TY2 in the diagram) to a Direct Connect gateway, those connections can reach any AWS Region (except the two AWS China Regions). No peering between Regions is necessary, because Direct Connect gateways are global resources.

The following diagram shows how you connect multiple sites using SiteLink. The data flows between Direct Connect locations without going through an AWS Region.

How to Get Started?
Configuring these connections is very similar to what you do today. The first step is to connect my network to Direct Connect locations. After that, SiteLink can be enabled or disabled in minutes.

Using the AWS Management Console, I navigate to the Direct Connect section, and I select Create virtual interface to create a virtual interface. Under the Additional Settings section, I make sure the SiteLink switch is turned on. Obviously, I repeat this on another virtual interface, once per site, to connect.

I have access to similar monitoring dashboards and metrics published to CloudWatch. I select my virtual interface, and then navigate to the Monitoring tab (hopefully your ViF will have more data available than mine that was created just for this post).

Availability and Pricing
You can connect your on-premises networks or branch offices to any of our Direct Connect locations available today, except in China.

Pricing is pay-as-you-go, with no commitment or recurring fees. In addition to existing Direct Connect charges, your monthly bill will include a price-per-hour for SiteLink virtual interfaces, as well as the cost of SiteLink data transfer. Check the pricing page to get the details.

Go ahead an start connecting your on-premises locations together with Direct Connect SiteLink!

— seb

New DynamoDB Table Class – Save Up To 60% in Your DynamoDB Costs

2021-12-01 Marcia Villalba

Post Syndicated from Marcia Villalba original https://aws.amazon.com/blogs/aws/new-dynamodb-table-class-save-up-to-60-in-your-dynamodb-costs/

Today we are announcing Amazon DynamoDB Standard-Infrequent Access (DynamoDB Standard-IA). A new table class for DynamoDB that reduces storage costs by 60 percent compared to existing DynamoDB Standard tables, and that delivers the same performance, durability, and scaling.

Nowadays, many customers are moving their infrequently accessed data between DynamoDB and Amazon Simple Storage Service (Amazon S3). This means that customers are developing a process to migrate the data and build complex applications that must support two different APIs—one for DynamoDB and another for Amazon S3.

DynamoDB Standard-IA table class is designed for customers who want a cost-optimized solution for storing infrequently accessed data in DynamoDB without changing any application code. Using this new table class, you get the single-digit millisecond read and write performance from DynamoDB and use all of the same APIs.

When you use DynamoDB Standard-IA table class, you will save up to 60 percent in storage costs as compared to using the DynamoDB Standard table class. However, DynamoDB reads and writes for this new table class are priced higher than the Standard tables. Therefore, it is important to understand your use cases before applying this new table class to your tables.

DynamoDB Standard-IA is a great solution if you must store terabytes of data for several years where the data must be highly available, but it is not frequently accessed. An example is social media applications where end users rarely access their old posts. However, these posts remain stored, because if someone scrolls on a profile to see an old photo from 2009, they should be able to retrieve it as fast as if it was a newer post.

E-commerce sites are another good use case. These sites might have a lot of products that are not frequently accessed, but administrators of the site still want to have them available in their store just in case someone wants to buy them. Furthermore, this is a good solution for storing a customer’s previous orders. DynamoDB Standard-IA table offers the ability to retain historical orders at a lower cost.

Get started using DynamoDB Standard-IA
Get started using DynamoDB Standard-IA by evaluating the best class for your existing tables.

Go to the table page and select Update the table class in the Actions dropdown to change the table class. Then, choose the new table class and save the changes. You can change the table class for an existing table to be Standard-IA or Standard twice every 30-days with no impact on performance or availability. All of the features of DynamoDB are available when using a table in the Standard-IA table class.

Moreover, you can also create a new table with the DynamoDB Standard-IA table class.

Availability and Pricing
DynamoDB Standard-IA is available in all of the AWS Regions, except the China Regions and AWS GovCloud.

For example, DynamoDB Standard-IA storage pricing in US East (N. Virginia) is now $0.10 per GB (60 percent less than DynamoDB Standard), while reads and writes are 25 percent higher.

For more information about this feature and its pricing, see the DynamoDB Standard-IA Feature page and the DynamoDB pricing page.

— Marcia

New – Amazon DevOps Guru for RDS to Detect, Diagnose, and Resolve Amazon Aurora-Related Issues using ML

2021-12-01 Marcia Villalba

Post Syndicated from Marcia Villalba original https://aws.amazon.com/blogs/aws/new-amazon-devops-guru-for-rds-to-detect-diagnose-and-resolve-amazon-aurora-related-issues-using-ml/

Today we are announcing Amazon DevOps Guru for RDS, a new capability for Amazon DevOps Guru. It allows developers to easily detect, diagnose, and resolve performance and operational issues in Amazon Aurora.

Hundreds of thousands of customers nowadays are using Amazon Aurora because it is highly available, scalable, and durable. But as applications grow in size and complexity, it becomes more challenging for these customers to detect and resolve operational and performance issues quickly.

During last year’s re:Invent, we announced DevOps Guru, a service that uses machine learning (ML) to automatically detect and alert customers of application issues, including database problems. Today we are announcing DevOps Guru for RDS to help developers using Amazon Aurora databases to detect, diagnose, and resolve database performance issues fast and at scale. Now developers will have enough information to determine the exact cause for a database performance issue. This launch will save developers and engineers many hours of work trying to uncover and remediate the performance-related database issues.

DevOps Guru for RDS uses ML to automatically identify and analyze a wide range of performance-related database issues, such as over-utilization of host resources, database bottlenecks, or misbehavior of SQL queries. It also recommends solutions to remediate the issues it finds. To use this capability, you don’t need to be a database or ML expert.

When an issue is detected, DevOps Guru for RDS displays the finding in the DevOps Guru console and sends notifications using Amazon EventBridge or Amazon Simple Notification Service (SNS). This allows developers to automatically manage and take real-time action on the issues.

How DevOps Guru for RDS Works
DevOps Guru for RDS uses anomaly detection on the database load (DB load) performance metric to detect issues. DB load is measured in units of Average Active Sessions (AAS). DB load measures the level of activity in your database, making it a great metric to understand the health of your database. If the DB load is high, this can result in performance issues. This metric can be compared to the number of virtual CPUs (vCPUs), and if the DB load is higher than that number, issues can arise.

The most useful dimensions for this metric are the wait events and the top SQL. The wait event describes what the system conditions that are currently running SQLs are waiting on. The most common reasons why a statement is waiting is that it is waiting for the CPU, waiting for a read or write, or waiting for a locked resource. The top SQL dimension shows which queries are contributing the most to DB load.

The following image is an example of a finding that DevOps Guru for RDS reported. The graph shows that from the AAS, most of them were waiting for access to a table or for CPU.

If you continue scrolling on the DevOps Guru for RDS analysis page, you can discover the cause for the problem and some recommendations to fix it. In this particular example, two problems were detected: high-load wait events and CPU capacity exceeded.

DevOps Guru for RDS looks more in-depth into these problems. First, it looks at the high-load wait events, where there were 27 AAS for the IO and CPU wait types, which is 99 percent of the total DB load.

Second, it tells us that the running tasks exceeded six processes. This database only has two vCPUs, and the recommended number of running processes should be a maximum of four (2x vCPUs). DevOps Guru for RDS also makes recommendations to fix these issues.

In another anomaly, the graph shows that there was a high load of wait events, and one SQL query was found to require further investigation. You can even see the exact SQL query if you click on the SQL digest IDs. The insight’s analysis and recommendation section is full of information on how to investigate further and fix the issue. You can get a lot of detailed information by clicking on the wait event, for example, on the wait event wait/io/table/sql/handler or in the View troubleshooting doc link.

Get started with DevOps Guru for RDS
To get started with this new capability of DevOps Guru, make sure that Performance Insights is enabled for your Amazon Aurora DB instances. It supports Amazon Aurora with MySQL- and PostgreSQL-compatibility. For instructions on how to enable Performance Insights, see Enabling and disabling Performance Insights.

The next step is to enable DevOps Guru to start monitoring your AWS resources. You can specify the resources you want to be covered by DevOps Guru.

If you are already using DevOps Guru, whenever there is a new insight for an Amazon Aurora database resource, you will see it in the console.

To see the detailed database analysis, navigate to the Insight page and select the new View analysis button under the DB load aggregated metric. That button will take you to the detailed analysis by DevOps Guru for RDS.

Pricing and Availability
DevOps Guru for RDS is offered to customers at no additional charge, as part of the existing price that DevOps Guru charges customers for RDS resources.

DevOps Guru for RDS is available in all Regions where DevOps Guru is available, US East (Ohio), US East (N. Virginia), US West (Oregon), Asia Pacific (Singapore), Asia Pacific (Sydney), Asia Pacific (Tokyo), Europe (Frankfurt), Europe (Ireland), and Europe (Stockholm).

Learn more about DevOps Guru for RDS and check out the talk at AWS re:Invent “Automatically detect and resolve performance issues with Amazon DevOps Guru for RDS” (Session Id 15877).

— Marcia

Enhanced Amazon S3 Integration for Amazon FSx for Lustre

2021-12-01 Sébastien Stormacq

Post Syndicated from Sébastien Stormacq original https://aws.amazon.com/blogs/aws/enhanced-amazon-s3-integration-for-amazon-fsx-for-lustre/

Today, we are announcing two additional capabilities of Amazon FSx for Lustre. First, a full bi-directional synchronization of your file systems with Amazon Simple Storage Service (Amazon S3), including deleted files and objects. Second, the ability to synchronize your file systems with multiple S3 buckets or prefixes.

Lustre is a large scale, distributed parallel file system powering the workloads of most of the largest supercomputers. It is popular among AWS customers for high-performance computing workloads, such as meteorology, life-science, and engineering simulations. It is also used in media and entertainment, as well as the financial services industry.

I had my first hands-on Lustre file systems when I was working for Sun Microsystems. I was a pre-sales engineer and worked on some deals to sell multimillion-dollar compute and storage infrastructure to financial services companies. Back then, having access to a Lustre file system was a luxury. It required expensive compute, storage, and network hardware. We had to wait weeks for delivery. Furthermore, it required days to install and configure a cluster.

Fast forward to 2021, I may create a petabyte-scale Lustre cluster and attach the file system to compute resources running in the AWS cloud, on-demand, and only pay for what I use. There is no need to know about Storage Area Networks (SAN), Fiber Channel (FC) fabric, and other underlying technologies.

Modern applications use different storage options for different workloads. It is common to use S3 object storage for data transformation, preparation, or import/export tasks. Other workloads may require POSIX file-systems to access the data. FSx for Lustre lets you synchronize objects stored on S3 with the Lustre file system to meet these requirements.

When you link your S3 bucket to your file system, FSx for Lustre transparently presents S3 objects as files and lets you to write results back to S3.

Full Bi-Directional Synchronization with Multiple S3 Buckets
If your workloads require a fast, POSIX-compliant file system access to your S3 buckets, then you can use FSx for Lustre to link your S3 buckets to a file system and keep data synchronized between the file system and S3 in both directions. However, until today, there were a couple limitations. First, you had to manually configure a task to export data back from FSx for Lustre to S3. Second, deleted files on S3 were not automatically deleted from the file system. And third, an FSx for Lustre file system was synchronized with one S3 bucket only. We are addressing these three challenges with this launch.

Starting today, when you configure an automatic export policy for your data repository association, files on your FSx for Lustre file system are automatically exported to your data repository on S3. Next, deleted objects on S3 are now deleted from the FSx for Lustre file system. The opposite is also available: deleting files on FSx for Lustre triggers the deletion of corresponding objects on S3. Finally, you may now synchronize your FSx for Lustre file system with multiple S3 buckets. Each bucket has a different path at the root of your Lustre file system. For example your S3 bucket logs may be mapped to /fsx/logs and your other financial_data bucket may be mapped to /fsx/finance.

These new capabilities are useful when you must concurrently process data in S3 buckets using both a file-based and an object-based workflow, as well as share results in near real time between these workflows. For example, an application that accesses file data can do so by using an FSx for Lustre file system linked to your S3 bucket, while another application running on Amazon EMR may process the same files from S3.

Moreover, you may link multiple S3 buckets or prefixes to a single FSx for Lustre file system, thereby enabling a unified view across multiple datasets. Now you can create a single FSx for Lustre file system and easily link multiple S3 data repositories (S3 buckets or prefixes). This is convenient when you use multiple S3 buckets or prefixes to organize and manage access to your data lake, access files from a public S3 bucket (such as these hundreds of public datasets) and write job outputs to a different S3 bucket, or when you want to use a larger FSx for Lustre file system linked to multiple S3 datasets to achieve greater scale-out performance.

How It Works
Let’s create an FSx for Lustre file system and attach it to an Amazon Elastic Compute Cloud (Amazon EC2) instance. I make sure that the file system and instance are in the same VPC subnet to minimize data transfer costs. The file system security group must authorize access from the instance.

I open the AWS Management Console, navigate to FSx, and select Create file system. Then, I select Amazon FSx for Lustre. I am not going through all of the options to create a file system here, you can refer to the documentation to learn how to create a file system. I make sure that Import data from and export data to S3 is selected.

It takes a few minutes to create the file system. Once the status is Available, I navigate to the Data repository tab, and then select Create data repository association.

I choose a Data Repository path (my source S3 bucket) and a file system path (where in the file system that bucket will be imported).

Then, I choose the Import policy and Export policy. I may synchronize the creation of file/objects, their updates, and when they are deleted. I select Create.

When I use automatic import, I also make sure to provide an S3 bucket in the same AWS Region as the FSx for Lustre cluster. FSx for Lustre supports linking to an S3 bucket in a different AWS Region for automatic export and all other capabilities.

Using the console, I see the list of Data repository associations. I wait for the import task status to become Succeeded. If I link the file system to an S3 bucket with a large number of objects, then I may choose to skip Importing metadata from repository while creating the data repository association, and then load metadata from selected prefixes in my S3 buckets that are required for my workload using an Import task.

I create an EC2 instance in the same VPC subnet. Furthermore, I make sure that the FSx for Lustre cluster security group authorizes ingress traffic from the EC2 instance. I use SSH to connect to the instance, and then type the following commands (commands are prefixed with the $ sign that is part of my shell prompt).

# check kernel version, minimum version 4.14.104-95.84 is required 
$ uname -r
4.14.252-195.483.amzn2.aarch64

# install lustre client 
$ sudo amazon-linux-extras install -y lustre2.10
Installing lustre-client
...
Installed:
  lustre-client.aarch64 0:2.10.8-5.amzn2                                                                                                                        

Complete!

# create a mount point 
$ sudo mkdir /fsx

# mount the file system 
$ sudo mount -t lustre -o noatime,flock fs-00...9d.fsx.us-east-1.amazonaws.com@tcp:/ny345bmv /fsx

# verify mount succeeded
$ mount 
...
172.0.0.0@tcp:/ny345bmv on /fsx type lustre (rw,noatime,flock,lazystatfs)

Then, I verify that the file system contains the S3 objects, and I create a new file using the touch command.

I switch to the AWS Console, under S3 and then my bucket name, and I verify that the file has been synchronized.

Using the console, I delete the file from S3. And, unsurprisingly, after a few seconds, the file is also deleted from the FSx file system.

Pricing and Availability
These new capabilities are available at no additional cost on Amazon FSx for Lustre file systems. Automatic export and multiple repositories are only available on Persistent 2 file systems in US East (N. Virginia), US East (Ohio), US West (Oregon), Canada (Central), Asia Pacific (Tokyo), Europe (Frankfurt), and Europe (Ireland). Automatic import with support for deleted and moved objects in S3 is available on file systems created after July 23, 2020 in all regions where FSx for Lustre is available.

You can configure your file system to automatically import S3 updates by using the AWS Management Console, the AWS Command Line Interface (CLI), and AWS SDKs.

Learn more about using S3 data repositories with Amazon FSx for Lustre file systems.

One More Thing
One more thing while you are reading. Today, we also launched the next generation of FSx for Lustre file systems. FSx for Lustre next-gen file systems are built on AWS Graviton processors. They are designed to provide you with up to 5x higher throughput per terabyte (up to 1 GB/s per terabyte) and reduce your cost of throughput by up to 60% as compared to previous generation file systems. Give it a try today!

— seb

PS : my colleague Michael recorded a demo video to show you the enhanced S3 integration for FSx for Lustre in action. Check it out today.

Preview – AWS Backup Adds Support for Amazon S3

2021-12-01 Sébastien Stormacq

Post Syndicated from Sébastien Stormacq original https://aws.amazon.com/blogs/aws/preview-aws-backup-adds-support-for-amazon-s3/

Starting today, you can preview AWS Backup for Amazon Simple Storage Service (Amazon S3).

AWS Backup is a fully managed, policy-based service that lets you to centralize and automate the backup and restore of your applications spanning across 12 AWS services: Amazon Elastic Compute Cloud (Amazon EC2) instances, Amazon Elastic Block Store (EBS) volumes, Amazon Relational Database Service (RDS) databases (including Amazon Aurora clusters), Amazon DynamoDB tables, Amazon Neptune databases, Amazon DocumentDB (with MongoDB compatibility) databases, Amazon Elastic File System (Amazon EFS) file systems, Amazon FSx for Lustre file systems, Amazon FSx for Windows File Server file systems, AWS Storage Gateway volumes, and now Amazon S3 (in preview).

Modern workloads and systems are leveraging different storage options for different functionalities. In the 21st century, it is normal to build applications relying on non-relational and relational databases, shared file storage, and object storage, just to name of few. When operating and managing these applications, you told us that you wanted centralized protection and provable compliance for application data stored in S3 alongside other AWS services for storage, compute, and databases.

I can see three benefits when integrating Amazon Simple Storage Service (Amazon S3) with your data protection policies in AWS Backup.

First, it lets you centrally manage your applications backups: AWS Backup provides an automated solution to centrally configure backup policies, thereby helping you simplify backup lifecycle management. This also makes it easy to ensure that your application data across AWS services (including S3) is centrally backed up.

Second, it lets you easily restore your data: AWS Backup provides a single-click-restore experience for your S3 data. This lets you perform point-in-time restores of your S3 buckets and objects to a new or existing S3 bucket.

Finally, it improves backup compliance: AWS Backup provides built-in dashboards that let you to track backup and restore operations for S3.

AWS Backup for S3 (Preview) lets you create continuous point-in-time backups along with periodic backups of S3 buckets, including object data, object tags, access control lists (ACLs), and user-defined metadata. The first backup is a full snapshot, while subsequent backups are incremental. If there is a data disruption event, then you choose a backup from the backup vault, and restore an S3 bucket (or individual S3 objects) to a new or existing S3 bucket. AWS Backup is integrated with AWS Organizations, which let you use a single policy across AWS accounts (within your Organizations) to automate backup creation and backup access management.

Furthermore, you can turn on AWS Backup Vault Lock to enable delete protection of the data that you protect with AWS Backup, and thereby improving protection of your immutable backups from accidental deletion or malicious re-encryption.

How to Get Started
AWS Backup works with versioned S3 buckets. Before you get started, turn on S3 Versioning on your buckets to backup.

I must enable S3 in AWS Backup Settings when I use this feature for the first time. Using the AWS Management Console, I navigate to AWS Backup, then select Settings and Configure resources. I enable S3, and select Confirm. This is a one-time operation.

For this demo, I already have an existing backup plan, and I want to add an S3 bucket to this plan. If you want to create a new backup plan, then you can refer to AWS Backup‘s technical documentation.

To start including my S3 objects in my backup plan, I open the AWS Management Console, navigate to Backup plans, and select Assign resources.

I give a name to my Resource assignment. I select Include specific resources types, then I select S3 as Resource type and one or several S3 Bucket names. When I am done, I select Assign resources.

Alternatively, I may use tags or resource IDs to assign S3 resources.

If you have thousands of S3 buckets, I recommend using tags to assign the S3 buckets to a backup plan. AWS Backup matches the tags in S3 buckets to the ones assigned to the backup plan, and it centrally backs up the S3 resources along with other AWS services that your application uses.

The other options are not different from what you know already.

The Bucket names list in the previous screenshot only shows the S3 buckets in the same Region.

Alternatively, I may also create on-demand backups. I navigate to the Protected resources section, and select Create on-demand backup.

I select S3 as the Resource type, and select the Bucket name. As per usual, I choose a Backup Window, a Retention period, a Backup vault, and an IAM role. Then, I select Create on-demand backup.

After a while, depending on the size of my bucket, the backup is Completed.

All of the backups are encrypted and stored securely in a backup vault that I selected in the backup plan.

A backup vault (or backup storage vault) is an encrypted logical construct in my AWS account that stores and organizes my backups (recovery points). I may create new backup vaults in every AWS Region where AWS Backup is available. I may enable AWS Backup Vault Lock (delete-protection capability) on the backup vault to avoid accidental deletions and prevent malicious actors from re-encrypting my data. AWS Backup stores my continuous backups and periodic snapshots in the backup vault of my preference, and it lets me browse and restore as per my requirements.

How to Restore Objects
Let’s try to restore this backup.

The restore operation is very flexible. I may restore entire S3 buckets or individual S3 objects. I may restore the backups to the source S3 bucket, or to another existing bucket. Furthermore, I may create a new S3 bucket during restore. The S3 buckets must have Versioning enabled. Also, I may change the encryption key during restore.

I navigate to Backup vaults to restore the S3 bucket I just backed up. In the Backups section, I select the Recovery point ID that I want to restore, and I select Restore from the Actions menu.

Before starting the restore, I may select a few options:

The Restore time: I may restore my continuous backup to a point-in-time in the last 35 days, while I can restore my periodic backups to their original state.
The Restore type: I may choose to restore the entire bucket or a subset of objects within it.
The Restore destination: I may choose to restore on the same bucket, on another one, or create a new bucket during restore.
The Restored object encryption: this lets me select the key I want to use to encrypt the restored objects in the bucket.

I select Restore backup to start the restore.

I can monitor the progress in the Jobs section, under the Restore jobs tab.

When the status turns green to Completed, my objects are ready to use!

Generally, the most comprehensive data-protection strategies include regular testing and validation of your restore procedures before you need them. Testing your restores also helps to prepare and maintain recovery runbooks. In turn, that ensures operational readiness during a disaster recovery exercise, or an actual data loss scenario.

Availability and Pricing
The preview is available in the US West (Oregon) Region only.

During the preview, there are no charges for creating and storing backups. You will pay the AWS charges for underlying resources, such as S3 storage, API usage, and versioning.

Send us an email at [email protected] including your AWS account ID to register for the preview.

Go ahead and apply to the preview program today.

— seb

Amazon S3 Glacier is the Best Place to Archive Your Data – Introducing the S3 Glacier Instant Retrieval Storage Class

2021-12-01 Marcia Villalba

Post Syndicated from Marcia Villalba original https://aws.amazon.com/blogs/aws/amazon-s3-glacier-is-the-best-place-to-archive-your-data-introducing-the-s3-glacier-instant-retrieval-storage-class/

Today we are announcing the Amazon S3 Glacier Instant Retrieval storage class. This new archive storage class delivers the lowest cost storage for long-lived data that is rarely accessed and requires millisecond retrieval.

We are also excited to announce that S3 Intelligent-Tiering now automatically optimizes storage costs for rarely accessed data that needs immediate retrieval with the new Archive Instant Access tier, which is ideal for data with unknown or changing access patterns. For existing customers, this will provide an immediate savings of 68 percent for data that hasn’t been accessed for more than 90 days, with no action needed. The Frequent, Infrequent, and now Archive Instant Access tiers are designed for the same milliseconds access time and high-throughput performance.

In addition, we are announcing the new name for the existing Amazon S3 Glacier storage class and several price reductions.

Amazon S3 Glacier Instant Retrieval
The Amazon S3 Glacier storage classes are extremely low-cost and built for data archiving. They are secure and durable, and they are designed to provide the lowest cost for data that does not require immediate access, with retrieval options from minutes to hours.

Many customers need to store rarely accessed data for several years. However the data must be highly available and immediately accessible. Today, these customers use the S3 Standard-Infrequent Access (S3 Standard-IA) storage class. This storage class offers low cost for storage and allows customers to retrieve their data instantly.

S3 Glacier Instant Retrieval is a new storage class that delivers the fastest access to archive storage, with the same low latency and high-throughput performance as the S3 Standard and S3 Standard-IA storage classes. You can save up to 68 percent on storage costs as compared with using the S3 Standard-IA storage class when you use the S3 Glacier Instant Retrieval storage class and pay a low price to retrieve data. For example, in the US East (N. Virginia) Region, S3 Glacier Instant Retrieval storage pricing is $0.004 per GB-month and data retrieval is $0.03 per GB. Learn more about pricing for your Region.

Media archives, medical images, or user-generated content are just a few examples of ideal use cases for S3 Glacier Instant Retrieval. Once created, this content is rarely accessed, but when it is needed it must be available in milliseconds.

To get started using the new storage class from the Amazon S3 console, upload an object as you would normally, and select the S3 Glacier Instant Retrieval storage class.

This feature is available programmatically from AWS SDKs, AWS Command Line Interface (CLI), and AWS CloudFormation.

In my opinion, the easiest way to store data in S3 Glacier Instant Retrieval is to use the S3 PUT API using the CLI. When using this API, set the storage class to GLACIER_IR.

aws s3api put-object --bucket <bucket-name> --key <object-key> --body <name-file> --storage-class GLACIER_IR

When the object is uploaded to Amazon S3, verify the storage class in the list of objects or on the object details page.

For data that already exists in Amazon S3, you can use S3 Lifecycle to transition data from the S3 Standard and S3 Standard-IA storage classes into S3 Glacier Instant Retrieval.

New Archive Instant Access Tier in S3 Intelligent-Tiering
S3 Intelligent-Tiering is a storage class that automatically moves objects between access tiers to optimize costs. This is the recommended storage class for data with unpredictable or changing access patterns, such as in data lakes, analytics, or user-generated content.

Until today, there were two low latency access tiers optimized for frequent and infrequent access, and two optional archive access tiers designed for asynchronous access optimized for rare access at a low cost.

Beginning today, the Archive Instant Access tier is added as a new access tier in the S3 Intelligent-Tiering storage class. You will start seeing automatic costs savings for your storage in S3 Intelligent-Tiering for rarely accessed objects.

The Archive Instant Access tier joins the group of low latency access tiers. This new tier is optimized for data that is not accessed for months at a time but, when it is needed, is available within milliseconds.

S3 Intelligent-Tiering automatically stores objects in three access tiers that deliver the same performance as the S3 Standard storage class:

Frequent Access tier
Infrequent Access tier
Archive Instant Access (new)

For a small monitoring and automation charge, S3 Intelligent-Tiering monitors access patterns and moves objects between the different access tiers. Objects that have not been accessed for 30 consecutive days are moved from the Frequent Access tier to the Infrequent Access tier for savings of 40 percent. When an object hasn’t been accessed for 90 consecutive days, S3 Intelligent-Tiering will move the object from the Infrequent Access tier to the Archive Instant Access tier, with a savings of 68 percent. If the data is accessed later, it is automatically moved back to the Frequent Access tier. No tiering charges apply when objects are moved between access tiers within the S3 Intelligent-Tiering storage class.

To get started with this new access tier, select Intelligent-Tiering as the storage class for an object when uploading an object using the S3 console. After 90 days of inactivity (30 days in Frequent Access tier and 60 days in Infrequent Access tier), S3 Intelligent-Tiering will automatically move the object to the Archive Instant Access tier. The introduction of the new Archive Instant Access tier has no impact on performance when you retrieve objects.

New name for the Amazon S3 Glacier storage class – S3 Glacier Flexible Retrieval
The existing Amazon S3 Glacier storage class is now named S3 Glacier Flexible Retrieval. This storage class now has free bulk retrievals in 5 to 12 hours, and the storage price has been reduced by 10 percent in all Regions, effective December 1, 2021. S3 Glacier Flexible Retrieval is now even more cost-effective, and the free bulk retrievals make it ideal for retrieving large data volumes.

These are the Amazon S3 archive storage classes:

S3 Glacier Instant Retrieval: The newest storage class is optimized for long-lived data that is rarely accessed (typically once per quarter). However when data is needed, it is available within milliseconds. For example, medical images and news media assets are perfect for this storage class.
S3 Glacier Flexible Retrieval: This newly renamed storage class is optimized for archiving data that can be retrieved in minutes or with free bulk retrievals in 5 to 12 hours. This storage class is ideal for backups and disaster recovery use cases, where you have large amounts of long-term, rarely accessed data, and you don’t want to worry about retrieval costs when you need the data.
S3 Glacier Deep Archive: This storage class is the lowest-cost storage in the cloud and is optimized for archiving data that can be restored in at least 12 hours. It’s great for storing your compliance archives or for digital media preservation.

Amazon S3 has reduced storage prices!
We are excited to announce that Amazon S3 has reduced storage prices of up to 31 percent in the S3 Standard-IA and S3 One Zone-IA storage classes across 9 AWS Regions: US West (N. California), Asia Pacific (Hong Kong), Asia Pacific (Mumbai), Asia Pacific (Osaka), Asia Pacific (Seoul), Asia Pacific (Singapore), Asia Pacific (Sydney), Asia Pacific (Tokyo), and South America (São Paulo). These price reductions are effective December 1, 2021.

Learn more about price reduction details.

Available Now
The new storage class, S3 Glacier Instant Retrieval, and the new Archive Instant Access tier in S3 Intelligent-Tiering are available today (November 30, 2021) in all AWS Regions.

The price cut for S3 Glacier and free bulk retrievals in all AWS Regions, and the S3 Standard-Infrequent Access/One Zone-Infrequent storage class in nine Regions will be effective on December 1, 2021.

Learn more about the storage classes changes and all the storage classes.

— Marcia

Machine Learning-Powered Amazon Connect, Now With Call Summarization

2021-11-30 Sébastien Stormacq

Post Syndicated from Sébastien Stormacq original https://aws.amazon.com/blogs/aws/machine-learning-powered-amazon-connect-now-with-call-summarization/

At AWS our mission is to make machine learning (ML) accessible to data scientists, developers, and business users. To help businesses easily leverage the power of ML, we create purpose-built solutions that embed ML and deep learning technologies directly into a business process to address real customer needs, rather than leaving companies to sort it out on their own.

One place where we have seen ML have an impact is within the contact center—the place you receive and respond to customer inquiries and issues. Because of the growing role of customer experience (CX) and the increase in contact less commerce via phone or email, contact centers are essentials to maintaining the human connections that businesses depend on. However, analog or outdated methods make it difficult to address every customer need in an effective way that delivers timely resolutions, delivers great experiences, and fosters customer loyalty.

Embedding AWS ML technologies into a cloud contact center solution helps decrease the friction of calls, chats, and other engagements. It also makes it possible to automate outdated processes.

Amazon Connect is an easy-to-use, cloud-based, ML-powered contact center service that helps companies of any size deliver superior customer service at a lower cost.

Let me take three examples with Voice ID, Wisdom, and Contact Lens.

Amazon Connect Voice ID
ML capabilities might help streamline customer experience for authentication. Instead of asking customers to repeat their email address and their mother’s maiden name several times, ML-powered voice identification can establish a digital voice print associated with each customer’s unique voice. Then, it can recognize it at the beginning of each subsequent call. Voice identification provides a confidence score that may be used to automate authentication workflows.

Amazon Connect Wisdom
ML might also help search the vast documentation and knowledge base to find the most relevant answers to the questions raised by the customer. ML helps resolve customer issues faster and better.

Contact Lens for Amazon Connect
ML technologies also shine at analyzing the tone and content of a conversation, capturing customer sentiment in the moment, and learning from it. ML can help transcribe calls, track customer sentiment, detect common issues and customer trends, or even pinpoint discrepancies.

At just about the same time last year, I announced the addition of real-time capabilities for Contact Lens. This lets supervisors identify when to assist an agent on live calls so that they can provide guidance via chat or have the agent transfer the call. Last September, we added support for eight new languages, ending up with a total of 21 languages for post-call analytics and 12 languages for both post-call and real-time analytics.

Contact Lens Adds Call Summarization
But we didn’t stop there. Today, I am pleased to announce the addition of a new capability that helps you improve customer experience and agent and supervisor productivity by automatically summarizing the important aspects of each customer call.

You told us that keeping notes of customer conversations is time consuming, especially, for agents that must take notes during the call and import them manually in your CRM tool afterward. In the end, this is more time for us, the customers, waiting in queue for an agent to become available. Likewise, using automatically generated call transcripts doesn’t save time for supervisors. It is time consuming for supervisors to read these full call transcripts to understand what happened during customer conversations.

How it Works
Starting today, Contact Lens has added a summary of the key moments in a conversation. It is enabled by default, and there is no additional configuration step. You may toggle the Show transcript summary button to show or hide the summary when you don’t need it.

Once a call is analyzed, the summary is available on the contact detail page.

Contact Lens identifies and summarizes the sections corresponding to Issue (e.g., lost package), Outcome (e.g., customer refund), and Action item (e.g., send a follow-up mail confirming the refund was processed). A manager can quickly see where there’s an action to send a customer a follow-up email and take action to ensure it happens.

The call summary is also available in JSON format. Contact Lens uploads these in the S3 bucket of your choice. Having access to the JSON file lets you import the summaries programmatically in your CRM or other tools.

... redacted for brevity ...

"IssuesDetected": [
{
   "CharacterOffsets": {
      "BeginOffsetChar": 31,
      "EndOffsetChar": 73
   },
   "Text": "I would like to cancel my subscription"
}
]
...
"ActionItemsDetected": [
 {
   "CharacterOffsets": {
      "BeginOffsetChar": 32,
      "EndOffsetChar": 116
   },
   "Text": "I will send you an email with details"
 }
 ]

Availability and Pricing
Call summarization by Contact Lens is available in all AWS Regions where Contact Lens is available today. We support post-call analytics in the US West (Oregon), US East (N. Virginia), Canada (Central), Europe (London), Europe (Frankfurt), Asia Pacific (Singapore), Asia Pacific (Seoul), Asia Pacific (Tokyo), and Asia Pacific (Sydney) regions. We support real-time analytics in the US West (Oregon), US East (N. Virginia), Canada (Central), Europe (London), Europe (Frankfurt), Asia Pacific (Seoul), Asia Pacific (Tokyo), and Asia Pacific (Sydney) regions.

Call summary comes at no additional cost on top of the usual charges for Contact Lens. This is why we choose to enable it by default. Contact Lens is charged $0.015 per minute of voice conversation analyzed. Most of our Contact Lens customers analyze millions of conversation minutes per month. The price is $0.0125 per minute when you analyze more than 5 millions minutes per month.

If you do not have Contact Lens enabled on your call center, go ahead and start using it today.

— seb

New – AWS Outposts Servers in Two Form Factors

2021-11-30 Jeff Barr

Post Syndicated from Jeff Barr original https://aws.amazon.com/blogs/aws/new-aws-outposts-servers-in-two-form-factors/

AWS Outposts gives you on-premises compute and storage that is monitored and managed by AWS, and controlled by the same, familiar AWS APIs. You may already know about the AWS Outposts rack, which occupies a full 42U rack.

Last year I told you that we were working on new sizes of Outposts suitable for locations such as branch offices, factories, retail stores, health clinics, hospitals, and cell sites that are space-constrained and need access to low-latency compute capacity. Today we are launching three AWS Outposts servers, all powered by AWS Nitro System and with your choice of x86 or Arm/Graviton2 processors. Here’s an overview:

Name/Rack Size/Catalog ID	EC2 Instance Capacity	Processor / Architecture	vCPUs	Memory	Local NVMe SSD Storage
Outposts 1U (STBKRBE)	c6gd.16xlarge	Graviton2 / Arm	64	128 GiB	3.8 TB ( 2x 1.9 TB)
Outposts 2U (LMXAD41)	c6id.16xlarge	Intel Ice Lake / x86	64	128 GiB	3.8 TB (2 x 1.9 TB)
Outposts 2U (KOSKFSF)	c6id.32xlarge	Intel Ice Lake / x86	128	256 GiB	7.6 TB (4 x 1.9 GB)

You can create VPC subnets on each Outpost, and you can launch Amazon Elastic Compute Cloud (Amazon EC2) instances from EBS-backed AMIs in the parent region. The c6gd.16xlarge model supports six instance sizes, as follows:

Instance Name	vCPUs	Memory	Local Storage
c6gd.large	2	4 GiB	118 GB
c6gd.xlarge	4	8 GiB	237 GB
c6gd.2xlarge	8	16 GiB	474 GB
c6gd.4xlarge	16	32 GiB	950 GB
c6gd.8xlarge	32	64 GiB	1.9 TB
c6gd.16xlarge	64	128 GiB	3.8 TB

The c6id.16xlarge model supports all but the largest of the following instance sizes, and the c6id.32xlarge supports all of them:

Instance Name	vCPUs	Memory	Local Storage
c6id.large	2	4 GiB	118 GB
c6id.xlarge	4	8 GiB	237 GB
c6id.2xlarge	8	16 GiB	474 GB
c6id.4xlarge	16	32 GiB	950 GB
c6id.8xlarge	32	64 GiB	1.9 TB
c6id.16xlarge	64	128 GiB	3.8 TB
c6id.32xlarge	128	256 GiB	7.6 TB

Within each of your Outposts servers, you can launch any desired mix of instance sizes as long as you remain within the overall processing and storage available. You can create Amazon Elastic Container Service (Amazon ECS) clusters (Amazon Elastic Kubernetes Service (EKS) is coming soon) , and the code you run on-premises can make use of the entire lineup of services in the AWS Cloud.

Each Outposts server connects to the cloud via the public Internet or across a private AWS Direct Connect line. Additionally, each Outpost server supports a Local Network Interface (LNI) that provides a Level 2 presence on your local network for AWS service endpoints.

Outposts servers incorporate many powerful Nitro features including high speed networking and enhanced security. The security model is locked-down and prevents administrative access, preventing tampering or human error. Additionally, data at rest is protected by a NIST-compliant physical security key.

While I was writing this post, I stopped in to say hello to the design and development team, and met with my colleague Bianca Nagy to learn more about the Outposts server:

Ordering Outposts Servers
Let’s walk through the process of ordering an Outposts server from the AWS Management Console. I visit the AWS Outposts Console, make sure that I am in the desired AWS Region, and click Place order to get started:

I click Servers, and then choose the desired configuration. I pick the c6gd.16xlarge, and click Next to proceed:

Then I create a new Outpost:

And a new Site:

Then I review my payment options and select my shipping address:

On the next page I review all of my options, click Place order, and await delivery:

In general, we expect to be able to deliver Outposts servers in two to six weeks, starting in the first quarter of 2022. After you receive yours, you or a member of your IT team can mount it in a 19″ rack or position it on a flat surface, cable it to power and networking, and power the device on. You then use a set of temporary AWS credentials to confirm the identity of the device, and to verify that the device is able to use DHCP to obtain an IP address. Once the device has established connectivity to the designated AWS parent region, we will finalize the provisioning of EC2 instance capacity and make it available to you.

After that, you are ready to launch instances and to deploy your on-premises applications.

We will monitor hardware performance and will contact you if your device is in need of maintenance. We will ship a replacement device for arrival within 2 business days. You can migrate your workloads to a redundant device, and use tracking information & notifications to track delivery status. When the replacement arrives, you install it and then destroy the physical security key in the old one before shipping it back to AWS.

Outposts API Update
We are also enhancing the Outposts API as part of this launch. Here are some of the new functions:

ListCatalogItem – Get a list of items in the Outposts catalog, with optional filtering by EC2 family or supported storage options.

GetCatalogItem – Get full information about a single item in the Outposts catalog.

GetSiteAddress – Get the physical address of a site where an Outposts rack or server is installed.

You can use the information returned by GetCatalogItem to place an order that contains the desired quantity of one or more catalog items.

Things to Know
Here are a couple of important things to know about Outposts servers:

Availability – Outposts servers are available for order to most locations where Outposts racks are available (currently 23 regions and 49 countries), with more to follow in 2022.

Ordering at Scale – I showed you the console-based ordering process above, and also gave you a glimpse at the Outposts API. If you need hundreds or thousands of devices, get in touch and we will give you a template that you can fill in and then upload.

re:Invent 2021 Outposts Server Selfie Challenge
If you attend AWS re:Invent, be sure to visit the AWS Hybrid kiosk in the AWS Booth (#1719) to see the new Outposts Servers up close and personal. While you are there, take a fun & creative selfie, tag it with #AWSOutposts & #AWSPromotion, and share it on Twitter. I will post my three favorites at the end of the show!

— Jeff;

Amazon Kinesis Data Streams On-Demand – Stream Data at Scale Without Managing Capacity

2021-11-30 Marcia Villalba

Post Syndicated from Marcia Villalba original https://aws.amazon.com/blogs/aws/amazon-kinesis-data-streams-on-demand-stream-data-at-scale-without-managing-capacity/

Today we are launching Amazon Kinesis Data Streams On-demand, a new capacity mode. This capacity mode eliminates capacity provisioning and management for streaming workloads.

Kinesis Data Streams is a fully-managed, serverless service for real-time processing of streamed data at a massive scale. Kinesis Data Streams can take any amount of data, from any number of sources, and scale up and down as needed. Creating a new data stream is easy, since we announced Kinesis Data Streams in November 2013. To get started, you only need to specify the number of shards with which you must provision your stream.

Shards are the way to define capacity in Kinesis Data Streams. Each shard can ingest 1 MB/s and 1,000 records/second and egress up to 2 MB/s. You can add or remove shards of the stream using Kinesis Data Streams APIs to adjust the stream capacity according to the throughout needs of their workloads. This lets you make sure that producer and consumer applications don’t experience any throttling.

As customers adopt data streaming broadly, workloads with data traffic that can increase by millions of events in a few minutes are becoming more common. For these volatile traffic patterns, customers carefully plan capacity, monitor throughput, and in some cases develop processes that automatically change the Kinesis Data Streams stream capacity.

Kinesis Data Streams On-Demand Mode
That is why today we are announcing Kinesis Data Streams On-demand. This new capacity mode eliminates the need for provisioning and managing the capacity for streaming data. Using Kinesis Data Streams On-demand automatically scales the capacity in response to varying data traffic. Customers are charged per gigabyte of data written, read, and stored in the stream, in a pay-per-throughput fashion.

Data streams in the on-demand mode have the same high durability, high availability, low latency, security, and deep AWS integrations that Kinesis Data Streams already provides. Moreover, there are no new APIs to write or read data. All existing Kinesis Data Streams integrations work in the on-demand mode.

Kinesis Data Streams uses the partition key to distribute data across shards. That is why when using Kinesis Data Streams On-demand, you still must specify a partition key for each record to write data into a data stream, as you do today in Kinesis Data Streams using the provisioned mode. In Kinesis Data Streams On-demand, the data stream automatically adapts to handle uneven data distribution patterns. But you must be careful that no partition key exceeds a shard’s limits. If this happens, then you will receive write throttles, and then you can retry these requests.

When a new data stream is created using Kinesis Data Streams On-demand, it gets created with the default capacity of 4 MB/s and 4,000 records per second for writes. Kinesis Data Streams On-demand can automatically scale up to 200 MB/s and 200,000 records per second for writes.

Kinesis Data Streams On-demand accommodates up to double its previous peak write throughput observed in the last 30 days. As your data stream’s write throughput hits a new peak, Kinesis Data Streams automatically scales the stream’s capacity.

For example, if your data stream has a write throughput that varies between 10 MB/s and 40 MB/s, Kinesis Data Streams will make sure that you can easily burst to double the peak—80 MB/s. And, if later on that same data stream reaches a new peak of 50 MB/s, then Kinesis Data Streams will make sure that there is enough capacity to ingest 100 MB/s. However, write throttling can occur if your traffic grows more than double the previous peak in less than 15 minutes.

When to Use Kinesis Data Streams On-demand
On-demand mode is great for customers that have an unknown or variable workload, or who simply don’t want to deal with capacity management. On-demand mode works best for workloads that have even partition key distribution. For example, you run a mobile game that has variable traffic through the week or day, as customers play mostly on nights or weekends. Or, you run a streaming platform that hosts live shows, and you see a sudden increase in demand depending on the guests you have.

In addition, you can switch between on-demand and provision mode twice a day. For example, you run an e-commerce site with predictable traffic. But, starting next month, there will be many marketing campaigns launched globally. You don’t know the impact that those will have on the site traffic. Switch your Kinesis Data Streams to on-demand mode, and now you can enjoy the automated capacity planning and management for your data streams.

Get Started with Kinesis Data Streams On-demand
Create a new data stream with Kinesis Data Streams On-demand from the AWS console, AWS SDKs, AWS Command Line Interface (CLI), and AWS CloudFormation.

To create one from the console, visit the Kinesis console and Create data stream. When selecting the capacity mode, select On-demand.

At the end of the page, all of the settings for the new data stream are presented. These settings can be changed after the data stream has been created.

Let’s See This in Action!
For this demo, I want to show you how the new Kinesis Data Streams capability works. This situation is best described if you at look at the following Amazon CloudWatch graphs. The green line represents the bytes ingested successfully into the stream, and the red line shows the percentage of traffic that is throttled.

First, we will start with a stream provisioned with five shards. For the first three minutes, we are sending a load of 4 MB/s. You can see that the stream can handle the load.

At the time stamp 21:19, we increase the load to 12 MB/s. Now the stream cannot handle the load, and the throttles start (the red line starts climbing up to 60 percent of request being throttled).

At the time stamp 21:23, we change the stream capacity from provisioned to on-demand. You can do that on-the-fly without affecting the stream. See that it takes a very short time for the stream to handle the load when converting from one capacity mode to the other.

In a few minutes (time stamp 21:24) the throttles start to drop as the stream starts scaling up. The stream capacity doubles to 10 shards first (time stamp 21:26), and the stream keeps scaling up until each shard has a load of less than 0.5 MB/s. In this way, if the stream suddenly receives double the amount of load, then it has the capacity ready to handle it.

At the time stamp 21:26, the load in the stream is increased to 18 MB/s. You can see the green line climbing to 350,000 records – there are no throttles, and the stream ends this demo with 40 open shards. This means that if suddenly the stream receives a load of 40 MB/s, then it could handle it with no problem.

Available Now!
The Amazon Kinesis Data Streams On-demand is available globally in all commercial Regions.

You can learn more about the capacity modes in the Amazon Kinesis Data Streams Developer Guide.

— Marcia

New – Amazon EBS Snapshots Archive

2021-11-30 Sébastien Stormacq

Post Syndicated from Sébastien Stormacq original https://aws.amazon.com/blogs/aws/new-amazon-ebs-snapshots-archive/

I am pleased to announce the availability of Amazon EBS Snapshots Archive, a new storage tier for the long-term retention of Amazon Elastic Block Store (EBS) snapshots of your EBS volumes.

In a nutshell, EBS is an easy-to-use high-performance block storage service for your Amazon Elastic Compute Cloud (Amazon EC2) instances. An EBS volume mounted to your EC2 instances lets you boot an operating system and store data for your most performance-demanding workloads. You may use EBS snapshots to create point-in-time copies of your volume data. The first snapshot of a volume contains all of the data written into that volume. Subsequent snapshots are incremental. Snapshots are stored on Amazon Simple Storage Service (Amazon S3), and they may be shared between AWS accounts and AWS Regions.

The ability to take frequent snapshots and easily restore volumes makes EBS snapshots an obvious choice for your data management strategy, alongside other backup options. The incremental nature of snapshots makes them cost-effective for daily and weekly backups that need immediate restores. However, you were telling us that business compliance and regulatory needs have meant that you needed to retain EBS snapshots for longer periods of time (months or years). For example, snapshots taken at the end of a project, or snapshots for test and development preserved for future project releases. The vast majority of these snapshots are taken and never read. For these snapshots, you are looking to lower your storage costs. Today, to benefit from lower storage costs, you may have written complex scripts involving temporary EC2 instances to restore snapshots, mount the corresponding volumes, and transfer the data to lower-cost storage tiers, such as Amazon Glacier.

EBS Snapshots Archive provides a low-cost storage tier to archive full, point-in-time copies of EBS Snapshots that you must retain for 90 days or more for regulatory and compliance reasons, or for future project releases. Now, you can easily archive and manage EBS Snapshots, thereby eliminating the need for custom scripts and third-party tools to manage these snapshots. This lets you move your rarely accessed snapshots to EBS Snapshots Archive to achieve up to 75% lower storage costs, and avoid licensing costs for third-party tools. Furthermore, you can retrieve an archived snapshot within 24-72 hours, and, once restored, use the snapshot to recover an EBS volume.

As per usual, let me show you how it works.

How to Get Started
I have a snapshot available in the US East (N. Virginia) Region, and I want to archive this snapshot for compliance reasons. I open the AWS Management Console, navigate to EC2, then to Snapshots. I select the snapshot I want to archive, and select the Actions menu. I select the Archive snapshot menu option.

I carefully read the confirmation message :-), and I select Archive snapshot.

I may monitor the progress of the archive operation with the new Storage Tier tab at the bottom of the screen. After some time, depending on the size of the snapshot, the Tiering status becomes Archival completed.

Archived snapshots stay visible in the console. The new Storage tier column indicates the tier used for storage (Standard or Archive).

How do I Restore a Volume?
Restoring a volume from EBS Snapshots Archive is a two-step process. First, I retrieve the snapshot from EBS Snapshots Archive to its original snapshot ID, using RestoreSnapshotTier API call or the management console. It takes between 24-72 hours to retrieve the snapshot from the archive, depending on the snapshot size. Once retrieved, the snapshot appears as a regular snapshot on my account. At this stage, I hydrate the retrieved snapshot into an EBS volume using the default snapshot restore or Fast Snapshot Restore (FSR) for expedited restores, just like usual.

A CloudWatch event is generated when the snapshot is restored. You may listen to this event to avoid pulling the status with the API.

A CreateVolume API call on an archived snapshot will fail. You must restore a snapshot from archive before you use it to create a volume.

Using the AWS Management Console, I select the snapshot that I want to restore, I select the Actions menu, and then I select the Restore snapshot from archive menu option.

I have the choice to restore the snapshot permanently, or just temporarily. At the end of the temporary duration, the standard tier snapshot is deleted, and only the archive is preserved.

After a while, depending on the snapshot size, the archive is restored to standard storage and may be used to recreate a volume, just like usual. I may monitor the progress of the retrieval and the lifetime for temporarily restored archives in the new Storage tier tab in the bottom half of the screen. Temporary restored snapshots may be kept for up to 180 days.

Pricing and Availability
EBS Snapshots Archive is available for you today in 17 AWS Regions. At the time of launch, it is not available in the two Regions in China, Asia Pacific (Seoul), Asia Pacific (Osaka), Canada (Central), and South America (São Paulo).

As per usual, you pay as-you-go, with no minimum or fixed fees. There are two metrics that influence EBS Snapshots Archive billing: data storage and data retrieval. We charge you $0.0125 per GB-month of stored data and $0.03 per GB retrieved. You are charged for a 90-day period at minimum. This means that if you delete a snapshot archive or permanently restore it less than 90 days after creation, then we charge for the full 90-day period. The EBS pricing page has the details.

Go ahead and start to configure your long term storage for EBS snaphots today.

— seb

New – Recycle Bin for EBS Snapshots

2021-11-30 Jeff Barr

Post Syndicated from Jeff Barr original https://aws.amazon.com/blogs/aws/new-recycle-bin-for-ebs-snapshots/

It is easy to create EBS Snapshots, and just as easy to either delete them manually or to use the Data Lifecycle Manager to delete them automatically in accord with your organization’s retention model. Sometimes, as it turns out, it is a bit too easy to delete snapshots, and a well-intended cleanup effort or a wayward script can sometimes go a bit overboard!

New Recycle Bin
In order to give you more control over the deletion process, we are launching a Recycle Bin for EBS Snapshots. As you will see in a moment, you can now set up rules to retain deleted snapshots so that you can recover them after an accidental deletion. You can think of this as a two-level model, where individual AWS users are responsible for the initial deletion, and then a designated “Recycle Bin Administrator” (as specified by an IAM role) manages retention and recovery.

Rules can apply to all snapshots, or to snapshots that include a specified set of tag/value pairs. Each rule specifies a retention period (between one day and one year), after which the snapshot is permanently deleted.

Let’s Recycle!
I open the Recycle Bin Console, select the region of interest, and click Create retention rule to begin:

I call my first rule KeepAll, and set it to retain all deleted EBS snapshots for 4 days:

I add a tag (User) to the rule, and click Create retention rule:

Because Apply to all resources is checked, this is a general rule that applies when there are no applicable rules that specify one or more tags.

Then I create a second rule (KeepDev) that retains snapshots tagged with a Mode of Dev for just one day:

If two different tag-based rules match the same resource, then the one with the longer retention period applies.

Here are my retention rules:

Here are my EBS snapshots. As you can see, the first three are tagged with a Mode of Dev:

In an effort to save several cents per month, I impulsively delete them all:

And they are gone:

Later in the day, a member of my developer team messages me in a panic and lets me know that they desperately need the latest snapshot of the development server’s code. I open the Recycle Bin and I locate the snapshot (DevServer_2021_10_6):

I select the snapshot and click Recover:

Then I confirm my intent:

And the snapshot is available once again:

As has always been the case, Fast Snapshot Restore is disabled when a snapshot is deleted. With this launch, it will remain disabled when a snapshot is restored.

All of this functionality (creating rules, listing resources in the Recycle Bin, and restoring them) is also available from the CLI and via the Recycle Bin APIs.

Things to Know
Here are a couple of things to know about the new Recycle Bin:

IAM Support – As I mentioned earlier, you can use AWS Identity and Access Management (IAM) to grant access to this feature, and should consider creating an empowered user known as the Recycle Bin Administrator.

Rule Changes – You can make changes to your retention rules at any time, but be aware that the rules are evaluated (and the retention period is set) when you delete a snapshot. Changing a rule after an item has been deleted will not alter the retention period for the item.

Pricing – Resources that are in the Recycle Bin are charged the usual price, but be aware that creating rules with long retention periods could increase your AWS bill. On a related note, be sure that keeping deleted snapshots around does not violate your organization’s data retention policies. There is no charge for deleting or recovering a resource.

In the Bin – Resources in the Recycle Bin are immutable. If a resource is recovered, all of its existing metadata (tags and so forth) is also recovered intact.

Recycling – We will do our best to recycle all of the zeroes and all of the ones once when a resource in your Recycle Bin reaches the end of its retention period!

— Jeff;

Announcing Pull Through Cache Repositories for Amazon Elastic Container Registry

2021-11-29 Steve Roberts

Post Syndicated from Steve Roberts original https://aws.amazon.com/blogs/aws/announcing-pull-through-cache-repositories-for-amazon-elastic-container-registry/

Organizations, development teams, and individual developers who have chosen to use containers to host their applications may prefer, or perhaps are required, to source all images from Amazon Elastic Container Registry to take advantage of its high availability and security. To satisfy those requirements, customers have needed to take on the burden of manually pulling images from public registries into their private Amazon Elastic Container Registry repositories, and then keeping them in sync. This adds operational complexity and maintenance costs, thereby impacting developer productivity. Additionally, some registries may have limitations or restrictions on how frequently images can be downloaded. When reached, those limitations then begin impacting developers and the release velocity of their business, due to build errors when image pulls are throttled, or even rejected.

Today, we have announced pull through cache repository support in Amazon Elastic Container Registry, for publicly accessible registries that do not require authentication. Pull through cache repositories offer developers the improved performance, security, and availability of Amazon Elastic Container Registry for container images that they source from public registries. Images in pull through cache repositories are automatically kept in sync with the upstream public registries, thereby eliminating the manual work of pulling images and periodically updating.

Pull through cache repositories provide the benefits of the built-in security capabilities in Amazon Elastic Container Registry, such as AWS PrivateLink enabling you to keep all of the network traffic private, image scanning to detect vulnerabilities, encryption with AWS Key Management Service (KMS) keys, cross-region replication, and lifecycle policies. When enabled, cross-region replication is designed to automatically distribute updated images to additional Regions. All you need to do is update the pull URL so that the image is downloaded from the relevant Region.

When consuming images from pull through cache repositories, download throttling is also no longer a problem for developers, as well as the build and deployment infrastructure that supports their applications. While Amazon Elastic Container Registry is designed to automatically keep the cache repository in sync, you can also manually sync a repository at any time. And, if you wish, the automatic sync can be turned off.

Getting Started with Amazon Elastic Container Registry Pull Through Cache Repositories
Setting up pull through cache repositories is a simple process. For the following example, I’m using Amazon Elastic Container Registry Public in the South America (São Paulo) Region as my upstream registry.

First, I must modify my private registry’s settings to add a rule that references the upstream, publicly accessible registry (multiple rules can be set if I need additional upstream registries). In the Amazon Elastic Container Registry console, I begin by selecting Private registry, and then select Edit in the Pull through cache panel to change settings. This takes me to the Pull through cache configuration page, where I select Add rule.

On the Create pull through cache rule page, I choose the upstream registry, which is ECR Public in this example. I also must set a namespace that I’ll use when referring to images in my pull commands. For this example, I’ll accept the suggested namespace, ecr-public.

Selecting Save takes me back to the Pull through cache configuration page where my newly configured rule is listed. Now, I’m ready to utilize the cache repository when pulling images.

To reference an image, I must specify the namespace that I chose in the pull URL, using the URL format <accountId>.dkr.ecr.<region>.amazonaws.com/<namespace>/<sourcerepo>:<tag>. When images are pulled, the cache repository associated with the namespace is checked for the image. In my case, the cache repository doesn’t exist yet, but I don’t have to create it myself. The image is fetched from the upstream repository in the public registry associated with the namespace, and then stored in a new cache repository that is created for me automatically.

In the command prompt session below, I first authenticate with my registry, and then pull an Amazon Linux 2 image from Amazon Elastic Container Registry Public into the cache:

C:\ aws ecr get-login-password --region sa-east-1 | docker login --username AWS --password-stdin 111122223333.dkr.ecr.sa-east-1.amazonaws.com/ecr-public
Login Succeeded
C:\ docker pull 111122223333.dkr.ecr.sa-east-1.amazonaws.com/ecr-public/amazonlinux/amazonlinux:latest
latest: Pulling from ecr-public/amazonlinux/amazonlinux
e11e8d46e102: Pull complete
Digest: sha256:916dbbb288948b54c94b5b9f0769085aa601d4468d099e90d8a7da5cfa551b50
Status: Downloaded newer image for 111122223333.dkr.ecr.sa-east-1.amazonaws.com/ecr-public/amazonlinux/amazonlinux:latest
111122223333.dkr.ecr.us-west-2.amazonaws.com/ecr-public/amazonlinux/amazonlinux:latest

In my Amazon Elastic Container Registry console, a check of the Repositories page shows that a new private repository has been created containing the image I pulled, together with an indication that a pull through cache is active.

Working with images and the pull through cache repository is just as straightforward in Dockerfiles. All I need do is reference the image I need using the namespace in the pull URL. If the image is not in the cache repository, then it will be pulled and stored there for me. Cached images are checked once per 24 hours to verify if the cached image is the latest version, with the timer based off the last pull time of the cached image.

Start using Pull through Cache Repositories Today
Pull through cache repositories for Amazon Elastic Container Registry are available for you to take advantage of today in all commercial AWS Regions. There is no charge for using pull through cache repositories, only standard Amazon Elastic Container Registry pricing for storage and data transfer charges applies. You can find more details on pricing at the Amazon Elastic Container Registry pricing page. Learn more about pull through cache repositories in the Amazon Elastic Container Registry User Guide, and get started today.

— Steve

Preview – AWS IoT RoboRunner for Building Robot Fleet Management Applications

2021-11-29 Channy Yun

Post Syndicated from Channy Yun original https://aws.amazon.com/blogs/aws/preview-aws-iot-roborunner-for-building-robot-fleet-management-applications/

In 2018, we launched AWS RoboMaker, a cloud-based simulation service that enables robotics developers to run, scale, and automate simulation without managing any infrastructure. As we worked with robot developers and operators, we have repeatedly heard that they face challenges in operating different robot types in their automation efforts, including autonomous guided vehicles (AGV), autonomous mobile vehicles (AMR), and robotic manipulators.

Many customers choose different types of robots – often from different vendors in a single facility. Robot operator want to access the unified data required to build applications that work across a fleet of robots. However, when a new robot is added to an autonomous operation, complex and time-consuming software integration work is required to connect the robot control software to work management systems.

Today, we are launching a public preview of AWS IoT RoboRunner, a new robotics service that makes it easier for enterprises to build and deploy applications that help fleets of robots work seamlessly together. AWS IoT RoboRunner lets you connect your robots and work management systems, thereby enabling you to orchestrate work across your operation through a single system view.

This new service builds on the same technology used in Amazon fulfillment centers, and now we are excited to make it available to all developers to build advanced robotics applications for their businesses.

AWS IoT RoboRunner in Action
You can create a single facility (e.g., site name and location) in the AWS Management Console to get started with AWS IoT RoboRunner. Behind the scenes, AWS IoT RoboRunner automatically creates centralized repositories for storing facility, robot, destination, and task data. Then, the robots working on this site are setup as a “Fleet”, and each individual robot is setup in AWS IoT RoboRunner as a “Robot” within a fleet.

You can download the Fleet Gateway Library to develop integration codes for connecting your robots and WMS systems with AWS IoT RoboRunner to send and receive data from individual robot fleets. You can also develop the first robotics management application using the Task Manager Library and deploy Task Manager codes as an AWS Lambda function and Fleet Gateway codes on-premises as an AWS IoT Greengrass component.

To enable a single-system view of the robots, status of the systems, and progress of tasks on the same interface, AWS IoT RoboRunner provides APIs that let you build a user application. AWS IoT RoboRunner provides sample applications for allocating tasks to robot fleets so that you can get started quickly. You can customize the task allocation code with business requirements that align to your use case.

Learn more by reading Getting started with AWS IoT RoboRunner in the AWS IoT RoboRunner Developer Guide. Watch a quick introductory video about AWS IoT RoboRunner for more information.

Try Public Preview Now
AWS IoT RoboRunner is now available in public preview, and you can start using them today in the US East (N. Virginia) and Europe (Frankfrut) Regions. There will be no additional cost to use this feature during the preview period.

You can send feedback to [email protected], the AWS forum for AWS IoT, or through your usual AWS Support contacts.

— Channy

Heard in the halls of Web Summit 2021

2021-11-26 João Tomé

Post Syndicated from João Tomé original https://blog.cloudflare.com/web-summit-2021-internet/

Heard in the halls of Web Summit 2021

Global in-person events were back in a big way at the start of November (1-4) in Lisbon, Portugal, with Web Summit 2021 gathering more than 42,000 attendees from 128 countries. I was there to discover Internet trends and meet interesting people. What I saw was the contagious excitement of people from all corners of the world coming together for what seemed like a type of normality in a time when the Internet “is almost as important as having water”, according to Sonia Jorge from the World Wide Web Foundation.

Here’s some of what I heard in the halls.

With a lot happening on a screen, the lockdowns throughout the pandemic showed us a glimpse of what the metaverse could be, just without VR or AR headsets. Think about the way many were able to use virtual tools to work all day, learn, collaborate, order food, supplies, and communicate with friends and family — all from their homes.

While many had this experience, many others were unable to, with some talks at the event focusing on the digital divide and how “Internet access is a basic human right”, according to the grandson of Nelson Mandela — we interviewed him, and you can watch the conversation below.

The future already has some paths laid out, and many were discussed at the event.

The pandemic helped to accelerate most of them, especially by bringing more people (in some countries) to the digital world.

The CPO of Meta, Chris Cox, shared how the company previously known as Facebook has some ideas about the future of augmented reality, and how they want to see those ideas play out in the next five to 10 years. “We want to get the conversation going,” he said.

Also present at the event was Jon Vlassopulos, Global Head of Music, Roblox. He explained how virtual concerts on the video game platform could be the future of music performances, and even bring free tickets to fans of famous music stars like Adele. Stars like Zara Larsson, KSI and Ava Max have already performed on Roblox and “they’re making big money from selling digital merchandise”.

On the other hand, Paddy Cosgrave, CEO of Web Summit, says that there’s something magical about in-person big events that can’t be replicated in full online events. However, the real and virtual world can complement each other — it was announced that CES 2022 will use a combination of Web Summit online and offline software.

Web3 was another big part of the discussion, sometimes in clear sight, other times embedded in the many conversations about blockchain, NFTs and cryptocurrencies, and as a vision for a decentralized web (we’re actually working on that).

Speakers also focused on data privacy and security, ethics in AI and data protection. Ownership to the user and sovereignty were topics discussed and emphasized by Sir Tim Berners-Lee on the last day of the event.

The workplace was also a popular topic, as well as the changes it underwent in the past couple of years. We heard about the importance of diversity in the workplace, as well as the future of work — is it going to be flexible, hybrid, full remote or something in between? Speakers also mentioned The Great Resignation and the reset of people’s and organizations’ mindsets.

Using AI to hire and motivate people was also in the air, as well as big topics like the digitalization of healthcare, mental health, behaviour changes in humans (young and adult) who are more and more on the Internet and even the decentralization of financial services.

And here are some examples of the different speakers at the event we talked to:

Vice-Admiral Gouveia e Melo: Vaccination, misinformation and leadership

Portuguese Navy officer and coordinator of the Task Force for the Portugal COVID-19 vaccination plan

Portugal has achieved an 86% vaccination rate on the vice-admiral’s watch. He brought a sense of mission to a task that involved organization, focus and the use of both digital and communication tools.

The country started the vaccination process late but is now one of the countries with a higher vaccination rate in the world. We talked with the vice-admiral about how the Internet helped, but also how it created problems related to disinformation and misinformation, and we asked about the dangers of controlling speech online. Finally, we asked for bits of leadership advice.

Sonia Jorge: The need for Internet — affordable, fast and for everyone

Executive Director World Wide Web Foundation (Alliance for Affordable Internet)

“The Internet is now an essential public good that everybody needs at this time just like we need to drink water or to have electricity and shelter. We should do more to bring everyone into the digital society.”

In some countries around the world Internet access is very limited. In some places people have to go to a particular plaza to have access to the Internet — five years ago John Graham-Cumming saw something similar in Cuba. Sonia Jorge knows that very well. She is trying to bring affordable Internet to everyone and that challenge is more difficult than it appears.

She explains that the world is far behind in the UN’s goals for Internet access — today only about half of the earth’s population has any Internet access at all. But many of those who have access to the World Wide Web have limited possibilities to be online: “some have access once a month, for example.” So the digital divide is real, and it “should worry everyone”.

The pandemic caused health and economic difficulties that didn’t help the mission of bringing good, fast and reliable Internet to everyone. Nevertheless, Sonia — who is Portuguese and moved to the US to study when she was 17 — saw that many African countries like Nigeria began to realize that the Internet is really important for knowledge and also for the possibilities it opens in terms of cultural, financial and societal growth.

Sonia also highlights that there is a big disparity in the world between men and women in terms of Internet access.

David Kiron: The future of work and how AI (and philosophy) can help

Editorial director of MIT Sloan Management Review

Technology will play a significant role in the future of work. In a way, that “future” is already here, but isn’t evenly distributed — and researchers are just beginning to study it. David Kiron goes on to explain the challenge for some people to be “really seen by their leadership when you’re not in the office.”

The former senior researcher at Harvard Business School tells us how companies started valuing employees even more through the pandemic. There’s also an opportunity for different ways of work interaction through digital tools — “Zoom calls aren’t it.” He’s also worried that the pandemic caused a great reset that is driving many out of the workforce entirely: “There’s a trend of working moms opting out,” for example.

About the metaverse and a universe of universes: “If tech leaders spent more time reading philosophy they might have a better sense of where the world is going (…) more and more leaders of companies are taking on the philosopher’s role.”

And how can AI help? “Once you get AI going in a company we saw in our new study that there’s a big bump in morale, collaboration, learning and people’s sense on what they should be doing”. AI can also help better identify talent and match candidates to skills that are already represented in a company, but he also highlights that “humans play a role in all the stages of the hiring and working process.”

David Kiron explains that “if you’re not asking the right questions to your AI teams you’re going to be behind other companies that are doing better questions”. He adds that AI can help with performance, but it also helps “redefine what performance means in your organization by finding other metrics to look at.”

Ana Maiques: neuroscience & women in tech

Co-founder and CEO of neuroscience-based medical device company Neuroelectrics

We talked to Ana about the future of the Internet. She thinks moving forward there will be more fluid interfaces — not only limited to computers and smartphones, but we will have different devices that go beyond VR headsets and that will lead to new types of interactions. In the neuroscience field, she has big hopes in the technology that Neuroelectrics, her company, is developing in Barcelona, Spain. They work with devices that use non-invasive transcranial electrical stimulation to treat the brain in diseases like epilepsy, depression and Alzheimer.

Neuroelectrics is also developing a process called digital copy (for better personalized treatments) that could be useful in the future if someone develops one of these problems. But she says humankind is still very far from the dangers of something like a mind-reading device or the possibility of reading and downloading thoughts and dreams: “it’s fun to think of science fiction possibilities, but we need to act now on things and problems that are affecting us today.”

She also talks about the difficulties of being a woman in the tech business and raising money. “But little by little I see more women and that’s why it’s important to get out there and explain to women that they can do it.”

Siyabulela Mandela: The Internet is a human right

Director for Africa Journalists for Human Rights

The grandson of Nelson Mandela is on a mission to help journalists in Africa to be free to publish human rights stories. He explains how the Internet is critical for this mission and “a human rights issue”. Not only does the Internet give communities access to trustworthy information, but it also helps them become aware of their rights, gives access to financial tools and allows them to grow in our era.

He also highlights how the Internet can be misused, for example when it becomes a vehicle for misinformation, or when governments shut down Internet access to control communities — in Sudan the Internet has been cut off since October 25, 2021 (you can track that information on Cloudflare Radar).

Carlos Moedas: The light (and innovation) in Lisbon

Newly elected Mayor of Lisbon; previous European Commissioner for Research, Science and Innovation

Why is Lisbon attracting so many tech companies and talent? Carlos Moedas welcomes Cloudflare to his city — we’re growing fast in the city, and we have more than 80 job openings in the country. He also talks about why Portugal’s capital is so special and should be considered by company leaders who want to grow innovative companies. Paddy Cosgrave, from the Web Summit, told us something similar four weeks ago.

The ambition? “Make Lisbon the capital of innovation of the world” or, at least, of Europe. The new mayor also has a project called Unicorn Factory to achieve just that.

Sudarsan Reddy: Why is Cloudflare Tunnel relevant?

Cloudflare engineer from the Tunnel Team

Also, at the event was our very own engineer Sudarsan Reddy (based in Lisbon). We asked him some questions about Cloudflare Tunnel, our tunneling software that lets you quickly secure and encrypt application traffic to any type of infrastructure, so you can hide your server IP addresses, block direct attacks, and get back to delivering great applications.

Sudarsan focuses on what Tunnel is, why it is relevant, how it works and examples of situations where it can make a difference.

Yusuf Sherwani: Addiction treated online

Co-founder & CEO, Quit Genius

Yusuf graduated as a doctor from Imperial College School of Medicine, in London, but joined two passions, healthcare and technology, when he co-founded Quit Genius. He explains how in just 18 months the pandemic accelerated the adoption of digital health by 10 years, and there’s no going back. “The Internet enables people to unlock improvements to their lives, and digital healthcare went from being convenient to a necessity”.

We dig into the benefits of digital healthcare, but also the scrutiny that is needed in technology, now that it is more powerful than ever and cemented in people’s lives. Yusuf also gives examples of how his digital clinic is helping people in treating tobacco, vaping, alcohol, and opioid addictions.

Yusuf has co-authored 12 peer-reviewed studies on behavioural health and substance addictions. He was featured on the Forbes 30 Under 30 List of 2018 and in Fast Company’s 100 Most Creative People in Business.

American futurist and Professor of Practice, AI & Innovation with Imperial College Business School in London

David sums up how the pandemic has affected people’s relationship with technology: “Everyone is tired of Zoom calls, but the convenience opened people’s minds”.

We also talk about the digital divide, about human-centered ways of working with AI, and we also address the potential in VR and AR and how nobody saw the sharing economy coming 20 years ago and, now, “it’s incredible to see how people embraced blockchain and the digitalization of financial services”.

Dame Til Wykes: The mental health discussion went viral

Professor of Clinical Psychology and Rehabilitation at King’s College London, Director of the NIHR Clinical Research Network: Mental Health

As someone with experience in the psychology field for more than 50 years, Dame Til Wykes still had to learn new ways of engaging with patients throughout the pandemic — and even learn which buttons to push on a computer to make Zoom calls. COVID-19 and the hardships of the pandemic made people more aware and ready to talk about their mental health issues, like anxiety or depression. But the pandemic wasn’t the same for everyone and Dame Til Wykes is worried about some of the effects, “most of them remain to be seen”.

Remote consultations were a big help, but she reminds us that in her field it is important to see the whole person and not just the face — for example, “if someone is tapping a foot nervously while giving us a smile, that tells us something that we cannot see in a Zoom call”. She also mentions the adoption of meditation apps bringing a form of help to some was another positive trend in this difficult period, as well as the reset button the pandemic brought to some people’s lives.

Announcing the 2021 Metasploit Community CTF

2021-11-16 Caitlin Condon

Post Syndicated from Caitlin Condon original https://blog.rapid7.com/2021/11/16/announcing-the-2021-metasploit-community-ctf/

Announcing the 2021 Metasploit Community CTF

It’s time for another Metasploit community CTF! Last year’s beginner-friendly CTF attracted a wider range of audiences and skill levels than in previous years, so we’re replicating our previous game architecture. Players will attack a single Linux target, we’ve spread prizes out across 15 teams, and the Metasploit Framework teams have devised a variety of challenges that aim to help infosec newcomers build practical skills. While there are some challenges that are intended to be a bit more difficult, we’ve designed the majority of them for beginner audiences. Hint: We’ve arranged challenges by port number according to difficulty this year. The higher the port number, the harder the challenge. If you want to start out with easier challenges, start by targeting services that run on lower port numbers.

As always, teams are encouraged! There is no cap on the number of players who can join a team. Read on for full competition details, and join the #metasploit-ctf channel on Slack to start building your team.

Big thanks to TryHackMe and CTFd for powering this year’s game!

TL;DR overview

There are 1,000 team spots available; both individuals and teams of multiple members are allowed. There is no limit on the number of players who can be on a team. Please note: Those playing as a team only need to register ONE team upon signup. Help us make the competition accessible to as many players as possible by organizing with your fellow team players ahead of time and creating only a single team for all of you. If others want to join your team later, that’s no problem. See the FAQ at the end of this post for details.

Important dates (all times in US Central Standard Time):

Initial team registration opens for the first 750 teams on Monday, November 22, 2021 at 2:00 PM CST (UTC-6).
CTF game play begins on Friday, December 3, 2021 at 11:00 AM CST (UTC-6). When the CTF officially begins, we will open registration for an additional 250 teams.
The CTF ends on Monday, December 6, 2021, at 11:00 AM CST (UTC-6).

Our goal in putting on CTFs is to enable relationship building and knowledge sharing across the security community. To further emphasize community partnership and camaraderie over purely monetary gain, we’ve increased the value of CTF prizes in the past few years but spread those prizes out over many teams.

The 15 teams with the highest point totals when game play ends will receive Amazon Gift Cards (only one per team!).
We’re partnering with TryHackMe to offer other prizes to the 3 teams that achieve the highest point total the fastest! See the prize section at the end of this post for details.

You can see results and statistics from the last Metasploit CTF here.

Questions?

To report technical issues during the competition or to discuss play with your teammates and community members, join us in the #metasploit-ctf channel on Slack. The Metasploit team will be occasionally available on Slack in case of technical issues, but please be advised that Rapid7 staff members will not respond to DMs with requests for hints or help with flags or MD5 hash submission.

A few notes on the technicalities of game play:

You must use a valid email for registration. You’ll need to verify your email upon signup; email is also how we communicate with winners.
We’ve run this CTF for several years now, and we’ve yet to encounter an actual technical issue with a flag (other than the occasional bit of latency, which we try to avoid by being thoughtful about challenge development). If your MD5 hash submission isn’t being accepted, it is because the hash is incorrect. Keep trying! There is no penalty for wrong answers.
The scoreboard is not a target. Nothing except the official CTF target is a target. Please don’t attack anything except the target box.
When game play starts, provisioning is first come, first served. It may take a few minutes. Be patient! If you’ve been waiting for more than half an hour for your network to be provisioned, you can reach out to us on Slack.
Please, no spoilers in Slack channels or other public places. Everyone learns at their own pace, so don’t ruin the game for others. We may kick you out of Slack if you post flag spoilers. Harassment of other players and community members won’t be tolerated.
Metasploit Slack messages archive automatically after a certain threshold (this is just how our implementation of Slack works). If you’re worried about continuous access to your conversations, you may want to hold them outside of Metasploit’s Slack channel.
Higher port numbers signify more advanced challenges.

2021 Metasploit Capture the Flag: Official rules

No purchase is necessary to participate. Only the first 1,000 registrants (teams or individuals) will be able to participate. For further information, see the full Contest Terms here.

To enter

Starting Monday, November 22, 2021 at 2:00 PM CST (UTC-6), the first 750 teams can register here. On Friday, December 3, 2021, at 11:00 AM CST (UTC-6) the CTF will begin. An additional 250 teams can register here when game play begins. Please note: Only ONE team needs to be created for all players. Teammates can and should share their team credentials (and/or their links to invite new players). Please ensure you enter your email address correctly when registering: You will need to verify your email upon registration, and we will use email to communicate with winners about prizes.

Play starts Friday, December 3, 2021 at 11:00 AM CST (UTC-6). When play starts, players should use the instructions on the Control Panel to connect to the Kali Linux jump box. From there, players can attack the vulnerable target environment to find flags. All flags are PNG images.

When a flag is found, players should submit the MD5 hash to the Challenges section of the scoreboard. If the MD5 hash is correct, points will be awarded. There is no penalty for wrong answers.

The competition will open on Friday, December 3, 2021 at 11:00 AM CST (UTC-6) and close on Monday, December 6, 2021, at 11:00 AM CST (UTC-6). The Contestants with the fifteen (15) highest point totals at the end of the contest will each receive one (1) $100 USD Amazon gift card. In addition, the three (3) Contestants with the highest point total at the end of the Contest will also receive the prizes listed below and will be announced in an official blog post following the Contest. In the event of a tie, the Contestant who reached that score first will be the winner.

You may participate as an individual or as a team. However, only ONE cash prize can be awarded for each winning team; therefore, if you are participating as a team, please be aware that we cannot offer cash prizes to each team member. (Any further method used to determine who among your teammates takes home the CTF spoils is up to you. We hear thumb wars and structured rock/paper/scissors competitions are effective.)

Prizes

Only the prizes listed below will be awarded as part of the competition. Prizes are not transferable or redeemable for cash. Rapid7 reserves the right to make equivalent substitutions as necessary, due to circumstances not under its control. Please allow several weeks for delivery of any prize.

To reiterate, only ONE cash prize can be awarded for each winning team; therefore, if you are participating as a team, please be aware that we will not offer cash prizes to each team member. How you divide spoils among your team is up to you!

Place	Prize	ARV
1st	$100 Amazon Gift Card (1), ($20) Two Month THM Premium Voucher per team member* + ($60) Throwback Voucher per team member*	180 USD
2nd	$100 Amazon Gift Card (1), ($20) Two Month THM Premium Voucher per team member* + ($20) Swag Voucher per team member*	140 USD
3rd	$100 Amazon Gift Card (1), ($10) One Month THM Premium Voucher per team member*	110 USD
4th to 15th	$100 Amazon Gift Card (1)	100 USD

*TryHackMe vouchers will be provided for up to 15 members per team

Contestants with the highest point total at the end of the Contest will also be announced in an official blog post following the Contest.

FAQ

What’s the difference between an account and a team? Anyone can create an account, and accounts are unlimited. Teams, on the other hand, are limited to 1,000. To actually play in the CTF, you need to belong to a team — either by yourself or with your teammates.

How do teams work? When CTF registration opens on November 22, you’ll see a page that guides you through creating an account, verifying your email, and finally, asking you to either create a new team OR join an existing team. If you already have a team you know would like to play together, designate ONE team captain to create your team and a team password. Team captains (or whoever created the team) can then share the team password with all team members. Note that a team password is different from an account password.

There is also a new feature this year that allows team captains the option to share an invite link instead of (or in addition to) a password, so a new user can just follow that link to be automatically added to a team (once they have a valid account).

Please note: Team captains and team members should ONLY share team invitation links or team passwords with people they trust.

What if I want to join a team later, or if someone else wants to join my team later? That’s OK! To join someone else’s team, ask them for their team name and password, OR their team invite link. You can then create an account if you don’t already have one (again, accounts are unlimited; it’s only teams that are limited) and input their team name and credentials. If you’d like someone to join your team, you can simply share your team name and password with them.

I already created or joined a team, but I want to join a different one. What do I do? If you are the team captain (you’re the one who created the team), you can disband the team by clicking the trash can icon on your team settings page. This will help out the community by freeing up a team registration slot for someone else. If you’re not the team captain, you can also leave a team by clicking the arrow icon on your team settings page. Both options are available at any point before the contest starts on December 3. Please note that teams cannot be adjusted once play has started.

Is there a maximum number of players allowed on a team? Nope! Feel free to team up with as many friends and strangers as you like — just remember that only one Amazon Gift Card can be awarded to each winning team, and TryHackMe vouchers are limited to a maximum of 15 per winning team. How you divide prizes if you win is totally up to you.

How do I connect to my CTF environment? Starting Friday, December 3, 2021, at 11:00 AM CST (UTC-6), you can log in here and follow the directions on your Control Panel to access the CTF environment.

Do I need to use Metasploit to solve the CTF challenges? No. Using Metasploit is an option for some challenges, but the CTF was not engineered to be Metasploit-specific.

I am not receiving points when I submit my flag. What’s wrong? You are not submitting the correct MD5 hash. This means you still have some work to do to solve the challenge correctly. Keep trying! There is no penalty for wrong answers.

Can you give me a hint about $FLAG? No, sorry. That would spoil the fun!

I’m having technical difficulties or I think I’ve found a bug! Can I DM someone for help? In general, Rapid7 staff will not respond to DMs requesting help with flag discovery, exploitation, or anything else related to the workings of the game. If you think you have discovered a bug in the CTF environment that is affecting your ability to play, you can reach out to a designated admin in the #metasploit-ctf channel on Slack, but we strongly recommend you check the pinned Slack messages to see if your question has already been addressed. If we think the behavior you’re experiencing is unexpected, we’ll respond and take a look, but in general, you should expect to proceed without input or attention from us.

My target or jump box reverted! What happened? Either you or one of your teammates clicked the “Revert” button from the control panel. Your boxes will not revert on their own, and Rapid7 staff will not revert boxes for you unless specifically requested.

Grab your friends, grab your parents, meet new folks, and good luck!

Join our community CTF!

Registration opens Monday, November 22

Create a serverless event-driven workflow to ingest and process Microsoft data with AWS Glue and Amazon EventBridge

2021-11-09 Venkata Sistla

Post Syndicated from Venkata Sistla original https://aws.amazon.com/blogs/big-data/create-a-serverless-event-driven-workflow-to-ingest-and-process-microsoft-data-with-aws-glue-and-amazon-eventbridge/

Microsoft SharePoint is a document management system for storing files, organizing documents, and sharing and editing documents in collaboration with others. Your organization may want to ingest SharePoint data into your data lake, combine the SharePoint data with other data that’s available in the data lake, and use it for reporting and analytics purposes. AWS Glue is a serverless data integration service that makes it easy to discover, prepare, and combine data for analytics, machine learning, and application development. AWS Glue provides all the capabilities needed for data integration so that you can start analyzing your data and putting it to use in minutes instead of months.

Organizations often manage their data on SharePoint in the form of files and lists, and you can use this data for easier discovery, better auditing, and compliance. SharePoint as a data source is not a typical relational database and the data is mostly semi structured, which is why it’s often difficult to join the SharePoint data with other relational data sources. This post shows how to ingest and process SharePoint lists and files with AWS Glue and Amazon EventBridge, which enables you to join other data that is available in your data lake. We use SharePoint REST APIs with a standard open data protocol (OData) syntax. OData advocates a standard way of implementing REST APIs that allows for SQL-like querying capabilities. OData helps you focus on your business logic while building RESTful APIs without having to worry about the various approaches to define request and response headers, query options, and so on.

AWS Glue event-driven workflows

Unlike a traditional relational database, SharePoint data may or may not change frequently, and it’s difficult to predict the frequency at which your SharePoint server generates new data, which makes it difficult to plan and schedule data processing pipelines efficiently. Running data processing frequently can be expensive, whereas scheduling pipelines to run infrequently can deliver cold data. Similarly, triggering pipelines from an external process can increase complexity, cost, and job startup time.

AWS Glue supports event-driven workflows, a capability that lets developers start AWS Glue workflows based on events delivered by EventBridge. The main reason to choose EventBridge in this architecture is because it allows you to process events, update the target tables, and make information available to consume in near-real time. Because frequency of data change in SharePoint is unpredictable, using EventBridge to capture events as they arrive enables you to run the data processing pipeline only when new data is available.

To get started, you simply create a new AWS Glue trigger of type EVENT and place it as the first trigger in your workflow. You can optionally specify a batching condition. Without event batching, the AWS Glue workflow is triggered every time an EventBridge rule matches, which may result in multiple concurrent workflows running. AWS Glue protects you by setting default limits that restrict the number of concurrent runs of a workflow. You can increase the required limits by opening a support case. Event batching allows you to configure the number of events to buffer or the maximum elapsed time before firing the particular trigger. When the batching condition is met, a workflow run is started. For example, you can trigger your workflow when 100 files are uploaded in Amazon Simple Storage Service (Amazon S3) or 5 minutes after the first upload. We recommend configuring event batching to avoid too many concurrent workflows, and optimize resource usage and cost.

To illustrate this solution better, consider the following use case for a wine manufacturing and distribution company that operates across multiple countries. They currently host all their transactional system’s data on a data lake in Amazon S3. They also use SharePoint lists to capture feedback and comments on wine quality and composition from their suppliers and other stakeholders. The supply chain team wants to join their transactional data with the wine quality comments in SharePoint data to improve their wine quality and manage their production issues better. They want to capture those comments from the SharePoint server within an hour and publish that data to a wine quality dashboard in Amazon QuickSight. With an event-driven approach to ingest and process their SharePoint data, the supply chain team can consume the data in less than an hour.

Overview of solution

In this post, we walk through a solution to set up an AWS Glue job to ingest SharePoint lists and files into an S3 bucket and an AWS Glue workflow that listens to S3 PutObject data events captured by AWS CloudTrail. This workflow is configured with an event-based trigger to run when an AWS Glue ingest job adds new files into the S3 bucket. The following diagram illustrates the architecture.

To make it simple to deploy, we captured the entire solution in an AWS CloudFormation template that enables you to automatically ingest SharePoint data into Amazon S3. SharePoint uses ClientID and TenantID credentials for authentication and uses Oauth2 for authorization.

The template helps you perform the following steps:

Create an AWS Glue Python shell job to make the REST API call to the SharePoint server and ingest files or lists into Amazon S3.
Create an AWS Glue workflow with a starting trigger of EVENT type.
Configure CloudTrail to log data events, such as PutObject API calls to CloudTrail.
Create a rule in EventBridge to forward the PutObject API events to AWS Glue when they’re emitted by CloudTrail.
Add an AWS Glue event-driven workflow as a target to the EventBridge rule. The workflow gets triggered when the SharePoint ingest AWS Glue job adds new files to the S3 bucket.

Prerequisites

For this walkthrough, you should have the following prerequisites:

An AWS account
SharePoint server 2013 or later

Configure SharePoint server authentication details

Before launching the CloudFormation stack, you need to set up your SharePoint server authentication details, namely, TenantID, Tenant, ClientID, ClientSecret, and the SharePoint URL in AWS Systems Manager Parameter Store of the account you’re deploying in. This makes sure that no authentication details are stored in the code and they’re fetched in real time from Parameter Store when the solution is running.

To create your AWS Systems Manager parameters, complete the following steps:

On the Systems Manager console, under Application Management in the navigation pane, choose Parameter Store.
Choose Create Parameter.
For Name, enter the parameter name /DATALAKE/GlueIngest/SharePoint/tenant.
Leave the type as string.
Enter your SharePoint tenant detail into the value field.
Choose Create parameter.
Repeat these steps to create the following parameters:
1. /DataLake/GlueIngest/SharePoint/tenant
2. /DataLake/GlueIngest/SharePoint/tenant_id
3. /DataLake/GlueIngest/SharePoint/client_id/list
4. /DataLake/GlueIngest/SharePoint/client_secret/list
5. /DataLake/GlueIngest/SharePoint/client_id/file
6. /DataLake/GlueIngest/SharePoint/client_secret/file
7. /DataLake/GlueIngest/SharePoint/url/list
8. /DataLake/GlueIngest/SharePoint/url/file

Deploy the solution with AWS CloudFormation

For a quick start of this solution, you can deploy the provided CloudFormation stack. This creates all the required resources in your account.

The CloudFormation template generates the following resources:

S3 bucket – Stores data, CloudTrail logs, job scripts, and any temporary files generated during the AWS Glue extract, transform, and load (ETL) job run.
CloudTrail trail with S3 data events enabled – Enables EventBridge to receive PutObject API call data in a specific bucket.
AWS Glue Job – A Python shell job that fetches the data from the SharePoint server.
AWS Glue workflow – A data processing pipeline that is comprised of a crawler, jobs, and triggers. This workflow converts uploaded data files into Apache Parquet format.
AWS Glue database – The AWS Glue Data Catalog database that holds the tables created in this walkthrough.
AWS Glue table – The Data Catalog table representing the Parquet files being converted by the workflow.
AWS Lambda function – The AWS Lambda function is used as an AWS CloudFormation custom resource to copy job scripts from an AWS Glue-managed GitHub repository and an AWS Big Data blog S3 bucket to your S3 bucket.
IAM roles and policies – We use the following AWS Identity and Access Management (IAM) roles:
- LambdaExecutionRole – Runs the Lambda function that has permission to upload the job scripts to the S3 bucket.
- GlueServiceRole – Runs the AWS Glue job that has permission to download the script, read data from the source, and write data to the destination after conversion.
- EventBridgeGlueExecutionRole – Has permissions to invoke the NotifyEvent API for an AWS Glue workflow.
- IngestGlueRole – Runs the AWS Glue job that has permission to ingest data into the S3 bucket.

To launch the CloudFormation stack, complete the following steps:

Sign in to the AWS CloudFormation console.
Choose Launch Stack:
Choose Next.
For pS3BucketName, enter the unique name of your new S3 bucket.
Leave pWorkflowName and pDatabaseName as the default.

For pDatasetName, enter the SharePoint list name or file name you want to ingest.
Choose Next.

On the next page, choose Next.
Review the details on the final page and select I acknowledge that AWS CloudFormation might create IAM resources.
Choose Create.

It takes a few minutes for the stack creation to complete; you can follow the progress on the Events tab.

You can run the ingest AWS Glue job either on a schedule or on demand. As the job successfully finishes and ingests data into the raw prefix of the S3 bucket, the AWS Glue workflow runs and transforms the ingested raw CSV files into Parquet files and loads them into the transformed prefix.

Review the EventBridge rule

The CloudFormation template created an EventBridge rule to forward S3 PutObject API events to AWS Glue. Let’s review the configuration of the EventBridge rule:

On the EventBridge console, under Events, choose Rules.
Choose the rule s3_file_upload_trigger_rule-<CloudFormation-stack-name>.
Review the information in the Event pattern section.

The event pattern shows that this rule is triggered when an S3 object is uploaded to s3://<bucket_name>/data/SharePoint/tablename_raw/. CloudTrail captures the PutObject API calls made and relays them as events to EventBridge.

In the Targets section, you can verify that this EventBridge rule is configured with an AWS Glue workflow as a target.

Run the ingest AWS Glue job and verify the AWS Glue workflow is triggered successfully

To test the workflow, we run the ingest-glue-job-SharePoint-file job using the following steps:

On the AWS Glue console, select the ingest-glue-job-SharePoint-file job.

On the Action menu, choose Run job.

Choose the History tab and wait until the job succeeds.

You can now see the CSV files in the raw prefix of your S3 bucket.

Now the workflow should be triggered.

On the AWS Glue console, validate that your workflow is in the RUNNING state.

Choose the workflow to view the run details.
On the History tab of the workflow, choose the current or most recent workflow run.
Choose View run details.

When the workflow run status changes to Completed, let’s check the converted files in your S3 bucket.

Switch to the Amazon S3 console, and navigate to your bucket.

You can see the Parquet files under s3://<bucket_name>/data/SharePoint/tablename_transformed/.

Congratulations! Your workflow ran successfully based on S3 events triggered by uploading files to your bucket. You can verify everything works as expected by running a query against the generated table using Amazon Athena.

Sample wine dataset

Let’s analyze a sample red wine dataset. The following screenshot shows a SharePoint list that contains various readings that relate to the characteristics of the wine and an associated wine category. This is populated by various wine tasters from multiple countries.

The following screenshot shows a supplier dataset from the data lake with wine categories ordered per supplier.

We process the red wine dataset using this solution and use Athena to query the red wine data and supplier data where wine quality is greater than or equal to 7.

We can visualize the processed dataset using QuickSight.

Clean up

To avoid incurring unnecessary charges, you can use the AWS CloudFormation console to delete the stack that you deployed. This removes all the resources you created when deploying the solution.

Conclusion

Event-driven architectures provide access to near-real-time information and help you make business decisions on fresh data. In this post, we demonstrated how to ingest and process SharePoint data using AWS serverless services like AWS Glue and EventBridge. We saw how to configure a rule in EventBridge to forward events to AWS Glue. You can use this pattern for your analytical use cases, such as joining SharePoint data with other data in your lake to generate insights, or auditing SharePoint data and compliance requirements.

About the Author

Venkata Sistla is a Big Data & Analytics Consultant on the AWS Professional Services team. He specializes in building data processing capabilities and helping customers remove constraints that prevent them from leveraging their data to develop business insights.

Getting started with testing serverless applications

2021-09-20 Talia Nassi

Post Syndicated from Talia Nassi original https://aws.amazon.com/blogs/compute/getting-started-with-testing-serverless-applications/

Testing is an essential step in the software development lifecycle. Through the different types of tests, you validate user experience, performance, and detect bugs in your code. Features should not be considered done until all of the corresponding tests are written.

The distributed nature of serverless architectures separates your application logic from other concerns like state management, request routing, workflow orchestration, and queue polling.

In this post, I cover the three main types of testing developers do when building applications. I also go through what changes and what stays the same when building serverless applications with AWS Lambda, in addition to the challenges of testing serverless applications.

The challenges of testing serverless applications

To test your code fully using managed services, you need to emulate the cloud environment on your local machine. However, this is usually not practical.

Secondly, using many managed services for event-driven architecture means you must also account for external resources like queues, database tables, and event buses. This means you write more integration tests than unit tests, altering the standard testing pyramid. Building more integration tests can impact the maintenance of your tests or slow your testing speed.

Lastly, with synchronous workloads, such as a traditional web service, you make a request and assert on the response. The test doesn’t need to do anything special because the thread is blocked until the response returns.

However, in the case of event-driven architectures, state changes are driven by events flowing from one resource to another. Your tests must detect side effects in downstream components and these might not be immediate. This means that the tests must tolerate asynchronous behaviors, which can make for more complicated and slower-running tests.

Unit testing

Unit tests validate individual units of your code, independent from any other components. Unit tests check the smallest unit of functionality and should only have one reason to fail – the unit is not correctly implemented.

Unit tests generally cover the smallest units of functionality although the size of each unit can vary. For example, a number of functions may provide a coherent piece of behavior and you may want to test them as a single unit. In this case, your unit test might call an entry-point function that invokes several others to do its job. You test these functions together as a single unit.

Integration Testing

One good practice to test how services interact with each other is to write integration tests that mock the behavior of your services in the cloud.

The point of integration tests is to make sure that two components of your application work together properly. Integration tests are important in serverless applications because they rely heavily on integrations of different services. Unless you are testing in production, the most efficient way to run automated integration tests is to emulate your services in the cloud.

This can be done with tools like moto. Moto mocks calls to AWS automatically without requiring any other dependencies. Another useful tool is localstack. Localstack allows you to mock certain AWS service APIs on your local machine that you can use for testing the integration of two or more services.

You can also configure test events and manually test directly from the Lambda console. Remember that when you test a Lambda function, you are not only testing the business logic. You must also mock its payload and call a function invoke. There are over 200 event sources that can trigger Lambda functions. Each service has its own unique event format, and contains data about the resource or request that invoked the function. Find the full list of test events in the AWS documentation.

To configure a test event for AWS Lambda:

Navigate to the Lambda console and choose the function. From the Test dropdown, choose Configure Test Event.
Choose Create a new Test Event and select the template for the service you want to act as the trigger for your Lambda function. In this example, you choose Amazon DynamoDB Update.
Save the test event and choose Test in the Code source section. Each user can create up to 10 test events per function. Those test events are private to you. Lambda runs the function on your behalf. The function handler receives and then processes the sample event.
The Execution result shows the execution status as succeeded.

End-to-end testing

When testing your serverless applications end-to-end, it’s important to understand your user and data flows. The most important business-critical flows of your software are what should be tested end-to-end in your UI.

From a business perspective, these should be the most valuable user and data flows that occur in your product. Another resource to utilize is data from your customers. From your analytics platform, find the actions that users are doing the most in production.

End-to-end tests should be running in your build pipeline and act as blockers if one of them fails. They should also be updated as new features are added to your product.

The testing pyramid

The standard testing pyramid above on the left indicates that systems should have more unit tests than any other type of test, then a medium number of integration tests, and the least number of end-to-end tests.

However, when testing serverless applications, this standard shifts to a hexagonal structure on the right because it’s mostly made up of two or more AWS services talking to each other. You can mock out those integrations with tools such as moto or localstack.

Add automated tests to your CI/CD pipeline

As serverless applications scale, having automated tests is essential in getting fast feedback on the current state of your product. It is not scalable to test everything manually, so investing in an automation tool to run your tests is essential.

All of the tests in your build pipeline, including unit, integration, and end-to-end tests should be blocking in your CI/CD pipeline. This means if one of them fails, it should block the promotion of that code into production. And remember – there’s no such thing as a flakey test. Either the test does what it’s supposed to do, or it doesn’t.

Narrowly scope your tests

Testing asynchronous processes can be tricky. Not only must you monitor different parts of your system, you also need to know when to stop waiting and end the test. When there are multiple asynchronous steps, the delays add up to a longer-running test. It’s also more difficult to estimate how long we should wait before ending. There are two approaches to mitigate these issues.

Firstly, write separate, more narrowly-scoped tests over each asynchronous step. This limits the possible causes of asynchronous test failure you need to investigate. Also, with fewer asynchronous steps, these tests will run quicker and it will be easier to estimate how long to wait before timing out.

Secondly, verify as much of your system as possible using synchronous tests. Then, you only need asynchronous tests to verify residual concerns that aren’t already covered. Synchronous tests are also easier to diagnose when they fail, so you want to catch as many issues with them as possible before running your asynchronous tests.

Conclusion

In this blog post, you learn the three types of testing – unit testing, integration testing, and end-to-end testing. Then you learn how to configure test events with Lambda. I then cover the shift from the standard testing pyramid to the hexagonal testing pyramid for serverless, and why more integration tests are necessary. Then you learn a few best practices to keep in mind for getting started with testing your serverless applications.

For more information on serverless, head to Serverless Land.

Welcome to AWS Storage Day 2021

2021-09-02 Marcia Villalba

Post Syndicated from Marcia Villalba original https://aws.amazon.com/blogs/aws/welcome-to-aws-storage-day-2021/

Welcome to the third annual AWS Storage Day 2021! During Storage Day 2020 and the first-ever Storage Day 2019 we made many impactful announcements for our customers and this year will be no different. The one-day, free AWS Storage Day 2021 virtual event will be hosted on the AWS channel on Twitch. You’ll hear from experts about announcements, leadership insights, and educational content related to AWS Storage services.

The first part of the day is the leadership track. Wayne Duso, VP of Storage, Edge, and Data Governance, will be presenting a live keynote. He’ll share information about what’s new in AWS Cloud Storage and how these services can help businesses increase agility and accelerate innovation. The keynote will be followed by live interviews with the AWS Storage leadership team, including Mai-Lan Tomsen Bukovec, VP of AWS Block and Object Storage.

The second part of the day is a technical track in which you’ll learn more about Amazon Simple Storage Service (Amazon S3), Amazon Elastic Block Store (EBS), Amazon Elastic File System (Amazon EFS), AWS Backup, Cloud Data Migration, AWS Transfer Family and Amazon FSx.

To register for the event, visit the AWS Storage Day 2021 event page.

Now as Jeff Barr likes to say, let’s get into the announcements.

Amazon FSx for NetApp ONTAP
Today, we are pleased to announce Amazon FSx for NetApp ONTAP, a new storage service that allows you to launch and run fully managed NetApp ONTAP file systems in the cloud. Amazon FSx for NetApp ONTAP joins Amazon FSx for Lustre and Amazon FSx for Windows File Server as the newest file system offered by Amazon FSx.

Amazon FSx for NetApp ONTAP provides the full ONTAP experience with capabilities and APIs that make it easy to run applications that rely on NetApp or network-attached storage (NAS) appliances on AWS without changing your application code or how you manage your data. To learn more, read New – Amazon FSx for NetApp ONTAP.

Amazon S3
Amazon S3 Multi-Region Access Points is a new S3 feature that allows you to define global endpoints that span buckets in multiple AWS Regions. Using this feature, you can now build multi-region applications without adding complexity to your applications, with the same system architecture as if you were using a single AWS Region.

S3 Multi-Region Access Points is built on top of AWS Global Accelerator and routes S3 requests over the global AWS network. S3 Multi-Region Access Points dynamically routes your requests to the lowest latency copy of your data, so the upload and download performance can increase by 60 percent. It’s a great solution for applications that rely on reading files from S3 and also for applications like autonomous vehicles that need to write a lot of data to S3. To learn more about this new launch, read How to Accelerate Performance and Availability of Multi-Region Applications with Amazon S3 Multi-Region Access Points.

There’s also great news about the Amazon S3 Intelligent-Tiering storage class! The conditions of usage have been updated. There is no longer a minimum storage duration for all objects stored in S3 Intelligent-Tiering, and monitoring and automation charges for objects smaller than 128 KB have been removed. Smaller objects (128 KB or less) are not eligible for auto-tiering when stored in S3 Intelligent-Tiering. Now that there is no monitoring and automation charge for small objects and no minimum storage duration, you can use the S3 Intelligent-Tiering storage class by default for all your workloads with unknown or changing access patterns. To learn more about this announcement, read Amazon S3 Intelligent-Tiering – Improved Cost Optimizations for Short-Lived and Small Objects.

Amazon EFS
Amazon EFS Intelligent Tiering is a new capability that makes it easier to optimize costs for shared file storage when access patterns change. When you enable Amazon EFS Intelligent-Tiering, it will store the files in the appropriate storage class at the right time. For example, if you have a file that is not used for a period of time, EFS Intelligent-Tiering will move the file to the Infrequent Access (IA) storage class. If the file is accessed again, Intelligent-Tiering will automatically move it back to the Standard storage class.

To get started with Intelligent-Tiering, enable lifecycle management in a new or existing file system and choose a lifecycle policy to automatically transition files between different storage classes. Amazon EFS Intelligent-Tiering is perfect for workloads with changing or unknown access patterns, such as machine learning inference and training, analytics, content management and media assets. To learn more about this launch, read Amazon EFS Intelligent-Tiering Optimizes Costs for Workloads with Changing Access Patterns.

AWS Backup
AWS Backup Audit Manager allows you to simplify data governance and compliance management of your backups across supported AWS services. It provides customizable controls and parameters, like backup frequency or retention period. You can also audit your backups to see if they satisfy your organizational and regulatory requirements. If one of your monitored backups drifts from your predefined parameters, AWS Backup Audit Manager will let you know so you can take corrective action. This new feature also enables you to generate reports to share with auditors and regulators. To learn more, read How to Monitor, Evaluate, and Demonstrate Backup Compliance with AWS Backup Audit Manager.

Amazon EBS
Amazon EBS direct APIs now support creating 64 TB EBS Snapshots directly from any block storage data, including on-premises. This was increased from 16 TB to 64 TB, allowing customers to create the largest snapshots and recover them to Amazon EBS io2 Block Express Volumes. To learn more, read Amazon EBS direct API documentation.

AWS Transfer Family
AWS Transfer Family Managed Workflows is a new feature that allows you to reduce the manual tasks of preprocessing your data. Managed Workflows does a lot of the heavy lifting for you, like setting up the infrastructure to run your code upon file arrival, continuously monitoring for errors, and verifying that all the changes to the data are logged. Managed Workflows helps you handle error scenarios so that failsafe modes trigger when needed.

AWS Transfer Family Managed Workflows allows you to configure all the necessary tasks at once so that tasks can automatically run in the background. Managed Workflows is available today in the AWS Transfer Family Management Console. To learn more, read Transfer Family FAQ.

Join us online for more!
Don’t forget to register and join us for the AWS Storage Day 2021 virtual event. The event will be live at 8:30 AM Pacific Time (11:30 AM Eastern Time) on September 2. The event will immediately re-stream for the Asia-Pacific audience with live Q&A moderators on Friday, September 3, at 8:30 AM Singapore Time. All sessions will be available on demand next week.

We look forward to seeing you there!

— Marcia

Summer at the Raspberry Pi Store with The Centre for Computing History

2021-08-13 Ashley Whittaker

Post Syndicated from Ashley Whittaker original https://www.raspberrypi.org/blog/raspberry-pi-store-summer-2021/

A whole lot of super free hands-on activities are happening at the Raspberry Pi Store this summer.

We have teamed up with the Centre for Computing History to create an interactive learning space that’s accessible to all ages and abilities. Best of all, everything is free. It’s all happening in a big space new space we’ve borrowed a few doors down from the Raspberry Pi Store in the Grand Arcade in Cambridge, UK.

What is Raspberry Pi doing?

Everyone aged seven to 107 can get hands-on and creative with our free beginner-friendly workshops. You can make games with Scratch on Raspberry Pi, learn simple electronics for beginners, or get hands-on with the Raspberry Pi camera and Python programming.

If you don’t know anything about coding, don’t worry: there are friendly people on hand to help you learn.

The workshops take place every Monday, Wednesday and Friday until 3 September. Pre-booking is highly advisable. If the one you want is fully booked, it’s well worth dropping by if you’re in the neighbourhood, because spaces often become available at the last minute. And if you book and find you can no longer come along, please do make sure you cancel, because there will be lots of people who would love to take your space!

Book your place at one of our workshops.

Not sure what you’re doing? We can help!

What is the Centre for Computing History doing?

Come and celebrate thirty years of the World Wide Web and see how things have changed over the last three decades.

This interactive exhibition celebrates the years since Tim Berners-Lee changed the world forever by publishing the very first website at CERN in 1991. You can trace the footsteps of the early web, and have a go on some original hardware.

centre for computing history web at 30 — So much retro hardware to get your hands on

Here are some of the things you can do:

Browse the very first website from 1991
Search the web with Archie, the first search engine
Enjoy the very first web comic
Order a pizza on the first transactional website
See the first webcam site
See a recreation of the trailblazing Trojan Room Coffee Cam

But I don’t live near the Raspberry Pi Store!

While we would love to have a Raspberry Pi store in every town in every country all over the world (cackles maniacally), we are sticking with just the one in our hometown for now. But we make lots of cool stuff you can access online to relieve the FOMO.

The Raspberry Pi Foundation’s livestreamed Digital Making at Home videos are all still available for young people to watch and learn along with. You can chat, code together, hear from cool people, and see amazing digital making projects from kids who love making with technology.

Taking a scroll through our FutureLearn courses

There are also more than thirty Raspberry Pi courses available for free on FutureLearn. There’s something for every type of user and level of learner, from coders looking to move from Scratch to Python programming, to people looking to start up their own CoderDojo. Plus tons of materials for teachers sharing practical resources for the classroom.

Raspberry Pi books

If you like to tinker away in your own time, there are loads of books for all abilities available from the Raspberry Pi Press online store. The Official Raspberry Pi Beginner’s Guide comes in five languages. Game designers can Code the Classics. And fashion-forward makers can create Wearable Tech Projects.

More books than the library in Beauty and the Beast

The post Summer at the Raspberry Pi Store with The Centre for Computing History appeared first on Raspberry Pi.

Engagement Event Dashboard

Prerequisites

Solution Overview

Use case(s)

Getting started with solution deployment

Clean up

Conclusion

Next steps

Extending the solution

Vice-Admiral Gouveia e Melo: Vaccination, misinformation and leadership

Sonia Jorge: The need for Internet — affordable, fast and for everyone

David Kiron: The future of work and how AI (and philosophy) can help

Ana Maiques: neuroscience & women in tech

Siyabulela Mandela: The Internet is a human right

Carlos Moedas: The light (and innovation) in Lisbon

Sudarsan Reddy: Why is Cloudflare Tunnel relevant?

Yusuf Sherwani: Addiction treated online

David Shrier: From sharing economy to blockchain

Dame Til Wykes: The mental health discussion went viral

TL;DR overview

Questions?

2021 Metasploit Capture the Flag: Official rules

To enter

Prizes

FAQ

Join our community CTF!

AWS Glue event-driven workflows

Overview of solution

Prerequisites

Configure SharePoint server authentication details

Deploy the solution with AWS CloudFormation

Review the EventBridge rule

Run the ingest AWS Glue job and verify the AWS Glue workflow is triggered successfully

Sample wine dataset

Clean up

Conclusion

About the Author

The challenges of testing serverless applications

Unit testing

Integration Testing

End-to-end testing

The testing pyramid

Add automated tests to your CI/CD pipeline

Narrowly scope your tests

Conclusion

What is Raspberry Pi doing?

What is the Centre for Computing History doing?

But I don’t live near the Raspberry Pi Store!

Raspberry Pi books

The collective thoughts of the interwebz