Tag Archives: launch

Introducing Amazon Managed Workflows for Apache Airflow (MWAA)

2020-11-24 Danilo Poccia

Post Syndicated from Danilo Poccia original https://aws.amazon.com/blogs/aws/introducing-amazon-managed-workflows-for-apache-airflow-mwaa/

As the volume and complexity of your data processing pipelines increase, you can simplify the overall process by decomposing it into a series of smaller tasks and coordinate the execution of these tasks as part of a workflow. To do so, many developers and data engineers use Apache Airflow, a platform created by the community to programmatically author, schedule, and monitor workflows. With Airflow you can manage workflows as scripts, monitor them via the user interface (UI), and extend their functionality through a set of powerful plugins. However, manually installing, maintaining, and scaling Airflow, and at the same time handling security, authentication, and authorization for its users takes much of the time you’d rather use to focus on solving actual business problems.

For these reasons, I am happy to announce the availability of Amazon Managed Workflows for Apache Airflow (MWAA), a fully managed service that makes it easy to run open-source versions of Apache Airflow on AWS, and to build workflows to execute your extract-transform-load (ETL) jobs and data pipelines.

Airflow workflows retrieve input from sources like Amazon Simple Storage Service (S3) using Amazon Athena queries, perform transformations on Amazon EMR clusters, and can use the resulting data to train machine learning models on Amazon SageMaker. Workflows in Airflow are authored as Directed Acyclic Graphs (DAGs) using the Python programming language.

A key benefit of Airflow is its open extensibility through plugins which allows you to create tasks that interact with AWS or on-premise resources required for your workflows including AWS Batch, Amazon CloudWatch, Amazon DynamoDB, AWS DataSync, Amazon ECS and AWS Fargate, Amazon Elastic Kubernetes Service (EKS), Amazon Kinesis Firehose, AWS Glue, AWS Lambda, Amazon Redshift, Amazon Simple Queue Service (SQS), and Amazon Simple Notification Service (SNS).

To improve observability, Airflow metrics can be published as CloudWatch Metrics, and logs can be sent to CloudWatch Logs. Amazon MWAA provides automatic minor version upgrades and patches by default, with an option to designate a maintenance window in which these upgrades are performed.

You can use Amazon MWAA with these three steps:

Create an environment – Each environment contains your Airflow cluster, including your scheduler, workers, and web server.
Upload your DAGs and plugins to S3 – Amazon MWAA loads the code into Airflow automatically.
Run your DAGs in Airflow – Run your DAGs from the Airflow UI or command line interface (CLI) and monitor your environment with CloudWatch.

Let’s see how this works in practice!

How to Create an Airflow Environment Using Amazon MWAA
In the Amazon MWAA console, I click on Create environment. I give the environment a name and select the Airflow version to use.

Then, I select the S3 bucket and the folder to load my DAG code. The bucket name must start with airflow-.

Optionally, I can specify a plugins file and a requirements file:

The plugins file is a ZIP file containing the plugins used by my DAGs.
The requirements file describes the Python dependencies to run my DAGs.

For plugins and requirements, I can select the S3 object version to use. In case the plugins or the requirements I use create a non-recoverable error in my environment, Amazon MWAA will automatically roll back to the previous working version.

I click Next to configure the advanced settings, starting with networking. Each environment runs in a Amazon Virtual Private Cloud using private subnets in two availability zones. Web server access to the Airflow UI is always protected by a secure login using AWS Identity and Access Management (IAM). However, you can choose to have web server access on a public network so that you can login over the Internet, or on a private network in your VPC. For simplicity, I select a Public network. I let Amazon MWAA create a new security group with the correct inbound and outbound rules. Optionally, I can add one or more existing security groups to fine-tune control of inbound and outbound traffic for your environment.

Now, I configure my environment class. Each environment includes a scheduler, a web server, and a worker. Workers automatically scale up and down according to my workload. We provide you a suggestion on which class to use based on the number of DAGs, but you can monitor the load on your environment and modify its class at any time.

Encryption is always enabled for data at rest, and while I can select a customized key managed by AWS Key Management Service (KMS) I will instead keep the default key that AWS owns and manages on my behalf.

For monitoring, I publish environment performance to CloudWatch Metrics. This is enabled by default, but I can disable CloudWatch Metrics after launch. For the logs, I can specify the log level and which Airflow components should send their logs to CloudWatch Logs. I leave the default to send only the task logs and use log level INFO.

I can modify the default settings for Airflow configuration options, such as default_task_retries or worker_concurrency. For now, I am not changing these values.

Finally, but most importantly, I configure the permissions that will be used by my environment to access my DAGs, write logs, and run DAGs accessing other AWS resources. I select Create a new role and click on Create environment. After a few minutes, the new Airflow environment is ready to be used.

Using the Airflow UI
In the Amazon MWAA console, I look for the new environment I just created and click on Open Airflow UI. A new browser window is created and I am authenticated with a secure login via AWS IAM.

There, I look for a DAG that I put on S3 in the movie_list_dag.py file. The DAG is downloading the MovieLens dataset, processing the files on S3 using Amazon Athena, and loading the result to a Redshift cluster, creating the table if missing.

Here’s the full source code of the DAG:

from airflow import DAG
from airflow.operators.python_operator import PythonOperator
from airflow.operators import HttpSensor, S3KeySensor
from airflow.contrib.operators.aws_athena_operator import AWSAthenaOperator
from airflow.utils.dates import days_ago
from datetime import datetime, timedelta
from io import StringIO
from io import BytesIO
from time import sleep
import csv
import requests
import json
import boto3
import zipfile
import io
s3_bucket_name = 'my-bucket'
s3_key='files/'
redshift_cluster='redshift-cluster-1'
redshift_db='dev'
redshift_dbuser='awsuser'
redshift_table_name='movie_demo'
test_http='https://grouplens.org/datasets/movielens/latest/'
download_http='http://files.grouplens.org/datasets/movielens/ml-latest-small.zip'
athena_db='demo_athena_db'
athena_results='athena-results/'
create_athena_movie_table_query="""
CREATE EXTERNAL TABLE IF NOT EXISTS Demo_Athena_DB.ML_Latest_Small_Movies (
  `movieId` int,
  `title` string,
  `genres` string 
)
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'
WITH SERDEPROPERTIES (
  'serialization.format' = ',',
  'field.delim' = ','
) LOCATION 's3://pinwheeldemo1-pinwheeldagsbucketfeed0594-1bks69fq0utz/files/ml-latest-small/movies.csv/ml-latest-small/'
TBLPROPERTIES (
  'has_encrypted_data'='false',
  'skip.header.line.count'='1'
); 
"""
create_athena_ratings_table_query="""
CREATE EXTERNAL TABLE IF NOT EXISTS Demo_Athena_DB.ML_Latest_Small_Ratings (
  `userId` int,
  `movieId` int,
  `rating` int,
  `timestamp` bigint 
)
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'
WITH SERDEPROPERTIES (
  'serialization.format' = ',',
  'field.delim' = ','
) LOCATION 's3://pinwheeldemo1-pinwheeldagsbucketfeed0594-1bks69fq0utz/files/ml-latest-small/ratings.csv/ml-latest-small/'
TBLPROPERTIES (
  'has_encrypted_data'='false',
  'skip.header.line.count'='1'
); 
"""
create_athena_tags_table_query="""
CREATE EXTERNAL TABLE IF NOT EXISTS Demo_Athena_DB.ML_Latest_Small_Tags (
  `userId` int,
  `movieId` int,
  `tag` int,
  `timestamp` bigint 
)
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe'
WITH SERDEPROPERTIES (
  'serialization.format' = ',',
  'field.delim' = ','
) LOCATION 's3://pinwheeldemo1-pinwheeldagsbucketfeed0594-1bks69fq0utz/files/ml-latest-small/tags.csv/ml-latest-small/'
TBLPROPERTIES (
  'has_encrypted_data'='false',
  'skip.header.line.count'='1'
); 
"""
join_tables_athena_query="""
SELECT REPLACE ( m.title , '"' , '' ) as title, r.rating
FROM demo_athena_db.ML_Latest_Small_Movies m
INNER JOIN (SELECT rating, movieId FROM demo_athena_db.ML_Latest_Small_Ratings WHERE rating > 4) r on m.movieId = r.movieId
"""
def download_zip():
    s3c = boto3.client('s3')
    indata = requests.get(download_http)
    n=0
    with zipfile.ZipFile(io.BytesIO(indata.content)) as z:       
        zList=z.namelist()
        print(zList)
        for i in zList: 
            print(i) 
            zfiledata = BytesIO(z.read(i))
            n += 1
            s3c.put_object(Bucket=s3_bucket_name, Key=s3_key+i+'/'+i, Body=zfiledata)
def clean_up_csv_fn(**kwargs):    
    ti = kwargs['task_instance']
    queryId = ti.xcom_pull(key='return_value', task_ids='join_athena_tables' )
    print(queryId)
    athenaKey=athena_results+"join_athena_tables/"+queryId+".csv"
    print(athenaKey)
    cleanKey=athena_results+"join_athena_tables/"+queryId+"_clean.csv"
    s3c = boto3.client('s3')
    obj = s3c.get_object(Bucket=s3_bucket_name, Key=athenaKey)
    infileStr=obj['Body'].read().decode('utf-8')
    outfileStr=infileStr.replace('"e"', '') 
    outfile = StringIO(outfileStr)
    s3c.put_object(Bucket=s3_bucket_name, Key=cleanKey, Body=outfile.getvalue())
def s3_to_redshift(**kwargs):    
    ti = kwargs['task_instance']
    queryId = ti.xcom_pull(key='return_value', task_ids='join_athena_tables' )
    print(queryId)
    athenaKey='s3://'+s3_bucket_name+"/"+athena_results+"join_athena_tables/"+queryId+"_clean.csv"
    print(athenaKey)
    sqlQuery="copy "+redshift_table_name+" from '"+athenaKey+"' iam_role 'arn:aws:iam::163919838948:role/myRedshiftRole' CSV IGNOREHEADER 1;"
    print(sqlQuery)
    rsd = boto3.client('redshift-data')
    resp = rsd.execute_statement(
        ClusterIdentifier=redshift_cluster,
        Database=redshift_db,
        DbUser=redshift_dbuser,
        Sql=sqlQuery
    )
    print(resp)
    return "OK"
def create_redshift_table():
    rsd = boto3.client('redshift-data')
    resp = rsd.execute_statement(
        ClusterIdentifier=redshift_cluster,
        Database=redshift_db,
        DbUser=redshift_dbuser,
        Sql="CREATE TABLE IF NOT EXISTS "+redshift_table_name+" (title	character varying, rating	int);"
    )
    print(resp)
    return "OK"
DEFAULT_ARGS = {
    'owner': 'airflow',
    'depends_on_past': False,
    'email': ['[email protected]'],
    'email_on_failure': False,
    'email_on_retry': False 
}
with DAG(
    dag_id='movie-list-dag',
    default_args=DEFAULT_ARGS,
    dagrun_timeout=timedelta(hours=2),
    start_date=days_ago(2),
    schedule_interval='*/10 * * * *',
    tags=['athena','redshift'],
) as dag:
    check_s3_for_key = S3KeySensor(
        task_id='check_s3_for_key',
        bucket_key=s3_key,
        wildcard_match=True,
        bucket_name=s3_bucket_name,
        s3_conn_id='aws_default',
        timeout=20,
        poke_interval=5,
        dag=dag
    )
    files_to_s3 = PythonOperator(
        task_id="files_to_s3",
        python_callable=download_zip
    )
    create_athena_movie_table = AWSAthenaOperator(task_id="create_athena_movie_table",query=create_athena_movie_table_query, database=athena_db, output_location='s3://'+s3_bucket_name+"/"+athena_results+'create_athena_movie_table')
    create_athena_ratings_table = AWSAthenaOperator(task_id="create_athena_ratings_table",query=create_athena_ratings_table_query, database=athena_db, output_location='s3://'+s3_bucket_name+"/"+athena_results+'create_athena_ratings_table')
    create_athena_tags_table = AWSAthenaOperator(task_id="create_athena_tags_table",query=create_athena_tags_table_query, database=athena_db, output_location='s3://'+s3_bucket_name+"/"+athena_results+'create_athena_tags_table')
    join_athena_tables = AWSAthenaOperator(task_id="join_athena_tables",query=join_tables_athena_query, database=athena_db, output_location='s3://'+s3_bucket_name+"/"+athena_results+'join_athena_tables')
    create_redshift_table_if_not_exists = PythonOperator(
        task_id="create_redshift_table_if_not_exists",
        python_callable=create_redshift_table
    )
    clean_up_csv = PythonOperator(
        task_id="clean_up_csv",
        python_callable=clean_up_csv_fn,
        provide_context=True     
    )
    transfer_to_redshift = PythonOperator(
        task_id="transfer_to_redshift",
        python_callable=s3_to_redshift,
        provide_context=True     
    )
    check_s3_for_key >> files_to_s3 >> create_athena_movie_table >> join_athena_tables >> clean_up_csv >> transfer_to_redshift
    files_to_s3 >> create_athena_ratings_table >> join_athena_tables
    files_to_s3 >> create_athena_tags_table >> join_athena_tables
    files_to_s3 >> create_redshift_table_if_not_exists >> transfer_to_redshift

In the code, different tasks are created using operators like PythonOperator, for generic Python code, or AWSAthenaOperator, to use the integration with Amazon Athena. To see how those tasks are connected in the workflow, you can see the latest few lines, that I repeat here (without indentation) for simplicity:

check_s3_for_key >> files_to_s3 >> create_athena_movie_table >> join_athena_tables >> clean_up_csv >> transfer_to_redshift
files_to_s3 >> create_athena_ratings_table >> join_athena_tables
files_to_s3 >> create_athena_tags_table >> join_athena_tables
files_to_s3 >> create_redshift_table_if_not_exists >> transfer_to_redshift

The Airflow code is overloading the right shift >> operator in Python to create a dependency, meaning that the task on the left should be executed first, and the output passed to the task on the right. Looking at the code, this is quite easy to read. Each of the four lines above is adding dependencies, and they are all evaluated together to execute the tasks in the right order.

In the Airflow console, I can see a graph view of the DAG to have a clear representation of how tasks are executed:

Available Now
Amazon Managed Workflows for Apache Airflow (MWAA) is available today in US East (Northern Virginia), US West (Oregon), US East (Ohio), Asia Pacific (Singapore), Asia Pacific (Toyko), Asia Pacific (Sydney), Europe (Ireland), Europe (Frankfurt), and Europe (Stockholm). You can launch a new Amazon MWAA environment from the console, AWS Command Line Interface (CLI), or AWS SDKs. Then, you can develop workflows in Python using Airflow’s ecosystem of integrations.

With Amazon MWAA, you pay based on the environment class and the workers you use. For more information, see the pricing page.

Upstream compatibility is a core tenet of Amazon MWAA. Our code changes to the AirFlow platform are released back to open source.

With Amazon MWAA you can spend more time building workflows for your engineering and data science tasks, and less time managing and scaling the infrastructure of your Airflow platform.

Learn more about Amazon MWAA and get started today!

— Danilo

New – Code Signing, a Trust and Integrity Control for AWS Lambda

2020-11-24 Channy Yun

Post Syndicated from Channy Yun original https://aws.amazon.com/blogs/aws/new-code-signing-a-trust-and-integrity-control-for-aws-lambda/

Code signing is an industry standard technique used to confirm that the code is unaltered and from a trusted publisher. Code running inside AWS Lambda functions is executed on highly hardened systems and runs in a secure manner. However, function code is susceptible to alteration as it moves through deployment pipelines that run outside AWS.

Today, we are launching Code Signing for AWS Lambda. It is a trust and integrity control that helps administrators enforce that only signed code packages from trusted publishers run in their Lambda functions and that the code has not been altered since signing.

Code Signing for Lambda provides a first-class mechanism to enforce that only trusted code is deployed in Lambda. This frees up organizations from the burden of building gatekeeper components in their deployment pipelines. Code Signing for AWS Lambda leverages AWS Signer, a fully managed code signing service from AWS. Administrators create Signing Profile, a resource in AWS Signer that is used for creating signatures and grant developers access to the signing profile using AWS Identity and Access Management (IAM). Within Lambda, administrators specify the allowed signing profiles using a new resource called Code Signing Configuration (CSC). CSC enables organizations to implement a separation of duties between administrators and developers. Administrators can use CSC to set code signing policies on the functions, and developers can deploy code to the functions.

How to Create a Signing Profile
You can use AWS Signer console to create a new Signing profile. A signing profile can represent a group of trusted publishers and is analogous to the use of a digital signing certificate.

By clicking Create Signing Profile, you can create a Signing Profile that can be used to create signed code packages.

You can assign Signature validity period for the signatures generated by a Signing Profile between 1 day and 135 months.

How to create a Code Signing Configuration (CSC)
You can configure your functions to use Code Signing through the AWS Lambda console, Command-line Interface (CLI), or APIs by creating and attaching a new resource called Code Signing Configuration to the function. You can find Code signing configurations under Additional resources menu.

You can click Create configuration to define signing profiles that are allowed to sign code artifacts for this configuration, and set signature validation policy. To add an allowed signing profile, you can either select from the dropdown, which shows all signing profiles in your AWS account, or add a signing profile from a different account by specifying the version ARN.

Also, you can set the signature validation policy to either ‘Warn’ or ‘Enforce’. With ‘Warn’, Lambda logs a Cloudwatch metric if there is a signature check failure but accepts the deployment. With ‘Enforce’, Lambda rejects the deployment if there is a signature check failure. Signature check fails if the signature signing profile does not match one of the allowed signing profiles in the CSC, the signature is expired, or the signature is revoked. If the code package is tampered or altered since signing, the deployment is always rejected, irrespective of the signature validation policy.

You can use new Lambda API CreateCodeSigningConfig to create a CSC too. You can see the JSON request syntax below.

{
     "CodeSigningConfigId": string,
     "CodeSigningConfigArn": string,
     "Description": string,
     "AllowedPublishers": {
           "SigningProfileVersionArns": [string]
      },
     "CodeSigningPolicies": {
     "UntrustedArtifactOnDeployment": string,   // WARN OR ENFORCE
    },
     "LastModified”: string
}

Let’s Enable Code Signing for Your Lambda Functions
To enable Code Signing feature for your Lambda functions, you can select a function and click Edit in Code signing configuration section.

Select one of the available CSCs and click the Save button.

Once your function is configured to use code signing, you need to upload signed .zip file or Amazon S3 URL of a signed .zip made by a signing job in AWS Signer.

How to Create a Signed Code Package
Choose one of the allowed signing profiles and specify the S3 location of the code package ZIP file to be signed. Also, specify a destination path where the signed code package should be uploaded.

A signing job is an asynchronous process that generates a signature for your code package and puts the signed code package in the specified destination path.

Once signing job is succeeded, you can find signed ZIP packages in your assigned S3 bucket.

Back to Lambda console, you can now publish the signed code package to the Lambda function. Lambda will perform signature checks to verify that the code has not been altered since signing and that the code is signed by one of the allowed signing profile.

You can also enable code signing for a function using CreateFunction or PutFunctionCodeSigningConfig APIs by attaching a CSC to the function.

Developers can also use SAM CLI to sign code packages. They do this by specifying the signing profiles at package or deploy stage. SAM CLI automatically starts the signing workflow before deploying the code to Lambda.

Code Signing is also supported by Infrastructure as code tools like AWS CloudFormation and Terraform. Terraform also allows developers to sign code, in addition to declaring and creating code signing resources.

Now Available
Code Signing for AWS Lambda is available in all commercial regions except AWS China Regions, AWS GovCloud (US) Regions, and Asia Pacific (Osaka) Region. There is no additional charge for using code signing, and customers pay the standard price for Lambda functions.

To learn more about Code Signing for AWS Lambda and AWS Signer, please visit the Lambda developer guide and send us feedback either in the forum for AWS Lambda or through your usual AWS support contacts.

— Channy;

New – Multi-Factor Authentication with WebAuthn for AWS SSO

2020-11-24 Sébastien Stormacq

Post Syndicated from Sébastien Stormacq original https://aws.amazon.com/blogs/aws/multi-factor-authentication-with-webauthn-for-aws-sso/

Starting today, you can add WebAuthn as a new multi-factor authentication (MFA) to AWS Single Sign-On, in addition to currently supported one-time password (OTP) and Radius authenticators. By adding support for WebAuthn, a W3C specification developed in coordination with FIDO Alliance, you can now authenticate with a wide variety of interoperable authenticators provisioned by your system administrator or built into your laptops or smartphones. For example, you can now tap a hardware security key, touch a fingerprint sensor on your Mac, or use facial recognition on your mobile device or PC to authenticate into the AWS Management Console or AWS Command Line Interface (CLI).

With this addition, you can now self-register multiple MFA authenticators. Doing so allows you to authenticate on AWS with another device in case you lose or misplace your primary authenticator device. We make it easy for you to name your devices for long-term manageability.

WebAuthn two-factor authentication is available for identities stored in the AWS Single Sign-On internal identity store and those stored in Microsoft Active Directory, whether it is managed by AWS or not.

What are WebAuthn and FIDO2?

Before exploring how to configure two-factor authentication using your FIDO2-enabled devices, and to discover the user experience for web-based and CLI authentications, let’s recap how FIDO2, WebAuthn and other specifications fit together.

FIDO2 is made of two core specifications: Web Authentication (WebAuthn) and Client To Authenticator Protocol (CTAP).

Web Authentication (WebAuthn) is a W3C standard that provides strong authentication based upon public key cryptography. Unlike traditional code generator tokens or apps using TOTP protocol, it does not require sharing a secret between the server and the client. Instead, it relies on a public key pair and digital signature of unique challenges. The private key never leaves a secured device, the FIDO-enabled authenticator. When you try to authenticate to a website, this secured device interacts with your browser using the CTAP protocol.

WebAuthn is strong: Authentication is ideally backed by a secure element, which can safely store private keys and perform the cryptographic operations. It is scoped: A key pair is only useful for a specific origin, like browser cookies. A key pair registered at console.amazonaws.com cannot be used at console.not-the-real-amazon.com, mitigating the threat of phishing. Finally, it is attested: Authenticators can provide a certificate that helps servers verify that the public key did in fact come from an authenticator they trust, and not a fraudulent source.

To start to use FIDO2 authentication, you therefore need three elements: a website that supports WebAuthn, a browser that supports WebAuthn and CTAP protocols, and a FIDO authenticator. Starting today, the SSO Management Console and CLI now support WebAuthn. All modern web browsers are compatible (Chrome, Edge, Firefox, and Safari). FIDO authenticators are either devices you can use from one device or another (roaming authenticators), such as a YubiKey, or built-in hardware supported by Android, iOS, iPadOS, Windows, Chrome OS, and macOS (platform authenticators).

How Does FIDO2 Work?
When I first register my FIDO-enabled authenticator on AWS SSO, the authenticator creates a new set of public key credentials that can be used to sign a challenge generated by AWS SSO Console (the relaying party). The public part of these new credentials, along with the signed challenge, are stored by AWS SSO.

When I want to use WebAuthn as second factor authentication, the AWS SSO console sends a challenge to my authenticator. This challenge can then be signed with the previously generated public key credentials and sent back to the console. This way, AWS SSO console can verify that I have the required credentials.

How Do I Enable MFA With a Secure Device in the AWS SSO Console?
You, the system administrator, can enable MFA for your AWS SSO workforce when the user profiles are stored in AWS SSO itself, or stored in your Active Directory, either self-managed or a AWS Directory Service for Microsoft Active Directory.

To let my workforce register their FIDO or U2F authenticator in self-service mode, I first navigate to Settings, click Configure under Multi-Factor Authentication. On the following screen, I make four changes. First, under Users should be prompted for MFA, I select Every time they sign in. Second, under Users can authenticate with these MFA types, I check Security Keys and built-in authenticators. Third, under If a user does not yet have a registered MFA device, I check Require them to register an MFA device at sign in. Finally, under Who can manage MFA devices, I check Users can add and manage their own MFA devices. I click on Save Changes to save and return.

That’s it. Now your workforce is prompted to register their MFA device the next time they authenticate.

What Is the User Experience?
As an AWS console user, I authenticate on the AWS SSO portal page URL that I received from my System Administrator. I sign in using my user name and password, as usual. On the next screen, I am prompted to register my authenticator. I check Security Key as device type. To use a biometric factor such as fingerprints or face recognition, I would click Built-in authenticator.

The browser asks me to generate a key pair and to send my public key. I can do that just by touching a button on my device, or providing the registered biometric, e.g. TouchID or FaceID.The browser does confirm and shows me a last screen where I have the possibility to give a friendly name to my device, so I can remember which one is which. Then I click Save and Done.From now on, every time I sign in, I am prompted to touch my security device or use biometric authentication on my smartphone or laptop. What happens behind the scene is the server sending a challenge to my browser. The browser sends the challenge to the security device. The security device uses my private key to sign the challenge and to return it to the server for verification. When the server validates the signature with my public key, I am granted access to the AWS Management Console.

At any time, I can register additional devices and manage my registered devices. On the AWS SSO portal page, I click MFA devices on the top-right part of the screen.

I can see and manage the devices registered for my account, if any. I click Register device to register a new device.

How to Configure SSO for the AWS CLI?
Once my devices are configured, I can configure SSO on the AWS Command Line Interface (CLI).

I first configure CLI SSO with aws configure sso and I enter the SSO domain URL that I received from my system administrator. The CLI opens a browser where I can authenticate with my user name, password, and my second-factor authentication configured previously. The web console gives me a code that I enter back into the CLI prompt.

When I have access to multiple AWS Accounts, the CLI lists them and I choose the one I want to use. This is a one-time configuration.

Once this is done, I can use the aws CLI as usual, the SSO authentication happens automatically behind the scene. You are asked to re-authenticate from time to time, depending on the configuration set by your system administrator.

Available today
Just like AWS Single Sign-On, FIDO2 second-factor authentication is provided to you at no additional cost, and is available in all AWS Regions where AWS SSO is available.

As usual, we welcome your feedback. The team told me they are working on other features to offer you additional authentication options in the near future.

You can start to use FIDO2 as second factor authentication for AWS Single Sign-On today. Configure it now.

— seb

AWS Network Firewall – New Managed Firewall Service in VPC

2020-11-18 Channy Yun

Post Syndicated from Channy Yun original https://aws.amazon.com/blogs/aws/aws-network-firewall-new-managed-firewall-service-in-vpc/

Our customers want to have a high availability, scalable firewall service to protect their virtual networks in the cloud. Security is the number one priority of AWS, which has provided various firewall capabilities on AWS that address specific security needs, like Security Groups to protect Amazon Elastic Compute Cloud (EC2) instances, Network ACLs to protect Amazon Virtual Private Cloud (VPC) subnets, AWS Web Application Firewall (WAF) to protect web applications running on Amazon CloudFront, Application Load Balancer (ALB) or Amazon API Gateway, and AWS Shield to protect against Distributed Denial of Service (DDoS) attacks.

We heard customers want an easier way to scale network security across all the resources in their workload, regardless of which AWS services they used. They also want customized protections to secure their unique workloads, or to comply with government mandates or commercial regulations. These customers need the ability to do things like URL filtering on outbound flows, pattern matching on packet data beyond IP/Port/Protocol and the ability to alert on specific vulnerabilities for protocols beyond HTTP/S.

Today, I am happy to announce AWS Network Firewall, a high availability, managed network firewall service for your virtual private cloud (VPC). It enables you to easily deploy and manage stateful inspection, intrusion prevention and detection, and web filtering to protect your virtual networks on AWS. Network Firewall automatically scales with your traffic, ensuring high availability with no additional customer investment in security infrastructure.

With AWS Network Firewall, you can implement customized rules to prevent your VPCs from accessing unauthorized domains, to block thousands of known-bad IP addresses, or identify malicious activity using signature-based detection. AWS Network Firewall makes firewall activity visible in real-time via CloudWatch metrics and offers increased visibility of network traffic by sending logs to S3, CloudWatch and Kinesis Firehose. Network Firewall is integrated with AWS Firewall Manager, giving customers who use AWS Organizations a single place to enable and monitor firewall activity across all your VPCs and AWS accounts. Network Firewall is interoperable with your existing security ecosystem, including AWS partners such as CrowdStrike, Palo Alto Networks, and Splunk. You can also import existing rules from community maintained Suricata rulesets.

Concepts of Network Firewall
AWS Network Firewall runs stateless and stateful traffic inspection rules engines. The engines use rules and other settings that you configure inside a firewall policy.

You use a firewall on a per-Availability Zone basis in your VPC. For each Availability Zone, you choose a subnet to host the firewall endpoint that filters your traffic. The firewall endpoint in an Availability Zone can protect all of the subnets inside the zone except for the one where it’s located.

You can manage AWS Network Firewall with the following central components.

Firewall – A firewall connects the VPC that you want to protect to the protection behavior that’s defined in a firewall policy. For each Availability Zone where you want protection, you provide Network Firewall with a public subnet that’s dedicated to the firewall endpoint. To use the firewall, you update the VPC route tables to send incoming and outgoing traffic through the firewall endpoints.
Firewall policy – A firewall policy defines the behavior of the firewall in a collection of stateless and stateful rule groups and other settings. You can associate each firewall with only one firewall policy, but you can use a firewall policy for more than one firewall.
Rule group – A rule group is a collection of stateless or stateful rules that define how to inspect and handle network traffic. Rules configuration includes 5-tuple and domain name filtering. You can also provide stateful rules using Suricata open source rule specification.

AWS Network Firewall – Getting Started
You can start AWS Network Firewall in AWS Management Console, AWS Command Line Interface (CLI), and AWS SDKs for creating and managing firewalls. In the navigation pane in VPC console, expand AWS Network Firewall and then choose Create firewall in Firewalls menu.

To create a new firewall, enter the name that you want to use to identify this firewall and select your VPC from the dropdown. For each availability zone (AZ) where you want to use AWS Network Firewall, create a public subnet to for the firewall endpoint. This subnet must have at least one IP address available and a non-zero capacity. Keep these firewall subnets reserved for use by Network Firewall.

For Associated firewall policy, select Create and associate an empty firewall policy and choose Create firewall.

Your new firewall is listed in the Firewalls page. The firewall has an empty firewall policy. In the next step, you’ll specify the firewall behavior in the policy. Select your newly created the firewall policy in Firewall policies menu.

You can create or add new stateless or stateful rule groups – zero or more collections of firewall rules, with priority settings that define their processing order within the policy, and stateless default action defines how Network Firewall handles a packet that doesn’t match any of the stateless rule groups.

For stateless default action, the firewall policy allows you to specify different default settings for full packets and for packet fragments. The action options are the same as for the stateless rules that you use in the firewall policy’s stateless rule groups.

You are required to specify one of the following options:

Allow – Discontinue all inspection of the packet and permit it to go to its intended destination.
Drop – Discontinue all inspection of the packet and block it from going to its intended destination.
Forward to stateful rule groups – Discontinue stateless inspection of the packet and forward it to the stateful rule engine for inspection.

Additionally, you can optionally specify a named custom action to apply. For this action, Network Firewall sends an CloudWatch metric dimension named CustomAction with a value specified by you. After you define a named custom action, you can use it by name in the same context where you have define it. You can reuse a custom action setting among the rules in a rule group and you can reuse a custom action setting between the two default stateless custom action settings for a firewall policy.

After you’ve defined your firewall policy, you can insert the firewall into your VPC traffic flow by updating the VPC route tables to include the firewall.

How to set up Rule Groups
You can create new stateless or stateful rule groups in Network Firewall rule groups menu, and choose Create rule group. If you select Stateful rule group, you can select one of three options: 1) 5-tuple format, specifying source IP, source port, destination IP, destination port, and protocol, and specify the action to take for matching traffic, 2) Domain list, specifying a list of domain names and the action to take for traffic that tries to access one of the domains, and 3) Suricata compatible IPS rules, providing advanced firewall rules using Suricata rule syntax.

Network Firewall supports the standard stateless “5 tuple” rule specification for network traffic inspection with priority number that indicates the processing order of the stateless rule within the rule group.

Similarly, a stateful 5 tuple rule has the following match settings. These specify what the Network Firewall stateful rules engine looks for in a packet. A packet must satisfy all match settings to be a match.

A rule group with domain names has the following match settings – Domain name, a list of strings specifying the domain names that you want to match, and Traffic direction, a direction of traffic flow to inspect. The following JSON shows an example rule definition for a domain name rule group.

{
  "RulesSource": {
    "RulesSourceList": {
      "TargetType": "FQDN_SNI","HTTP_HOST",
      "Targets": [
        "test.example.com",
        "test2.example.com"
      ],
      "GeneratedRulesType": "DENYLIST"
    }
  } 
}

A stateful rule group with Suricata compatible IPS rules has all settings defined within the Suricata compatible specification. For example, as following is to detect SSH protocol anomalies. For information about Suricata, see the Suricata website.

alert tcp any any -> any 22 (msg:"ALERT TCP port 22 but not SSH"; app-layer-protocol:!ssh; sid:2271009; rev:1;)

You can monitor Network Firewall using CloudWatch, which collects raw data and processes it into readable, near real-time metrics, and AWS CloudTrail, a service that provides a record of API calls to AWS Network Firewall by a user, role, or an AWS service. CloudTrail captures all API calls for Network Firewall as events. To learn more about logging and monitoring, see the documentation.

Network Firewall Partners
At this launch, Network Firewall integrates with a collection of AWS partners. They provided us with lots of helpful feedback. Here are some of the blog posts that they wrote in order to share their experiences (I am updating this article with links as they are published).

IBM Product – IBM Security expands support for AWS native services with new AWS Network Firewall offering
IBM Services – AWS Cloud Native Firewall brings modern Enterprise Firewall Features to Cloud Native and eases the burden of BYO-FW
Sumo Logic – Full VPC traffic visibility with AWS Network Firewall and Sumo Logic

Available Now
AWS Network Firewall is now available in US East (N. Virginia), US West (Oregon), and Europe (Ireland) Regions. Take a look at the product page, price, and the documentation to learn more. Give this a try, and please send us feedback either through your usual AWS Support contacts or the AWS forum for Amazon VPC.

Learn all the details about AWS Network Firewall and get started with the new feature today.

— Channy;

Lightsail Containers: An Easy Way to Run your Containers in the Cloud

2020-11-13 Sébastien Stormacq

Post Syndicated from Sébastien Stormacq original https://aws.amazon.com/blogs/aws/lightsail-containers-an-easy-way-to-run-your-containers-in-the-cloud/

When I am delivering an introduction to the AWS Cloud for developers, I usually spend a bit of time to mention and to demonstrate Amazon Lightsail. It is by far the easiest way to get started on AWS. It allows you to get your application running on your own virtual server in a matter of minutes. Today, we are adding the possibility to deploy your container-based workloads on Amazon Lightsail. You can now deploy your container images to the cloud with the same simplicity and the same bundled pricing Amazon Lightsail provides for your virtual servers.

Amazon Lightsail is an easy-to-use cloud service that offers you everything needed to deploy an application or website, for a cost effective and easy to understand monthly plan. It is ideal to deploy simple workloads, websites, or to get started with AWS. The typical Lightsail customers range from developers to small businesses or startups who are looking to get quickly started in the cloud and AWS. At any time, you can later adopt the broad AWS Services when you are getting more familiar with the AWS cloud.

Under the hood, Lightsail is powered by Amazon Elastic Compute Cloud (EC2), Amazon Relational Database Service (RDS), Application Load Balancer, and other AWS services. It offers the level of security, reliability, and scalability you are expecting from AWS.

When deploying to Lightsail, you can choose between six operating systems (4 Linux distributions, FreeBSD, or Windows), seven applications (such as WordPress, Drupal, Joomla, Plesk…), and seven stacks (such as Node.js, Lamp, GitLab, Django…). But what about Docker containers?

Starting today, Amazon Lightsail offers an simple way for developers to deploy their containers to the cloud. All you need to provide is a Docker image for your containers and we automatically containerize it for you. Amazon Lightsail gives you an HTTPS endpoint that is ready to serve your application running in the cloud container. It automatically sets up a load balanced TLS endpoint, and take care of the TLS certificate. It replaces unresponsive containers for you automatically, it assigns a DNS name to your endpoint, it maintains the old version till the new version is healthy and ready to go live, and more.

Let’s see how it works by deploying a simple Python web app as a container. I assume you have the AWS Command Line Interface (CLI) and Docker installed on your laptop. Python is not required, it will be installed in the container only.

I first create a Python REST API, using the Flask simple application framework. Any programming language and any framework that can run inside a container works too. I just choose Python and Flask because they are simple and elegant.

You can safely copy /paste the following commands:

mkdir helloworld-python
cd helloworld-python
# create a simple Flask application in helloworld.py
echo "

from flask import Flask, request
from flask_restful import Resource, Api

app = Flask(__name__)
api = Api(app)

class Greeting (Resource):
   def get(self):
      return { "message" : "Hello Flask API World!" }
api.add_resource(Greeting, '/') # Route_1

if __name__ == '__main__':
   app.run('0.0.0.0','8080')

"  > helloworld.py

Then I create a Dockerfile that contains the steps and information required to build the container image:

# create a Dockerfile
echo '
FROM python:3
ADD helloworld.py /
RUN pip install flask
RUN pip install flask_restful
EXPOSE 8080
CMD [ "python", "./helloworld.py"]
 '  > Dockerfile

Now I can build my container:

docker build -t lightsail-hello-world .

The build command outputs many lines while it builds the container, it eventually terminates with the following message (actual ID differs):

Successfully built 7848e055edff
Successfully tagged lightsail-hello-world:latest

I test the container by launching it on my laptop:

docker run -it --rm -p 8080:8080 lightsail-hello-world

and connect a browser to localhost:8080

When I am satisfied with my app, I push the container to Docker Hub.

docker tag lightsail-hello-world sebsto/lightsail-hello-world
docker login
docker push sebsto/lightsail-hello-world

Now that I have a container ready on Docker Hub, let’s create a Lightsail Container Service.

I point my browser to the Amazon Lightsail console. I can see container services already deployed and I can manage them. To create a new service, I click Create container service:

On the next screen, I select the size of the container I want to use, in terms of vCPU and memory available to my application. I also select the number of container instances I want to run in parallel for high availability or scalability reasons. I can change the number of container instances or their power (vCPU and RAM) at any time, without interrupting the service. Both these parameters impact the price AWS charges you per month. The price is indicated and dynamically adjusted on the screen, as shown on the following video.

Slightly lower on the screen, I choose to skip the deployment for now. I give a name for the service (“hello-world“). I click Create container service.

Once the service is created, I click Create your first deployment to create a deployment. A deployment is a combination of a specific container image and version to be deployed on the service I just created.

I chose a name for my image and give the address of the image on Docker Hub, using the format user/<my container name>:tag. This is also where I have the possibility to enter environment variables, port mapping, or a launch command.

My container is offering a network service on port TCP 8080, so I add that port to the deployment configuration. The Open Ports configuration specifies which ports and protocols are open to other systems in my container’s network. Other containers or virtual machines can only connect to my container when the port is explicitly configured in the console or EXPOSE‘d in my Dockerfile. None of these ports are exposed to the public internet.

But in this example, I also want Lightsail to route the traffic from the public internet to this container. So, I add this container as an endpoint of the hello-world service I just created. The endpoint is automatically configured for TLS, there is no certificate to install or manage.

I can add up to 10 containers for one single deployment. When ready, I click Save and deploy.

After a while, my deployment is active and I can test the endpoint.

The endpoint DNS address is available on the top-right side of the console. If I must, I can configure my own DNS domain name.

I open another tab in my browser and point it at the https endpoint URL:

When I must deploy a new version, I use the console again to modify the deployment. I spare you the details of modifying the application code, build, and push a new version of the container. Let’s say I have my second container image version available under the name sebsto/lightsail-hello-world:v2. Back to Amazon Lightsail console, I click Deployments, then Modify your Deployments. I enter the full name, including the tag, of the new version of the container image and click Save and Deploy.

After a while, the new version is deployed and automatically activated.

I open a new tab in my browser and I point it to the endpoint URI available on the top-right corner of Amazon Lightsail console. I observe the JSON version is different. It now has a version attribute with a value of 2.

When something goes wrong during my deployment, Amazon Lightsail automatically keeps the last deployment active, to avoid any service interruption. I can also manually activate a previous deployment version to reverse any undesired changes.

I just deployed my first container image from Docker Hub. I can also manage my services and deploy local container images from my laptop using the AWS Command Line Interface (CLI). To push container images to my Amazon Lightsail container service directly from my laptop, I must install the LightSail Controler Plugin. (TL;DR curl, cp and chmod are your friends here, I also maintain a DockerFile to use the CLI inside a container.)

To create, list, or delete a container service, I type:

aws lightsail create-container-service --service-name myservice --power nano --scale 1

aws lightsail get-container-services
{
   "containerServices": [{
      "containerServiceName": "myservice",
      "arn": "arn:aws:lightsail:us-west-2:012345678901:ContainerService/1b50c121-eac7-4ee2-9078-425b0665b3d7",
      "createdAt": "2020-07-31T09:36:48.226999998Z",
      "location": {
         "availabilityZone": "all",
         "regionName": "us-west-2"
      },
      "resourceType": "ContainerService",
      "power": "nano",
      "powerId": "",
      "state": "READY",
      "scale": 1,
      "privateDomainName": "",
      "isDisabled": false,
      "roleArn": ""
   }]
}

aws lightsail delete-container-service --service myservice

I can also use the CLI to deploy container images directly from my laptop. Be sure lightsailctl is installed.

# Build the new version of my image (v3)
docker build -t sebsto/lightsail-hello-world:v3 .

# Push the new image.
aws lightsail push-container-image --service-name hello-world --label hello-world --image sebsto/lightsail-hello-world:v3

After a while, I see the output:

Image "sebsto/lightsail-hello-world:v3" registered.
Refer to this image as ":hello-world.hello-world.1" in deployments.

I create a lc.json file to hold the details of the deployment configuration. it is aligned to the options I see on the console. I report the name given by the previous command on the image property:

{
"serviceName": "hello-world",
"containers": {
     "hello-world": {
      "image": ":hello-world.hello-world.1",
      "ports": {
           "8080": "HTTP"
      }
     }
},
"publicEndpoint": {
     "containerName": "hello-world",
     "containerPort": 8080
}
}

Finally, I create a new service version with:
aws lightsail create-container-service-deployment --cli-input-json file://lc.json

I can query the deployment status with
aws lightsail get-container-services

...
"nextDeployment": {
   "version": 4,
   "state": "ACTIVATING",
   "containers": {
      "hello-world": {
    "image": ":hello-world.hello-world.1",
      "command": [],
      "environment": {},
      "ports": {
         "8080": "HTTP"
      }
     }
},
...

After a while, the status becomes ACTIVE, and I can test my endpoint.

curl https://hello-world.nxxxxxxxxxxx.lightsail.ec2.aws.dev/
{"message": "Hello Flask API World!", "version": 3}

If you plan to later deploy your container to Amazon ECS or Amazon Elastic Kubernetes Service, no changes are required. You can pull the container image from your repository, just like you do with Amazon Lightsail.

You can deploy your containers on Lightsail in all AWS Regions where Amazon Lightsail is available. As of today, this is US East (Ohio), US East (N. Virginia), US West (Oregon), Asia Pacific (Mumbai), Asia Pacific (Seoul), Asia Pacific (Singapore), Asia Pacific (Sydney), Asia Pacific (Tokyo), Canada (Central), Europe (Frankfurt), Europe (Ireland), Europe (London), and Europe (Paris).

As usual when using Amazon Lightsail, pricing is easy to understand and predictable. Amazon Lightsail Containers have a fixed price per month per container, depending on the size of the container (the vCPU/memory combination you use). You are charged on the prorated hours you keep the service running. The price per month is the maximum price you will be charged for running your service 24h/7. The prices are identical in all AWS Regions. They are ranging from $7 / month for a Nano container (512MB memory and 0.25 vCPU) to $160 / month for a X-Large container (8GB memory and 4 vCPU cores). This price not only includes the container itself, but also the load balancer, the DNS, and a generous data transfer tier. The details and prices for other AWS Regions are on the Lightsail pricing page.

I can’t wait to discover what solutions you will build and deploy on Amazon Lightsail Containers!

— seb

Meet the newest AWS Heroes including the first DevTools Heroes!

2020-11-12 Ross Barich

Post Syndicated from Ross Barich original https://aws.amazon.com/blogs/aws/meet-the-newest-aws-heroes-including-the-first-devtools-heroes/

The AWS Heroes program recognizes individuals from around the world who have extensive AWS knowledge and go above and beyond to share their expertise with others. The program continues to grow, to better recognize the most influential community leaders across a variety of technical disciplines.

Introducing AWS DevTools Heroes
Today we are introducing AWS DevTools Heroes: passionate advocates of the developer experience on AWS and the tools that enable that experience. DevTools Heroes excel at sharing their knowledge and building community through open source contributions, blogging, speaking, community organizing, and social media. Through their feedback, content, and contributions DevTools Heroes help shape the AWS developer experience and evolve the AWS DevTools, such as the AWS Cloud Development Kit, the AWS SDKs, and AWS Code suite of services.

The first cohort of AWS DevTools Heroes include:

Bhuvaneswari Subramani – Bengaluru, India

DevTools Hero Bhuvaneswari Subramani is Director Engineering Operations at Infor. With two decades of IT experience, specializing in Cloud Computing, DevOps, and Performance Testing, she is one of the community leaders of AWS User Group Bengaluru. She is also an active speaker at AWS community events and industry conferences, and delivers guest lectures on Cloud Computing for staff and students at engineering colleges across India. Her workshops, presentations, and blogs on AWS Developer Tools always stand out.

Jared Short – Washington DC, USA

DevTools Hero Jared Short is an engineer at Stedi, where they are using AWS native tooling and serverless services to build a global network for exchanging B2B transactions in a standard format. Jared’s work includes early contributions to the Serverless Framework. These days, his focus is working with AWS CDK and other toolsets to create intuitive and joyful developer experiences for teams on AWS.

Matt Coulter – Belfast, Northern Ireland

DevTools Hero Matt Coulter is a Technical Architect for Liberty IT, focused on creating the right environment for empowered teams to rapidly deliver business value in a well-architected, sustainable, and serverless-first way. Matt has been creating this environment by building CDK Patterns, an open source collection of serverless architecture patterns built using AWS CDK that reference the AWS Well Architected Framework. Matt also created CDK Day, which was the first community driven, global conference focused on everything CDK (AWS CDK, CDK for Terraform, CDK for Kubernetes, and others).

Paul Duvall – Washington DC, USA

DevTools Hero Paul Duvall is co-founder and former CTO of Stelligent. He is principal author of “Continuous Integration: Improving Software Quality and Reducing Risk,” and is also the author of many other publications including “Continuous Compliance on AWS,” “Continuous Encryption on AWS,” and “Continuous Security on AWS.” Paul hosted the DevOps on AWS Radio podcast for over three years and has been an enthusiastic user and advocate of AWS Developer Tools since their respective releases.

Sebastian Korfmann – Hamburg, Germany

DevTools Hero Sebastian Korfmann is an entrepreneurial Software Engineer with a current focus on Cloud Tooling, Infrastructure as Code, and the Cloud Development Kit (CDK) ecosystem in particular. He is a core contributor to the CDK for Terraform project, which enables users to define infrastructure using TypeScript, Python, and Java while leveraging the hundreds of providers and thousands of module definitions provided by Terraform and the Terraform ecosystem. With cdk.dev, Sebastian co-founded a community-driven hub for all things CDK, and he runs a weekly newsletter covering the growing CDK ecosystem.

Steve Gordon – East Sussex, United Kingdom

DevTools Hero Steve Gordon is a Pluralsight author and senior engineer who is passionate about community and all things .NET related, having worked with .NET for over 16 years. Steve has used AWS extensively for five years as a platform for running .NET microservices. He blogs regularly about running .NET on AWS, including deep dives into how the .NET SDK works, building cloud-native services, and how to deploy .NET containers to Amazon ECS. Steve founded .NET South East, a .NET Meetup group based in Brighton.

Thorsten Höger – Stuttgart, Germany

DevTools Hero Thorsten Höger is CEO and cloud consultant at Taimos, where he is advising customers on how to use AWS. Being a developer, he focuses on improving development processes and automating everything to build efficient deployment pipelines for customers of all sizes. As a supporter of open-source software, Thorsten is maintaining or contributing to several projects on GitHub, like test frameworks for AWS Lambda, Amazon Alexa, or developer tools for AWS. He is also the maintainer of the Jenkins AWS Pipeline plugin and one of the top three non-AWS contributors to AWS CDK.

Meet the rest of the new AWS Heroes
There is more good news! We are thrilled to introduce the remaining new AWS Heroes in this cohort, including the first Heroes from Argentina, Lebanon, and Saudi Arabia:

Ahmed Samir – Riyadh, Saudi Arabia

Community Hero Ahmed Samir is a Cloud Architect and mentor with more than 12 years of experience in IT. He is the leader of three Arabic Meetups in Riyadh: AWS, Amazon SageMaker, and Kubernetes, where he has organized and delivered over 40 Meetup events. Ahmed frequently shares knowledge and evangelizes AWS in Arabic through his social media accounts. He also holds AWS and Kubernetes certifications.

Anas Khattar – Beirut, Lebanon

Community Hero Anas Khattar is co-founder of Digico Solutions. He founded the AWS User Group Lebanon in 2018 and coordinates monthly meetups and workshops on a variety of cloud topics, which helped grow the group to more than 1,000 AWSome members. He also regularly speaks at tech conferences and authors tech blogs on Dev Community, sharing his AWS experiences and best practices. In close collaboration with the regional AWS community leaders and builders, Anas organized AWS Community Day MENA, which started in September 2020 with 12 User Groups from 10 countries, and hosted 27 speakers over 2 days.

Chris Gong – New York, USA

Community Hero Chris Gong is constantly exploring the different ways that cloud services can be applied in game development. Passionate about sharing his knowledge with the world, he routinely creates tutorials and educational videos on his YouTube channel, Flopperam, where the primary focus has been AWS Game Tech and Unreal Engine, specifically the multiplayer and networking aspects of game development. Although Amazon GameLift has been his biggest interest, Chris has plans to cover the usage of other AWS services in game development while exploring how they can be integrated with other game engines besides Unreal Engine.

Damian Olguin – Cordoba, Argentina

Community Hero Damian Olguin is a tech entrepreneur and one of the founders of Teracloud, an AWS APN Partner. As a community leader, he promotes knowledge-sharing experiences in user communities within LATAM. He is co-organizer of AWS User Group Cordoba and co-host of #DeepFridays, a Twitch streaming show that promotes AI/ML technology adoption by playing with DeepRacer, DeepLens, and DeepComposer. Damian is a public speaker who has spoken at AWS Community Day Buenos Aires 2019, AWS re:Invent 2019, and AWS Community Day LATAM 2020.

Denis Bauer – Sydney, Australia

Data Hero Denis Bauer is a Principal Research Scientist at Australia’s government research agency (CSIRO). Her open source products include VariantSpark, the Machine Learning tool for analysing ultra-high dimensional data, which was the first AWS Marketplace health product from a public sector organization. Denis is passionate about facilitating the digital transformation of the health and life-science sector by building a strong community of practice through open source technology, keynote presentations, and inclusive interdisciplinary collaborations. For example, the collaboration with her organization’s visionary Scientific Computing and Cloud Platforms experts as well as AWS Data Hero Lynn Langit has enabled the creation of cloud-based bioinformatics solutions used by 10,000 researchers annually.

Denis Dyack – St. Catharines, Canada

Community Hero Denis Dyack, a video game industry veteran of more than 30 years, is the Founder and CEO of Apocalypse Studios. His studio evangelizes using a cloud-first approach in game development and partners with AWS Game Tech to move the medium of the games industry forward. In his years of experience speaking at games conferences and within AWS communities, Denis has been an advocate for building on Amazon Lumberyard and more recently in moving over game development pipelines to AWS.

Emrah Şamdan – Ankara, Turkey

Serverless Hero Emrah Şamdan is the VP of Products at Thundra. In order to expand the serverless community globally in a pandemic, he co-organized the quarterly held ServerlessDays Virtual. He’s also a local community organizer for AWS Community Day Turkey, ServerlessDays Istanbul, and bi-weekly meetups at Cloud and Serverless Turkey. He’s currently part of the core organizer team of global ServerlessDays and is continuously looking for ways to expand the community. He frequently writes about serverless and cloud-native microservices on Medium and on the Thundra blog.

Franck Pachot – Lausanne, Switzerland

Data Hero Franck Pachot is the Principal Consultant and Database Evangelist at dbi services (an AWS Select Partner) and is passionate about all databases. With over 20 years of experience in development, data modeling, infrastructure, and all DBA tasks, Franck is a recognized database expert across Oracle and AWS. Franck is also an AWS Academy educator for Powercoders, and holds AWS Certified Database Specialty and Oracle Certified Master certifications. Franck contributes to technical communities, educating customers on AWS Databases through his blog, Twitter, and podcast in French. He is active in the Data community, and enjoys talking and meeting other data enthusiasts at conferences.

Hiroko Nishimura – Washington DC, USA

Community Hero Hiroko Nishimura (Hiro) is the founder of AWS Newbies and Cloud Newbies, which help people with non-traditional technical backgrounds begin their explorations into the AWS Cloud. As a “career switcher” herself, she has been community building since 2018 to help others deconstruct Cloud Computing jargon so they, too, can begin a career in the Cloud. Finally putting her degrees in Special Education to good use, Hiro teaches “Introduction to AWS for Non-Engineers” courses at LinkedIn Learning, and introductory coding lessons at egghead.

Juliano Cristian – Santa Catarina, Brazil

Community Hero Juliano Cristian is CEO of Game Business Accelerator Academy and co-founder of Game Developers SC which participates in the AWS APN program. He organizes the AWS Game Tech Lumberyard User Group in Florianópolis, Brazil, holding Meetups, Practical Labs, Game Jams, and Workshops. He also conducts many lectures at universities, speaking with more than 90 educational institutions across Brazil, introducing students to cloud computing and AWS Game Tech services. Whenever he can, Juliano also participates in other AWS User Groups in Brazil and Latin America, working to build an increasingly motivated and productive community.

Jungyoul Yu – Seoul, Korea

Machine Learning Hero Jungyoul Yu works at Danggeun Market as a DevOps Engineer, and is a leader of the AWS DeepRacer Group, part of the AWS Korea User Group. He was one of the AWS DeepRacer League finalists in AWS Summit Seoul and AWS re:Invent 2019. Starting off with zero ML experience, Jungyoul used AWS DeepRacer to learn ML techniques and began sharing his learnings both in the AWS DeepRacer Community, with User Groups, at AWS Community Day, and at various meetups. He has also shared many blog posts and sample code such as DeepRacer Reward Function Simulator, Rank Notifier, and Auto submit bot.

Juv Chan – Singapore

Machine Learning Hero Juv Chan is an AI automation engineer at UBS. He is the AWS DeepRacer League Singapore Summit 2019 champion and a re:Invent Championship Cup 2019 finalist. He is the lead organizer for the AWS DeepRacer Beginner Challenge global virtual community race in 2020. Juv is also involved in sharing his DeepRacer and Machine Learning knowledge with the AWS Machine Learning community at both global and regional scale. Juv is a contributing writer for both Towards Data Science and Towards AI platforms, where he blogs about AWS AI/ML and cloud relevant topics.

Lukonde Mwila – Johannesburg, South Africa

Container Hero Lukonde Mwila is a Senior Software Engineer at Entelect. He has a passion for sharing knowledge through speaking engagements such as meetups and tech conferences, as well as writing technical articles. His talk at DockerCon 2020 on deploying multi-container applications to AWS was one of the top rated and most viewed sessions of the event. He is 3x AWS certified and is an advocate for containerization and serverless technologies. Lukonde enjoys sharing experiences of building out AWS infrastructure on Medium and sharing open source projects on GitHub for the developer community to easily consume, replicate, and improve for their own benefit.

Magdalena Zawada – Rybnik, Poland

Community Hero Magdalena Zawada is Director of Strategy and Expansion at LCloud Ltd. She has been working in IT for 11 years and started her adventure with AWS technology in 2013 as CEO of Hostersi Ltd. Magdalena willingly shares her knowledge and experience with the community, co-organizing AWS UG Warsaw meetings. She also organizes a series of events to support preparation for obtaining AWS Cloud Practitioner Certifications. Magdalena belongs to numerous industry organizations, including FinOps Foundation and ISSA Information Systems Security Association Poland, and in 2019 she was a participant of the AWS re:Invent Community Leader “We Power Tech” Diversity Grant in Las Vegas.

Nick Walter – Lincoln, USA

Data Hero Nick Walter has over 15 years of experience with enterprise IT solutions, including expertise and certifications in AWS, VMware, and Oracle. A passionate evangelist for data management solutions on AWS, Nick can often be found blogging, hosting webinars, or presenting at conferences regarding the latest trends in business critical database technologies. Recently, Nick has focused on helping clients find cost effective ways to handle both the technical and licensing challenges of migrating application stacks backed by commercial database engines, such as Oracle or MS SQL Server, into AWS.

Renato Losio – Berlin, Germany

Data Hero Renato Losio is the Principal Cloud Architect at Funambol, a provider of white label cloud services. He has been working with AWS technologies since 2011, and holds 7 AWS certifications (including the Database Specialty). Renato enjoys speaking at international events, including DevOps Pro Europe, DevOpsConf Russia, All Day DevOps, Codemotion, and Percona Live. Passionate about knowledge sharing, Renato is an editor at InfoQ and writes about different cloud-related topics on his blog, cloudiamo.com. Through his various platforms, he has covered different topics across AWS Databases, such as Amazon RDS Proxy, Amazon RDS, and Amazon Aurora.

Tomasz Lakomy – Poznan, Poland

Community Hero Tomasz Lakomy is a Senior Frontend Engineer at OLX Group, and an egghead.io instructor. Over the last two years, he’s been diving into the world of AWS and sharing what he’s learned with others. After passing the AWS Certified Solutions Architect: Associate exam in 2019 he has recorded multiple courses on serverless technologies, including “Build an App with the AWS Cloud Development Kit” and “Learn AWS Lambda from Scratch.” In addition, he’s active on his Twitter, blog (tlakomy.com), as well as The Practical Dev community, where he posts articles on career advice, testing and – of course – AWS.

If you’d like to learn more about the new Heroes, or connect with a Hero near you, please visit the AWS Hero website.

— Ross;

Announcing AWS Glue DataBrew – A Visual Data Preparation Tool That Helps You Clean and Normalize Data Faster

2020-11-11 Danilo Poccia

Post Syndicated from Danilo Poccia original https://aws.amazon.com/blogs/aws/announcing-aws-glue-databrew-a-visual-data-preparation-tool-that-helps-you-clean-and-normalize-data-faster/

To be able to run analytics, build reports, or apply machine learning, you need to be sure the data you’re using is clean and in the right format. That’s the data preparation step that requires data analysts and data scientists to write custom code and do many manual activities. First, you need to look at the data, understand which possible values are present, and build some simple visualizations to understand if there are correlations between the columns. Then, you need to check for strange values outside of what you’re expecting, such as weather temperature above 200℉ (93℃) or speed of a truck above 200 mph (322 km/h), or for data that is missing. Many algorithms need values to be rescaled to a specific range, for example between 0 and 1, or normalized around the mean. Text fields need to be set to a standard format, and may require advanced transformations such as stemming.

That’s a lot of work. For this reason, I am happy to announce that today AWS Glue DataBrew is available, a visual data preparation tool that helps you clean and normalize data up to 80% faster so you can focus more on the business value you can get.

DataBrew provides a visual interface that quickly connects to your data stored in Amazon Simple Storage Service (S3), Amazon Redshift, Amazon Relational Database Service (RDS), any JDBC accessible data store, or data indexed by the AWS Glue Data Catalog. You can then explore the data, look for patterns, and apply transformations. For example, you can apply joins and pivots, merge different data sets, or use functions to manipulate data.

Once your data is ready, you can immediately use it with AWS and third-party services to gain further insights, such as Amazon SageMaker for machine learning, Amazon Redshift and Amazon Athena for analytics, and Amazon QuickSight and Tableau for business intelligence.

How AWS Glue DataBrew Works
To prepare your data with DataBrew, you follow these steps:

Connect one or more datasets from S3 or the Glue data catalog (S3, Redshift, RDS). You can also upload a local file to S3 from the DataBrew console. CSV, JSON, Parquet, and .XLSX formats are supported.
Create a project to visually explore, understand, combine, clean, and normalize data in a dataset. You can merge or join multiple datasets. From the console, you can quickly spot anomalies in your data with value distributions, histograms, box plots, and other visualizations.
Generate a rich data profile for your dataset with over 40 statistics by running a job in the profile view.
When you select a column, you get recommendations on how to improve data quality.
You can clean and normalize data using more than 250 built-in transformations. For example, you can remove or replace null values, or create encodings. Each transformation is automatically added as a step to build a recipe.
You can then save, publish, and version recipes, and automate the data preparation tasks by applying recipes on all incoming data. To apply recipes to or generate profiles for large datasets, you can run jobs.
At any point in time, you can visually track and explore how datasets are linked to projects, recipes, and job runs. In this way, you can understand how data flows and what are the changes. This information is called data lineage and can help you find the root cause in case of errors in your output.

Let’s see how this works with a quick demo!

Preparing a Sample Dataset with AWS Glue DataBrew
In the DataBrew console, I select the Projects tab and then Create project. I name the new project Comments. A new recipe is also created and will be automatically updated with the data transformations that I will apply next.

I choose to work on a New dataset and name it Comments.

Here, I select Upload file and in the next dialog I upload a comments.csv file I prepared for this demo. In a production use case, here you will probably connect an existing source on S3 or in the Glue Data Catalog. For this demo, I specify the S3 destination for storing the uploaded file. I leave Encryption disabled.

The comments.csv file is very small, but will help show some common data preparation needs and how to complete them quickly with DataBrew. The format of the file is comma-separated values (CSV). The first line contains the name of the columns. Then, each line contains a text comment and a numerical rating made by a customer (customer_id) about an item (item_id). Each item is part of a category. For each text comment, there is an indication of the overall sentiment (comment_sentiment). Optionally, when giving the comment, customers can enable a flag to ask to be contacted for further support (support_needed).

Here’s the content of the comments.csv file:

customer_id,item_id,category,rating,comment,comment_sentiment,support_needed
234,2345,"Electronics;Computer", 5,"I love this!",Positive,False
321,5432,"Home;Furniture",1,"I can't make this work... Help, please!!!",negative,true
123,3245,"Electronics;Photography",3,"It works. But I'd like to do more",,True
543,2345,"Electronics;Computer",4,"Very nice, it's going well",Positive,False
786,4536,"Home;Kitchen",5,"I really love it!",positive,false
567,5432,"Home;Furniture",1,"I doesn't work :-(",negative,true
897,4536,"Home;Kitchen",3,"It seems OK...",,True
476,3245,"Electronics;Photography",4,"Let me say this is nice!",positive,false

In the Access permissions, I select a AWS Identity and Access Management (IAM) role which provides DataBrew read permissions to my input S3 bucket. Only roles where DataBrew is the service principal for the trust policy are shown in the DataBrew console. To create one in the IAM console, select DataBrew as trusted entity.

If the dataset is big, you can use Sampling to limit the number of rows to use in the project. These rows can be selected at the beginning, at the end, or randomly through the data. You are going to use projects to create recipes, and then jobs to apply recipes to all the data. Depending on your dataset, you may not need access to all the rows to define the data preparation recipe.

Optionally, you can use Tagging to manage, search, or filter resources you create with AWS Glue DataBrew.

The project is now being prepared and in a few minutes I can start exploring my dataset.

In the Grid view, the default when I create a new project, I see the data as it has been imported. For each column, there is a summary of the range of values that have been found. For numerical columns, the statistical distribution is given.

In the Schema view, I can drill down on the schema that has been inferred, and optionally hide some of the columns.

In the Profile view, I can run a data profile job to examine and collect statistical summaries about the data. This is an assessment in terms of structure, content, relationships, and derivation. For a large dataset, this is very useful to understand the data. For this small example the benefits are limited, but I run it nonetheless, sending the output of the profile job to a different folder in the same S3 bucket I use to store the source data.

When the profile job has succeeded, I can see a summary of the rows and columns in my dataset, how many columns and rows are valid, and correlations between columns.

Here, if I select a column, for example rating, I can drill down into specific statistical information and correlations for that column.

Now, let’s do some actual data preparation. In the Grid view, I look at the columns. The category contains two pieces of information, separated by a semicolon. For example, the category of the first row is “Electronics;Computers.” I select the category column, then click on the column actions (the three small dots on the right of the column name) and there I have access to many transformations that I can apply to the column. In this case, I select to split the column on a single delimiter. Before applying the changes, I quickly preview them in the console.

I use the semicolon as delimiter, and now I have two columns, category_1 and category_2. I use the column actions again to rename them to category and subcategory. Now, for the first row, category contains Electronics and subcategory Computers. All these changes are added as steps to the project recipe, so that I’ll be able to apply them to similar data.

The rating column contains values between 1 and 5. For many algorithms, I prefer to have these kind of values normalized. In the column actions, I use min-max normalization to rescale the values between 0 and 1. More advanced techniques are available, such as mean or Z-score normalization. A new rating_normalized column is added.

I look into the recommendations that DataBrew gives for the comment column. Since it’s text, the suggestion is to use a standard case format, such as lowercase, capital case, or sentence case. I select lowercase.

The comments contain free text written by customers. To simplify further analytics, I use word tokenization on the column to remove stop words (such as “a,” “an,” “the”), expand contractions (so that “don’t” becomes “do not”), and apply stemming. The destination for these changes is a new column, comment_tokenized.

I still have some special characters in the comment_tokenized column, such as an emoticon :-). In the column actions, I select to clean and remove special characters.

I look into the recommendations for the comment_sentiment column. There are some missing values. I decide to fill the missing values with a neutral sentiment. Now, I still have values written with a different case, so I follow the recommendation to use lowercase for this column.

The comment_sentiment column now contains three different values (positive, negative, or neutral), but many algorithms prefer to have one-hot encoding, where there is a column for each of the possible values, and these columns contain 1, if that is the original value, or 0 otherwise. I select the Encode icon in the menu bar and then One-hot encode column. I leave the defaults and apply. Three new columns for the three possible values are added.

The support_needed column is recognized as boolean, and its values are automatically formatted to a standard format. I don’t have to do anything here.

The recipe for my dataset is now ready to be published and can be used in a recurring job processing similar data. I didn’t have a lot of data, but the recipe can be used with much larger datasets.

In the recipe, you can find a list of all the transformations that I just applied. When running a recipe job, output data is available in S3 and ready to be used with analytics and machine learning platforms, or to build reports and visualization with BI tools. The output can be written in a different format than the input, for example using a columnar storage format like Apache Parquet.

Available Now

AWS Glue DataBrew is available today in US East (N. Virginia), US East (Ohio), US West (Oregon), Europe (Ireland), Europe (Frankfurt), Asia Pacific (Tokyo), Asia Pacific (Sydney).

It’s never been easier to prepare you data for analytics, machine learning, or for BI. In this way, you can really focus on getting the right insights for your business instead of writing custom code that you then have to maintain and update.

To practice with DataBrew, you can create a new project and select one of the sample datasets that are provided. That’s a great way to understand all the available features and how you can apply them to your data.

Learn more and get started with AWS Glue DataBrew today.

— Danilo

Introducing AWS Gateway Load Balancer – Easy Deployment, Scalability, and High Availability for Partner Appliances

2020-11-10 Channy Yun

Post Syndicated from Channy Yun original https://aws.amazon.com/blogs/aws/introducing-aws-gateway-load-balancer-easy-deployment-scalability-and-high-availability-for-partner-appliances/

Last year, we launched Virtual Private Cloud (VPC) Ingress Routing to allow routing of all incoming and outgoing traffic to/from an Internet Gateway (IGW) or Virtual Private Gateway (VGW) to the Elastic Network Interface of a specific Amazon Elastic Compute Cloud (EC2) instance. With VPC Ingress Routing, you can now configure your VPC to send all […]

Welcome to AWS Storage Day 2020

2020-11-09 Jeff Barr

Post Syndicated from Jeff Barr original https://aws.amazon.com/blogs/aws/welcome-to-aws-storage-day-2020/

Our first-ever Storage Day in November 2019 (Welcome to AWS Storage Day) was a big success. We were able to take a multitude of significant announcements related to AWS Storage services and summarize them in a single post, with longer and more detailed posts as needed. Today, we are doing it again, so welcome to […]

New – Archive and Replay Events with Amazon EventBridge

2020-11-06 Danilo Poccia

Post Syndicated from Danilo Poccia original https://aws.amazon.com/blogs/aws/new-archive-and-replay-events-with-amazon-eventbridge/

Event-driven architectures use events to share information between the components of one or more applications. Events tell us that “something has happened”, maybe you received an API request, a file has been uploaded to a storage platform, or a database record has been updated. Business events describe something related to your activities, for example that […]

In the Works – AWS Region in Hyderabad, India

2020-11-06 Jeff Barr

Post Syndicated from Jeff Barr original https://aws.amazon.com/blogs/aws/in-the-works-aws-region-in-hyderabad-india/

We opened the AWS Regions in South Africa and Italy earlier this year and are currently working on regions in Indonesia, Japan, Spain, and Switzerland. Second AWS Region in India We launched the Asia Pacific (Mumbai) Region in June 2016, giving enterprises, public sector organizations, startups, and SMBs access to state-of-the-art public cloud infrastructure. In […]

Amazon MQ Update – New RabbitMQ Message Broker Service

2020-11-04 Channy Yun

Post Syndicated from Channy Yun original https://aws.amazon.com/blogs/aws/amazon-mq-update-new-rabbitmq-message-broker-service/

In 2017, we launched Amazon MQ – a managed message broker service for Apache ActiveMQ, a popular open-source message broker that is fast and feature-rich. It offers queues and topics, durable and non-durable subscriptions, push-based and poll-based messaging, and filtering. With Amazon MQ, we have enhanced lots of new features by customer feedback to improve […]

New – GPU-Equipped EC2 P4 Instances for Machine Learning & HPC

2020-11-02 Jeff Barr

Post Syndicated from Jeff Barr original https://aws.amazon.com/blogs/aws/new-gpu-equipped-ec2-p4-instances-for-machine-learning-hpc/

The Amazon EC2 team has been providing our customers with GPU-equipped instances for nearly a decade. The first-generation Cluster GPU instances were launched in late 2010, followed by the G2 (2013), P2 (2016), P3 (2017), G3 (2017), P3dn (2018), and G4 (2019) instances. Each successive generation incorporates increasingly-capable GPUs, along with enough CPU power, memory, […]

In the Works – New AWS Region in Zurich, Switzerland

2020-11-02 Alex Casalboni

Post Syndicated from Alex Casalboni original https://aws.amazon.com/blogs/aws/in-the-works-new-aws-region-in-zurich-switzerland/

Earlier this year, we launched the new AWS Region in Italy and have plans for three more AWS Regions in Indonesia, Japan, and Spain. Coming to Switzerland in 2022 Today, I’m happy to announce that the AWS Europe (Zurich) Region is in the works. It will open in the second half of 2022 with three […]

New – Application Load Balancer Support for End-to-End HTTP/2 and gRPC

2020-10-29 Danilo Poccia

Post Syndicated from Danilo Poccia original https://aws.amazon.com/blogs/aws/new-application-load-balancer-support-for-end-to-end-http-2-and-grpc/

Thanks to its efficiency and support for numerous programming languages, gRPC is a popular choice for microservice integrations and client-server communications. gRPC is a high performance remote procedure call (RPC) framework using HTTP/2 for transport and Protocol Buffers to describe the interface. To make it easier to use gRPC with your applications, Application Load Balancer (ALB) […]

AWS Nitro Enclaves – Isolated EC2 Environments to Process Confidential Data

2020-10-29 Jeff Barr

Post Syndicated from Jeff Barr original https://aws.amazon.com/blogs/aws/aws-nitro-enclaves-isolated-ec2-environments-to-process-confidential-data/

When I first told you about the AWS Nitro System, I said: The Nitro system is a rich collection of building blocks that can be assembled in many different ways, giving us the flexibility to design and rapidly deliver EC2 instance types with an ever-broadening selection of compute, storage, memory, and networking options. To date, […]

Introducing Amazon SNS FIFO – First-In-First-Out Pub/Sub Messaging

2020-10-23 Danilo Poccia

Post Syndicated from Danilo Poccia original https://aws.amazon.com/blogs/aws/introducing-amazon-sns-fifo-first-in-first-out-pub-sub-messaging/

When designing a distributed software architecture, it is important to define how services exchange information. For example, the use of asynchronous communication decouples components and simplifies scaling, reducing the impact of changes and making it easier to release new features.

The two most common forms of asynchronous service-to-service communication are message queues and publish/subscribe messaging:

With message queues, messages are stored on the queue until they are processed and deleted by a consumer. On AWS, Amazon Simple Queue Service (SQS) provides a fully managed message queuing service with no administrative overhead.
With pub/sub messaging, a message published to a topic is delivered to all subscribers to the topic. On AWS, Amazon Simple Notification Service (SNS) is a fully managed pub/sub messaging service that enables message delivery to a large number of subscribers. Each subscriber can also set a filter policy to receive only the messages that it cares about.

You can use topics when you want to fan out messages to multiple applications, and queues when you want to send messages to one application. Using topics and queues together, you can decouple microservices, distributed systems, and serverless applications.

With SQS, you can use FIFO (First-In-First-Out) queues to preserve the order in which messages are sent and received, and to avoid that a message is processed more than once.

Introducing SNS FIFO Topics
Today, we are adding similar capabilities for pub/sub messaging with the introduction of SNS FIFO topics, providing strict message ordering and deduplicated message delivery to one or more subscribers.

FIFO topics manage ordering and deduplication similar to FIFO queues:

Ordering – You configure a message group by including a message group ID when publishing a message to a FIFO topic. For each message group ID, all messages are sent and delivered in order of their arrival. For example, to ensure the delivery of messages related to the same customer in order, you can publish these messages to the topic using the customer’s account number as the message group ID. There is no limit in the number of message groups with FIFO topics and queues. You don’t need to declare in advance the message group ID, any value will work. If you don’t have a logical distinction between messages, you can simply use the same message group ID for all and have a single group of ordered messages. The message group ID is passed to any subscribed FIFO queue.

Deduplication – Distributed systems (like SNS) and client applications sometimes generate duplicate messages. You can avoid duplicated message deliveries from the topic in two ways: either by enabling content-based deduplication on the topic, or by adding a deduplication ID to the messages that you publish. With message content-based deduplication, SNS uses a SHA-256 hash to generate the message deduplication ID using the body of the message. After a message with a specific deduplication ID is published successfully, there is a 5-minute interval during which any message with the same deduplication ID is accepted but not delivered. If you subscribe a FIFO queue to a FIFO topic, the deduplication ID is passed to the queue and it is used by SQS to avoid duplicate messages being received.

You can use FIFO topics and queues together to simplify the implementation of applications where the order of operations and events is critical, or when you cannot tolerate duplicates. For example, to process financial operations and inventory updates, or to asynchronously apply commands that you receive from a client device. FIFO queues can use message filtering in FIFO topics to selectively receive only a subset of messages rather than every message published to the topic.

How to Use SNS FIFO Topics
A common scenario where FIFO topics can help is when you receive updates that need to be processed in order. For example, I can use a FIFO topic to receive updates from an application where my customers edit their account profiles. Then, I subscribe an SQS FIFO queue to the FIFO topic, and use the queue as trigger for a Lambda function that applies the account updates to an Amazon DynamoDB table used by my Customer management system that needs to be kept in sync.

The decoupling introduced by the FIFO topic makes it easier to add new functionality with minimal impact to existing applications. For example, to reward my loyal customers with additional promotions, I add a new Loyalty application that is storing information in a relational database managed by Amazon Aurora. To keep the customer’s information stored in the Loyalty database in sync with my other applications, I can subscribe a new FIFO queue to the same FIFO topic, and add a new Lambda function that receives customer updates in the same order as they are generated, and applies them to the Loyalty database. In this way, I don’t need to change code and configuration of other applications to integrate the new Loyalty app.

First, I create two FIFO queues in the SQS console, leaving all options to their defaults:

The customer.fifo queue to process updates in my Customer management system.
The loyalty.fifo queue to help me collect and store customer updates for the Loyalty application.

In the SNS console, I create the updates.fifo topic. I select FIFO as type, and enable Content-based message deduplication.

Then, I subscribe the customer.fifo and loyalty.fifo queues to the topic.

To be able to receive messages, I add a statement to the access policy of both queues granting the updates.fifo topic permissions to send messages to the queues. For example, for the customer.fifo queue the statement is:

{
  "Effect": "Allow",
  "Principal": {
    "Service": "sns.amazonaws.com"
  },
  "Action": "SQS:SendMessage",
  "Resource": "arn:aws:sqs:us-east-2:123412341234:customer.fifo",
  "Condition": {
    "ArnLike": {
      "aws:SourceArn": "arn:aws:sns:us-east-2:123412341234:updates.fifo"
    }
  }
}

Now, I use the SNS console to publish 4 messages in sequence. For all messages, I use the same message group ID. In this way, they are all in the same message group. The only part that is different is the message body, where I use in order:

Update One
Update Two
Update Three
Update One

In the SQS console, I see that only 3 messages have been delivered to the FIFO queues:

Why is that? When I created the FIFO topics, I enabled content-based deduplication. The 4 messages were sent within the 5-minute deduplication window. The last message has been recognized as a duplicate of the first one and has not been delivered to the subscribed queues.

Let’s see the actual messages in the queues. I use the AWS Command Line Interface (CLI) to receive the messages from SQS, and the jq command-line JSON processor to format the output and get only the Message in the Body.

Here are the messages in the customer.fifo queue:

$ aws sqs receive-message --queue-url https://sqs.us-east-2.amazonaws.com/123412341234/customer.fifo --max-number-of-messages 10 | jq '.Messages[].Body | fromjson | .Message'

"Update One"
"Update Two"
"Update Three"

And these are the messages in the loyalty.fifo queue:

$ aws sqs receive-message --queue-url https://sqs.us-east-2.amazonaws.com/123412341234/loyalty.fifo --max-number-of-messages 10 | jq '.Messages[].Body | fromjson | .Message'

"Update One"
"Update Two"
"Update Three"

As expected, the 3 messages with unique content have been delivered to both queues in the same order as they were sent.

Available Now
You can use SNS FIFO topics in all commercial regions. You can process up to 300 transactions per second (TPS) per FIFO topic or FIFO queue. With SNS, you pay only for what you use, you can find more information in the pricing page.

To learn more, please see the documentation.

— Danilo

Amazon Prime Day 2020 – Powered by AWS

2020-10-22 Jeff Barr

Post Syndicated from Jeff Barr original https://aws.amazon.com/blogs/aws/amazon-prime-day-2020-powered-by-aws/

Tipped off by a colleague in Denmark, I bought the LEGO Star Wars Stormtrooper Helmet, which turned out to be a Prime Day best-seller!

As I like to do every year, I would like to share a few of the many ways that AWS helped to make Prime Day a reality for our customers. Back in 2016 I wrote How AWS Powered Amazon’s Biggest Day Ever to describe how we plan for Prime Day and that post is still informative and relevant.

This time around I would like to focus on four ways that AWS helped to support Prime Day: Amazon Live and IVS, Infrastructure Event Management, Storage, and Content Delivery.

Amazon Live and IVS on Prime Day
Throughout Prime Day 2020, Amazon customers were able to shop from livestreams through Amazon Live. Shoppers were also able to use live chat to interact with influencers and hosts in real time. They were able to ask questions, share their experiences, and get a better feel for products of interest to them.

Amazon Live helped customers learn more about products and take advantage of top deals by counting down to Deal Reveals and sharing live product demonstrations. Anitta, Russell Wilson, and Ciara curated Prime Day deals as did author Elizabeth Gilbert. In addition, influencers including @SheaWhitney, @ShopDandy, and @TheDealGuy shared their top product picks with customers. In total, there were over 1,200 live streams and tens of thousands of chat messages on Amazon Live during Prime Day.

To deliver these enhanced shopping experiences for customers and for creators, low latency video is essential. It enables Amazon Live to synchronize the products featured in the live video with the products displayed in the carousel at the bottom of the video player. Low latency also allows the livestream hosts to answer customer questions in real-time. And, of course, on Prime Day in particular, all of this needed to happen at scale.

In order to do this, the Amazon Live team made use of the newly launched Amazon Interactive Video Service (IVS). As Martin explains in his recent post (Amazon Interactive Video Service – Add Live Video to Your Apps and Websites), this is a managed live streaming service that supports the creation of interactive, low-latency video experiences. It uses the same technology that powers Twitch, and allows you to deliver live content with very low latency, often three seconds or less (20 to 30 seconds is more common).

Infrastructure Event Management
AWS Infrastructure Event Management (IEM) helps our customers to plan and run large-scale business-critical events.This program is included in the Enterprise Support plan and is available to Business Support customers for an additional fee. IEM includes an assessment of operational readiness, identification and mitigation of risks, and the confidence to run an event with AWS experts standing by and ready to help.

This year, the TAMs (Technical Account Managers) that support the IEM program created a Control Room that was 100% virtual. A combination of Slack channels and Amazon Chime bridges empowered AWS service teams, AWS support, IT support, and Amazon Customer Reliability Engineering (thousands of people in all) to communicate and collaborate in real time.

Storage for Prime Day
Amazon DynamoDB powers multiple high-traffic Amazon properties and systems including Alexa, the Amazon.com sites, and all Amazon fulfillment centers. Over the course of the 66-hour Prime Day, these sources made 16.4 trillion calls to the DynamoDB API, peaking at 80.1 million requests per second.

On the block storage side, Amazon Elastic Block Store (EBS) added 241 petabytes of storage in preparation for Prime Day; the resulting fleet handled 6.2 trillion requests per day and transferred 563 petabytes per day.

Content Delivery for Prime Day
Amazon CloudFront played an important role as always, serving up web and streamed content to a world-wide audience. CloudFront handled over 280 million HTTP requests per minute, a total of 450 billion requests across all of the Amazon.com sites.

— Jeff;

Public Preview – AWS Distro for OpenTelemetry

2020-10-21 Jeff Barr

Post Syndicated from Jeff Barr original https://aws.amazon.com/blogs/aws/public-preview-aws-distro-open-telemetry/

It took me a while to figure out what observability was all about. A year or two I asked around and my colleagues told me that I needed to follow Charity Majors and to read her blog (done, and done). Just this week, Charity tweeted:

Kislay’s tweet led to his blog post, Observing is not Debugging, which I found very helpful. As Charity noted, Kislay tells us that Observability is a study of the system in motion.

Today’s large-scale distributed applications and systems are effectively always in motion. Whether serving web requests, processing streams of data or handling events, something is always happening. At world-scale, looking at individual requests or events is not always feasible. Instead, it is necessary to take a statistical approach and to watch how well a system is working, instead of simply waiting for a total failure.

New AWS Distro for OpenTelemetry
Today we are launching a preview of AWS Distro for OpenTelemetry. We are part of the Cloud Native Computing Foundation (CNCF)’s OpenTelemetry community, working to define an open standard for the collection of distributed traces and metrics. AWS Distro for OpenTelemetry is a secure and supported distribution of the APIs, libraries, agents, and collectors defined in the OpenTelemetry Specification.

One of the coolest features of the toolkit is auto instrumentation. Starting with Java and in the works for other languages and environments (.NET and JavaScript are next), the auto-instrumentation agent identifies the frameworks and languages used by your application and automatically instruments them to collect and forward metrics and traces.

Here’s how all of the pieces fit together:

The AWS Observability Collector runs within your environment. It can be launched as a sidecar or daemonset for EKS, a sidecar for ECS, or an agent on EC2. You configure the metrics and traces that you want to collect, and also which AWS services to forward them to. You can set up a central account for monitoring complex multi-account applications, and you can also control the sampling rate (what percentage of the raw data is forwarded and ultimately stored).

Partners in Action
You can make use of AWS and partner tools and applications to observe, analyze, and act on what you see. We’re working with Cisco AppDynamics, Datadog, New Relic, Splunk, and other partners and will have more information to share during the preview.

Things to Know
The preview of the AWS Distro for OpenTelemetry is available now and you can start using it today. In addition to the .NET and JavaScript support that I mentioned earlier, we plan to support Python, Ruby, Go, C++, Erlang, and Rust as well.

This is an open source project and welcome your pull requests! We will be tracking the upstream repository and plan to release a fresh version of the toolkit quarterly.

— Jeff;

PS – Be sure to sign up for our upcoming webinar, Observability at AWS and AWS Distro for OpenTelemetry Deep Dive.

New – Amazon RDS on Graviton2 Processors

2020-10-16 Sébastien Stormacq

Post Syndicated from Sébastien Stormacq original https://aws.amazon.com/blogs/aws/new-amazon-rds-on-graviton2-processors/

I recently wrote a post to announce the availability of M6g, R6g and C6g families of instances on Amazon Elastic Compute Cloud (EC2). These instances offer better cost-performance ratio than their x86 counterparts. They are based on AWS-designed AWS Graviton2 processors, utilizing 64-bit Arm Neoverse N1 cores.

Starting today, you can also benefit from better cost-performance for your Amazon Relational Database Service (RDS) databases, compared to the previous M5 and R5 generation of database instance types, with the availability of AWS Graviton2 processors for RDS. You can choose between M6g and R6g instance families and three database engines (MySQL 8.0.17 and higher, MariaDB 10.4.13 and higher, and PostgreSQL 12.3 and higher).

M6g instances are ideal for general purpose workloads. R6g instances offer 50% more memory than their M6g counterparts and are ideal for memory intensive workloads, such as Big Data analytics.

Graviton2 instances provide up to 35% performance improvement and up to 52% price-performance improvement for RDS open source databases, based on internal testing of workloads with varying characteristics of compute and memory requirements.

Graviton2 instances family includes several new performance optimizations such as larger L1 and L2 caches per core, higher Amazon Elastic Block Store (EBS) throughput than comparable x86 instances, fully encrypted RAM, and many others as detailed on this page. You can benefit from these optimizations with minimal effort, by provisioning or migrating your RDS instances today.

RDS instances are available in multiple configurations, starting with 2 vCPUs, with 8 GiB memory for M6g, and 16 GiB memory for R6g with up to 10 Gbps of network bandwidth, giving you new entry-level general purpose and memory optimized instances. The table below shows the list of instance sizes available for you:

Instance Size	vCPU	Memory (GiB)		Dedicated EBS Bandwidth (Mbps)	Network Bandwidth (Gbps)
		M6g	R6g
large	2	8	16	Up to 4750	Up to 10
xlarge	4	16	32	Up to 4750	Up to 10
2xlarge	8	32	64	Up to 4750	Up to 10
4xlarge	16	64	128	4750	Up to 10
8xlarge	32	128	256	9000	12
12xlarge	48	192	384	13500	20
16xlarge	64	256	512	19000	25

Let’s Start Your First Graviton2 Based Instance
To start a new RDS instance, I use the AWS Management Console or the AWS Command Line Interface (CLI), just like usual, and select one of the db.m6g or db.r6ginstance types (this page in the documentation has all the details).

Using the CLI, it would be:

aws rds create-db-instance
 --region us-west-2 \
 --db-instance-identifier $DB_INSTANCE_NAME \
 --db-instance-class db.m6g.large \
 --engine postgres \
 --engine-version 12.3 \
 --allocated-storage 20 \
 --master-username $MASTER_USER \
 --master-user-password $MASTER_PASSWORD

The CLI confirms with:

{
    "DBInstance": {
        "DBInstanceIdentifier": "newsblog",
        "DBInstanceClass": "db.m6g.large",
        "Engine": "postgres",
        "DBInstanceStatus": "creating",
...
}

Migrating to Graviton2 instances is easy, in the AWS Management Console, I select my database and I click Modify.

The I select the new DB instance class:

Or, using the CLI, I can use the modify-db-instance API call.

There is a short service interruption happening when you switch instance type. By default, the modification will happen during your next maintenance window, unless you enable the ApplyImmediately option.

You can provision new or migrate to Graviton2 Amazon Relational Database Service (RDS) instances in all regions where EC2 M6g and R6g are available : US East (N. Virginia), US East (Ohio), US West (Oregon), Asia Pacific (Mumbai), Asia Pacific (Singapore), Asia Pacific (Sydney), Asia Pacific (Tokyo), Europe (Ireland), and Europe (Frankfurt) AWS Regions.

As usual, let us know your feedback on the AWS Forum or through your usual AWS contact.

— seb

Bhuvaneswari Subramani – Bengaluru, India

Jared Short – Washington DC, USA

Matt Coulter – Belfast, Northern Ireland

Paul Duvall – Washington DC, USA

Sebastian Korfmann – Hamburg, Germany

Steve Gordon – East Sussex, United Kingdom

Thorsten Höger – Stuttgart, Germany

Ahmed Samir – Riyadh, Saudi Arabia

Anas Khattar – Beirut, Lebanon

Chris Gong – New York, USA

Damian Olguin – Cordoba, Argentina

Denis Bauer – Sydney, Australia

Denis Dyack – St. Catharines, Canada

Emrah Şamdan – Ankara, Turkey

Franck Pachot – Lausanne, Switzerland

Hiroko Nishimura – Washington DC, USA

Juliano Cristian – Santa Catarina, Brazil

Jungyoul Yu – Seoul, Korea

Juv Chan – Singapore

Lukonde Mwila – Johannesburg, South Africa

Magdalena Zawada – Rybnik, Poland

Nick Walter – Lincoln, USA

Renato Losio – Berlin, Germany

Tomasz Lakomy – Poznan, Poland

The collective thoughts of the interwebz