All posts by Channy Yun

New – Trusted Language Extensions for PostgreSQL on Amazon Aurora and Amazon RDS

2022-11-30 Channy Yun

Post Syndicated from Channy Yun original https://aws.amazon.com/blogs/aws/new-trusted-language-extensions-for-postgresql-on-amazon-aurora-and-amazon-rds/

PostgreSQL has become the preferred open-source relational database for many enterprises and start-ups with its extensible design for developers. One of the reasons developers use PostgreSQL is it allows them to add database functionality by building extensions with their preferred programming languages.

You can already install and use PostgreSQL extensions in Amazon Aurora PostgreSQL-Compatible Edition and Amazon Relational Database Service for PostgreSQL. We support more than 85 PostgreSQL extensions in Amazon Aurora and Amazon RDS, such as the pgAudit extension for logging your database activity. While many workloads use these extensions, we heard our customers asking for flexibility to build and run the extensions of their choosing for their PostgreSQL database instances.

Today, we are announcing the general availability of Trusted Language Extensions for PostgreSQL (pg_tle), a new open-source development kit for building PostgreSQL extensions. With Trusted Language Extensions for PostgreSQL, developers can build high-performance extensions that run safely on PostgreSQL.

Trusted Language Extensions for PostgreSQL provides database administrators control over who can install extensions and a permissions model for running them, letting application developers deliver new functionality as soon as they determine an extension meets their needs.

To start building with Trusted Language Extensions, you can use trusted languages such as JavaScript, Perl, and PL/pgSQL. These trusted languages have safety attributes, including restricting direct access to the file system and preventing unwanted privilege escalations. You can easily install extensions written in a trusted language on Amazon Aurora PostgreSQL-Compatible Edition 14.5 and Amazon RDS for PostgreSQL 14.5 or a newer version.

Trusted Language Extensions for PostgreSQL is an open-source project licensed under Apache License 2.0 on GitHub. You can comment or suggest items on the Trusted Language Extensions for PostgreSQL roadmap and help us support this project across multiple programming languages, and more. Doing this as a community will help us make it easier for developers to use the best parts of PostgreSQL to build extensions.

Let’s explore how we can use Trusted Language Extensions for PostgreSQL to build a new PostgreSQL extension for Amazon Aurora and Amazon RDS.

Setting up Trusted Language Extensions for PostgreSQL
To use pg_tle with Amazon Aurora or Amazon RDS for PostgreSQL, you need to set up a parameter group that loads pg_tle in the PostgreSQL shared_preload_libraries setting. Choose Parameter groups in the left navigation pane in the Amazon RDS console and Create parameter group to make a new parameter group.

Choose Create after you select postgres14 with Amazon RDS for PostgreSQL in the Parameter group family and pg_tle in the Group Name. You can select aurora-postgresql14 for an Amazon Aurora PostgreSQL-Compatible cluster.

Choose a created pgtle parameter group and Edit in the Parameter group actions dropbox menu. You can search shared_preload_library in the search box and choose Edit parameter. You can add your preferred values, including pg_tle, and choose Save changes.

You can also do the same job in the AWS Command Line Interface (AWS CLI).

$ aws rds create-db-parameter-group \
  --region us-east-1 \
  --db-parameter-group-name pgtle \
  --db-parameter-group-family aurora-postgresql14 \
  --description "pgtle group"

$ aws rds modify-db-parameter-group \
  --region us-east-1 \
  --db-parameter-group-name pgtle \
  --parameters "ParameterName=shared_preload_libraries,ParameterValue=pg_tle,ApplyMethod=pending-reboot"

Now, you can add the pgtle parameter group to your Amazon Aurora or Amazon RDS for PostgreSQL database. If you have a database instance called testing-pgtle, you can add the pgtle parameter group to the database instance using the command below. Please note that this will cause an active instance to reboot.

$ aws rds modify-db-instance \
  --region us-east-1 \
  --db-instance-identifier testing-pgtle \
  --db-parameter-group-name pgtle-pg \
  --apply-immediately

Verify that the pg_tle library is available on your Amazon Aurora or Amazon RDS for PostgreSQL instance. Run the following command on your PostgreSQL instance:

SHOW shared_preload_libraries;

pg_tle should appear in the output.

Now, we need to create the pg_tle extension in your current database to run the command:

 CREATE EXTENSION pg_tle;

You can now create and install Trusted Language Extensions for PostgreSQL in your current database. If you create a new extension, you should grant the pgtle_admin role to your primary user (e.g., postgres) with the following command:

GRANT pgtle_admin TO postgres;

Let’s now see how to create our first pg_tle extension!

Building a Trusted Language Extension for PostgreSQL
For this example, we are going to build a pg_tle extension to validate that a user is not setting a password that’s found in a common password dictionary. Many teams have rules around the complexity of passwords, particularly for database users. PostgreSQL allows developers to help enforce password complexity using the check_password_hook.

In this example, you will build a password check hook using PL/pgSQL. In the hook, you can check to see if the user-supplied password is in a dictionary of 10 of the most common password values:

SELECT pgtle.install_extension (
  'my_password_check_rules',
  '1.0',
  'Do not let users use the 10 most commonly used passwords',
$_pgtle_$
  CREATE SCHEMA password_check;
  REVOKE ALL ON SCHEMA password_check FROM PUBLIC;
  GRANT USAGE ON SCHEMA password_check TO PUBLIC;

  CREATE TABLE password_check.bad_passwords (plaintext) AS
  VALUES
    ('123456'),
    ('password'),
    ('12345678'),
    ('qwerty'),
    ('123456789'),
    ('12345'),
    ('1234'),
    ('111111'),
    ('1234567'),
    ('dragon');
  CREATE UNIQUE INDEX ON password_check.bad_passwords (plaintext);

  CREATE FUNCTION password_check.passcheck_hook(username text, password text, password_type pgtle.password_types, valid_until timestamptz, valid_null boolean)
  RETURNS void AS $$
    DECLARE
      invalid bool := false;
    BEGIN
      IF password_type = 'PASSWORD_TYPE_MD5' THEN
        SELECT EXISTS(
          SELECT 1
          FROM password_check.bad_passwords bp
          WHERE ('md5' || md5(bp.plaintext || username)) = password
        ) INTO invalid;
        IF invalid THEN
          RAISE EXCEPTION 'password must not be found on a common password dictionary';
        END IF;
      ELSIF password_type = 'PASSWORD_TYPE_PLAINTEXT' THEN
        SELECT EXISTS(
          SELECT 1
          FROM password_check.bad_passwords bp
          WHERE bp.plaintext = password
        ) INTO invalid;
        IF invalid THEN
          RAISE EXCEPTION 'password must not be found on a common password dictionary';
        END IF;
      END IF;
    END
  $$ LANGUAGE plpgsql SECURITY DEFINER;

  GRANT EXECUTE ON FUNCTION password_check.passcheck_hook TO PUBLIC;

  SELECT pgtle.register_feature('password_check.passcheck_hook', 'passcheck');
$_pgtle_$
);

You need to enable the hook through the pgtle.enable_password_check configuration parameter. On Amazon Aurora and Amazon RDS for PostgreSQL, you can do so with the following command:

$ aws rds modify-db-parameter-group \
    --region us-east-1 \
    --db-parameter-group-name pgtle \
    --parameters "ParameterName=pgtle.enable_password_check,ParameterValue=on,ApplyMethod=immediate"

It may take several minutes for these changes to propagate. You can check that the value is set using the SHOW command:

SHOW pgtle.enable_password_check;

If the value is on, you will see the following output:

 pgtle.enable_password_check
-----------------------------
 on

Now you can create this extension in your current database and try setting your password to one of the dictionary passwords and observe how the hook rejects it:

CREATE EXTENSION my_password_check_rules;

CREATE ROLE test_role PASSWORD '123456';
ERROR:  password must not be found on a common password dictionary

CREATE ROLE test_role;
SET SESSION AUTHORIZATION test_role;
SET password_encryption TO 'md5';
\password
-- set to "password"
ERROR:  password must not be found on a common password dictionary

To disable the hook, set the value of pgtle.enable_password_check to off:

$ aws rds modify-db-parameter-group \
    --region us-east-1 \
    --db-parameter-group-name pgtle \
    --parameters "ParameterName=pgtle.enable_password_check,ParameterValue=off,ApplyMethod=immediate"

You can uninstall this pg_tle extension from your database and prevent anyone else from running CREATE EXTENSION on my_password_check_rules with the following command:

DROP EXTENSION my_password_check_rules;
SELECT pgtle.uninstall_extension('my_password_check_rules');

You can find more sample extensions and give them a try. To build and test your Trusted Language Extensions in your local PostgreSQL database, you can build from our source code after cloning the repository.

Join Our Community!
The Trusted Language Extensions for PostgreSQL community is open to everyone. Give it a try, and give us feedback on what you would like to see in future releases. We welcome any contributions, such as new features, example extensions, additional documentation, or any bug reports in GitHub.

To learn more about using Trusted Language Extensions for PostgreSQL in the AWS Cloud, see the Amazon Aurora PostgreSQL-Compatible Edition and Amazon RDS for PostgreSQL documentation.

Give it a try, and please send feedback to AWS re:Post for PostgreSQL or through your usual AWS support contacts.

– Channy

Preview: Use Amazon SageMaker to Build, Train, and Deploy ML Models Using Geospatial Data

2022-11-30 Channy Yun

Post Syndicated from Channy Yun original https://aws.amazon.com/blogs/aws/preview-use-amazon-sagemaker-to-build-train-and-deploy-ml-models-using-geospatial-data/

You use map apps every day to find your favorite restaurant or travel the fastest route using geospatial data. There are two types of geospatial data: vector data that uses two-dimensional geometries such as a building location (points), roads (lines), or land boundary (polygons), and raster data such as satellite and aerial images.

Last year, we introduced Amazon Location Service, which makes it easy for developers to add location functionality to their applications. With Amazon Location Service, you can visualize a map, search points of interest, optimize delivery routes, track assets, and use geofencing to detect entry and exit events in your defined geographical boundary.

However, if you want to make predictions from geospatial data using machine learning (ML), there are lots of challenges. When I studied geographic information systems (GIS) in graduate school, I was limited to a small data set that covered only a narrow area and had to contend with limited storage and only the computing power of my laptop at the time.

These challenges include 1) acquiring and accessing high-quality geospatial datasets is complex as it requires working with multiple data sources and vendors, 2) preparing massive geospatial data for training and inference can be time-consuming and expensive, and 3) specialized tools are needed to visualize geospatial data and integrate with ML operation infrastructure

Today I’m excited to announce the preview release of Amazon SageMaker‘s new geospatial capabilities that make it easy to build, train, and deploy ML models using geospatial data. This collection of features offers pre-trained deep neural network (DNN) models and geospatial operators that make it easy to access and prepare large geospatial datasets. All generated predictions can be visualized and explored on the map.

Also, you can use the new geospatial image to transform and visualize data inside geospatial notebooks using open-source libraries such as NumPy, GDAL, GeoPandas, and Rasterio, as well as SageMaker-specific libraries.

With a few clicks in the SageMaker Studio console, a fully integrated development environment (IDE) for ML, you can run an Earth Observation job, such as a land cover segmentation or launch notebooks. You can bring various geospatial data, for example, your own Planet Labs satellite data from Amazon S3, or US Geological Survey LANDSAT and Sentinel-2 images from Open Data on AWS, Amazon Location Service, or bring your own data, such as location data generated from GPS devices, connected vehicles or internet of things (IoT) sensors, retail store foot traffic, geo-marketing and census data.

The Amazon SageMaker geospatial capabilities support use cases across any industry. For example, insurance companies can use satellite images to analyze the damage impact from natural disasters on local economies, and agriculture companies can track the health of crops, predict harvest yield, and forecast regional demand for agricultural produce. Retailers can combine location and map data with competitive intelligence to optimize new store locations worldwide. These are just a few of the example use cases. You can turn your own ideas into reality!

Introducing Amazon SageMaker Geospatial Capabilities
In the preview, you can use SageMaker Studio initialized in the US West (Oregon) Region. Make sure to set the default Jupyter Lab 3 as the version when you create a new user in the Studio. To learn more about setting up SageMaker Studio, see Onboard to Amazon SageMaker Domain Using Quick setup in the AWS documentation.

Now you can find the Geospatial section by navigating to the homepage and scrolling down in SageMaker Studio’s new Launcher tab.

Here is an overview of three key Amazon SageMaker geospatial capabilities:

Earth Observation jobs – Acquire, transform, and visualize satellite imagery data to make predictions and get useful insights.
Vector Enrichment jobs – Enrich your data with operations, such as converting geographical coordinates to readable addresses from CSV files.
Map Visualization – Visualize satellite images or map data uploaded from a CSV, JSON, or GeoJSON file.

Let’s dive deep into each component!

Get Started with an Earth Observation Job
To get started with Earth Observation jobs, select Create Earth Observation job on the front page.

You can select one of the geospatial operations or ML models based on your use case.

Spectral Index – Obtain a combination of spectral bands that indicate the abundance of features of interest.
Cloud Masking – Identify cloud and cloud-free pixels to get clear and accurate satellite imagery.
Land Cover Segmentation – Identify land cover types such as vegetation and water in satellite imagery.

The SageMaker provides a combination of geospatial functionalities that include built-in operations for data transformations along with pretrained ML models. You can use these models to understand the impact of environmental changes and human activities over time, identify cloud and cloud-free pixels, and perform semantic segmentation.

Define a Job name, choose a model to be used, and click the bottom-right Next button to move to the second configuration step.

Next, you can define an area of interest (AOI), the satellite image data set you want to use, and filters for your job. The left screen shows the Area of Interest map to visualize for your Earth Observation Job selection, and the right screen contains satellite images and filter options for your AOI.

You can choose the satellite image collection, either USGS LANDSAT or Sentinel-2 images, the date span for your Earth Observation job, and filters on properties of your images in the filter section.

I uploaded GeoJSON format to define my AOI as the Mountain Halla area in Jeju island, South Korea. I select all job properties and options and choose Create.

Once the Earth Observation job is successfully created, a flashbar will appear where I can view my job details by pressing the View job details button.

Once the job is finished, I can Visualize job output.

This image is a job output on rendering process to detect land usage from input satellite images. You can see either input images, output images, or the AOI from data layers in the left pane.

It shows automatic mapping results of land cover for natural resource management. For example, the yellow area is the sea, green is cloud, dark orange is forest, and orange is land.

You can also execute the same job with SageMaker notebook using the geospatial image with geospatial SDKs.

From the File and New, choose Notebook and select the Image dropdown menu in the Setup notebook environment and choose Geospatial 1.0. Let the other settings be set to the default values.

Let’s look at Python sample code! First, set up SageMaker geospatial libraries.

import boto3
import botocore
import sagemaker
import sagemaker_geospatial_map

region = boto3.Session().region_name
session = botocore.session.get_session()
execution_role = sagemaker.get_execution_role()

sg_client= session.create_client(
    service_name='sagemaker-geospatial',
    region_name=region
)

Start an Earth Observation Job to identify the land cover types in the area of Jeju island.

# Perform land cover segmentation on images returned from the sentinel dataset.
eoj_input_config = {
    "RasterDataCollectionQuery": {
        "RasterDataCollectionArn": <ArnDataCollection,
        "AreaOfInterest": {
            "AreaOfInterestGeometry": {
                "PolygonGeometry": {
                    "Coordinates": [
                        [[126.647226, 33.47014], [126.406116, 33.47014], [126.406116, 33.307529], [126.647226, 33.307529], [126.647226, 33.47014]]
                    ]
                }
            }
        },
        "TimeRangeFilter": {
            "StartTime": "2022-11-01T00:00:00Z",
            "EndTime": "2022-11-22T23:59:59Z"
        },
        "PropertyFilters": {
            "Properties": [
                {
                    "Property": {
                        "EoCloudCover": {
                            "LowerBound": 0,
                            "UpperBound": 20
                        }
                    }
                }
            ],
            "LogicalOperator": "AND"
        }
    }
}
eoj_config = {"LandCoverSegmentationConfig": {}}

response = sg_client.start_earth_observation_job(
    Name =  "jeju-island-landcover", 
    InputConfig = eoj_input_config,
    JobConfig = eoj_config, 
    ExecutionRoleArn = execution_role
)
# Monitor the EOJ status
sg_client.get_earth_observation_job(Arn = response['Arn'])

After your EOJ is created, the Arn is returned to you. You use the Arn to identify a job and perform further operations. After finishing the job, visualize Earth Observation inputs and outputs in the visualization tool.

# Creates an instance of the map to add EOJ input/ouput layer
map = sagemaker_geospatial_map.create_map({
    'is_raster': True
})
map.set_sagemaker_geospatial_client(sg_client)
# render the map
map.render()

# Visualize input, you can see EOJ is not be completed.
time_range_filter={
    "start_date": "2022-11-01T00:00:00Z",
    "end_date": "2022-11-22T23:59:59Z"
}
arn_to_visualize = response['Arn']
config = {
    'label': 'Jeju island'
}
input_layer=map.visualize_eoj_input(Arn=arn_to_visualize, config=config , time_range_filter=time_range_filter)

# Visualize output, EOJ needs to be in completed status
time_range_filter={
    "start_date": "2022-11-01T00:00:00Z",
    "snd_date": "2022-11-22T23:59:59Z"
}

config = {
   'preset': 'singleBand',
   'band_name': 'mask'
}
output_layer = map.visualize_eoj_output(Arn=arn_to_visualize, config=config, time_range_filter=time_range_filter)

You can also execute the StartEarthObservationJob API using the AWS Command Line Interface (AWS CLI).

When you create an Earth Observation Job in notebooks, you can use additional geospatial functionalities. Here is a list of some of the other geospatial operations that are supported by Amazon SageMaker:

Band Stacking – Combine multiple spectral properties to create a single image.
Cloud Removal – Remove pixels containing parts of a cloud from satellite imagery.
Geomosaic – Combine multiple images for greater fidelity.
Resampling – Scale images to different resolutions.
Temporal Statistics – Calculate statistics through time for multiple GeoTIFFs in the same area.
Zonal Statistics – Calculate statistics on user-defined regions.

To learn more, see Amazon SageMaker geospatial notebook SDK and Amazon SageMaker geospatial capability Service APIs in the AWS documentation and geospatial sample codes in the GitHub repository.

Perform a Vector Enrichment Job and Map Visualization
A Vector Enrichment Job (VEJ) performs operations on your vector data, such as reverse geocoding or map matching.

Reverse Geocoding – Convert map coordinates to human-readable addresses powered by Amazon Location Service.
Map Matching – Match GPS coordinates to road segments.

While you need to use an Amazon SageMaker Studio notebook to execute a VEJ, you can view all the jobs you create.

With the StartVectorEnrichmentJob API, you can create a VEJ for the supplied two job types.

{
  "Name":"vej-reverse", 
  "InputConfig":{
       "DocumentType":"csv", //
       "DataSourceConfig":{
       "S3Data":{
            "S3Uri":"s3://channy-geospatial/sample/vej.csv",
        } 
   }
  }, 
  "JobConfig": {
      "MapMatchingConfig": { 
          "YAttributeName":"string", // Latitude 
          "XAttributeName":"string", // Longitude 
          "TimestampAttributeName":"string", 
          "IdAttributeName":"string"
       }
   },
   "ExecutionRoleArn":"string" 
}

You can visualize the output of VEJ in the notebook or use the Map Visualization feature after you export VEJ jobs output to your S3 bucket. With the map visualization feature, you can easily show your geospatial data on the map.

This sample visualization includes Seattle City Council districts and public-school locations in GeoJSON format. Select Add data to upload data files or select S3 bucket.

{
  "type": "FeatureCollection",
  "crs": { "type": "name", "properties": { 
            "name":   "urn:ogc:def:crs:OGC:1.3:CRS84" } },
                                                                                
  "features": [
            { "type": "Feature", "id": 1, "properties": { "PROPERTY_L": "Jane Addams", "Status": "MS" }, "geometry": { "type": "Point", "coordinates": [ -122.293009024934037, 47.709944862769468 ] } },
            { "type": "Feature", "id": 2, "properties": { "PROPERTY_L": "Rainier View", "Status": "ELEM" }, "geometry": { "type": "Point", "coordinates": [ -122.263172064204767, 47.498863322205558 ] } },
            { "type": "Feature", "id": 3, "properties": { "PROPERTY_L": "Emerson", "Status": "ELEM" }, "geometry": { "type": "Point", "coordinates": [ -122.258636146463658, 47.514820466363943 ] } }
            ]
}

That’s all! For more information about each component, see Amazon SageMaker geospatial Developer Guide.

Join the Preview
The preview release of Amazon SageMaker geospatial capability is now available in the US West (Oregon) Region.

We want to hear more feedback during the preview. Give it a try, and please send feedback to AWS re:Post for Amazon SageMaker or through your usual AWS support contacts.

– Channy

Introducing Amazon Omics – A Purpose-Built Service to Store, Query, and Analyze Genomic and Biological Data at Scale

2022-11-29 Channy Yun

Post Syndicated from Channy Yun original https://aws.amazon.com/blogs/aws/introducing-amazon-omics-a-purpose-built-service-to-store-query-and-analyze-genomic-and-biological-data-at-scale/

You might learn in high school biology class that the human genome is composed of over three billion letters of code using adenine (A), guanine (G), cytosine (C), and thymine (T) paired in the deoxyribonucleic acid (DNA). The human genome acts as the biological blueprint of every human cell. And that’s only the foundation for what makes us human.

Healthcare and life sciences organizations collect myriad types of biological data to improve patient care and drive scientific research. These organizations map an individual’s genetic predisposition to disease, identify new drug targets based on protein structure and function, profile tumors based on what genes are expressed in a specific cell, or investigate how gut bacteria can influence human health. Collectively, these studies are often known as “omics”.

AWS has helped healthcare and life sciences organizations accelerate the translation of this data into actionable insights for over a decade. Industry leaders such as as Ancestry, AstraZeneca, Illumina, DNAnexus, Genomics England, and GRAIL leverage AWS to accelerate time to discovery while concurrently reducing costs and enhancing security.

The scale these customers, and others, operate at continues to increase rapidly. When omics data across thousand or hundreds of thousands (or more!) of individuals are compared and analyzed, new insights for predicting disease and the efficacy of different drug treatments are possible.

However, this scale, which can be many petabytes of data, can add complexity. When I studied medical informatics in my Ph.D course, I experienced this complexity in data access, processing, and tooling. You need a way to store omics data that is cost-efficient and easy to access. You need to scale compute across millions of biological samples while preserving accuracy and reliability. You also need specialized tools to analyze genetic patterns across populations and train machine learning (ML) models to predict diseases.

Today I’m excited to announce the general availability of Amazon Omics, a purpose-built service to help bioinformaticians, researchers, and scientists store, query, and analyze genomic, transcriptomic, and other omics data and then generate insights from that data to improve health and advance scientific discoveries.

With just a few clicks in the Omics console, you can import and normalize petabytes of data into formats optimized for analysis. Amazon Omics provides scalable workflows and integrated tools for preparing and analyzing omics data and automatically provisions and scales the underlying cloud infrastructure. So, you can focus on advancing science and translate discoveries into diagnostics and therapies.

Amazon Omics has three primary components:

Omics-optimized object storage that helps customers store and share their data efficiently and at low cost.
Managed compute for bioinformatics workflows that allows customers to run the exact analysis they specify, without worrying about provisioning underlying infrastructure.
Optimized data stores for population-scale variant analysis.

Now let’s learn more about each component of Amazon Omics. Generally, it follows the steps to create a data store and import data files, such as genome sequencing raw data, set up a basic bioinformatics workflow, and analyze results using existing AWS analytics and ML services.

The Getting Started page in the Omics console contains tutorial examples using Amazon SageMaker notebooks with the Python SDK. I will demonstrate Amazon Omics features through an example using a human genome reference.

Omics Data Storage
The Omics data storage helps you store and share petabytes of omics data efficiently. You can create data stores and import sample data in the Omics console and also do the same job in the AWS Command Line Interface (AWS CLI).

Let’s make a reference store and import a reference genome. This example uses Genome Reference Consortium Human Reference 38 (hg38), which is open access and available from the following Amazon S3 bucket: s3://broad-references/hg38/v0/Homo_sapiens_assembly38.fasta.

As prerequisites, you need to create Amazon S3 bucket in your preferred Region and the necessary IAM permissions to access S3 buckets. In the Omics console, you can easily create and select IAM role during the Omics storage setup.

Use the following AWS CLI command to create your reference store, copy the genome data to your S3 bucket, and import it data into your reference store.

// Create your reference store
$ aws omics create-reference-store --name "Reference Store"

// Import your reference data into your data store
$ aws s3 cp s3://broad-references/hg38/v0/Homo_sapiens_assembly38.fasta,name=hg38 s3://channy-omics
$ aws omics start-reference-import-job --sources sourceFile=s3://channy-omics/Homo_sapiens_assembly38.fasta,name=hg38 --reference-store-id 123456789 --role-arn arn:aws:iam::01234567890:role/OmicsImportRole

You can see the result in your console too.

Now you can create a sequence store. A sequence store is similar to an S3 bucket. Each object in a sequence store is known as a “read set”. A read set is an abstraction of a set of genomics file types:

FASTQ – A text-based file format that stores information about a base (sequence letter) from a sequencer and the corresponding quality information.
BAM – The compressed binary version of raw reads and their mapping to a reference genome.
CRAM – Similar to BAM, but uses the reference genome information to aid in compression.

Amazon Omics allows you to specify domain-specific metadata to your read sets you import. These are searchable and defined when you start a read set import job.

As an example, we will use the 1000 Genomes Project, a highly detailed catalogue of more than 80 million human genetic variants for more than 400 billions data points from over 2500 individuals. Let’s make a sequence store and then import genome sequence files into it.

// Create your sequence store 
$ aws omics create-sequence-store --name "MySequenceStore"

// Import your reference data into your data store
$ aws s3 cp s3://1000genomes/phase3/data/HG00146/sequence_read/SRR233106_1.filt.fastq.gz s3://channy-omics
$ aws s3 cp s3://1000genomes/phase3/data/HG00146/sequence_read/SRR233106_2.filt.fastq.gz s3://channy-omics

$ aws omics start-read-set-import-job --cli-input-json ‘
{
    "sourceFiles":
    {
        "source1": "s3://channy-omics/SRR233106_1.filt.fastq.gz",
        "source2": "s3://channy-omics/SRR233106_2.filt.fastq.gz"

    },
    "sourceFileType": "FASTQ",
    "subjectId": "mySubject2",
    "sampleId": "mySample2",
    "referenceArn": "arn:aws:omics:us-east-1:123456789012:referenceStore/123467890",
    "name": "HG00100"
}’

You can see the result in your console again.

Analytics Transformations
You can store variant data referring to a mutation, a difference between what the sequencer read at a position compared to the known reference and annotation data, known information about a location or variant in a genome, such as whether it may cause disease.

A variant store supports both variant call format files (VCF) where there is a called variant and gVCF inputs with records covering every position in a genome. An annotation store supports either a generic feature format (GFF3), tab-separated values (TSV), or VCF file. An annotation store can be mapped to the same coordinate system as variant stores during an import.

Once you’ve imported your data, you can now run queries like as followings which search for Single Nucleotide Variants (SNVs), the most common type of genetic variation among people, on human chromosome 1.

SELECT
    sampleid,
    contigname,
    start,
    referenceallele,
    alternatealleles
FROM "myvariantstore"."myvariantstore"
WHERE
    contigname = 'chr1'
    and cardinality(alternatealleles) = 1
    and length(alternatealleles[1]) = 1
    and length(referenceallele) = 1
LIMIT 10

You can see the output of this query:

#	sampleid	contigname	start	referenceallele	alternatealleles
1	NA20858	chr1	10096	T	[A]
2	NA19347	chr1	10096	T	[A]
3	NA19735	chr1	10096	T	[A]
4	NA20827	chr1	10102	T	[A]
5	HG04132	chr1	10102	T	[A]
6	HG01961	chr1	10102	T	[A]
7	HG02314	chr1	10102	T	[A]
8	HG02837	chr1	10102	T	[A]
9	HG01111	chr1	10102	T	[A]
10	NA19205	chr1	10108	A	[T]

You can view, manage, and query those data by integrating with existing analytics engines such as Amazon Athena. These query results can be used to train ML models in Amazon SageMaker.

Bioinformatics Workflows
Amazon Omics allows you to perform bioinformatics workflow, such as variant calling or gene expression, analysis on AWS. These compute workloads are defined using workflow languages like Workflow Description Language (WDL) and Nextflow, domain-specific languages that specify multiple compute tasks and their input and output dependencies.

You can define and execute a workflow using a few simple CLI commands. As an example, create a main.wdl file with the following WDL codes to create a simple WDL workflow with one task that creates a copy of a file.

version 1.0
workflow Test {
	input {
		File input_file
	}
	call FileCopy {
		input:
			input_file = input_file,
	}
	output {
		File output_file = FileCopy.output_file
	}
}
task FileCopy {
	input {
		File input_file
	}
	command {
		echo "copying ~{input_file}" >&2
		cat ~{input_file} > output
	}
	output {
		File output_file = "output"
	}
}

Then zip up your workflow and create your workflow with Amazon Omics using the AWS CLI:

$ zip my-wdl-workflow-zip main.wdl
$ aws omics create-workflow \
    --name MyWDLWorkflow \
    --description "My WDL Workflow" \
    --definition-zip file://my-wdl-workflow.zip \
    --parameter-template '{"input_file": "input test file to copy"}'

To run the workflow we just created, you can use the following command:

aws omics start-run \
  --workflow-id // id of the workflow we just created  \
  --role-arn // arn of the IAM role to run the workflow with  \
  --parameters '{"input_file": "s3://bucket/path/to/file"}' \
  --output-uri s3://bucket/path/to/results

Once the workflow completes, you could use these results in s3://bucket/path/to/results for downstream analyses in the Omics variant store.

You can execute a run, a single invocation of a workflow with a task and defined compute specifications. An individual run acts on your defined input data and produces an output. Runs also can have priorities associated with them, which allow specific runs to take execution precedence over other submitted and concurrent runs. For example, you can specify that a run that is high priority will be run before one that is lower priority.

You can optionally use a run group, a group of runs that you can set the max vCPU and max duration runs to help limit the compute resources used per run. This can help you partition users who may need access to different workflows to run on different data. It can also be used as a budget control/resource fairness mechanism by isolating users to specific run groups.

As you saw, Amazon Omics gives you a managed service with a couple of clicks and simple commands, and APIs in analyzing large-scale omic data, such as human genome samples so you can derive meaningful insights from this data, in hours rather than weeks. We also provide more tutorial SageMaker notebooks that you can use in Amazon SageMaker to help you get started.

In terms of data security, Amazon Omics helps ensure that your data remains secure and patient privacy is protected with customer-managed encryption keys, and HIPAA eligibility.

Customer and Partner Voices
Customers and partners in the healthcare and life science industry have shared how they are using Amazon Omics to accelerate scientific insights.

Children’s Hospital of Philadelphia (CHOP) is the oldest hospital in the United States dedicated exclusively to pediatrics and strives to advance healthcare for children with the integration of excellent patient care and innovative research. AWS has worked with the CHOP Research Institute for many years as they’ve led the way in utilizing data and technology to solve challenging problems in child health.

“At Children’s Hospital of Philadelphia, we know that getting a comprehensive view of our patients is crucial to delivering the best possible care, based on the most innovative research. Combining multiple clinical modalities is foundational to achieving this. With Amazon Omics, we can expand our understanding of our patients’ health, all the way down to their DNA.” – Jeff Pennington, Associate Vice President & Chief Research Informatics Officer, Children’s Hospital of Philadelphia

G42 Healthcare enables AI-powered healthcare that uses data and emerging technologies to personalize preventative care.

“Amazon Omics allows G42 to accelerate a competitive and deployable end-to-end service with globally leading data governance. We’re able to leverage the extensive omics data management and bioinformatics solutions hosted globally on AWS, at our customers’ fingertips. Our collaboration with AWS is much more than data – it’s about value.” – Ashish Koshi, CEO, G42 Healthcare

C2i Genomics brings together researchers, physicians and patients to utilize ultra-sensitive whole-genome cancer detection to personalize medicine, reduce cancer treatment costs, and accelerate drug development.

“In C2i Genomics, we empower our data scientists by providing them cloud-based computational solutions to run high-scale, customizable genomic pipelines, allowing them to focus on method development and clinical performance, while the company’s engineering teams are responsible for the operations, security and privacy aspects of the workloads. Amazon Omics allows researchers to use tools and languages from their own domain, and considerably reduces the engineering maintenance effort while taking care of cost and resource allocation considerations, which in turn reduce time-to-market and NRE costs of new features and algorithmic improvements.” – Ury Alon, VP Engineering, C2i Genomics

We are excited to work hand in hand with our AWS partners to build scalable, multi-modal solutions that enable the conversion of raw sequencing data into insights.

Lifebit builds enterprise data platforms for organizations with complex and sensitive biomedical datasets, empowering customers across the life sciences sector to transform how they use sensitive biomedical data.

“At Lifebit, we’re on a mission to connect the world’s biomedical data to obtain novel therapeutic insights. Our customers work with vast cohorts of linked genomic, multi-omics and clinical data – and these data volumes are expanding rapidly. With Amazon Omics they will have access to optimised analytics and storage for this large-scale data, allowing us to provide even more scalable bioinformatics solutions. Our customers will benefit from significantly lower cost per gigabase of data, essentially achieving hot storage performance at cold storage prices, removing cost as a barrier to generating insights from their population-scale biomedical data.” – Thorben Seeger, Chief Business Development Officer, Lifebit

To hear more customers and partner voices, see Amazon Omics Customers page.

Now Available
Amazon Omics is now available in the US East (N. Virginia), US West (Oregon), Europe (Ireland), Europe (London), Europe (Frankfurt), and Asia Pacific (Singapore) Regions.

To learn more, see the Amazon Omics page, Amazon Omics User Guide, Genomics on AWS, and Healthcare & Life Sciences on AWS. Give it a try, and please contact AWS genomics team and send feedback through your usual AWS support contacts.

– Channy

New – Amazon EC2 Hpc6id Instances Optimized for High Performance Computing

2022-11-29 Channy Yun

Post Syndicated from Channy Yun original https://aws.amazon.com/blogs/aws/new-amazon-ec2-hpc6id-instances-optimized-for-high-performance-computing/

We have given you the flexibility and ability to run the largest and most complex high performance computing (HPC) workloads with Amazon Elastic Compute Cloud (Amazon EC2) instances that feature enhanced networking like C5n, C6gn, R5n, M5n, and our recently launched HPC instances Hpc6a.

We heard feedback from customers asking us to deliver more options to support their most intensive workloads with higher per-vCPU compute performance as well as larger memory and local disk storage to reduce job completion time for data-intensive workloads like Finite Element Analysis (FEA) and seismic processing.

Announcing Amazon EC2 Hpc6id Instance for HPC Workloads
Today, we announce the general availability of Amazon EC2 Hpc6id instances, a new instance type that is purpose-built for tightly coupled HPC workloads. Amazon EC2 Hpc6id instances are powered by 3rd Gen Intel Xeon Scalable processors (Ice Lake) that run at frequencies up to 3.5 GHz, 1024 GiB memory, 15.2 TB local SSD disk, 200 Gbps Elastic Fabric Adapter (EFA) network bandwidth, which is 4x higher than R6i instances.

Amazon EC2 Hpc6id instances have the best per-vCPU HPC performance when compared to similar x86-based EC2 instances for data-intensive HPC workloads.

Here are the detailed specs:

Instance Name	CPUs	RAM	EFA Network Bandwidth	Attached Storage
hpc6id.32xlarge	64	1024 GiB	Up to 200 Gbps	15.2 TB local SSD disk

Amazon EC2 Hpc6id Instances Use Cases
Customers running license-bound scenarios can lower infrastructure and HPC software licensing costs with Hpc6id. Other customers with HPC codes that are optimized for Intel-specific features, such as Math Kernel Library or AVX-512, can migrate their largest HPC workloads to Hpc6id and scale up their workloads on AWS by taking advantage of 200 Gbps EFA bandwidth.

Other customers using HPC software codes that are optimized for per-CPU performance are also able to consolidate their workloads on fewer nodes and complete jobs faster with Hpc6id. Faster job completion time helps customers to reduce both infrastructure and software licensing costs. Customers can use Hpc6id instances to quickly carry out complex calculations across a range of cluster sizes—up to tens of thousands of cores.

Customers also can use Hpc6id instances with AWS ParallelCluster to provision Hpc6id instances alongside other instance types, giving customers the flexibility to run different workload types within the same HPC cluster. Hpc6id instances benefit from the AWS Nitro System, a rich collection of building blocks that offloads many of the traditional virtualization functions to dedicated hardware and software to deliver high performance, high availability, and high security while also reducing virtualization overhead.

Now Available
Amazon EC2 Hpc6id instances are available for purchase as On-Demand or Reserved Instances or with Savings Plans. Hpc6id instances are available in the US East (Ohio) and AWS GovCloud (US-West) Regions. To optimize Amazon EC2 Hpc6id instances networking for tightly coupled workloads, use cluster placement groups within a single Availability Zone.

To learn more, visit our Hpc6 instance page and get in touch with our HPC team, AWS re:Post for EC2, or through your usual AWS Support contacts.

– Channy

Preview: Amazon Security Lake – A Purpose-Built Customer-Owned Data Lake Service

2022-11-29 Channy Yun

Post Syndicated from Channy Yun original https://aws.amazon.com/blogs/aws/preview-amazon-security-lake-a-purpose-built-customer-owned-data-lake-service/

To identify potential security threats and vulnerabilities, customers should enable logging across their various resources and centralize these logs for easy access and use within analytics tools. Some of these data sources include logs from on-premises infrastructure, firewalls, and endpoint security solutions, and when utilizing the cloud, services such as Amazon Route 53, AWS CloudTrail, and Amazon Virtual Private Cloud (Amazon VPC).

The Amazon Simple Storage Service (Amazon S3) and AWS Lake Formation simplify the creation and management of a data lake on AWS. But, some customers’ security teams still struggle to define and implement security domain–specific aspects, such as data normalization, which requires them to analyze each log source’s structure and fields, define schemas and mappings, and pull in data enrichment such as threat intelligence.

Today we are announcing the preview release of Amazon Security Lake, a purpose-built service that automatically centralizes an organization’s security data from cloud and on-premises sources into a purpose-built data lake stored in your account. Amazon Security Lake automates the central management of security data, normalizing from integrated AWS services and third-party services and managing the lifecycle of data with customizable retention and also automates storage tiering.

Here are the key features of Amazon Security Lake:

Variety of supported log and event sources – During the preview, Amazon Security Lake automatically collects logs for AWS CloudTrail, Amazon VPC, Amazon Route 53, Amazon S3, and AWS Lambda, as well as security findings via AWS Security Hub for AWS Config, AWS Firewall Manager, Amazon GuardDuty, AWS Health Dashboard, AWS IAM Access Analyzer, Amazon Inspector, Amazon Macie, and AWS Systems Manager Patch Manager. Additionally, over 50 sources of third-party security findings can be sent to Amazon Security Lake. Security Partners are also directly sending data in a standard schema called the Open Cybersecurity Schema Framework (OCSF) format to Amazon Security Lake, such as Cisco Security, CrowdStrike, Palo Alto Networks, and more.
Data transformation and normalization – Security Lake automatically partitions and converts incoming log data to a storage and query-efficient Apache Parquet and OCSF format, making the data broadly and immediately usable for security analytics without the need for post-processing. Security Lake supports integrations with analytics partners such as IBM, Splunk, Sumo Logic, and more to address a variety of security use cases such as threat detection, investigation, and incident response.
Customizable data access levels – You can configure the level of subscribers consuming data stored in the Security Lake, such as specific data sources for data access to all new objects or directly querying data stored. You can also specify a rollup Region that the Security Lake is available in and multiple AWS accounts across your AWS Organizations. This can help you comply with data residency compliance requirements.

By reducing the operational overhead of security data management, you can make it easier to gather more security signals from across your organization and analyze that data to improve the protection of your data, applications, and workloads.

Configure Your Security Lake for Collection Data
To get started with Amazon Security Lake, choose Get started in the AWS console. You can enable log and event sources for all Regions and all accounts.

You can select log and event sources such as CloudTrail logs, VPC flow logs, and Route53 resolver logs into your data lake. Select Regions will contribute their data to your data lake with the Amazon S3-managed encryption that Amazon S3 will create and manage all encryption keys, as well as the specific AWS accounts in your organizations.

Next, you can select rollup and contributing Regions. All aggregated data from contributing Regions reside in the rollup Region. You can create multiple rollup Regions, which can help you comply with data residency compliance requirements. Optionally, you can define the Amazon S3 storage classes and the retention period you want the data to transition from the standard Amazon S3 storage classes used in Security Lake.

After initial configuration, choose Sources in the left pane of the console if you can add or remove log sources in your Regions or account.

You can also collect data from custom sources, such as Bind DNS logs, endpoint telemetry logs, on-premise Netflow logs, and so on. Before adding a custom source, you need to create AWS IAM role to grant permissions for AWS Glue.

To create a custom data source, choose Create custom source in the left menu of Custom sources.

It requires you to enter the IAM role Amazon Resource Names (ARNs) to write data to Security Lake and invoke AWS Glue on your behalf. Then, you can provide details about your custom source.

For efficient data processing and querying, objects from your custom sources should be partitioned by AWS Region, AWS account, year, month, day, and hour with a Parquet-formatted object.

Consume Your Data from Security Lake
Now you can create a subscriber, a service that consumes logs and events from Security Lake. To add or see your subscribers, choose Subscribers in the left pane of the console.

The Security Lake supports two types of subscriber data access methods:

Data access (Amazon S3) – Subscribers are notified of new objects for a source as the data is written to your Security Lake S3 bucket. You can choose to notify subscribers of new objects with an Amazon Simple Queue Service (Amazon SQS) queue or through messaging to an HTTPS endpoint provided by the subscriber. This type is useful to ingest selected data in your analytics application—good for use cases that require frequent access to data.
Query access (Lake Formation) – Subscribers can consume data by directly querying AWS Lake Formation tables in your S3 bucket through services like Amazon Athena. This type is useful to provide on-demand query access to data without the need to pre-ingest anything and for use cases that require infrequent access or on large volume sources too expensive to ingest upfront or retain in analytics tools.

When you add a subscriber, you can choose Amazon S3 to create data access for the subscriber. If you select the default method of notification, you can receive the following object notification message in either an HTTPS endpoint or Amazon SQS.

{
  "source": "aws.s3",
  "time": "2021-11-12T00:00:00Z",
  "region": "ca-central-1",
  "resources": [
    "arn:aws:s3:::example-bucket"
  ],
  "detail": {
    "bucket": {
      "name": "example-bucket"
    },
    "object": {
      "key": "example-key",
      "size": 5,
      "etag": "b57f9512698f4b09e608f4f2a65852e5"
    },
    "request-id": "N4N7GDK58NMKJ12R",
    "requester": "123456789012"
  }
}

Subscribers with query access can directly query data that is stored in Security Lake by using services like Amazon Athena and other services that can read from AWS Lake Formation. The following are sample queries of CloudTrail data.

SELECT 
      time, 
      api.service.name, 
      api.operation, 
      api.response.error, 
      api.response.message, 
      src_endpoint.ip 
    FROM ${athena_db}.${athena_table}
    WHERE eventHour BETWEEN '${query_start_time}' and '${query_end_time}' 
      AND api.response.error in (
        'Client.UnauthorizedOperation',
        'Client.InvalidPermission.NotFound',
        'Client.OperationNotPermitted',
        'AccessDenied')
    ORDER BY time desc
    LIMIT 25

Subscribers only have access to source data in the AWS Region that you’ve selected when you create the subscriber. To give a subscriber access to data from multiple Regions, you can set the Region where you create your subscriber as a rollup Region.

Third-Party Integrations
For supported third-party integrations, there are a number of sources as well as subscribing services integrated with Amazon Security Lake.

Amazon Security Lake supports third-party sources providing OCSF security data, including Barracuda Networks, Cisco, Cribl, CrowdStrike, CyberArk, Lacework, Laminar, Netscout, Netskope, Okta, Orca, Palo Alto Networks, Ping Identity, SecurityScorecard, Tanium, The Falco Project, Trend Micro, Vectra AI, VMware, Wiz, and Zscaler.

You can also use third-party security, automation, and analytics tools supporting Security Lake, including Datadog, IBM, Rapid7, Securonix, SentinelOne, Splunk, Sumo Logic, and Trellix. There are also service partners such as Accenture, Atos, Deloitte, DXC, Kyndryl, PWC, Rackspace, and Wipro that can work with you and Amazon Security Lake.

Join the Preview
The preview release of Amazon Security Lake is now available in the US East (Ohio), US East (N. Virginia), US West (Oregon), Asia Pacific (Sydney), Asia Pacific (Tokyo), Europe (Frankfurt), and Europe (Ireland) Regions.

To learn more, see the Amazon Security Lake page and Amazon Security Lake User Guide. We want to hear more feedback during the preview. Please send feedback in AWS re:Post and through your usual AWS support contacts.

– Channy

New – Amazon Redshift Integration with Apache Spark

2022-11-29 Channy Yun

Post Syndicated from Channy Yun original https://aws.amazon.com/blogs/aws/new-amazon-redshift-integration-with-apache-spark/

Apache Spark is an open-source, distributed processing system commonly used for big data workloads. Spark application developers working in Amazon EMR, Amazon SageMaker, and AWS Glue often use third-party Apache Spark connectors that allow them to read and write the data with Amazon Redshift. These third-party connectors are not regularly maintained, supported, or tested with various versions of Spark for production.

Today we are announcing the general availability of Amazon Redshift integration for Apache Spark, which makes it easy to build and run Spark applications on Amazon Redshift and Redshift Serverless, enabling customers to open up the data warehouse for a broader set of AWS analytics and machine learning (ML) solutions.

With Amazon Redshift integration for Apache Spark, you can get started in seconds and effortlessly build Apache Spark applications in a variety of languages, such as Java, Scala, and Python.

Your applications can read from and write to your Amazon Redshift data warehouse without compromising on the performance of the applications or transactional consistency of the data, as well as performance improvements with pushdown optimizations.

Amazon Redshift integration for Apache Spark builds on an existing open source connector project and enhances it for performance and security, helping customers gain up to 10x faster application performance. We thank the original contributors on the project who collaborated with us to make this happen. As we make further enhancements we will continue to contribute back into the open source project.

Getting Started with Spark Connector for Amazon Redshift
To get started, you can go to AWS analytics and ML services, use data frame or Spark SQL code in a Spark job or Notebook to connect to the Amazon Redshift data warehouse, and start running queries in seconds.

In this launch, Amazon EMR 6.9, EMR Serverless, and AWS Glue 4.0 come with the pre-packaged connector and JDBC driver, and you can just start writing code. EMR 6.9 provides a sample notebook, and EMR Serverless provides a sample Spark Job too.

First, you should set AWS Identity and Access Management (AWS IAM) authentication between Redshift and Spark, between Amazon Simple Storage Service (Amazon S3) and Spark, and between Redshift and Amazon S3. The following diagram describes the authentication between Amazon S3, Redshift, the Spark driver, and Spark executors.

For more information, see Identity and access management in Amazon Redshift in the AWS documentation.

Amazon EMR
If you already have an Amazon Redshift data warehouse and the data available, you can create the database user and provide the right level of grants to the database user. To use this with Amazon EMR, you need to upgrade to the latest version of the Amazon EMR 6.9 that has the packaged spark-redshift connector. Select the emr-6.9.0 release when you create an EMR cluster on Amazon EC2.

You can use EMR Serverless to create your Spark application using the emr-6.9.0 release to run your workload.

EMR Studio also provides an example Jupyter Notebook configured to connect to an Amazon Redshift Serverless endpoint leveraging sample data that you can use to get started quickly.

Here is a Scalar example to build your applications both with Spark Dataframe and Spark SQL. Use IAM-based credentials for connecting to Redshift and use IAM role for unloading and loading data from S3.

// Create the JDBC connection URL and define the Redshift context
val jdbcURL = "jdbc:redshift:iam://<RedshiftEndpoint>:<Port>/<Database>?DbUser=<RsUser>"
val rsOptions = Map (
  "url" -> jdbcURL,
  "tempdir" -> tempS3Dir, 
  "aws_iam_role" -> roleARN,
  )
// Reference the sales table from Redshift 
val sales_df = spark
  .read 
  .format("io.github.spark_redshift_community.spark.redshift") 
  .options(rsOptions) 
  .option("dbtable", "sales") 
  .load() 
sales_df.createOrReplaceTempView("sales") 
// Reference the date table from Redshift using Data Frame 
sales_df.join(date_df, sales_df("dateid") === date_df("dateid"))
  .where(col("caldate") === "2008-01-05")
  .groupBy().sum("qtysold")
  .select(col("sum(qtysold)"))
  .show()

If Amazon Redshift and Amazon EMR are in different VPCs, you have to configure VPC peering or enable cross-VPC access. Assuming both Amazon Redshift and Amazon EMR are in the same virtual private cloud (VPC), you can create a Spark job or Notebook and connect to the Amazon Redshift data warehouse and write Spark code to use the Amazon Redshift connector.

To learn more, see Use Spark on Amazon Redshift with a connector in the AWS documentation.

AWS Glue
When you use AWS Glue 4.0, the spark-redshift connector is available both as a source and target. In Glue Studio, you can use a visual ETL job to read or write to a Redshift data warehouse simply by selecting a Redshift connection to use within a built-in Redshift source or target node.

The Redshift connection contains Redshift connection details along with the credentials needed to access Redshift with the proper permissions.

To get started, choose Jobs in the left menu of the Glue Studio console. Using either of the Visual modes, you can easily add and edit a source or target node and define a range of transformations on the data without writing any code.

Choose Create and you can easily add and edit a source, target node, and the transform node in the job diagram. At this time, you will choose Amazon Redshift as Source and Target.

Once completed, the Glue job can be executed on Glue for the Apache Spark engine, which will automatically use the latest spark-redshift connector.

The following Python script shows an example job to read and write to Redshift with dynamicframe using the spark-redshift connector.

import sys
from awsglue.transforms import *
from awsglue.utils import getResolvedOptions
from pyspark.context import SparkContext
from awsglue.context import GlueContext
from awsglue.job import Job

## @params: [JOB_NAME]
args = getResolvedOptions(sys.argv, ['JOB_NAME'])
sc = SparkContext()
glueContext = GlueContext(sc)
spark = glueContext.spark_session
job = Job(glueContext)
job.init(args['JOB_NAME'], args)

print("================ DynamicFrame Read ===============")
url = "jdbc:redshift://<RedshiftEndpoint>:<Port>/dev"
read_options = {
    "url": url,
    "dbtable": dbtable,
    "redshiftTmpDir": redshiftTmpDir,
    "tempdir": redshiftTmpDir,
    "aws_iam_role": aws_iam_role,
    "autopushdown": "true",
    "include_column_list": "false"
}

redshift_read = glueContext.create_dynamic_frame.from_options(
    connection_type="redshift",
    connection_options=read_options
) 

print("================ DynamicFrame Write ===============")

write_options = {
    "url": url,
    "dbtable": dbtable,
    "user": "awsuser",
    "password": "Password1",
    "redshiftTmpDir": redshiftTmpDir,
    "tempdir": redshiftTmpDir,
    "aws_iam_role": aws_iam_role,
    "autopushdown": "true",
    "DbUser": "awsuser"
}

print("================ dyf write result: check redshift table ===============")
redshift_write = glueContext.write_dynamic_frame.from_options(
    frame=redshift_read,
    connection_type="redshift",
    connection_options=write_options
)

When you set up your job detail, you can only use the Glue 4.0 – Supports spark 3.3 Python 3 version for this integration.

To learn more, see Creating ETL jobs with AWS Glue Studio and Using connectors and connections with AWS Glue Studio in the AWS documentation.

Gaining the Best Performance
In the Amazon Redshift integration for Apache Spark, the Spark connector automatically applies predicate and query pushdown to optimize for performance. You can gain performance improvement by using the default Parquet format for the connector used for unloading with this integration.

As the following sample code shows, the Spark connector will turn the supported function into a SQL query and run the query in Amazon Redshift.

import sqlContext.implicits._val
sample= sqlContext.read
.format("io.github.spark_redshift_community.spark.redshift")
.option("url",jdbcURL )
.option("tempdir", tempS3Dir)
.option("unload_s3_format", "PARQUET")
.option("dbtable", "event")
.load()

// Create temporary views for data frames created earlier so they can be accessed via Spark SQL
sales_df.createOrReplaceTempView("sales")
date_df.createOrReplaceTempView("date")
// Show the total sales on a given date using Spark SQL API
spark.sql(
"""SELECT sum(qtysold)
| FROM sales, date
| WHERE sales.dateid = date.dateid
| AND caldate = '2008-01-05'""".stripMargin).show()

Amazon Redshift integration for Apache Spark adds pushdown capabilities for operations such as sort, aggregate, limit, join, and scalar functions so that only the relevant data is moved from the Redshift data warehouse to the consuming Spark application, thereby improving performance.

Available Now
The Amazon Redshift integration for Apache Spark is now available in all Regions that support Amazon EMR 6.9, AWS Glue 4.0, and Amazon Redshift. You can start using the feature directly from EMR 6.9 and Glue Studio 4.0 with the new Spark 3.3.0 version.

Give it a try, and please send us feedback either in the AWS re:Post for Amazon Redshift or through your usual AWS support contacts.

– Channy

Preview: Amazon OpenSearch Serverless – Run Search and Analytics Workloads without Managing Clusters

2022-11-29 Channy Yun

Post Syndicated from Channy Yun original https://aws.amazon.com/blogs/aws/preview-amazon-opensearch-serverless-run-search-and-analytics-workloads-without-managing-clusters/

Most AWS analytics services have compelling serverless offerings that make it even easier for customers to analyze vast amounts of data without having to configure, scale, or manage the underlying infrastructure.

Along with other serverless analytics, such as Amazon QuickSight for business intelligence and AWS Glue for data integration, we have introduced Amazon EMR Serverless, Amazon MSK Serverless, and Amazon Redshift Serverless this year.

Today, we announce the preview release of a new serverless option for Amazon OpenSearch Service that makes it easy for customers to run large-scale search and analytics workloads without managing clusters. It automatically provisions and scales the underlying resources to deliver fast data ingestion and query responses for even the most demanding and unpredictable workloads, eliminating the need to configure and optimize clusters.

With Amazon OpenSearch Serverless, you do not need to account for factors that are hard to know in advance, such as the frequency and complexity of queries or the volume of data expected to be analyzed. Instead of managing infrastructure, you can focus on using OpenSearch for exploring and deriving insights from your data. You can also get started using familiar APIs to load and query data and use OpenSearch Dashboards for interactive data analysis and visualization.

Configure Your OpenSearch Serverless Collection
To get started with Amazon OpenSearch Serverless, you create a Collection via the AWS Management Console, AWS Command-Line Interface (AWS CLI), or AWS API.

Before the launch of OpenSearch Serverless, you created a managed cluster, specifying instance types, counts, and storage options, and then managed the lifecycle and shard strategy for indices within that cluster. With OpenSearch Serverless, you create a Collection, which manages a group of indices that work together to support a specific workload. You no longer need to specify the hardware or manage the indices directly.

To create an OpenSearch Serverless collection and secure data, set up Encryption policies to assign AWS KMS keys to one or more collections and attach Network policies to collections to control the access from specified VPCs and public IP addresses.

To create an encryption policy, choose Encryption policies in the left navigation pane and Create encryption policy. Encryption at rest secures the indices within your collection. For each collection, AWS KMS generates a unique, symmetric encryption key. Encryption policies are the optimal way to manage AWS KMS keys across multiple collections. You can define the target collection name or a prefix that automatically applies the encryption settings from this policy to the collection.

In order for users to access a collection, choose Network policies in the left navigation pane and Create network policy. Network policies determine whether your collection is accessible over the internet from public networks or whether it must be accessed through OpenSearch Serverless–managed VPC endpoints.

You can define multiple rules for each collection, either the Public or VPC, as a recommended option for the Access Type. If you select a public option, you can access the collection from OpenSearch Dashboards.

Also, you can configure access for OpenSearch Dashboards and the OpenSearch endpoint. For the Resource type, enable both Access to OpenSearch endpoints and Access to OpenSearch Dashboards. In both input boxes, select the Collection Name property and your collection name or prefix.

Finally, to create an OpenSearch Serverless collection, choose Create collection in the home page or choose Collections in the left navigation pane and choose Create collection.

Input your collection name, description, and collection type, either Time series or Search by your data type.

Time series – The log analytics segment that focuses on analyzing large volumes of semistructured, machine-generated data in real time for operational, security, user behavior, and business insights.
Search – Full-text search that powers applications in your internal networks (content management systems, legal documents) and internet-facing applications such as e-commerce website search and content search.

When you choose Create, a collection typically takes less than a minute to initialize.

Upload and Search Data in Your Collection
Before uploading and searching data in your collection, configure the IAM policy to access the actual data within a collection. Choose Data access policies in the left navigation pane and Create data access policy.

You can apply multiple policies simultaneously to the same resource. Each policy contains a set of rules. Each rule has a resource (collection or index), permissions for the resource, and a list of principals (IAM users, role ARNs, or SAML identities).

Here is a sample policy that provides a single user the minimum permissions required to create an index in your collection, index some data, and search for it. Replace the principal ARN with the ARN of the account that you’ll use to sign in to OpenSearch Dashboards.

[
  {
    "Rules": [
      {
        "ResourceType": "index",
        "Resource": [
          "index/books/*"
        ],
        "Permission": [
          "aoss:CreateIndex",
          "aoss:ReadDocument",
          "aoss:UpdateIndex",
          "aoss:DeleteIndex",
          "aoss:WriteDocument"
        ]
      }
    ],
    "Principal": [
      "arn:aws:iam::123456789012:user/admin"
    ]
  }
]

Now, you can upload data to an OpenSearch Serverless collection using Postman or curl. You can also use Dev Tools within the OpenSearch Dashboards console. Choose OpenSearch Dashboards on the detail page of your collection.

Sign in to OpenSearch Dashboards using the AWS access and secret keys for the principal that you specified in your data access policy. Within OpenSearch Dashboards, open the left navigation menu and choose Dev Tools.

To create a single index called books-index, run PUT books-index, and index your first single document into books-index.

You can also query search data in Dev Tools.

GET books_index/_search
{
    "query": {
    "simple_query_string": {
    "query": "Jeff",
    "fields": ["author"]
    } 
  }
}

In the case of time-series data, you can ingest data with all of the streaming ingestion options, such as native OpenSearch streaming APIs, Amazon Kinesis Data Firehose, AWS Glue, and a wide range of open-source streaming ingestion pipelines like Logstash, FluentBit, Fluentd, and Data Prepper.

In addition, you can snapshot your data from a managed cluster on OpenSearch Service and restore it to your collection, making it easy to migrate your workloads. Once your data is in your collection, you can then query it using your favorite OpenSearch client and interactively analyze and visualize your data using OpenSearch Dashboards.

Things to Know
Here are a couple of things to keep in mind about additional features and considerations when you choose Amazon OpenSearch Serverless:

SAML Authentications – You can use your existing identity provider to offer single sign-on (SSO) for the OpenSearch Dashboards endpoints of OpenSearch Serverless SAML authentication lets you use third-party identity providers to sign in to OpenSearch Dashboards to index and search data. OpenSearch Serverless supports providers that use the SAML 2.0 standard, such as Okta, Keycloak, Active Directory Federation Services, and Auth0.
Private VPC Endpoints – You can use AWS PrivateLink to create a private connection between your VPC and OpenSearch Serverless. You can access your collections as if they were in your VPC without the use of an internet gateway, NAT device, VPN connection, or AWS Direct Connect connection. To create an interface endpoint, choose VPC endpoints in the left navigation pane of OpenSearch Service.
Managed Clusters – You may prefer to use an option of Amazon OpenSearch Service’s managed clusters in scenarios where you need tight control over cluster configuration or specific customizations. For example, your workloads may need custom plugins that run best on accelerated computing instances and need more control on configuration such as data sharding strategy. You can choose either provisioned instances or serverless according to the requirements of your workload.

Join the Preview
The preview release of Amazon OpenSearch Serverless is now available in the US East (N. Virginia, Ohio), US West (Oregon), EU (Ireland), Asia Pacific (Tokyo). With OpenSearch Serverless, there are no upfront costs, and you pay only for the data that is ingest and the queries you run. For pricing details, see the OpenSearch Service pricing page. To learn more, visit the Amazon OpenSearch Service User Guide.

We want to hear more feedback during the preview. Please send feedback to AWS re:Post for Amazon OpenSearch Service or through your usual AWS support contacts.

– Channy

New – AWS Marketplace for Containers Now Supports Direct Deployment to Amazon EKS Clusters

2022-11-28 Channy Yun

Post Syndicated from Channy Yun original https://aws.amazon.com/blogs/aws/new-aws-marketplace-for-containers-now-supports-direct-deployment-to-amazon-eks-clusters/

Today we are announcing the extension of the Amazon Elastic Kubernetes Service (EKS) add-ons deployment experience to include software from AWS Marketplace for Containers. Amazon EKS add-ons allow you to consistently ensure that your EKS clusters are secure and stable and reduce the amount of work that you need to do in order to install, configure, and update Kubernetes software.

This new launch makes it easier for you to find third-party Kubernetes operation software from the Amazon EKS console and deploy it to your EKS clusters using the same commands used to deploy EKS add-ons.

Amazon EKS customers can now find and deploy third-party operational software to their EKS clusters through the EKS console or using command-line interface (CLI), eksctl, AWS APIs, or infrastructure as code tools such as AWS CloudFormation and Terraform. All software in AWS Marketplace is continually scanned for common vulnerabilities and exposures (CVEs), providing you confidence when deploying software onto your EKS clusters.

In this launch, you can find commercial software from popular independent software vendors (ISVs), such as Kubecost, Teleport, Tetrate, Upbound, Factorhouse, and Dynatrace.

Deploying AWS Marketplace for Containers to Your EKS Clusters
To get started, in the Amazon EKS console, go to your EKS clusters, and in the Add-ons tab, select Get more add-ons to find new third-party EKS add-ons in the cluster setting of your existing EKS clusters.

You can see a list of Amazon EKS add-ons provided by AWS and a list of products from independent software vendors provided by AWS Marketplace add-ons. You can use the search bar and filter by categories, vendors, and pricing models. Check your favorite add-ons and select Next.

In the next step, configure selected add-ons, such as the version and some optional settings for each add-on. In step 3, you can review and add your third-party add-ons in your EKS cluster.

If you do not have a subscription to Kubecost, you will be presented with a button to redirect you to the AWS Marketplace website to complete the subscription.

Subscribe to the software in AWS Marketplace. You will need to accept the end user license agreement (EULA), select the version of the software you would like to deploy, and finally configure the software if required.

You can also deploy kubecost using the AWS Command Line Interface (AWS CLI). Using the create-addon API, you can install Kubernetes software from AWS Marketplace. If you try to deploy software from AWS Marketplace without first subscribing to it, the API will return an error and redirect you to subscribe to the software.

$ aws eks create-addon --cluster-name channy-eks --addon-name kubecost_kubecost  
{
"addon": {
"addonName": "kubecost_kubecost",
"clusterName": "channy-eks",
"status": "CREATING",
"addonVersion": "v1.97.0-eksbuild.1",
"health": {
 "issues": []
     }
       }
}

As I noted, after subscribing your software, you can finish add-ons settings for selected software. To learn more, see the Amazon EKS add-ons documentation or the Amazon EKS API reference.

AWS Marketplace seller EKS Add-ons Available at Launch
Here is a list of AWS Marketplace software sellers that support Amazon EKS add-ons today.

All software in AWS Marketplace is continually scanned for common vulnerabilities and exposures (CVEs) and is validated by AWS to work with EKS. After deployment, customers will receive notifications when new versions of the software are available to upgrade and ensure they are running the latest patches at all times. Try them out today!

To learn more details about creating container products on AWS Marketplace, visit Getting started as a seller and Container-based products in the AWS documentation. If you have any further questions please email [email protected] or contact your usual AWS partner contact.

Available Now
The feature of AWS Marketplace for Amazon EKS add-ons is available now in all commercial Regions that support AWS Marketplace and Amazon EKS. You can start using the feature directly from the above products of launch partners.

Give it a try, and please send us feedback either in the AWS re:Post for Amazon EKS, AWS Marketplace, or through your usual AWS support contacts.

– Channy

New – A Fully Managed Schema Conversion in AWS Database Migration Service

2022-11-28 Channy Yun

Post Syndicated from Channy Yun original https://aws.amazon.com/blogs/aws/new-a-fully-managed-schema-conversion-in-aws-database-migration-service/

Since we launched AWS Database Migration Service (AWS DMS) in 2016, customers have securely migrated more than 800,000 databases to AWS with minimal downtime. AWS DMS supports migration between 20+ database and analytics engines, such as Oracle to Amazon Aurora MySQL, MySQL to Amazon Relational Database (Amazon RDS) MySQL, Microsoft SQL Server to Amazon Aurora PostgreSQL, MongoDB to Amazon DocumentDB, Oracle to Amazon Redshift, and to and from Amazon Simple Storage Service (Amazon S3).

Specifically, the AWS Schema Conversion Tool (AWS SCT) makes heterogeneous database and data warehouse migrations predictable and can automatically convert the source schema and a majority of the database code objects, including views, stored procedures, and functions, to a format compatible with the target engine. For example, it supports the conversion of Oracle PL/SQL and SQL Server T-SQL code to equivalent code in the Amazon Aurora MySQL dialect of SQL or the equivalent PL/pgSQL code in PostgreSQL. You can download the AWS SCT for your platform, including Windows or Linux (Fedora and Ubuntu).

Today we announce fully managed AWS DMS Schema Conversion, which streamlines database migrations by making schema assessment and conversion available inside AWS DMS. With DMS Schema Conversion, you can now plan, assess, convert and migrate under one central DMS service. You can access features of DMS Schema Conversion in the AWS Management Console without downloading and executing AWS SCT.

AWS DMS Schema Conversion automatically converts your source database schemas, and a majority of the database code objects to a format compatible with the target database. This includes tables, views, stored procedures, functions, data types, synonyms, and so on, similar to AWS SCT. Any objects that cannot be automatically converted are clearly marked as action items with prescriptive instructions on how to migrate to AWS manually.

In this launch, DMS Schema Conversion supports the following databases as sources for migration projects:

Microsoft SQL Server version 2008 R2 and higher
Oracle version 10.2 and later, 11g and up to 12.2, 18c, and 19c

DMS Schema Conversion supports the following databases as targets for migration projects:

Amazon RDS for MySQL version 8.x
Amazon RDS for PostgreSQL version 14.x

Setting Up AWS DMS Schema Conversion
To get started with DMS Schema Conversion, and if it is your first time using AWS DMS, complete the setup tasks to create a virtual private cloud (VPC) using the Amazon VPC service, source, and target database. To learn more, see Prerequisites for AWS Database Migration Service in the AWS documentation.

In the AWS DMS console, you can see new menus to set up Instance profiles, add Data providers, and create Migration projects.

Before you create your migration project, set up an instance profile by choosing Instance profiles in the left pane. An instance profile specifies network and security settings for your DMS Schema Conversion instances. You can create multiple instance profiles and select an instance profile to use for each migration project.

Choose Create instance profile and specify your default VPC or a new VPC, Amazon Simple Storage Service (Amazon S3) bucket to store your schema conversion metadata, and additional settings such as AWS Key Management Service (AWS KMS) keys.

You can create the simplest network configuration with a single VPC configuration. If your source or target data providers are in different VPCs, you can create your instance profile in one of the VPCs, and then link these two VPCs by using VPC peering.

Next, you can add data providers that store the data store type and location information about your source and target databases by choosing Data providers in the left pane. For each database, you can create a single data provider and use it in multiple migration projects.

Your data provider can be a fully managed Amazon RDS instance or a self-managed engine running either on-premises or on an Amazon Elastic Compute Cloud (Amazon EC2) instance.

Choose Create data provider to create a new data provider. You can set the type of the database location manually, such as database engine, domain name or IP address, port number, database name, and so on, for your data provider. Here, I have selected an RDS database instance.

After you create a data provider, make sure that you add database connection credentials in AWS Secrets Manager. DMS Schema Conversion uses this information to connect to a database.

Converting your database schema with AWS DMS Schema Conversion
Now, you can create a migration project for DMS Schema Conversion by choosing Migration projects in the left pane. A migration project describes your source and target data providers, your instance profile, and migration rules. You can also create multiple migration projects for different source and target data providers.

Choose Create migration project and select your instance profile and source and target data providers for DMS Schema Conversion.

After creating your migration project, you can use the project to create assessment reports and convert your database schema. Choose your migration project from the list, then choose the Schema conversion tab and click Launch schema conversion.

Migration projects in DMS Schema Conversion are always serverless. This means that AWS DMS automatically provisions the cloud resources for your migration projects, so you don’t need to manage schema conversion instances.

Of course, the first launch of DMS Schema Conversion requires starting a schema conversion instance, which can take up to 10–15 minutes. This process also reads the metadata from the source and target databases. After a successful first launch, you can access DMS Schema Conversion faster.

An important part of DMS Schema Conversion is that it generates a database migration assessment report that summarizes all of the schema conversion tasks. It also details the action items for schema that cannot be converted to the DB engine of your target database instance. You can view the report in the AWS DMS console or export it as a comma-separated value (.csv) file.

To create your assessment report, choose the source database schema or schema items that you want to assess. After you select the checkboxes, choose Assess in the Actions menu in the source database pane. This report will be archived with .csv files in your S3 bucket. To change the S3 bucket, edit the schema conversion settings in your instance profile.

Then, you can apply the converted code to your target database or save it as a SQL script. To apply converted code, choose Convert in the pane of Source data provider and then Apply changes in the pane of Target data provider.

Once the schema has been converted successfully, you can move on to the database migration phase using AWS DMS. To learn more, see Getting started with AWS Database Migration Service in the AWS documentation.

Now Available
AWS DMS Schema Conversion is now available in the US East (Ohio), US East (N. Virginia), US West (Oregon), Asia Pacific (Singapore), Asia Pacific (Sydney), Asia Pacific (Tokyo), Europe (Frankfurt), Europe (Ireland), and Europe (Stockholm) Regions, and you can start using it today.

To learn more, see the AWS DMS Schema Conversion User Guide, give it a try, and please send feedback to AWS re:Post for AWS DMS or through your usual AWS support contacts.

– Channy

AWS Application Migration Service Major Updates – New Migration Servers Grouping, Updated Launch, and Post-Launch Template

2022-11-28 Channy Yun

Post Syndicated from Channy Yun original https://aws.amazon.com/blogs/aws/aws-application-migration-service-major-updates-new-migration-servers-grouping-updated-launch-and-post-launch-template/

Last year, we introduced the general availability of AWS Application Migration Service that simplifies and expedites your migration to AWS by automatically converting your source servers from physical, virtual, or cloud infrastructure to run natively on AWS. Since the GA launch, we have made improvements, adding features such as agentless replication, MAP 2.0 auto-tagging and support for optional post-launch modernization actions.

Today we announce three major updates of Application Migration Service to support your migration projects of any size:

New Migration Servers Grouping – You can group migration servers into “applications,” a group of servers that function together as a single application, and manage the migration stage in “waves,” a plan of migrations including grouping servers and applications.
Updated Launch Template – You can modify the general settings and default launch template, and this template is then used to generate the Amazon Elastic Compute Cloud (Amazon EC2) instance launch template of subsequently installed source servers.
Updated Post-Launch Template – You can configure custom modernization actions for the post-launch template. You can associate any AWS Systems Manager documents and their parameters with a post-launch custom action.

Let’s dive deep into each launch!

New Migration Servers Grouping – Applications and Waves
Customers have clusters of servers that comprise an application, with dependencies between them. The servers within an application share the same configurations, such as network, security policies, etc. Customers want to migrate complete applications and services, as well as set up and configure the application environment.

We introduce the new concept of “application,” representing a group of servers, and you can manage the migration of an application.

The new application feature groups source servers together with the same application for integrated migration jobs. It includes configuring the environment before migrating the application’s servers, creating the appropriate security groups, and performing bulk actions on all of the applications servers.

You can track and monitor the status of application migration and data replication within the migration lifecycle from source servers.

Also, customers with large migrations plan their migration, grouping servers and applications in waves. These are logical groups that describe the migration plan over time. Waves may include multiple servers and applications that do not necessarily have dependencies between them.

We introduce the new concept of “wave,” assisting customers in building their migration plan, as well as executing and monitoring it.

Application Migration Service supports actions on waves, such as launching all servers in a testing environment or performing cutover of a wave. Application Migration Service also provides reporting and monitoring information at the wave level so that customers will be able to manage their migration projects.

Updated Launch Template – Launch Settings and Default EC2 Launch Template
The launch template allows you to control the way Application Migration Service launches instances in the AWS Cloud. You can change the settings for existing and newly added servers individually. Previously, we only supported the AWS Migration Acceleration Program (MAP) option to add tags to launched migration instances.

We added two new options to modify the global launch template, and this template is then used to generate the EC2 launch templates of subsequently installed source servers. Customers start with a global Application Migration Service launch template, which can be used for predefined launch templates. They would then potentially only have to perform modifications to a smaller subset of source servers, as opposed to all of them.

Here are default settings that will be used when launching target servers:

Activate instance type right-sizing – The service will determine the best match instance type. The default instance type defined in the EC2 template will be ignored.
Start instance upon launch – The service will launch instances automatically. If this option is not selected, the launched instance will need to be manually started after launch.
Copy private IP – This enables you to copy the private IP of your source server to the target.
Transfer server tags – Transfer the tags from the source server to the launched instances.
Operating system licensing – Specify whether to continue to use the Bring Your Own License model (BYOL) of the source server or use an AWS provided license.

Also, you can configure the default settings that will be applied to the EC2 launch template of every target server, such as default target subnet, additional security groups, default instance type, Amazon Elastic Block Store (Amazon EBS) volume type, IOPS, and throughput to associate with all instances launched by this service.

Updated Post-Launch Template – Custom Actions
Post-launch settings allow you to control and automate actions performed after the server has been launched in AWS. It includes four built-in actions: installing the AWS Systems Manager agent, installing the AWS Elastic Disaster Recover agent and configuring replication, CentOS conversion, and SUSE subscription conversion.

We added a new option to configure custom actions in the post-launch template. You can associate any AWS Systems Manager and its action parameters. It also includes the order in which the actions will be executed and the source server’s operating systems for which the custom action can be configured.

Choose Add custom action to make a new post-launch custom action. For example, the AWS-CopySnapshot, one of Systems Manager Automation’s runbooks, copies a point-in-time snapshot of an EBS volume. You can copy the snapshot within the same AWS Region or from one Region to another.

In the Action parameters, you can assign SnapshotId and SourceRegion to run the AWS Systems Manager CopySnapshot runbook.

You can create your own Systems Manager document to define the actions that Systems Manager performs on your managed instances. Systems Manager offers more than 100 preconfigured documents that you can use by specifying parameters as the post-launch actions. To learn more, see AWS Systems Manager Automation runbook reference in the AWS documentation.

Now Available
The new migration servers grouping, updates on the launch, and post-launch template are available now, and you can start using them today in all Regions where AWS Application Migration Service is supported.

To learn more, see the Application Migration Service User Guide, give it a try, and please send feedback to AWS re:Post for Application Migration Service or through your usual AWS support contacts.

– Channy

New – Fully Managed Blue/Green Deployments in Amazon Aurora and Amazon RDS

2022-11-28 Channy Yun

Post Syndicated from Channy Yun original https://aws.amazon.com/blogs/aws/new-fully-managed-blue-green-deployments-in-amazon-aurora-and-amazon-rds/

When updating databases, using a blue/green deployment technique is an appealing option for users to minimize risk and downtime. This method of making database updates requires two database environments—your current production environment, or blue environment, and a staging environment, or green environment. You must then keep these two environments in sync with each other so you may safely test and upgrade your changes to production.

Amazon Aurora and Amazon Relational Database Service (Amazon RDS) customers can use database cloning and promotable read replicas to help self-manage a blue/green deployment. However, self-managing a blue/green deployment can be costly and complex to build and manage. As a result, customers sometimes delay implementing database updates, choosing availability over the benefits that they would gain from updating their databases.

Today, we are announcing the general availability of Amazon RDS Blue/Green Deployments, a new feature for Amazon Aurora with MySQL compatibility, Amazon RDS for MySQL, and Amazon RDS for MariaDB that enables you to make database updates safer, simpler, and faster.

With just a few steps, you can use Blue/Green Deployments to create a separate, synchronized, fully managed staging environment that mirrors the production environment. The staging environment clones your production environment’s primary database and in-Region read replicas. Blue/Green Deployments keep these two environments in sync using logical replication.

In as fast as a minute, you can promote the staging environment to be the new production environment with no data loss. During switchover, Blue/Green Deployments blocks writes on blue and green environments so that the green catches up with the blue, ensuring no data loss. Then, Blue/Green Deployments redirects production traffic to the newly promoted staging environment, all without any code changes to your application.

With Blue/Green Deployments, you can make changes, such as major and minor version upgrades, schema modifications, and operating system or maintenance updates, to the staging environment without impacting the production workload.

Getting Started with Blue/Green Deployments for MySQL Clusters
You can start updating your databases with just a few clicks in the AWS Management Console. To get started, simply select the database that needs to be updated in the console and click Create Blue/Green Deployment under the Actions dropdown menu.

You can set a Blue/Green Deployment identifier and the attributes of your database to be modified, such as the engine version, DB cluster parameter group, and DB parameter group for green databases. To use a Blue/Green Deployment in your Aurora MySQL DB cluster, you should turn on binary logging, changing the value for the binlog_format parameter from OFF to MIXED in the DB cluster parameter group.

When you choose Create Blue/Green Deployment, it creates a new staging environment and runs automated tasks to prepare the database for production. Note, you will be charged the cost of the green database, including read replicas and DB instances in Multi-AZ deployments, and any other features such as Amazon RDS Performance Insights that you may have enabled on green.

You can also do the same job in the AWS Command Line Interface (AWS CLI). To perform an engine version upgrade, simply add a targetEngineVersion parameter and specify the engine version you’d like to upgrade to. This parameter works with both minor and major version upgrades, and it accepts short versions like 5.7 for Amazon Aurora MySQL-Compatible.

$ aws rds create-blue-green-deployment \
--blue-green-deployment-name my-bg-deployment \
--source arn:aws:rds:us-west-2:1234567890:db:my-aurora-mysql \
--target-engine-version 5.7 \
--region us-west-2 \

After creation is complete, you now have a staging environment that is ready for test and validation before promoting it to be the new production environment.

When testing and qualification of changes are complete, you can choose Switch over in the Actions dropdown menu to promote the staging environment marked as Green to be the new production system.

Now you are nearly ready to switch over your green databases to production. Check the settings of your green databases to verify that they are ready for the switchover. You may also set a timeout setting to determine the maximum time limit for your switchover. If Blue/Green Deployments’ switchover guardrails detect that it would take longer than the specified duration, then the switchover is canceled, and no changes are made to the environments. We recommend that you identify times of low or moderate production traffic to initiate a switchover.

After switchover, Blue/Green Deployments does not delete your old production environment. You may access it for additional validations and performance/regression testing, if needed. Please note that it is your responsibility to delete the old production environment when you no longer need it. Standard billing charges apply on old production instances until you delete them.

Now Available
Amazon RDS Blue/Green Deployments is available today on Amazon Aurora with MySQL Compatibility 5.6 or higher, Amazon RDS for MySQL major version 5.6 or higher, and Amazon RDS for MariaDB 10.2 and higher in all AWS commercial Regions, excluding China, and AWS GovCloud Regions.

To learn more, read the Amazon Aurora MySQL Developer Guide or the Amazon RDS for MySQL User Guide. Give it a try, and please send feedback to AWS re:Post for Amazon RDS or through your usual AWS support contacts.

– Channy

New – Amazon ECS Service Connect Enabling Easy Communication Between Microservices

2022-11-28 Channy Yun

Post Syndicated from Channy Yun original https://aws.amazon.com/blogs/aws/new-amazon-ecs-service-connect-enabling-easy-communication-between-microservices/

Microservices architectures are a well-known software development approach to make applications composed of small independent services that communicate over well-defined application programming interfaces (APIs). Customers faced challenges when they started breaking down their monolith applications into microservices, as it required specialized networking knowledge to communicate internally with other microservices.

Amazon Elastic Container Services (Amazon ECS) customers have several solutions for service-to-service, but each one comes with some challenges and complications: 1) Elastic Load Balancing (ELB) needs to carefully plan for configuring infrastructure for high availability and incur additional infrastructure cost. 2) Using Amazon ECS Service Discovery often requires developers to write custom application code for collecting traffic metrics and for making network calls resilient. 3) Service mesh solutions such as AWS App Mesh run outside of Amazon ECS despite having advanced traffic monitoring and routing features between services.

Today, we are announcing the general availability of Amazon ECS Service Connect, a new capability that simplifies building and operating resilient distributed applications. ECS Service Connect provides an easy network setup and seamless service communication deployed across multiple ECS clusters and virtual private clouds (VPCs). You can add a layer of resilience to your ECS service communication and get traffic insights with no changes to your application code.

With ECS Service Connect, you can refer and connect to your services by logical names using a namespace provided by AWS Cloud Map and automatically distribute traffic between ECS tasks without deploying and configuring load balancers. You can set some safe defaults for traffic resilience, such as health checking, automatic retries for 503 errors, and connection draining, for each of your ECS services. Additionally, the Amazon ECS console provides easy-to-use dashboards with real-time network traffic metrics for operational convenience and simplified debugging.

Getting Started with Amazon ECS Service Connect
To get started with the ECS Service Connect, you can specify a namespace as part of creating an ECS cluster or create one in the Cloud Map. A namespace represents a way to structure your services and can span across multiple ECS clusters residing in different VPCs. All ECS services that belong to a specific namespace can communicate with existing services in the namespaces, provided existing network-level connectivity.

You can also see a list of Cloud Map namespaces in Namespaces in the left navigation pane of the Amazon ECS console. When you select a namespace, it shows a list of services with the same namespace from two different ECS clusters with database services (db-mysql, db-redis) and backend services (webui, appserver).

When you create an ECS cluster, you can select one of the namespaces in the Default namespaces of the Networking setting. ECS Service Connect is enabled for all new ECS services running in both AWS Fargate and Amazon EC2 instances. To enable all existing services, you would need to redeploy with either a new version of ECS-optimized Amazon Machine Image (AMI), or with a new Fargate Agent that supports ECS Service Connect.

Or, you can simply create a cluster via AWS Command Line Interface (AWS CLI) with serviceConnect parameter and a default Cloud Map namespace name for service discovery purposes.

$ aws ecs create-cluster
     --cluster "svc-cluster-2"
     --serviceConnect {
       "defaultNamespace": "svc-namespace"
}

This command will create an ECS cluster with the namespace on AWS’s behalf. If you would like to use an already existing Cloud Map namespace, you can simply pass the name of the existing namespace here.

Next, let’s create a service with a task definition and expose your web user-interface server using ECS Service Connect.

$ aws ecs create-service
--cluster "svc-cluster-2"
--service-name "webui"
--task-definition "webui-svc-cluster"
--serviceConnect {
  "enabled": true,
  "namespace": "svc-namespace",
  "services":
   [
      {
         "portName": "webui-port",
         "discoveryName": "webui-svc",
         "clientAliases": [
           {
              "port": 80, // *Required *//
              "dnsName": "webui-svc-domain" // * Optional *//
            }
        }
     ]
   ]
}

In this command, portName represents a reference to the container port, and clientAliases assigns the port number and DNS name, overriding the discovery name that is used in the endpoint. Each service has an endpoint URL that contains the protocol, a DNS name, and the port. You can select the protocol and port name in the task definition or the ECS service configuration. For example, an endpoint could be http://webui:80, grpc://appserver:8080, or http://db-redis:8888.

In the ECS console, you can see this configuration of ECS Service Connect for the webui service in the svc-cluster-2 cluster.

As you can see, you can run the same workloads across different clusters with the same clientAlias and namespace name for high availability. ECS Service Connect will intelligently load balance the traffic to the ECS tasks. To connect to services running in different ECS clusters, you need to specify the same namespace name for all your ECS services that need to talk to each other. ECS Service Connect will make your services discoverable to all other services in the same namespace.

Improving Service Resilience with Observability Data
You can collect traffic metrics with ECS Service Connect observability capabilities. By default, for each ECS service, you can see the number of healthy and unhealthy endpoints, along with inbound and outbound traffic volume.

ECS Service Connect supports HTTP/1, HTTP/2, gRPC, and TCP protocols. So, you can collect the number of requests, number of HTTP errors, and average call latency. For gRPC and TCP, you can see the total number of active connections. All of these metrics are pushed to Amazon CloudWatch or other AWS analytics services via custom log routing

In the Advanced menu, you can publish ECS Service Connect Agent logs for help in debugging in case of issues.

These metrics are only visible in the original interface of the CloudWatch console. When you use the CloudWatch console, switch to the original interface to see the additional metric dimensions of “discovery name” and “target discovery name” under the ECS grouping.

The default settings provide you with a starting point for building resilient applications, and you can fine-tune parameters to limit the impact of failures, latency spikes, and network fluctuations on your application behavior using AWS Management Console or dedicated ECS APIs.

Now Available
Amazon ECS Service Connect is available in all commercial Regions, except China, where Amazon ECS is available. ECS Service Connect is fully supported in AWS CloudFormation, AWS CDK, AWS Copilot, and AWS Proton for infrastructure provisioning, code deployments, and monitoring of your services. To learn more, see the Amazon ECS Service Connect Developer Guide.

My colleagues, Hemanth AVS, Senior Container Specialist SA, and Satya Vajrapu, Senior DevOps Consultant, prepared a hands-on workshop to demonstrate an example of the ECS Service Connect. Join CON303 Networking, service mesh, and service discovery with Amazon ECS when you attend AWS re:Invent 2022.

Give it a try, and please send feedback to AWS re:Post for Amazon ECS or through your usual AWS support contacts.

– Channy

Now Open the 30th AWS Region – Asia Pacific (Hyderabad) Region in India

2022-11-22 Channy Yun

Post Syndicated from Channy Yun original https://aws.amazon.com/blogs/aws/now-open-the-30th-aws-region-asia-pacific-hyderabad-region-in-india/

In November 2020, Jeff announced the upcoming AWS Asia Pacific (Hyderabad) as the second Region in India. Yes! Today we are announcing the general availability of the 30th AWS Region, Asia Pacific (Hyderabad) Region, with three Availability Zones and the ap-south-2 API name.

The Asia Pacific (Hyderabad) Region is located in the state of Telangana. As the capital and the largest city in Telangana, Hyderabad is already an important talent hub for IT professionals and entrepreneurs. For example, AWS Hyderabad User Groups has more than 4,000 community members and holds active meetups, including an upcoming Community Day in December 2022.

The new Hyderabad Region gives customers an additional option for running their applications and serving end users from data centers located in India. Customers with data-residency requirements arising from statutes, regulations, and corporate policy can run workloads and securely store data in India while serving end users with even lower latency.

Here are the latest numbers of latency:

AWS Services in the Asia Pacific (Hyderabad) Region
In the new Hyderabad Region, you can use C5, C5d, C6g, M5, M5d, M6gd, R5, R5d, R6g, I3, I3en, T3, and T4g instances, and use a long list of AWS services including: Amazon API Gateway, AWS AppConfig, AWS Application Auto Scaling, Amazon Aurora, Amazon EC2 Auto Scaling, AWS Config, AWS Certificate Manager, AWS Cloud Control API, AWS CloudFormation, AWS CloudTrail, Amazon CloudWatch, Amazon CloudWatch Events, Amazon CloudWatch Logs, AWS CodeDeploy, AWS Database Migration Service, AWS Direct Connect, Amazon DynamoDB, Amazon Elastic Block Store (Amazon EBS), Amazon Elastic Compute Cloud (Amazon EC2), Amazon Elastic Container Registry (Amazon ECR), Amazon Elastic Container Service (Amazon ECS), Amazon ElastiCache, Amazon EMR, Elastic Load Balancing, Elastic Load Balancing – Network (NLB), Amazon EventBridge, AWS Fargate, AWS Health Dashboard, AWS Identity and Access Management (IAM), Amazon Kinesis Data Streams, AWS Key Management Service (AWS KMS), AWS Lambda, AWS Marketplace, Amazon OpenSearch Service, Amazon Relational Database Service (Amazon RDS), Amazon Redshift, Amazon Route 53, AWS Secrets Manager, Amazon Simple Storage Service (Amazon S3), Amazon S3 Glacier, Amazon Simple Notification Service (Amazon SNS), Amazon Simple Queue Service (Amazon SQS), AWS Step Functions, AWS Support API, Amazon Simple Workflow Service (Amazon SWF), AWS Systems Manager, AWS Trusted Advisor, VM Import/Export, Amazon Virtual Private Cloud (Amazon VPC), AWS VPN, and AWS X-Ray.

AWS in India
AWS has a long-standing history of helping drive digital transformation in India. AWS first established a presence in the country in 2011, with the opening of an office in Delhi. In 2016, AWS launched the Asia Pacific (Mumbai) Region giving enterprises, public sector organizations, startups, and SMBs access to state-of-the-art public cloud infrastructure. In May 2019, AWS expanded the Region to include a third Availability Zone to support rapid customer growth and provide more choice, flexibility, the ability to replicate workloads across more Availability Zones, and even higher availability.

There are currently 33 Amazon CloudFront edge locations: Mumbai, India (10), New Delhi (7), Chennai (7), Bangalore (4), Hyderabad (3), and Kolkata (2) in India. The edge locations work in concert with a CloudFront Regional edge cache in Mumbai to speed delivery of content. There are six AWS Direct Connect locations, all of which connect to the Asia Pacific (Mumbai) Region: two in Mumbai, one in Chennai, one in Hyderabad, one in Delhi, and one in Bangalore. Finally, the first AWS Local Zones launched in Delhi, India for bringing selected AWS services very close to a particular geographic area. We announced plans to launch three more AWS Local Zones in India, in the cities of Chennai, Bengaluru, and Kolkata.

AWS is also investing in the future of the Indian technology community and workforce, training tech professionals to expand their skillset and cloud knowledge. In fact, since 2017, AWS has trained over three million individuals in India on cloud skills. AWS has worked with government officials, educational institutes, and corporate organizations to achieve this milestone, which has included first-time learners and mid-career professionals alike.

AWS continues to invest in upskilling local developers, students, and the next generation of IT leaders in India through programs such as AWS Academy, AWS Educate, AWS re/Start, and other Training and Certification programs.

AWS Customers in India
We have many amazing customers in India that are doing incredible things with AWS, for example:

SonyLIV is the first Over the top (OTT) service in India born on the AWS Cloud. SonyLIV launched Kaun Banega Crorepati (KBC) interactive game show to allow viewers to submit answers to questions on the show in real time via their mobile devices. SonyLIV uses Amazon ElastiCache to support real-time, in-memory caching at scale, Amazon CloudFront as a low-latency content delivery network, and Amazon SQS as a highly available message queuing service.
DocOnline is a digital healthcare platform that provides video or phone doctor consultations to over 3.5 million families in 10 specialties and 14 Indian languages. DocOnline delivers over 100,000 consultations, diagnostic tests, and medicines every year. DocOnline has built its entire business on AWS to power its online consultation services 24-7 and to continuously measure and improve health outcomes. Being in the Healthcare domain, DocOnline needs to comply with regulatory guidelines, including Data Residency, PII security, and Disaster Recovery in seismic zones. With the AWS Asia Pacific (Hyderabad) Region, DocOnline can ensure critical patient data is hosted in India on the most secure, extensive, and reliable cloud platform while serving customers with even faster response times.
ICICI Lombard General Insurance is one of the first among the large insurance companies in India to move over 140+ applications, including its core application, to AWS. The rapid advances in technology and computing power delivered by cloud computing are poised to radically change the way insurance is delivered as well as consumed. ICICI Lombard has launched new generation products like cyber insurance, telehealth, cashless homecare, and IoT-based risk management solutions for marine transit insurance, providing seamless integration with various digital partners for digital distribution of insurance products and virtual motor claims inspection solutions, which have seen adoption increase from 61 percent last year to 80 percent this year. ICICI Lombard was able to process group health endorsements for their corporate customers in less than a day as compared to 10–12 days earlier. ICICI Lombard is looking at the cloud for further transformative possibilities in real-time inspection of risk and personalized underwriting.
Ministry of Health and Family Welfare (MoHFW), Government of India, needed a highly reliable, scalable, and resilient technical infrastructure to power a large-scale COVID-19 vaccination drive for India’s more than 1.3 billion citizens in 2021. To facilitate the required performance and speed, the MoHFW engaged India’s Ministry of Electronics and Information Technology to build and launch the Co-WIN application powered by AWS, which scales in seconds to handle user registrations and consistently supports 10 million vaccinations daily.

You can find more customer stories in India.

Available Now
The new Hyderabad Region is ready to support your business. You can find a detailed list of the services available in this Region on the AWS Regional Services List.

With this launch, AWS now spans 96 Availability Zones within 30 geographic Regions around the world, with three new Regions launched in 2022, including the AWS Middle East (UAE) Region, the AWS Europe (Zurich) Region, and the AWS Europe (Spain) Region. We have also announced plans for 15 more Availability Zones and five more AWS Regions in Australia, Canada, Israel, New Zealand, and Thailand.

To learn more, see the Global Infrastructure page, and please send feedback through your usual AWS support contacts in India.

— Channy

AWS Week in Review – October 24, 2022

2022-10-24 Channy Yun

Post Syndicated from Channy Yun original https://aws.amazon.com/blogs/aws/aws-week-in-review-october-24-2022/

Last week, we announced plans to launch the AWS Asia Pacific (Bangkok) Region, which will become our third AWS Region in Southeast Asia. This Region will have three Availability Zones and will give AWS customers in Thailand the ability to run workloads and store data that must remain in-country.

In the Works – AWS Region in Thailand
With this big news, AWS announced a 190 billion baht (US 5 billion dollars) investment to drive Thailand’s digital future over the next 15 years. It includes capital expenditures on the construction of data centers, operational expenses related to ongoing utilities and facility costs, and the purchase of goods and services from Regional businesses.

Since we first opened an office in Bangkok in 2015, AWS has launched 10 Amazon CloudFront edge locations, a highly secure and programmable content delivery network (CDN) in Bangkok. In 2020, we launched AWS Outposts, a family of fully managed solutions delivering AWS infrastructure and services to virtually any on-premises or edge location for a truly consistent hybrid experience in Thailand. This year, we also plan the upcoming launch of an AWS Local Zone in Bangkok, which will enable customers to deliver applications that require single-digit millisecond latency to end users in Thailand.

Photo courtesy of Conor McNamara, Managing Director, ASEAN at AWS

The new AWS Region in Thailand is also part of our broader, multifaceted investment in the country, covering our local team, partners, skills, and the localization of services, including Amazon Transcribe, Amazon Translate, and Amazon Connect.

Many Thailand customers have chosen AWS to run their workloads to accelerate innovation, increase agility, and drive cost savings, such as 2C2P, CP All Plc., Digital Economy Promotion Agency, Energy Response Co. Ltd. (ENRES), PTT Global Public Company Limited (PTT), Siam Cement Group (SCG), Sukhothai Thammathirat Open University, The Stock Exchange of Thailand, Papyrus Studio, and more.

For example, Dr. Werner Vogels, CTO of Amazon.com, introduced the story of Papyrus Studio, a large film studio and one of the first customers in Thailand.

“Customer stories like Papyrus Studio inspire us at AWS. The cloud can allow a small company to rapidly scale and compete globally. It also provides new opportunities to create, innovate, and identify business opportunities that just aren’t possible with conventional infrastructure.”

For more information on how to enable AWS and get support in Thailand, contact our AWS Thailand team.

Last Week’s Launches
My favorite news of last week was to launch dark mode as a beta feature in the AWS Management Console. In Unified Settings, you can choose between three settings for visual mode: Browser default, Light, and Dark. Browser default applies the default dark or light setting of the browser, dark applies the new built-in dark mode, and light maintains the current look and feel of the AWS console. Choose your favorite!

Here are some launches that caught my eye for web, mobile, and IoT application developers:

New AWS Amplify Library for Swift – We announce the general availability of Amplify Library for Swift (previously Amplify iOS). Developers can use Amplify Library for Swift via the Swift Package Manager to build apps for iOS and macOS (currently in beta) platforms with Auth, Storage, Geo, and more features.

The Amplify Library for Swift is open source on GitHub, and we deeply appreciate the feedback we have gotten from the community. To learn more, see Introducing the AWS Amplify Library for Swift in the AWS Front-End Web & Mobile Blog or Amplify Library for Swift documentation.

New Amazon IVS Chat SDKs – Amazon Interactive Video Service (Amazon IVS) now provides SDKs for stream chat with support for web, Android, and iOS. The Amazon IVS stream chat SDKs support common functions for chat room resource management, sending and receiving messages, and managing chat room participants.

Amazon IVS is a managed, live-video streaming service using the broadcast SDKs or standard streaming software such as Open Broadcaster Software (OBS). The service provides cross-platform player SDKs for playback of Amazon IVS streams you need to make low-latency live video available to any viewer around the world. Also, it offers Chat Client Messaging SDK. For more information, see Getting Started with Amazon IVS Chat in the AWS documentation.

New AWS Parameters and Secrets Lambda Extension – This is new extension for AWS Lambda developers to retrieve parameters from AWS Systems Manager Parameter Store and secrets from AWS Secrets Manager. Lambda function developers can leverage this extension to improve their application performance as it decreases the latency and the cost of retrieving parameters and secrets.

Previously, you had to initialize either the core library of a service or the entire service SDK inside a Lambda function for retrieving secrets and parameters. Now you can simply use the extension. To learn more, see AWS Systems Manager Parameter Store documentation and AWS Secrets Manager documentation.

New FreeRTOS Long Term Support Version – We announce the second release of FreeRTOS Long Term Support (LTS) – FreeRTOS 202210.00 LTS. FreeRTOS LTS offers a more stable foundation than standard releases as manufacturers deploy and later update devices in the field. This release includes new and upgraded libraries such as AWS IoT Fleet Provisioning, Cellular LTE-M Interface, coreMQTT, and FreeRTOS-Plus-TCP.

All libraries included in this FreeRTOS LTS version will receive security and critical bug fixes until October 2024. With an LTS release, you can continue to maintain your existing FreeRTOS code base and avoid any potential disruptions resulting from FreeRTOS version upgrades. To learn more, see the FreeRTOS announcement.

Here is some news on performance improvement and increasing capacity:

Up to 10X Improving Amazon Aurora Snapshot Exporting Speed – Amazon Aurora MySQL-Compatible Edition for MySQL 5.7 and 8.0 now speed up to 10x faster snapshot exports to Amazon S3. The performance improvement is automatically applied to all types of database snapshot exports, including manual snapshots, automated system snapshots, and snapshots created by the AWS Backup service. For more information, see Exporting DB cluster snapshot data to Amazon S3 in the Amazon Aurora documentation.

3X Increasing Amazon RDS Read Capacity – Amazon Relational Database Service (RDS) for MySQL, MariaDB, and PostgreSQL now supports 15 read replicas per instance, including up to 5 cross-Region read replicas, delivering up to 3x the previous read capacity. For more information, see Working with read replicas in the Amazon RDS documentation.

2X Increasing AWS Snowball Edge Compute Capacity – The AWS Snowball Edge Compute Optimized device doubled the compute capacity up to 104 vCPUs, doubled the memory capacity up to 416GB RAM, and is now fully SSD with 28TB NVMe storage. The updated device is ideal when you need dense compute resources to run complex workloads such as machine learning inference or video analytics at the rugged, mobile edge such as trucks, aircraft or ships. You can get started by ordering a Snowball Edge device on the AWS Snow Family console.

2X Increasing Amazon SQS FIFO Default Quota – Amazon Simple Queue Service (SQS) announces the increase of default quota up to 6,000 transactions per second per API action. It is double the previous 3,000 throughput quota for a high throughput mode for FIFO (first in, first out) queues in all AWS Regions where Amazon SQS FIFO queue is available. For a detailed breakdown of default throughput quotas per Region, see Quotas related to messages in the Amazon SQS documentation.

For a full list of AWS announcements, be sure to keep an eye on the What’s New at AWS page.

Other AWS News
Here are some other news items that you may find interesting:

22 New or Updated Open Datasets on AWS – We released 22 new or updated datasets, including Amazonia-1 imagery, Bitcoin and Ethereum data, and elevation data over the Arctic and Antarctica. The full list of publicly available datasets is on the Registry of Open Data on AWS and is now also discoverable on AWS Data Exchange.

Sustainability with AWS Partners (ft. AWS On Air) – This episode covers a broad discipline of environmental, social, and governance (ESG) issues across all regions, organization types, and industries. AWS Sustainability & Climate Tech provides a comprehensive portfolio of AWS Partner solutions built on AWS that address climate change events and the United Nation’s Sustainable Development Goals (SDG).

AWS Open Source News and Updates #131 – This newsletter covers latest open-source projects such as Amazon EMR Toolkit for VS Code, a VS Code Extension to make it easier to develop Spark jobs on EMR and AWS CDK For Discourse, sample codes that demonstrates how to create a full environment for Discourse, etc. Remember to check out the Open source at AWS keep up to date with all our activity in open source by following us on @AWSOpen.

Upcoming AWS Events
Check your calendars and sign up for these AWS events:

AWS re:Invent 2022 Attendee Guide – Browse re:Invent 2022 attendee guides, curated by AWS Heroes, AWS industry teams, and AWS Partners. Each guide contains recommended sessions, tips and tricks for building your agenda, and other useful resources. Also, seat reservations for all sessions are now open for all re:Invent attendees. You can still register for AWS re:Invent either offline or online.

AWS AI/ML Innovation Day on October 25 – Join us for this year’s AWS AI/ML Innovation Day, where you’ll hear from Bratin Saha and other leaders in the field about the great strides AI/ML has made in the past and the promises awaiting us in the future.

AWS Container Day at Kubecon 2022 on October 25–28 – Come join us at KubeCon + CloudNativeCon North America 2022, where we’ll be hosting AWS Container Day Featuring Kubernetes on October 25 and educational sessions at our booth on October 26–28. Throughout the event, our sessions focus on security, cost optimization, GitOps/multi-cluster management, hybrid and edge compute, and more.

You can browse all upcoming in-person, and virtual events.

That’s all for this week. Check back next Monday for another Week in Review!

— Channy

This post is part of our Week in Review series. Check back each week for a quick roundup of interesting news and announcements from AWS!

AWS IoT FleetWise Now Generally Available – Easily Collect Vehicle Data and Send to the Cloud

2022-09-27 Channy Yun

Post Syndicated from Channy Yun original https://aws.amazon.com/blogs/aws/aws-iot-fleetwise-now-generally-available-easily-collect-vehicle-data-and-send-to-the-cloud/

Today we announce the general availability of AWS IoT FleetWise, a fully managed AWS service that makes it easier to collect, transform, and transfer vehicle data to the cloud. Last AWS re:Invent 2021, we previewed AWS IoT FleetWise, heard customer feedback, and improved features for various use cases of near-real-time vehicle data processing.

With AWS IoT FleetWise, automakers, fleet operators, and automotive suppliers can take the complex variability out of collecting data from vehicle fleets at scale. You can access standardized fleet-wide vehicle data and avoid developing custom data collection systems, or you can integrate AWS IoT FleetWise to enhance your existing systems. AWS IoT FleetWise enables intelligent data collection that sends the exact data you need from the vehicle to the cloud. You can use the data to analyze vehicle fleet health to more quickly identify potential maintenance issues or make in-vehicle infotainment systems smarter. Furthermore, you can use it to train machine learning (ML) models that improve autonomous driving and advanced driver assistance systems (ADAS).

For example, electric vehicle (EV) battery temperature is a critical metric that should be continuously analyzed for the entire vehicle fleet. In order to avoid costly continuous data ingestion, you may want to optimize the data collection by setting a threshold on EV battery temperature. The results of this analysis would be provided to the automaker’s quality engineering department, enabling fast assessment of the criticality and possible root causes of any issues identified at certain temperatures. Based on the root cause analysis, the automaker can then take short-term actions to support the driver affected by the issue, as well as midterm actions to improve vehicle quality.

How AWS IoT FleetWise Works
AWS IoT FleetWise provides a vehicle modeling framework that you can use to model your vehicle and its sensors and actuators in the cloud. To enable secure communication between your vehicle and the cloud, AWS IoT FleetWise also provides the AWS IoT FleetWise Edge Agent application that you can use to download and install in-vehicle electronic control units (ECUs) such as the gateway, in-vehicle infotainment controller, etc. You define data collection schemes in the cloud and deploy them to your vehicle.

The AWS IoT FleetWise Edge Agent running in your vehicle uses data collection schemes to control what data to collect and when to transfer it to the cloud. Data collected and ingested through AWS IoT FleetWise Edge Agent software goes directly into your Amazon Timestream table or Amazon Simple Storage Service (Amazon S3) repositories via AWS IoT Core.

AWS IoT FleetWise Features
To get started with AWS IoT FleetWise, you can register your account and configure the settings via the AWS console. AWS IoT FleetWise automatically registers your AWS account, IAM role, and Amazon Timestream resources.

The Edge Agent software is a C++ application distributed as source code and is available on GitHub to collect, decode, normalize, cache, and ingest vehicle data to AWS. It supports multiple deployment options, such as vehicle gateways, infotainment systems, telematics control units (TCUs), or aftermarket devices. When vehicles are connected to the cloud, the Edge Agent continually receives data collection schemes and collects, decodes, normalizes and ingests the transformed vehicle data to AWS.

Let’s see the benefits and features of AWS IoT FleetWise:

Signal catalog
A signal catalog contains a collection of vehicle signals. Signals are fundamental structures that you define to contain vehicle data and its metadata. A signal can be a sensor and its status, an attribute as static information of the manufacturer, a branch to represent a nested structure such as Vehicle.Powertrain.combustionEngine expression, or an actuator such as the state of a vehicle device. For example, you can create a sensor to receive in-vehicle temperature values and store its metadata, including a sensor name, a data type, and a unit.

Signals in a signal catalog can be used to model vehicles that use different protocols and data formats. For example, there are two cars made by different automakers: one uses the Controller Area Network (CAN) to transmit the in-vehicle temperature data and the other uses On-board Diagnostic (OBD) protocol.

You can define a sensor in the signal catalog to receive in-vehicle temperature values. This sensor can be used to represent the thermocouples in both cars, irrespective of how this temperature data is available within the vehicle networks. For more information, see Create and manage signal catalogs in the AWS documentation.

Vehicle models
Vehicle models are virtual declarative representations that standardize the format of your vehicles and define relationships between signals in the vehicles. Vehicle models enforce consistent information across multiple vehicles of the same type so that you can quickly configure and create a vehicle fleet. In each vehicle model, you can add signals, including attributes, branches (signal hierarchies), sensors, and actuators.

You can define condition-based schemes to control what data to collect, such as data in-vehicle temperature values that are greater than 40 degrees. You can also define time-based schemes to control how often to collect data. For more information, see Create and manage vehicle models in the AWS documentation.

When a decoder manifest is associated with a vehicle model, you can create a vehicle. Each vehicle corresponds to an AWS IoT thing. You can use an existing AWS IoT thing to create a vehicle or set AWS IoT FleetWise to automatically create an AWS IoT thing for your vehicle. For more information, see Provision vehicles in the AWS documentation. After you create vehicles, you can create campaigns for them.

Campaigns
A campaign gives the AWS IoT FleetWise Edge Agent instructions on how to select, collect, and transfer data to the cloud. You can make a campaign with vehicle attributes that you added when creating vehicles, and a data collection scheme. You can manually define the data collection scheme either condition-based logical expressions such as $variable.myVehicle.InVehicleTemperature > 40.0, or time-based data collection in milliseconds such as from 10000 – 60000 milliseconds. To learn more, see Create a campaign in the AWS documentation.

After you create and approve the campaign, AWS IoT FleetWise automatically deploys the campaign to the listed vehicles. The AWS IoT FleetWise Edge Agent software doesn’t start collecting data until a running campaign is deployed to the vehicle. If you want to pause collecting data from vehicles connected to the campaign, on the Campaign summary page, choose Suspend. To resume collecting data from vehicles connected to the campaign, choose Resume.

Demo – Visualizing Vehicle Data
Here is a demo that aims to show how AWS IoT FleetWise can make it easy to collect vehicle data and use it to build visualizing applications. In this demo, you can simulate two kinds of vehicles, an NXP GoldBox powered by an Automotive Grade Linux distribution that runs the AWS IoT FleetWise agent as an AWS IoT Greengrass component or a completely virtual vehicle implemented as an AWS Graviton ARM-based Amazon EC2 instance. To learn more, see the getting started guide and source code in the GitHub repository.

The vehicle in CARLA Simulator can self-drive or be driven with a game steering wheel connected to your desktop. You can watch a live demo video.

Data is collected by AWS IoT FleetWise and stored in the Amazon Timestream table, and visualized on a Grafana Dashboard.

Customer and Partner Voices
During the preview period, we heard lots of feedback from our customers and partners in automotive industry such as automakers, fleet operators, and automotive suppliers.

For example, Hyundai Motor Group (HMG) is a global vehicle manufacturer that offers consumers a technology-rich lineup of cars, sport utility vehicles, and electrified vehicles. HMG has used AWS services, such as using Amazon SageMaker, to reduce its ML model training time for autonomous driving models.

Hae Young Kwon, vice president and head of the infotainment development group at HMG, said:

“As a leading global vehicle manufacturer, we have come to appreciate the breadth and depth of AWS services to help create new connected vehicle capabilities. With more data available from our expanding global fleet of connected cars, we look forward to leveraging AWS IoT FleetWise to discover how we can build more personalized ownership experiences for our customers.”

LG CNS is a global IT service provider and AWS Premier Consulting Partner that is transforming smart transportation services by building an advanced transportation system that is convenient and safe by maximizing the operational efficiency of multiple modes of transport, including buses, subways, taxis, railways, and airplanes.

Jae Seung Lee, vice president at LG CNS, said:

“At LG CNS, we are committed to advancing the technology that is powering the future of transportation. By using AWS IoT FleetWise, we are creating a new data platform that allows us to ingest, analyze, and simulate vehicle conditions in real-time. With these advanced insights, our customers can gain a better understanding of their vehicles and, as a result, improve decision-making about their fleets.”

Bridgestone is a global leader in tires and rubber building on its expertise to provide solutions for safe and sustainable mobility. Bridgestone has worked with AWS for several years to develop a system that delivers insights derived from the interaction between a tire and a vehicle using advanced machine learning capabilities on Amazon SageMaker.

Brian Goldstine, president of mobility solutions and fleet management at Bridgestone Americas Inc. said:

“Bridgestone has been working with AWS to transform the digital services we provide to our automotive manufacturer, fleet, and retail customers. We look forward to exploring how AWS IoT FleetWise will make it easier for our customers to collect detailed tire data, which can provide new insights for their products and applications.”

Renesas Electronics Corporation is a global leader in microcontrollers, analog, power, and system on chips (SoC) products. Renesas launched cellular-to-cloud IoT development platforms and its cloud development kits to run on AWS IoT Core and FreeRTOS.

Yusuke Kawasaki, director at Renesas Electronics Corporation, said:

“The volume of connected vehicle data is forecast to increase dramatically over the next few years, driven by new and evolving customer expectations. As a result, Renesas is focused on addressing the needs of automotive engineers facing increasing system complexity. Incorporating AWS IoT FleetWise into our vehicle gateway solution will enable our customers to enjoy our market-ready approach for large-scale data collection and accelerate their cloud development strategy. We look forward to further collaborating with AWS to provide a better and simpler development environment for our customers.”

By working with AWS IoT FleetWise Partners, you can take advantage of solutions to streamline your IoT projects, reduce the risk of your efforts, and accelerate time to value. To learn more how AWS accelerates the automotive industry’s digital transformation, see AWS for Automotive.

Now Available
AWS IoT FleetWise is now generally available in the US East (N. Virginia) and Europe (Frankfurt) Regions. You pay for the vehicles you have created and messages per vehicle per month. Additional services used alongside AWS IoT FleetWise, such as AWS IoT Core and Amazon Timestream, are billed separately. For more detail, see the AWS IoT FleetWise pricing page.

To learn more, see the AWS IoT FleetWise resources page including documentations, videos, and blog posts. Please send feedback to AWS re:Post for AWS IoT FleetWise or through your usual AWS support contacts.

– Channy

New – AWS Support App in Slack to Manage Support Cases

2022-08-24 Channy Yun

Post Syndicated from Channy Yun original https://aws.amazon.com/blogs/aws/new-aws-support-app-in-slack-to-manage-support-cases/

ChatOps speeds up software development and operations by enabling DevOps teams to use chat clients and chatbots to communicate and run tasks. DevOps engineers have increasingly moved their monitoring, system management, continuous integration (CI), and continuous delivery (CD) workflows to chat applications in order to streamline activities in a single place and enable better collaboration within organizations.

For example, AWS Chatbot enables ChatOps for AWS to monitor and respond to operational events. AWS Chatbot processes AWS service notifications from Amazon Simple Notification Service (Amazon SNS) and forwards them to your Slack channel or Amazon Chime chat rooms so teams can analyze and act on them immediately, regardless of location. However, AWS Support customers had to switch applications from Slack to the AWS Support Center console to access and engage with AWS Support, moving them away from critical operation channels where essential group communications take place.

Today we are announcing the new AWS Support App, which enables you to directly manage your technical, billing, and account support cases, increase service quotas in Slack, and initiate a live chat with AWS Support engineers in Slack channels. You can then search for, respond to, and participate in group chats with AWS Support engineers to resolve support cases from your Slack channels.

With the AWS Support App in Slack, you can integrate AWS Support into your team workflows to improve collaboration. When creating, updating, or monitoring a support case status, your team members keep up to date in real time. They can also easily search previous cases to find recommendations and solutions and instantly share those details with all team members without having to switch applications.

Configuring the AWS Support App in Slack
The AWS Support App in Slack is now available to all customers with Business, Enterprise On-ramp, or Enterprise Support at no additional charge. If you have a Basic or Developer plan, you can upgrade your support plan.

For connecting your Slack workspace and channel for your organization, you should have access to add apps to your Slack workspace and an AWS Identity and Access Management (IAM) user or role with the required permissions. To learn more, see examples of IAM policies to manage access.

To get started with the AWS Support App in Slack, visit the AWS Support Center console and choose Authorize workspace.

When prompted to give permissions to access your Slack workspace, you can select your workspace to connect and choose Allow.

Now you can see your workspace on the Slack configuration page. To add more workspaces, choose Add workspace and repeat this step. You can add up to five workspaces to your account.

After you authorize your Slack workspace, you can add your Slack channels by choosing Add channel. You can add up to 20 channels for a single account. A single Slack channel can have up to 100 AWS accounts.

Choose the workspace name that you previously authorized, the Slack channel ID included in the channel link and the value that looks like C01234A5BCD where you invited the AWS Support App by /invite @awssupport command, the IAM role that you created for the AWS Support App.

You can also set notifications for how to get notified about cases and choose at least one of the options in New and reopened cases, Case correspondences, or Resolved cases for notification types. If you select High-severity cases, you can get notified for only cases that affect a production system or higher by the severity levels.

After adding a new channel, you can now open the Slack channel and manage support cases and live chats with AWS Support engineers.

Managing Support Cases in the Slack Channel
After you add your Slack workspace and channel, you can create, search, resolve, and reopen your support case in your Slack channel.

In your Slack channel, when you enter /awssupport create-case command, you can create a support case to specify the subject, description, issue type, service, category, severity, and contact method — either email and Slack notifications or live chat in Slack.

If you choose Live chat in Slack, you can enter the names of other members. AWS Support App will create a new chat channel for the created support case and will automatically add you, the members that you specified, and AWS Support engineers.

After reviewing the information you provided, you can create a support case. You can also choose Share to channel to share the search results with the channel.

In your Slack channel, when you enter the /awssupport search-case command, you can search support cases for a specific AWS account, data range, and case status, such as open or resolved.

You can choose See details to see more information about a case. When you see details for a support case, you can resolve or reopen specific support cases directly.

Initiating Live Chat Sessions with AWS Support Engineers
If you chose the live chat option when you created your case, the AWS Support App creates a chat channel for you and an AWS Support engineer. You can use this chat channel to communicate with a support engineer and any others that you invited to the live chat.

To join a live chat session with AWS Support, navigate to the channel name that the AWS Support App created for you. The live channel name contains your support case ID, such as awscase-1234567890. Anyone who joins your live chat channel can view details about this specific support case. We strongly recommend that you only add users that require access to your support cases.

When a support engineer joins the channel, you can chat with a support engineer about your support case and upload any file attachments to the channel. The AWS Support App automatically saves your files and chat log to your case correspondence.

To stop chatting with the support agent, choose End chat or enter the /awssupport endchat command. The support agent will leave the channel and the AWS Support App will stop recording the live chat. You can find the chat history attached to the case correspondence for this support case. If the issue has been resolved, you can choose Resolve case from the pinned message to show the case details in the chat channel or enter the /awssupport resolve command.

When you manage support cases or join live chats for your account in the Slack channel, you can view the case correspondences to determine whether the case has been updated in the Slack channel. You can also audit the Support API calls the application made on behalf of users via logs in AWS CloudTrail. To learn more, see Logging AWS Support API calls using AWS CloudTrail.

Requesting Service Quota Increases
In your Slack channel, when you enter the /awssupport service-quota-increase command, you can request to increase the service quota for a specific AWS account, AWS Region, service name, quota name, and requested value for the quota increase.

Now Available
The AWS Support App in Slack is now available to all customers with Business, Enterprise On-ramp, or Enterprise Support at no additional charge. If you have a Basic or Developer plan, you can upgrade your support plan. To learn more, see Manage support cases with the AWS Support App or contact your usual AWS Support contacts.

– Channy

Happy 10th Anniversary, Amazon S3 Glacier – A Decade of Cold Storage in the Cloud

2022-08-22 Channy Yun

Post Syndicated from Channy Yun original https://aws.amazon.com/blogs/aws/happy-10th-anniversary-amazon-s3-glacier-a-decade-of-cold-storage-in-the-cloud/

Ten years ago, on August 20, 2012, AWS announced the general availability of Amazon Glacier, secure, reliable, and extremely low-cost storage designed for data archiving and backup. At the time, I was working as an AWS customer and it felt like an April Fools’ joke, offering long-term, secure, and durable cloud storage that allowed me to archive large amounts of data at a very low cost.

In Jeff’s original blog post for this launch, he noted that:

Glacier provides, at a cost as low as $0.01 (one US penny, one one-hundredth of a dollar) per Gigabyte per month, extremely low-cost archive storage. You can store a little bit, or you can store a lot (terabytes, petabytes, and beyond). There’s no upfront fee, and you pay only for the storage that you use. You don’t have to worry about capacity planning, and you will never run out of storage space.

Ten years later, Amazon S3 Glacier has evolved to be the best place in the world for you to store your archive data. The Amazon S3 Glacier storage classes are purpose-built for data archiving, providing you with the highest performance, most retrieval flexibility, and the lowest cost archive storage in the cloud.

You can now choose from three archive storage classes optimized for different access patterns and storage duration – Amazon S3 Glacier Instant Retrieval, Amazon S3 Glacier Flexible Retrieval (formerly Amazon S3 Glacier), and Amazon S3 Glacier Deep Archive. We’ll dive into each of these storage classes in a bit.

A Decade of Innovation in Amazon S3 Glacier
To understand how we got here, we’ll walk through through the last decade and revisit some of the most significant Amazon S3 Glacier launches that fundamentally changed archive storage forever:

August 2012 – Amazon Glacier: Archival Storage for One Penny per GB per Month
We launched Amazon Glacier to store any amount of data with high durability at a cost that allows you to get rid of your tape libraries and all the operational complexity and overhead that have been part of data archiving for decades. Amazon Glacier was modeled on S3’s durability and dependability but designed and built from the ground up to offer an archival storage to you at an extremely low cost. At that time, Glacier introduced the concept of a “vault” for storing archival data. You could then easily retrieve your archival data by initiating a request and then the data was made available to you for download in 3–5 hours.

November 2012 – Archiving Amazon S3 Data to Glacier
While Glacier was purpose-built from the ground up for archival data, many customers had object data that originated in S3 warmer storage that they would eventually want to move to colder storage. To make that easy for customers, Amazon S3’s Lifecycle Management (aka Lifecycle Rule) integrated S3 and Glacier and made the details visible via the storage class of each object. Lifecycle Management allows you to define time-based rules that can start Transition (changing S3 storage class to Glacier) and Expiration (deletion of objects). In 2014, we combined the flexibility of S3 versioned objects with Glacier, helping you to further reduce your overall storage costs.

November 2016 – Glacier Price Reductions and Additional Retrieval Options for Glacier
As part of AWS’s long-term focus on reducing costs and passing along those savings to customers, we reduced the price of Glacier storage to $0.004 (less than half a cent) in the case of 1 GB for 1 month in the US East (N. Virginia) Region, from $0.007 in 2015 and $0.010 in 2012. With storing data at a very low cost but having flexibility in how quickly they can retrieve the data, we introduced two more options for data retrieval that were based on the amount of data that you stored in Glacier and the rate at which you retrieved it. You could select expedited retrieval (typically taking 1–5 minutes), bulk retrieval (5–12 hours), or the existing standard retrieval method (3–5 hours).

November 2018 – Amazon S3 Glacier Storage Class to Integrate S3 Experiences
Glacier customers appreciated the way they could easily move data from S3 to Glacier via S3 lifecycle management, and wanted us to expand on that capability to use the most common S3 APIs to operate directly on S3 Glacier objects. So, we added S3 PUT API to S3 Glacier, which enables you to use the standard S3 PUT API and select any storage class, including S3 Glacier, to store the data. Data can be stored directly in S3 Glacier, eliminating the need to upload to S3 Standard and immediately transition to S3 Glacier with a zero-day lifecycle policy. So, you could PUT to S3 Glacier like any other S3 storage class.

March 2019 – Amazon S3 Glacier Deep Archive – the Lowest Cost Storage in the Cloud
While the original Glacier service offered an extremely low price for archival storage, we challenged ourselves to see if we could find a way to invent an even lower priced storage offering for very cold data. The Amazon S3 Glacier Deep Archive storage class delivers the lowest cost storage, up to 75 percent lower cost (than S3 Glacier Flexible Retrieval), for long-lived archive data that is accessed less than once per year and is retrieved asynchronously. At just $0.00099 per GB-month (or $1 per TB-month), S3 Glacier Deep Archive offers the lowest cost storage in the cloud at prices significantly lower than storing and maintaining data in on-premises tape or archiving data off-site.

November 2020 – Amazon S3 Intelligent-Tiering adds Archive Access and Deep Archive Access tiers
In November 2018, we launched Amazon S3 Intelligent-Tiering, the only cloud storage class that delivers automatic storage cost savings, up to 95 percent when data access patterns change, without performance impact or operational overhead. In order to offer customers the simplicity and flexibility of S3 Intelligent-Tiering and the low storage cost of archival data, we added the Archive Access tier providing the same performance and pricing as the S3 Glacier storage class as well as the Deep Archive Access tier which offers the same performance and pricing as the S3 Glacier Deep Archive storage class.

November 2021 – Amazon S3 Glacier Flexible Retrieval and S3 Glacier Instant Retrieval
The Amazon S3 Glacier storage class was renamed to Amazon S3 Glacier Flexible Retrieval and now includes free bulk retrievals along with an additional 10 percent price reduction across all Regions, making it optimized for use cases such as backup and disaster recovery.

Additionally, customers asked us for a storage solution that had the low costs of Glacier but allowed for fast access when data was needed very quickly. So, we introduced Amazon S3 Glacier Instant Retrieval, a new archive storage class that delivers the lowest cost storage for long-lived data that is rarely accessed and requires milliseconds retrieval. You can save up to 68 percent on storage costs compared to using the S3 Standard-Infrequent Access (S3 Standard-IA) storage class when your data is accessed once per quarter.

The Amazon S3 Intelligent-Tiering storage class also recently added a new Archive Instant Access tier, providing the same performance and pricing as the S3 Glacier Instant Retrieval storage class which delivers automatic 68% cost savings for customers using S3 Intelligent-Tiering with long-lived data.

Then and Now
Customers across all industries and verticals use the S3 Glacier storage classes for every imaginable archival workload. Accessing and using the S3 Glacier storage classes through the S3 APIs and S3 console provides enhanced functionality for data management and cost optimization.

As we discussed above, you can now choose from three archive storage classes optimized for different access patterns and storage duration:

S3 Glacier Instant Retrieval – For archive data that needs immediate access, such as medical images, news media assets, or genomics data, choose the S3 Glacier Instant Retrieval storage class, an archive storage class that delivers the lowest cost storage with milliseconds retrieval.
S3 Glacier Flexible Retrieval – For archive data that does not require immediate access but needs to have the flexibility to retrieve large sets of data at no cost, such as backup or disaster recovery use cases, choose the S3 Glacier Flexible Retrieval storage class, with retrieval in minutes or free bulk retrievals in 12 hours.
S3 Glacier Deep Archive – For retaining data for 7–10 years or longer to meet customer needs and regulatory compliance requirements, such as financial services, healthcare, media and entertainment, and public sector, choose the S3 Glacier Deep Archive storage class, the lowest cost storage in the cloud with data retrieval within 12–48 hours.

Watch a brief introduction video for an overview of the S3 Glacier storage classes.

All S3 Glacier storage classes are designed for 99.999999999% (11 9s) of durability for objects. Data is redundantly stored across three or more Availability Zones that are physically separated within an AWS Region. Here are some comparisons across the S3 Glacier storage classes at a glance:

Performances	S3 Glacier Instant Retrieval	S3 Glacier Flexible Retrieval	S3 Glacier Deep Archive
Availability	99.9%	99.99%	99.99%
Availability SLA	99%	99.9%	99.9%
Minimum capacity charge per object	128 KB	40 KB	40 KB
Minimum storage duration charge	90 days	90 days	180 days
Retrieval charge	per GB	per GB	per GB
Retrieval time	milliseconds	Expedited (1–5 minutes), Standard (3–5 hours), Bulk (5–12 hours) free	Standard (within 12 hours), Bulk (within 48 hours)

For data with changing access patterns that you want to automatically archive based on the last access of that data, choose the S3 Intelligent-Tiering storage class. Doing so will optimize storage costs by automatically moving data to the most cost-effective access tier when access patterns change. Its Archive Instant Access, Archive Access, and Deep Archive Access tiers have the same performance as S3 Glacier Instant Retrieval, S3 Glacier Flexible Retrieval, and S3 Glacier Deep Archive respectively. To learn more, see the blog post Automatically archive and restore data with Amazon S3 Intelligent-Tiering.

To get started with S3 Glacier, see the blog post Best practices for archiving large datasets with AWS for key considerations and actions when planning your cold data storage patterns. You can also use a hands-on lab tutorial that will help you get started with the S3 Glacier storage classes in just 20 minutes, and start archiving your data in the S3 Glacier storage classes in the S3 console.

Happy Birthday, Amazon S3 Glacier!
During the last AWS Storage Day 2022, Kevin Miller, VP & GM of Amazon S3, mentioned the 10th anniversary of S3 Glacier and its pace of innovation for many customer use cases throughout his interview with theCUBE.

In this expanding world of data growth, you have to have an archiving strategy. Everyone has archival data — every company, every vertical, and every industry. There is an archiving need not only for companies that have been around for a while but also for digital native businesses.

Lots of AWS customers such as Nasdaq, Electronic Arts, and NASCAR have used S3 Glacier storage classes for their backup and archiving workloads. The following are some additional recent customer-authored blogs focusing on AWS archiving best practices from customers in the financial, media, gaming, and software industries.

A big thank you to all of our S3 Glacier customers from around the world! Over 90 percent of S3’s roadmap has come directly from feedback from customers like you. We will never stop listening to you, as your feedback and ideas are essential to how we improve the service. Thank you for trusting us and for constantly raising the bar and pushing us to improve to lower costs, simplify your storage, increase your agility, and allow you to innovate faster.

In accordance with Customer Obsession, one of the Amazon Leadership Principles, your feedback is always welcome! If you want to see new S3 Glacier features and capabilities, please send any feedback to AWS re:Post for S3 Glacier or through your usual AWS Support contacts.

– Channy

New – HTTP/3 Support for Amazon CloudFront

2022-08-16 Channy Yun

Post Syndicated from Channy Yun original https://aws.amazon.com/blogs/aws/new-http-3-support-for-amazon-cloudfront/

Amazon CloudFront is a content delivery network (CDN) service, a network of interconnected servers that is geographically closer to the users and reaches their computers much faster. Amazon CloudFront reduces latency by delivering data through 410+ globally dispersed Points of Presence (PoPs) with automated network mapping and intelligent routing.

With Amazon CloudFront, content, API requests and responses or applications can be delivered over Hypertext Transfer Protocol (HTTP) version 1.1, and 2.0 over the latest version of Transport Layer Security (TLS) to encrypt and secure communication between the user client and CloudFront.

Today we are adding HTTP version 3.0 (HTTP/3) support for Amazon CloudFront. HTTP/3 uses QUIC, a user datagram protocol-based, stream-multiplexed, and secure transport protocol that combines and improves upon the capabilities of existing transmission control protocol (TCP), TLS, and HTTP/2. Now, you can enable HTTP/3 for end user connections in all new and existing CloudFront distributions on all edge locations worldwide, and there is no additional charge for using this feature.

What is HTTP/3?
HTTP/3 uses QUIC and overcomes many of TCP’s limitations and bring those benefits to HTTP. When using existing HTTP/2 over TCP and TLS, TCP needs a handshake to establish a session between a client and server, and TLS also needs its own handshake to ensure that the session is secured. Each handshake has to make the full round trip between client and server, which can take a long time when client and server and far apart, network-wise. But, QUIC only needs a single handshake to establish a secure session.

Also, TCP is understood and manipulated by a myriad of different middleboxes, such as firewalls and network address translation (NAT) devices. QUIC uses UDP as its basis to allow packet flows in an enterprise or public network and is fully encrypted, including the metadata, which makes middleboxes unable to inspect or manipulate its details.

HTTP/3 streams are multiplexed independently to eliminate head-of-line blocking between requests and responses. This is possible because stream multiplexing occurs in the transport layer as opposed to the application layer like HTTP/2 over TCP. This enables web applications to perform faster, especially over slow networks and latency-sensitive connections.

Benefits of HTTP/3 on CloudFront
Our customers always want to provide faster, more responsive and secure experience on the web for end users. HTTP/3 provides benefits to all CloudFront customers in the form of faster connection times, stream multiplexing, client-side connection migration, and fewer round trips in the handshake process to reduce error rates.

QUIC connections over UDP support connection reuse with a connection ID independent from IP address/port tuples so users have no interruption or impact. Customers operating in countries with low network connectivity will see improved performance from their applications.

CloudFront’s HTTP/3 support provides enhanced security built on top of s2n-quic, an open-source Rust implementation of the QUIC protocol added to our set of AWS encryption open-source libraries, both with a strong emphasis on efficiency and performance.

If you enable HTTP/3 in CloudFront distributions, the users can make HTTP/3 viewer request to CloudFront edge locations. Past the edge location, we have highly reliable networks within AWS Cloud and CloudFront will continue to use HTTP/1.1 for origin fetches. So, you don’t need to make any server-side changes in order to make your content accessible via HTTP/3.

For some types of applications, like those requiring an HTTP client library to make HTTP requests, customers may need to update their HTTP client library to a version that supports HTTP/3. But if for some operational reason clients cannot establish a QUIC connection, they can fall back to another supported protocol such as HTTP/1.1 or HTTP/2.

How to Enable HTTP/3
To enable HTTP/3 connection, you can edit the distribution configuration through the CloudFront console. You can select HTTP/3 in Supported HTTP versions on an existing distribution or create a new distribution without any changes to origin. You can use the UpdateDistribution API or use the CloudFormation template.

After deploying your distribution, you can connect with a browser that supports HTTP/3, such as the latest version of Google Chrome, Mozilla Firefox, and Microsoft Edge, and Apple Safari after turning it on manually. To learn more about web browser support, see the Can I Use – HTTP/3 Support page.

From web developer tools in your browser, you can see the HTTP/3 requests made when a page is loaded from the CloudFront. The image below is an example of Mozilla Firefox.

You can also add HTTP/3 support to Curl and test from the command line:

$ curl --http3 -i https://d1e0fmnut9xxxxx.cloudfront.net/speed.html
HTTP/3 200
content-type: text/html
content-length: 9286
date: Fri, 05 Aug 2022 15:49:52 GMT
last-modified: Thu, 28 Jul 2022 00:50:38 GMT
etag: "d928997023f6479537940324aeddabb3"
x-amz-version-id: mdUmFuUfVaSHPseoVPRoOKGuUkzWeUhK
accept-ranges: bytes
server: AmazonS3
vary: Origin
x-cache: Miss from cloudfront
via: 1.1 6e4f43c5af08f740d02d21f990dfbe80.cloudfront.net (CloudFront)
x-amz-cf-pop: ICN54-C2
alt-svc: h3=":443"; ma=86400
x-amz-cf-id: 6fy8rrUrtqDMrgoc7iJ73kzzXzHz7LQDg73R0lez7_nEXa3h9uAlCQ==

Customer Stories
Several AWS customers including Snap, Zillow, AC3/Movember, Audible, Skyscanner have already enabled HTTP/3 on their CloudFront distributions. Here are some of their voices:

Snap Inc is a social media company that offers Snapchat, an app that offers a fast and fun way to connect with close friends to its community around the world. On AWS, Snap now supports more than 306 million Snapchat users sending over 5.4 billion Snaps daily with 20 percent less latency than its prior architecture.

Mahmoud Ragab, Software Engineering Manager at Snapchat said:

“Snapchat helps millions of people around the world to share moments with friends. At Snapchat, we strive to be the fastest way to communicate. This is why we have been partnering with Amazon Cloudfront for fast, high-performance, low latency content delivery, leveraging QUIC on Cloudfront.

It offers significant advantages while sending and receiving content, especially in networks with lossy signals and intermittent connectivity. Improvements offered by QUIC, like zero round-trip time (0-RTT) connection setup and improved congestion control enables an average of 10% reduction in time to first byte (TTFB) while lowering overall error rates. Lower network latencies and errors make Snapchat better for people all over the world.

With early access to QUIC, we’ve been able to experiment and quickly iterate and improve server-side implementation and optimize integration between the client and the server. Both companies will continue to collaborate together as QUIC is made more widely available.”

Zillow is a real estate tech company that offer its customers an on-demand experience for selling, buying, renting and financing with transparency and nearly seamless end-to-end service. Since 2015, Zillow has increased the availability of its imaging system by using Amazon S3 and Amazon CloudFront.

Craig Link, Chief Cloud Architect at Zillow said:

“We are excited about the launch of HTTP/3 support for Amazon CloudFront. Enabling HTTP/3 on CloudFront was a seamless transition and our synthetic test and ad-hoc usage continued working without issue.”

AC3 is an Australia-based AWS Managed Services partner and has supported our customer, Movember Foundation, one of the leading charities for men’s health. Running an international charity that handles donations, data, events, and localized websites in 21 countries can pose some technical challenges. Born in the cloud, Movember has leveraged AWS technology in adopting new working models, ensuring a flexible IT platform, and innovating faster.

Greg Cockburn, Head of Hyperscale Cloud at AC3 said:

“AC3 is excited to work with their longtime partner Movember enabling HTTP3 on their CloudFront distributions serving web and API frontends and is encouraged by the performance improvements seen in the initial results.”

Now Available
The HTTP/3 support for Amazon CloudFront is now available in all 410+ CloudFront edge locations worldwide with no additional charge for using this feature. To learn more, see the FAQ and Developer Guide of Amazon CloudFront. Please send feedback to AWS re:Post for Amazon CloudFront or through your usual AWS support contacts.

– Channy

AWS Week in Review – August 15, 2022

2022-08-15 Channy Yun

Post Syndicated from Channy Yun original https://aws.amazon.com/blogs/aws/aws-week-in-review-august-15-2022/

I love the AWS Twitch channel for watching interesting online live shows such as AWS On Air, Containers from the Couch, and Serverless Office Hours.

Last week, AWS Storage Day 2022 was hosted virtually on the AWS Twitch channel and covered recent announcements and insights that address customers’ needs to reduce and optimize storage costs and build data resiliency into their organization. For example, we pre-announced Amazon File Cache, an upcoming new service on AWS that accelerates and simplifies hybrid cloud workloads. To learn more, watch the on-demand recording.

Two weeks ago, AWS Silicon Innovation Day 2022 was also hosted on the AWS Twitch channel. This event covered an overview of our history of silicon development and provided useful sessions on specific AWS chip innovations such as AWS Nitro, AWS Graviton, AWS Inferencia, and AWS Trainium. To learn more, watch the on-demand recording. If you don’t miss such useful live events or online shows, check out the upcoming live schedule!

Last Week’s Launches
Here are some launches that caught my eye last week:

AWS Private 5G – With the general availability of AWS Private 5G, you can easily make your own private mobile networks with a powerful box of hardware and software for 4G/LTE mobile networks. This cool new service lets you easily install, operate, and scale high reliability and low latency of a private cellular network in a matter of days and does not require any specialized expertise. You pay only for the network coverage and capacity that you need.

AWS DeepRacer Student Community Races – Educators and event organizers can now create their own private virtual autonomous racing league for students by powering a 1/18th scale race car driven by reinforcement learning. They can select their own track, race date, and time and invite students to participate through a unique link for their event. To learn more, see the AWS DeepRacer Developer Guide.

Amazon SageMaker Updates – Amazon SageMaker Automatic Model Tuning now supports specifying multiple alternate SageMaker training instance types to make tuning jobs more robust when the preferred instance type is not available due to insufficient capacity. SageMaker Model Pipelines supports securely sharing pipeline entities across AWS accounts and access to shared pipelines through direct API calls. SageMaker Canvas expands capabilities to better prepare and analyze data, including replacing missing values and outliers and the flexibility to choose different sample sizes for your datasets.

Amazon Personalize Updates – Amazon Personalize supports incremental bulk dataset imports, a new option for updating your data and improving the quality of your recommendations. Also, Amazon Personalize allows you to promote specific items in all users’ recommendations based on rules that align with your business goals.

AWS Partner Program Updates – We announce the new AWS Transfer Family Delivery Program for AWS Partners that helps customers build sophisticated Managed File Transfer (MFT) and business-to-business (B2B) file exchange solutions with AWS Transfer Family. Also, we introduce the new AWS Supply Chain Competency, featuring top AWS Partners who provide professional services and cloud-native supply chain solutions on AWS.

For a full list of AWS announcements, be sure to keep an eye on the What’s New at AWS page.

Other AWS News
Here are some other news items that you may find interesting:

AWS CDK for Terraform – Two years ago, AWS began collaborating with HashiCorp to develop Cloud Development Kit for Terraform (CDKTF), an open-source tool that provides a developer-friendly workflow for deploying cloud infrastructure with Terraform in their preferred programming language. The CDKTF is now generally available, so try CDK for Terraform and AWS CDK.

Smithy Interface Definition Language (IDL) 2.0 – Smithy is Amazon’s next-generation API modeling language, based on our experience building tens of thousands of services and generating SDKs. This release focuses on improving the developer experience of authoring Smithy models and using code generated from Smithy models.

Serverless Snippets Collection – The AWS Serverless Developer Advocate team introduces the snippets collection to enable reusable, tested, and recommended snippets driven and maintained by the community. Builders can use serverless snippets to find and integrate tools and code examples to help with their development workflow. I recommend searching other useful resources such as Serverless patterns and workflows collection to get started on your serverless application.

Upcoming AWS Events
Check your calendars and sign up for these AWS events:

AWS Summit

AWS Summit – Registration is open for upcoming in-person AWS Summits that might be close to you in August and September: Anaheim (August 18), Chicago (August 28), Canberra (August 31), Ottawa (September 8), New Delhi (September 9), and Mexico City (September 21–22).

AWS Innovate – Data Edition – On August 23, learn how a modern data strategy can support your present and future use cases, including steps to build an end-to-end data solution to store and access, analyze and visualize, and even predict.

AWS Innovate – For Every Application Edition – On August 25, learn about a wide selection of AWS solutions across compute, storage, networking, hybrid, and edge infrastructure to help you scale application resources seamlessly and optimally.

Although these two Innovate events will be held in the Asia Pacific and Japan time zones, you can view on-demand videos for two months following your registration.

Also, we are preparing 16 upcoming online tech talks on August 15–26 to cover a range of topics and expertise levels and feature technical deep dives, demonstrations, customer examples, and live Q&A with AWS experts.

That’s all for this week. Check back next Monday for another Week in Review!

— Channy

This post is part of our Week in Review series. Check back each week for a quick roundup of interesting news and announcements from AWS!

New – Run Visual Studio Software on Amazon EC2 with User-Based License Model

2022-08-03 Channy Yun

Post Syndicated from Channy Yun original https://aws.amazon.com/blogs/aws/new-run-visual-studio-software-on-amazon-ec2-with-user-based-license-model/

We announce the general availability of license-included Visual Studio software on Amazon Elastic Cloud Compute (Amazon EC2) instances. You can now purchase fully compliant AWS-provided licenses of Visual Studio with a per-user subscription fee. Amazon EC2 provides preconfigured Amazon Machine Images (AMIs) of Visual Studio Enterprise 2022 and Visual Studio Professional 2022. You can launch on-demand Windows instances including Visual Studio and Windows Server licenses without long-term licensing commitments.

Amazon EC2 provides a broad choice of instances, and customers not only have the flexibility of paying for what their end users use but can also provide the capacity and right hardware to their end users. You can simply launch EC2 instances using license-included AMIs, and multiple authorized users can connect to these EC2 instances by using Remote Desktop software. Your administrator can authorize users centrally using AWS License Manager and AWS Managed Microsoft Active Directory (AD).

Configure Visual Studio License with AWS License Manager
As a prerequisite, your administrator needs to create an instance of AWS Managed Microsoft AD and allow AWS License Manager to onboard to it by accepting permission. To set up authorized users, see AWS Managed Microsoft AD documentation.

AWS License Manager makes it easier to manage your software licenses from vendors such as Microsoft, SAP, Oracle, and IBM across AWS and on-premises environments. To display a list of available Visual Studio software licenses, select User-based subscriptions in the AWS Licence Manager console.

You can see listed products to support user-based subscriptions. Each product has a descriptive name, a count of the subscribed users to utilize the product, and whether the subscription has been activated for use with a directory. Also, you are required to purchase Remote Desktop Services SAL licenses in the same way as Visual Studio by authorizing users for those licenses.

When you select Visual Studio Professional, you can see product details and subscribed users. By selecting Subscribe users, you can add authorized users to the license of Visual Studio Professional software.

You can perform the administrative tasks using the AWS Command Line Interface (CLI) tools via AWS License Manager APIs. For example, you can subscribe a user to the product in your Active Directory.

$ aws license-manager-user-subscriptions start-product-subscription \
         --username vscode2 \
         --product VISUAL_STUDIO_PROFESSIONAL \
         --identity-provider " \
                "ActiveDirectoryIdentityProvider" = \
                {"DirectoryId" = "d-9067b110b5"}" 
         --endpoint-url https://license-manager-user-subscriptions.us-east-1.amazonaws.com

To launch a Windows instance with preconfigured Visual Studio software, go to the EC2 console and select Launch instances. In the Application and OS Images (Amazon Machine Image), search for “Visual Studio on EC2,” and you can find AMIs under the Quickstart AMIs and AWS Marketplace AMIs tabs.

After launching your Windows instance, your administrator associates a user to the product in the Instances screen of the License Manager console. You can see the listed instances were launched using an AMI to provide the specified product to users who can then be associated.

These steps will be performed by the administrators who are responsible for managing users, instances, and costs across the organization. To learn more about administrative tasks, see User-based subscriptions in AWS License Manager.

Run Visual Studio Software on EC2 Instances
Once administrators authorize end users and launch the instances, you can remotely connect to Visual Studio instances using your AD account information shared by your administrator via Remote Desktop software. That’s all!

The instances deployed for user-based subscriptions must remain as managed nodes with AWS Systems Manager. For more information, see Troubleshooting managed node availability and Troubleshooting SSM Agent in the AWS Systems Manager User Guide.

Now Available
License-included Visual Studio on Amazon EC2 is now available in all AWS commercial and public Regions. You are billed per user for licenses of Visual Studio through a monthly subscription and per vCPU for license-included Windows Server instances on EC2. You can use On-Demand Instances, Reserved Instances, and Savings Plan pricing models like you do today for EC2 instances.

To learn more, visit our License Manager User Based Subscriptions documentation, and please send any feedback to AWS re:Post for EC2 or through your usual AWS Support contacts.

– Channy

Noise

All posts by Channy Yun

New – Trusted Language Extensions for PostgreSQL on Amazon Aurora and Amazon RDS

Preview: Use Amazon SageMaker to Build, Train, and Deploy ML Models Using Geospatial Data

Introducing Amazon Omics – A Purpose-Built Service to Store, Query, and Analyze Genomic and Biological Data at Scale

New – Amazon EC2 Hpc6id Instances Optimized for High Performance Computing

Preview: Amazon Security Lake – A Purpose-Built Customer-Owned Data Lake Service

New – Amazon Redshift Integration with Apache Spark

Preview: Amazon OpenSearch Serverless – Run Search and Analytics Workloads without Managing Clusters

New – AWS Marketplace for Containers Now Supports Direct Deployment to Amazon EKS Clusters

New – A Fully Managed Schema Conversion in AWS Database Migration Service

AWS Application Migration Service Major Updates – New Migration Servers Grouping, Updated Launch, and Post-Launch Template

New – Fully Managed Blue/Green Deployments in Amazon Aurora and Amazon RDS

New – Amazon ECS Service Connect Enabling Easy Communication Between Microservices

Now Open the 30th AWS Region – Asia Pacific (Hyderabad) Region in India

AWS Week in Review – October 24, 2022

AWS IoT FleetWise Now Generally Available – Easily Collect Vehicle Data and Send to the Cloud

New – AWS Support App in Slack to Manage Support Cases

Happy 10th Anniversary, Amazon S3 Glacier – A Decade of Cold Storage in the Cloud

New – HTTP/3 Support for Amazon CloudFront

AWS Week in Review – August 15, 2022

New – Run Visual Studio Software on Amazon EC2 with User-Based License Model

The collective thoughts of the interwebz