Tag Archives: S3 Select

FICO: Fraud Detection and Anti-Money Laundering with AWS Lambda and AWS Step Functions

Post Syndicated from Annik Stahl original https://aws.amazon.com/blogs/architecture/fico-fraud-detection-and-anti-money-laundering-with-aws-lambda-and-aws-step-functions/

In this episode of This is My Architecture, filmed in 2018 on the last day of re:Invent (a learning conference hosted by Amazon Web Services for the global cloud computing community), FICO lead Software Engineer Sven Ahlfeld talks to AWS Solutions Architect Tom Jones about how the company uses a combination of AWS Lambda and AWS Step Functions to architect an on-demand solution for fraud detection and anti-money laundering.

When you think of FICO, you probably thing credit score. And that’s true: founded in 1956, FICO introduced analytic solutions–such as credit scoring–that have made credit more widely available in the US and around the world. As well, the FICO score is the standard measure of consumer risk in the US.

In the video, Sven explains that FICO is making software to meet regulatory compliance goals and requirements, in this case to tackle money laundering. FICO ingests a massive amounts of customer data in the form of financial documents into S3, and then uses S3 to trigger and analyze each document for a number of different fraud and money laundering characteristics.

Key architecture components are designed to be immutable, assuring that the EC2 instances doing the analysis work themselves can’t be compromised and tampered with. But as well, an unchangeable instance can scale very fast and allow for the ability to ingest a large amount of documents. It can also scale back when there is less demand. The immutable images also support regulatory requirements for the various needs and regulations of localities around the world.

 

*Check out more This Is My Architecture videos on YouTube.

About the author

Annik StahlAnnik Stahl is a Senior Program Manager in AWS, specializing in blog and magazine content as well as customer ratings and satisfaction. Having been the face of Microsoft Office for 10 years as the Crabby Office Lady columnist, she loves getting to know her customers and wants to hear from you.

Optimizing a Lift-and-Shift for Cost Effectiveness and Ease of Management

Post Syndicated from Jonathan Shapiro-Ward original https://aws.amazon.com/blogs/architecture/optimizing-a-lift-and-shift-for-cost/

Lift-and-shift is the process of migrating a workload from on premise to AWS with little or no modification. A lift-and-shift is a common route for enterprises to move to the cloud, and can be a transitionary state to a more cloud native approach. This is the second blog post in a three-part series which investigates how to optimize a lift-and-shift workload. The first post is about performance.

A key concern that many customers have with a lift-and-shift is cost. If you move an application as is  from on-prem to AWS, is there any possibility for meaningful cost savings? By employing AWS services, in lieu of self-managed EC2 instances, and by leveraging cloud capability such as auto scaling, there is potential for significant cost savings. In this blog post, we will discuss a number of AWS services and solutions that you can leverage with minimal or no change to your application codebase in order to significantly reduce management costs and overall Total Cost of Ownership (TCO).

Automate

Even if you can’t modify your application, you can change the way you deploy your application. The adopting-an-infrastructure-as-code approach can vastly improve the ease of management of your application, thereby reducing cost. By templating your application through Amazon CloudFormation, Amazon OpsWorks, or Open Source tools you can make deploying and managing your workloads a simple and repeatable process.

As part of the lift-and-shift process, rationalizing the workload into a set of templates enables less time to spent in the future deploying and modifying the workload. It enables the easy creation of dev/test environments, facilitates blue-green testing, opens up options for DR, and gives the option to roll back in the event of error. Automation is the single step which is most conductive to improving ease of management.

Reserved Instances and Spot Instances

A first initial consideration around cost should be the purchasing model for any EC2 instances. Reserved Instances (RIs) represent a 1-year or 3-year commitment to EC2 instances and can enable up to 75% cost reduction (over on demand) for steady state EC2 workloads. They are ideal for 24/7 workloads that must be continually in operation. An application requires no modification to make use of RIs.

An alternative purchasing model is EC2 spot. Spot instances offer unused capacity available at a significant discount – up to 90%. Spot instances receive a two-minute warning when the capacity is required back by EC2 and can be suspended and resumed. Workloads which are architected for batch runs – such as analytics and big data workloads – often require little or no modification to make use of spot instances. Other burstable workloads such as web apps may require some modification around how they are deployed.

A final alternative is on-demand. For workloads that are not running in perpetuity, on-demand is ideal. Workloads can be deployed, used for as long as required, and then terminated. By leveraging some simple automation (such as AWS Lambda and CloudWatch alarms), you can schedule workloads to start and stop at the open and close of business (or at other meaningful intervals). This typically requires no modification to the application itself. For workloads that are not 24/7 steady state, this can provide greater cost effectiveness compared to RIs and more certainty and ease of use when compared to spot.

Amazon FSx for Windows File Server

Amazon FSx for Windows File Server provides a fully managed Windows filesystem that has full compatibility with SMB and DFS and full AD integration. Amazon FSx is an ideal choice for lift-and-shift architectures as it requires no modification to the application codebase in order to enable compatibility. Windows based applications can continue to leverage standard, Windows-native protocols to access storage with Amazon FSx. It enables users to avoid having to deploy and manage their own fileservers – eliminating the need for patching, automating, and managing EC2 instances. Moreover, it’s easy to scale and minimize costs, since Amazon FSx offers a pay-as-you-go pricing model.

Amazon EFS

Amazon Elastic File System (EFS) provides high performance, highly available multi-attach storage via NFS. EFS offers a drop-in replacement for existing NFS deployments. This is ideal for a range of Linux and Unix usecases as well as cross-platform solutions such as Enterprise Java applications. EFS eliminates the need to manage NFS infrastructure and simplifies storage concerns. Moreover, EFS provides high availability out of the box, which helps to reduce single points of failure and avoids the need to manually configure storage replication. Much like Amazon FSx, EFS enables customers to realize cost improvements by moving to a pay-as-you-go pricing model and requires a modification of the application.

Amazon MQ

Amazon MQ is a managed message broker service that provides compatibility with JMS, AMQP, MQTT, OpenWire, and STOMP. These are amongst the most extensively used middleware and messaging protocols and are a key foundation of enterprise applications. Rather than having to manually maintain a message broker, Amazon MQ provides a performant, highly available managed message broker service that is compatible with existing applications.

To use Amazon MQ without any modification, you can adapt applications that leverage a standard messaging protocol. In most cases, all you need to do is update the application’s MQ endpoint in its configuration. Subsequently, the Amazon MQ service handles the heavy lifting of operating a message broker, configuring HA, fault detection, failure recovery, software updates, and so forth. This offers a simple option for reducing management overhead and improving the reliability of a lift-and-shift architecture. What’s more is that applications can migrate to Amazon MQ without the need for any downtime, making this an easy and effective way to improve a lift-and-shift.

You can also use Amazon MQ to integrate legacy applications with modern serverless applications. Lambda functions can subscribe to MQ topics and trigger serverless workflows, enabling compatibility between legacy and new workloads.

Integrating Lift-and-Shift Workloads with Lambda via Amazon MQ

Figure 1: Integrating Lift-and-Shift Workloads with Lambda via Amazon MQ

Amazon Managed Streaming Kafka

Lift-and-shift workloads which include a streaming data component are often built around Apache Kafka. There is a certain amount of complexity involved in operating a Kafka cluster which incurs management and operational expense. Amazon Kinesis is a managed alternative to Apache Kafka, but it is not a drop-in replacement. At re:Invent 2018, we announced the launch of Amazon Managed Streaming Kafka (MSK) in public preview. MSK provides a managed Kafka deployment with pay-as-you-go pricing and an acts as a drop-in replacement in existing Kafka workloads. MSK can help reduce management costs and improve cost efficiency and is ideal for lift-and-shift workloads.

Leveraging S3 for Static Web Hosting

A significant portion of any web application is static content. This includes videos, image, text, and other content that changes seldom, if ever. In many lift-and-shifted applications, web servers are migrated to EC2 instances and host all content – static and dynamic. Hosting static content from an EC2 instance incurs a number of costs including the instance, EBS volumes, and likely, a load balancer. By moving static content to S3, you can significantly reduce the amount of compute required to host your web applications. In many cases, this change is non-disruptive and can be done at the DNS or CDN layer, requiring no change to your application.

Reducing Web Hosting Costs with S3 Static Web Hosting

Figure 2: Reducing Web Hosting Costs with S3 Static Web Hosting

Conclusion

There are numerous opportunities for reducing the cost of a lift-and-shift. Without any modification to the application, lift-and-shift workloads can benefit from cloud-native features. By using AWS services and features, you can significantly reduce the undifferentiated heavy lifting inherent in on-prem workloads and reduce resources and management overheads.

About the author

Dr. Jonathan Shapiro-Ward is an AWS Solutions Architect based in Toronto. He helps customers across Canada to transform their businesses and build industry leading cloud solutions. He has a background in distributed systems and big data and holds a PhD from the University of St Andrews.

Optimizing a Lift-and-Shift for Performance

Post Syndicated from Jonathan Shapiro-Ward original https://aws.amazon.com/blogs/architecture/optimizing-a-lift-and-shift-for-performance/

Many organizations begin their cloud journey with a lift-and-shift of applications from on-premise to AWS. This approach involves migrating software deployments with little, or no, modification. A lift-and-shift avoids a potentially expensive application rewrite but can result in a less optimal workload that a cloud native solution. For many organizations, a lift-and-shift is a transitional stage to an eventual cloud native solution, but there are some applications that can’t feasibly be made cloud-native such as legacy systems or proprietary third-party solutions. There are still clear benefits of moving these workloads to AWS, but how can they be best optimized?

In this blog series post, we’ll look at different approaches for optimizing a black box lift-and-shift. We’ll consider how we can significantly improve a lift-and-shift application across three perspectives: performance, cost, and security. We’ll show that without modifying the application we can integrate services and features that will make a lift-and-shift workload cheaper, faster, more secure, and more reliable. In this first blog, we’ll investigate how a lift-and-shift workload can have improved performance through leveraging AWS features and services.

Performance gains are often a motivating factor behind a cloud migration. On-premise systems may suffer from performance bottlenecks owing to legacy infrastructure or through capacity issues. When performing a lift-and-shift, how can you improve performance? Cloud computing is famous for enabling horizontally scalable architectures but many legacy applications don’t support this mode of operation. Traditional business applications are often architected around a fixed number of servers and are unable to take advantage of horizontal scalability. Even if a lift-and-shift can’t make use of auto scaling groups and horizontal scalability, you can achieve significant performance gains by moving to AWS.

Scaling Up

The easiest alternative to scale up to compute is vertical scalability. AWS provides the widest selection of virtual machine types and the largest machine types. Instances range from small, burstable t3 instances series all the way to memory optimized x1 series. By leveraging the appropriate instance, lift-and-shifts can benefit from significant performance. Depending on your workload, you can also swap out the instances used to power your workload to better meet demand. For example, on days in which you anticipate high load you could move to more powerful instances. This could be easily automated via a Lambda function.

The x1 family of instances offers considerable CPU, memory, storage, and network performance and can be used to accelerate applications that are designed to maximize single machine performance. The x1e.32xlarge instance, for example, offers 128 vCPUs, 4TB RAM, and 14,000 Mbps EBS bandwidth. This instance is ideal for high performance in-memory workloads such as real time financial risk processing or SAP Hana.

Through selecting the appropriate instance types and scaling that instance up and down to meet demand, you can achieve superior performance and cost effectiveness compared to running a single static instance. This affords lift-and-shift workloads far greater efficiency that their on-prem counterparts.

Placement Groups and C5n Instances

EC2 Placement groups determine how you deploy instances to underlying hardware. One can either choose to cluster instances into a low latency group within a single AZ or spread instances across distinct underlying hardware. Both types of placement groups are useful for optimizing lift-and-shifts.

The spread placement group is valuable in applications that rely on a small number of critical instances. If you can’t modify your application  to leverage auto scaling, liveness probes, or failover, then spread placement groups can help reduce the risk of simultaneous failure while improving the overall reliability of the application.

Cluster placement groups help improve network QoS between instances. When used in conjunction with enhanced networking, cluster placement groups help to ensure low latency, high throughput, and high network packets per second. This is beneficial for chatty applications and any application that leveraged physical co-location for performance on-prem.

There is no additional charge for using placement groups.

You can extend this approach further with C5n instances. These instances offer 100Gbps networking and can be used in placement group for the most demanding networking intensive workloads. Using both placement groups and the C5n instances require no modification to your application, only to how it is deployed – making it a strong solution for providing network performance to lift-and-shift workloads.

Leverage Tiered Storage to Optimize for Price and Performance

AWS offers a range of storage options, each with its own performance characteristics and price point. Through leveraging a combination of storage types, lift-and-shifts can achieve the performance and availability requirements in a price effective manner. The range of storage options include:

Amazon EBS is the most common storage service involved with lift-and-shifts. EBS provides block storage that can be attached to EC2 instances and formatted with a typical file system such as NTFS or ext4. There are several different EBS types, ranging from inexpensive magnetic storage to highly performant provisioned IOPS SSDs. There are also storage-optimized instances that offer high performance EBS access and NVMe storage. By utilizing the appropriate type of EBS volume and instance, a compromise of performance and price can be achieved. RAID offers a further option to optimize EBS. EBS utilizes RAID 1 by default, providing replication at no additional cost, however an EC2 instance can apply other RAID levels. For instance, you can apply RAID 0 over a number of EBS volumes in order to improve storage performance.

In addition to EBS, EC2 instances can utilize the EC2 instance store. The instance store provides ephemeral direct attached storage to EC2 instances. The instance store is included with the EC2 instance and provides a facility to store non-persistent data. This makes it ideal for temporary files that an application produces, which require performant storage. Both EBS and the instance store are expose to the EC2 instance as block level devices, and the OS can use its native management tools to format and mount these volumes as per a traditional disk – requiring no significant departure from the on prem configuration. In several instance types including the C5d and P3d are equipped with local NVMe storage which can support extremely IO intensive workloads.

Not all workloads require high performance storage. In many cases finding a compromise between price and performance is top priority. Amazon S3 provides highly durable, object storage at a significantly lower price point than block storage. S3 is ideal for a large number of use cases including content distribution, data ingestion, analytics, and backup. S3, however, is accessible via a RESTful API and does not provide conventional file system semantics as per EBS. This may make S3 less viable for applications that you can’t easily modify, but there are still options for using S3 in such a scenario.

An option for leveraging S3 is AWS Storage Gateway. Storage Gateway is a virtual appliance than can be run on-prem or on EC2. The Storage Gateway appliance can operate in three configurations: file gateway, volume gateway and tape gateway. File gateway provides an NFS interface, Volume Gateway provides an iSCSI interface, and Tape Gateway provides an iSCSI virtual tape library interface. This allows files, volumes, and tapes to be exposed to an application host through conventional protocols with the Storage Gateway appliance persisting data to S3. This allows an application to be agnostic to S3 while leveraging typical enterprise storage protocols.

Using S3 Storage via Storage Gateway

Figure 1: Using S3 Storage via Storage Gateway

Conclusion

A lift-and-shift can achieve significant performance gains on AWS by making use of a range of instance types, storage services, and other features. Even without any modification to the application, lift-and-shift workloads can benefit from cutting edge compute, network, and IO which can help realize significant, meaningful performance gains.

About the author

Dr. Jonathan Shapiro-Ward is an AWS Solutions Architect based in Toronto. He helps customers across Canada to transform their businesses and build industry leading cloud solutions. He has a background in distributed systems and big data and holds a PhD from the University of St Andrews.

How to Efficiently Extract and Query Tagged Resources Using the AWS Resource Tagging API and S3 Select (SQL)

Post Syndicated from Marcilio Mendonca original https://aws.amazon.com/blogs/architecture/how-to-efficiently-extract-and-query-tagged-resources-using-the-aws-resource-tagging-api-and-s3-select-sql/

AWS customers can use tags to assign metadata to their AWS resources. Each tag is a simple label consisting of a customer-defined key and an optional value that can make it easier to manage, search for, and filter resources. Although there are no inherent types of tags, they enable customers to categorize resources by multiple criteria such as purpose, owner and, environment.

Once a tagging strategy is defined and enforced, customers can use the AWS Tag Editor to view and manage tags on their AWS resources, regardless of service or region. They can use the tag editor to search for resources by resource type, region, or tag, and then manage the tags applied to those resources.

However, customers have asked for guidance on how to build custom automation mechanisms to extract and query tagged resources so that they can extend the built-in functionalities of the Tag Editor. For instance, customers can build automation to generate custom CSV files for tagged resources and perhaps use SQL to query those resources. In addition, automation allows customers to add validation checks to their CI/CD deployment pipelines, for instance, to check whether resources have been properly tagged.

In this blog post, we introduce a simple yet efficient AWS architecture for extracting and querying tagged resources based on AWS cloud-native features such as the Resource Tagging API and S3 Select. We provide sample code for the architecture discussed that can help customers to customize and/or extend the architecture for their own purpose. By relying on AWS cloud-native features, customers can save time and reduce costs while still being able to do customizations.

For customers unfamiliar with the Resource Tagging API and the S3 Select features, below is a very brief introduction.

Resource Tagging API
AWS customers can use the Resource Tagging API to programatically access the same resource group operations that had been accessible only from the AWS Management Console by now using the AWS SDKs or the AWS Command Line Interface (CLI). By doing so, customers can build automation that fits their need, e.g., code that extract, export, and queries tagged resources.

For further details, please read Resource Groups Tagging – Reference

S3 Select
S3 Select enables applications to retrieve only a subset of data from an object by using simple SQL expressions. By using S3 Select to retrieve only the data needed by the application, customers can achieve drastic performance increases – in many cases you can get as much as a 400% improvement.

For further details, please read:

The Overall Solution Architecture

The figure above depict the overall architecture discussed in this post. It is a simple yet efficient architecture for extracting and querying tagged resources based on AWS cloud-native features. The Resource Tagging API is used to extract tagged resources from one or more AWS accounts via the Python AWS SDK, then a custom CSV file is generated and pushed to S3. Once in S3, the tagged resources file can now be efficiently queried via S3 Select also using Python AWS SDK. By leveraging S3 Select, we can now use SQL to query tagged resources and save on S3 data transfer costs since only the filtered results will be returned directly from S3. Pretty neat, eh?

The Extract Process
The extract process was built using Python 3 and relies on the Resource Tagging API to fetch pages of tagged resources and export them to CSV using the csv Python library.

We start importing the required libraries (boto3 is the AWS SDK for Python, argparse helps managing input parameters, and csv supports building valid CSV files):

import boto3
import argparse
import csv

Then, we define the header columns to use when generating the CSV files containing all tagged resources and the writeToCsv function:

field_names = ['ResourceArn', 'TagKey', 'TagValue']

def writeToCsv(writer, args, tag_list):
    for resource in tag_list:
        print("Extracting tags for resource: " +
              resource['ResourceARN'] + "...")
        for tag in resource['Tags']:
            row = dict(
                ResourceArn=resource['ResourceARN'], TagKey=tag['Key'], TagValue=tag['Value'])
            writer.writerow(row)

We take the CSV output file path as a required parameter so that users can specificy the desired output file name using the argparse library:

def input_args():
    parser = argparse.ArgumentParser()
    parser.add_argument("--output", required=True,
                        help="Output CSV file (eg, /tmp/tagged-resources.csv)")
    return parser.parse_args()

And then, we implement the main extract logic that uses the Resource Tagging API (see boto3.client(‘resourcegroupstaggingapi’) in the code below). Note that we fetch 50 resources at a time and write them to the CSV output file until no more resources are found.

def main():
    args = input_args()
    restag = boto3.client('resourcegroupstaggingapi')
    with open(args.output, 'w') as csvfile:
        writer = csv.DictWriter(csvfile, quoting=csv.QUOTE_ALL,
                                delimiter=',', dialect='excel', fieldnames=field_names)
        writer.writeheader()
        response = restag.get_resources(ResourcesPerPage=50)
        writeToCsv(writer, args, response['ResourceTagMappingList'])
        while 'PaginationToken' in response and response['PaginationToken']:
            token = response['PaginationToken']
            response = restag.get_resources(
                ResourcesPerPage=50, PaginationToken=token)
            writeToCsv(writer, args, response['ResourceTagMappingList'])

if __name__ == '__main__':
    main()

The extract procedure is pretty simple and illustrates well how to use the Resource Tagging API to customize the output. It will also use the default credentials in your account.

Here is how the extract process can be triggered for the QA account (assuming the python source file is named aws-tagged-resources-extractor.py and that there is a QA_AWS_ACCOUNT AWS profile defined in your ~/.aws/credentials file).

export AWS_PROFILE=QA_AWS_ACCOUNT
python aws-tagged-resources-extractor.py --output /tmp/qa-tagged-resources.csv

The extract procedure can be applied to other AWS accounts by updating the AWS_PROFILE environment variable accordingly.

The extract procedure can be applied to other AWS accounts by updating the AWS_PROFILE environment variable accordingly.

The ‘Upload to S3’ Process
Once file /tmp/qa-tagged-resources.csv is generated, it can be upload to an S3 bucket using the AWS CLI (or one could extend the extract sample code above to do so):

aws s3 cp /tmp/qa-tagged-resources.csv s3://[REPLACE-WITH-YOUR-S3-BUCKET]

The Query Process
Once the CSV files containing tagged resources for different AWS accounts are uploaded to S3, we can now use S3 Select to perform familiar SQL queries against these files. Another advantage of using S3 Select is that it reduces the amount of data transferred from S3 which is especially relevant in our case when accounts have a very large number of tagged resources.

We again use the boto3 and argparse libraries (Python 3). Required input parameters include the S3 bucket (–bucket) and the S3 key (–key). The SQL query parameter (–query) is optional and will return all results if not provided.

import boto3
import argparse

def input_args():
    parser = argparse.ArgumentParser()
    parser.add_argument("--bucket", required=True, help="SQL query to filter tagged resources output")
    parser.add_argument("--key", required=True, help="SQL query to filter tagged resources output")
    parser.add_argument("--query", default="select * from s3object", help="SQL query to filter tagged resources output")
    return parser.parse_args()

The main query logic is shown below. It uses the boto3.client(‘s3’) to initialize an s3 client that is later used to query the tagged resources CSV file in S3 via the select_object_content() function. This function takes the S3 bucket name, S3 key, and query as parameters. Check the [Boto3] (http://boto3.readthedocs.io/en/latest/reference/services/s3.html) API reference for details on this function and its inputs and outputs.

def main():
    args = input_args()
    s3 = boto3.client('s3')
    response = s3.select_object_content(
        Bucket=args.bucket,
        Key=args.key,
        ExpressionType='SQL',
        Expression=args.query,
        InputSerialization = {'CSV': {"FileHeaderInfo": "Use"}},
        OutputSerialization = {'CSV': {}},
    )

    for event in response['Payload']:
        if 'Records' in event:
            records = event['Records']['Payload'].decode('utf-8')
            print(records)
            
if __name__ == '__main__':
    main()

Here’s a few examples of how to trigger the query procedure against the CSV files stored in S3 (assuming the Python source file for the query procedure is called aws-tagged-resources-querier). We assume that the S3 bucket is located in a single account referenced by profile CENTRAL_AWS_ACCOUNT.

Return the resource ARNs of all route tables containing a tag named ‘aws:cloudformation:stack-name’ in the QA AWS account

export AWS_PROFILE= CENTRAL_AWS_ACCOUNT
python aws-tagged-resources-querier \
     --bucket [REPLACE-WITH-YOUR-S3-BUCKET] \
     --key qa-tagged-resources.csv \
     --query "select ResourceArn from s3object s \
              where s.ResourceArn like 'arn:aws:ec2%route-table%' \
                and s.TagKey='aws:cloudformation:stack-name'"

We invite readers to build more sophisticated SQL queries.

Summary
In this blog post, we introduced a simple yet efficient AWS architecture for extracting and querying tagged resources based on AWS cloud-native features such as the Resource Tagging API and S3 Select. We provided sample code that can help customers to customize and/or extend the architecture for their own purpose. By relying on AWS cloud-native features, customers can save time and reduce costs while still being able to do customizations.

The “extract” process discussed above is available in the AWS Serverless Repository under an application called aws-tag-explorer. Check it out!

Happy Resource Tagging!

About the Author

Marcilio Mendonca is a Sr. Consultant in the Global DevOps Team at AWS Professional Services. In the past years, he has been helping AWS customers to design, build and deploy best-in-class cloud-native AWS applications using VMs, containers and serverless architectures. Prior to joining AWS, Marcilio was a Software Development Engineer with Amazon. Marcilio also holds a PhD in Computer Science.

 

 

AWS Online Tech Talks – April & Early May 2018

Post Syndicated from Betsy Chernoff original https://aws.amazon.com/blogs/aws/aws-online-tech-talks-april-early-may-2018/

We have several upcoming tech talks in the month of April and early May. Come join us to learn about AWS services and solution offerings. We’ll have AWS experts online to help answer questions in real-time. Sign up now to learn more, we look forward to seeing you.

Note – All sessions are free and in Pacific Time.

April & early May — 2018 Schedule

Compute

April 30, 2018 | 01:00 PM – 01:45 PM PTBest Practices for Running Amazon EC2 Spot Instances with Amazon EMR (300) – Learn about the best practices for scaling big data workloads as well as process, store, and analyze big data securely and cost effectively with Amazon EMR and Amazon EC2 Spot Instances.

May 1, 2018 | 01:00 PM – 01:45 PM PTHow to Bring Microsoft Apps to AWS (300) – Learn more about how to save significant money by bringing your Microsoft workloads to AWS.

May 2, 2018 | 01:00 PM – 01:45 PM PTDeep Dive on Amazon EC2 Accelerated Computing (300) – Get a technical deep dive on how AWS’ GPU and FGPA-based compute services can help you to optimize and accelerate your ML/DL and HPC workloads in the cloud.

Containers

April 23, 2018 | 11:00 AM – 11:45 AM PTNew Features for Building Powerful Containerized Microservices on AWS (300) – Learn about how this new feature works and how you can start using it to build and run modern, containerized applications on AWS.

Databases

April 23, 2018 | 01:00 PM – 01:45 PM PTElastiCache: Deep Dive Best Practices and Usage Patterns (200) – Learn about Redis-compatible in-memory data store and cache with Amazon ElastiCache.

April 25, 2018 | 01:00 PM – 01:45 PM PTIntro to Open Source Databases on AWS (200) – Learn how to tap the benefits of open source databases on AWS without the administrative hassle.

DevOps

April 25, 2018 | 09:00 AM – 09:45 AM PTDebug your Container and Serverless Applications with AWS X-Ray in 5 Minutes (300) – Learn how AWS X-Ray makes debugging your Container and Serverless applications fun.

Enterprise & Hybrid

April 23, 2018 | 09:00 AM – 09:45 AM PTAn Overview of Best Practices of Large-Scale Migrations (300) – Learn about the tools and best practices on how to migrate to AWS at scale.

April 24, 2018 | 11:00 AM – 11:45 AM PTDeploy your Desktops and Apps on AWS (300) – Learn how to deploy your desktops and apps on AWS with Amazon WorkSpaces and Amazon AppStream 2.0

IoT

May 2, 2018 | 11:00 AM – 11:45 AM PTHow to Easily and Securely Connect Devices to AWS IoT (200) – Learn how to easily and securely connect devices to the cloud and reliably scale to billions of devices and trillions of messages with AWS IoT.

Machine Learning

April 24, 2018 | 09:00 AM – 09:45 AM PT Automate for Efficiency with Amazon Transcribe and Amazon Translate (200) – Learn how you can increase the efficiency and reach your operations with Amazon Translate and Amazon Transcribe.

April 26, 2018 | 09:00 AM – 09:45 AM PT Perform Machine Learning at the IoT Edge using AWS Greengrass and Amazon Sagemaker (200) – Learn more about developing machine learning applications for the IoT edge.

Mobile

April 30, 2018 | 11:00 AM – 11:45 AM PTOffline GraphQL Apps with AWS AppSync (300) – Come learn how to enable real-time and offline data in your applications with GraphQL using AWS AppSync.

Networking

May 2, 2018 | 09:00 AM – 09:45 AM PT Taking Serverless to the Edge (300) – Learn how to run your code closer to your end users in a serverless fashion. Also, David Von Lehman from Aerobatic will discuss how they used [email protected] to reduce latency and cloud costs for their customer’s websites.

Security, Identity & Compliance

April 30, 2018 | 09:00 AM – 09:45 AM PTAmazon GuardDuty – Let’s Attack My Account! (300) – Amazon GuardDuty Test Drive – Practical steps on generating test findings.

May 3, 2018 | 09:00 AM – 09:45 AM PTProtect Your Game Servers from DDoS Attacks (200) – Learn how to use the new AWS Shield Advanced for EC2 to protect your internet-facing game servers against network layer DDoS attacks and application layer attacks of all kinds.

Serverless

April 24, 2018 | 01:00 PM – 01:45 PM PTTips and Tricks for Building and Deploying Serverless Apps In Minutes (200) – Learn how to build and deploy apps in minutes.

Storage

May 1, 2018 | 11:00 AM – 11:45 AM PTBuilding Data Lakes That Cost Less and Deliver Results Faster (300) – Learn how Amazon S3 Select And Amazon Glacier Select increase application performance by up to 400% and reduce total cost of ownership by extending your data lake into cost-effective archive storage.

May 3, 2018 | 11:00 AM – 11:45 AM PTIntegrating On-Premises Vendors with AWS for Backup (300) – Learn how to work with AWS and technology partners to build backup & restore solutions for your on-premises, hybrid, and cloud native environments.

Amazon S3 Update: New Storage Class and General Availability of S3 Select

Post Syndicated from Jeff Barr original https://aws.amazon.com/blogs/aws/amazon-s3-update-new-storage-class-general-availability-of-s3-select/

I’ve got two big pieces of news for anyone who stores and retrieves data in Amazon Simple Storage Service (S3):

New S3 One Zone-IA Storage Class – This new storage class is 20% less expensive than the existing Standard-IA storage class. It is designed to be used to store data that does not need the extra level of protection provided by geographic redundancy.

General Availability of S3 Select – This unique retrieval option lets you retrieve subsets of data from S3 objects using simple SQL expressions, with the possibility for a 400% performance improvement in the process.

Let’s take a look at both!

S3 One Zone-IA (Infrequent Access) Storage Class
This new storage class stores data in a single AWS Availability Zone and is designed to provide eleven 9’s (99.99999999%) of data durability, just like the other S3 storage classes. Unlike those other classes, it is not designed to be resilient to the physical loss of an AZ due to major event such as an earthquake or a flood, and data could be lost in the unlikely event that an AZ is destroyed. S3 One Zone-IA storage gives you a lower cost option for secondary backups of on-premises data and for data that can be easily re-created. You can also use it as the target of S3 Cross-Region Replication from another AWS region.

You can specify the use of S3 One Zone-IA storage when you upload a new object to S3:

You can also make use of it as part of an S3 lifecycle rule:

You can set up a lifecycle rule that moves previous versions of an object to S3 One Zone-IA after 30 or more days:

And you can modify the storage class of an existing object:

You can also manage storage classes using the S3 API, CLI, and CloudFormation templates.

The S3 One Zone-IA storage class can be used in all public AWS regions. As I noted earlier, pricing is 20% lower than for the S3 Standard-IA storage class (see the S3 Pricing page for more info). There’s a 30 day minimum retention period, and a 128 KB minimum object size.

General Availability of S3 Select
Randall wrote a detailed introduction to S3 Select last year and showed you how you can use it to retrieve selected data from within S3 objects. During the preview we added support for server-side encryption and the ability to run queries from the S3 Console.

I used a CSV file of airport codes to exercise the new console functionality:

This file contains listings for over 9100 airports, so it makes for useful test data but it definitely does not test the limits of S3 Select in any way. I select the file, open the More menu, and choose Select from:

The console sets the file format and compression according to the file name and the encryption status. I set delimiter and click Show file preview to verify that my settings are correct. Then I click Next to proceed:

I type SQL expressions in the SQL editor and click Run SQL to issue the query:

Or:

I can also issue queries from the AWS SDKs. I initiate the select operation:

s3 = boto3.client('s3', region_name='us-west-2')

r = s3.select_object_content(
        Bucket='jbarr-us-west-2',
        Key='sample-data/airportCodes.csv',
        ExpressionType='SQL',
        Expression="select * from s3object s where s.\"Country (Name)\" like '%United States%'",
        InputSerialization = {'CSV': {"FileHeaderInfo": "Use"}},
        OutputSerialization = {'CSV': {}},
)

And then I process the stream of results:

for event in r['Payload']:
    if 'Records' in event:
        records = event['Records']['Payload'].decode('utf-8')
        print(records)
    elif 'Stats' in event:
        statsDetails = event['Stats']['Details']
        print("Stats details bytesScanned: ")
        print(statsDetails['BytesScanned'])
        print("Stats details bytesProcessed: ")
        print(statsDetails['BytesProcessed'])

S3 Select is available in all public regions and you can start using it today. Pricing is based on the amount of data scanned and the amount of data returned.

Jeff;

Glenn’s Take on re:Invent 2017 – Part 3

Post Syndicated from Glenn Gore original https://aws.amazon.com/blogs/architecture/glenns-take-on-reinvent-2017-part-3/

Glenn Gore here, Chief Architect for AWS. I was in Las Vegas last week — with 43K others — for re:Invent 2017. I checked in to the Architecture blog here and here with my take on what was interesting about some of the bigger announcements from a cloud-architecture perspective.

In the excitement of so many new services being launched, we sometimes overlook feature updates that, while perhaps not as exciting as Amazon DeepLens, have significant impact on how you architect and develop solutions on AWS.

Amazon DynamoDB is used by more than 100,000 customers around the world, handling over a trillion requests every day. From the start, DynamoDB has offered high availability by natively spanning multiple Availability Zones within an AWS Region. As more customers started building and deploying truly-global applications, there was a need to replicate a DynamoDB table to multiple AWS Regions, allowing for read/write operations to occur in any region where the table was replicated. This update is important for providing a globally-consistent view of information — as users may transition from one region to another — or for providing additional levels of availability, allowing for failover between AWS Regions without loss of information.

There are some interesting concurrency-design aspects you need to be aware of and ensure you can handle correctly. For example, we support the “last writer wins” reconciliation where eventual consistency is being used and an application updates the same item in different AWS Regions at the same time. If you require strongly-consistent read/writes then you must perform all of your read/writes in the same AWS Region. The details behind this can be found in the DynamoDB documentation. Providing a globally-distributed, replicated DynamoDB table simplifies many different use cases and allows for the logic of replication, which may have been pushed up into the application layers to be simplified back down into the data layer.

The other big update for DynamoDB is that you can now back up your DynamoDB table on demand with no impact to performance. One of the features I really like is that when you trigger a backup, it is available instantly, regardless of the size of the table. Behind the scenes, we use snapshots and change logs to ensure a consistent backup. While backup is instant, restoring the table could take some time depending on its size and ranges — from minutes to hours for very large tables.

This feature is super important for those of you who work in regulated industries that often have strict requirements around data retention and backups of data, which sometimes limited the use of DynamoDB or required complex workarounds to implement some sort of backup feature in the past. This often incurred significant, additional costs due to increased read transactions on their DynamoDB tables.

Amazon Simple Storage Service (Amazon S3) was our first-released AWS service over 11 years ago, and it proved the simplicity and scalability of true API-driven architectures in the cloud. Today, Amazon S3 stores trillions of objects, with transactional requests per second reaching into the millions! Dealing with data as objects opened up an incredibly diverse array of use cases ranging from libraries of static images, game binary downloads, and application log data, to massive data lakes used for big data analytics and business intelligence. With Amazon S3, when you accessed your data in an object, you effectively had to write/read the object as a whole or use the range feature to retrieve a part of the object — if possible — in your individual use case.

Now, with Amazon S3 Select, an SQL-like query language is used that can work with delimited text and JSON files, as well as work with GZIP compressed files. We don’t support encryption during the preview of Amazon S3 Select.

Amazon S3 Select provides two major benefits:

  • Faster access
  • Lower running costs

Serverless Lambda functions, where every millisecond matters when you are being charged, will benefit greatly from Amazon S3 Select as data retrieval and processing of your Lambda function will experience significant speedups and cost reductions. For example, we have seen 2x speed improvement and 80% cost reduction with the Serverless MapReduce code.

Other AWS services such as Amazon Athena, Amazon Redshift, and Amazon EMR will support Amazon S3 Select as well as partner offerings including Cloudera and Hortonworks. If you are using Amazon Glacier for longer-term data archival, you will be able to use Amazon Glacier Select to retrieve a subset of your content from within Amazon Glacier.

As the volume of data that can be stored within Amazon S3 and Amazon Glacier continues to scale on a daily basis, we will continue to innovate and develop improved and optimized services that will allow you to work with these magnificently-large data sets while reducing your costs (retrieval and processing). I believe this will also allow you to simplify the transformation and storage of incoming data into Amazon S3 in basic, semi-structured formats as a single copy vs. some of the duplication and reformatting of data sometimes required to do upfront optimizations for downstream processing. Amazon S3 Select largely removes the need for this upfront optimization and instead allows you to store data once and process it based on your individual Amazon S3 Select query per application or transaction need.

Thanks for reading!

Glenn contemplating why CSV format is still relevant in 2017 (Italy).