Tag Archives: Amazon Glacier

AWS Online Tech Talks – April & Early May 2018

Post Syndicated from Betsy Chernoff original https://aws.amazon.com/blogs/aws/aws-online-tech-talks-april-early-may-2018/

We have several upcoming tech talks in the month of April and early May. Come join us to learn about AWS services and solution offerings. We’ll have AWS experts online to help answer questions in real-time. Sign up now to learn more, we look forward to seeing you.

Note – All sessions are free and in Pacific Time.

April & early May — 2018 Schedule

Compute

April 30, 2018 | 01:00 PM – 01:45 PM PTBest Practices for Running Amazon EC2 Spot Instances with Amazon EMR (300) – Learn about the best practices for scaling big data workloads as well as process, store, and analyze big data securely and cost effectively with Amazon EMR and Amazon EC2 Spot Instances.

May 1, 2018 | 01:00 PM – 01:45 PM PTHow to Bring Microsoft Apps to AWS (300) – Learn more about how to save significant money by bringing your Microsoft workloads to AWS.

May 2, 2018 | 01:00 PM – 01:45 PM PTDeep Dive on Amazon EC2 Accelerated Computing (300) – Get a technical deep dive on how AWS’ GPU and FGPA-based compute services can help you to optimize and accelerate your ML/DL and HPC workloads in the cloud.

Containers

April 23, 2018 | 11:00 AM – 11:45 AM PTNew Features for Building Powerful Containerized Microservices on AWS (300) – Learn about how this new feature works and how you can start using it to build and run modern, containerized applications on AWS.

Databases

April 23, 2018 | 01:00 PM – 01:45 PM PTElastiCache: Deep Dive Best Practices and Usage Patterns (200) – Learn about Redis-compatible in-memory data store and cache with Amazon ElastiCache.

April 25, 2018 | 01:00 PM – 01:45 PM PTIntro to Open Source Databases on AWS (200) – Learn how to tap the benefits of open source databases on AWS without the administrative hassle.

DevOps

April 25, 2018 | 09:00 AM – 09:45 AM PTDebug your Container and Serverless Applications with AWS X-Ray in 5 Minutes (300) – Learn how AWS X-Ray makes debugging your Container and Serverless applications fun.

Enterprise & Hybrid

April 23, 2018 | 09:00 AM – 09:45 AM PTAn Overview of Best Practices of Large-Scale Migrations (300) – Learn about the tools and best practices on how to migrate to AWS at scale.

April 24, 2018 | 11:00 AM – 11:45 AM PTDeploy your Desktops and Apps on AWS (300) – Learn how to deploy your desktops and apps on AWS with Amazon WorkSpaces and Amazon AppStream 2.0

IoT

May 2, 2018 | 11:00 AM – 11:45 AM PTHow to Easily and Securely Connect Devices to AWS IoT (200) – Learn how to easily and securely connect devices to the cloud and reliably scale to billions of devices and trillions of messages with AWS IoT.

Machine Learning

April 24, 2018 | 09:00 AM – 09:45 AM PT Automate for Efficiency with Amazon Transcribe and Amazon Translate (200) – Learn how you can increase the efficiency and reach your operations with Amazon Translate and Amazon Transcribe.

April 26, 2018 | 09:00 AM – 09:45 AM PT Perform Machine Learning at the IoT Edge using AWS Greengrass and Amazon Sagemaker (200) – Learn more about developing machine learning applications for the IoT edge.

Mobile

April 30, 2018 | 11:00 AM – 11:45 AM PTOffline GraphQL Apps with AWS AppSync (300) – Come learn how to enable real-time and offline data in your applications with GraphQL using AWS AppSync.

Networking

May 2, 2018 | 09:00 AM – 09:45 AM PT Taking Serverless to the Edge (300) – Learn how to run your code closer to your end users in a serverless fashion. Also, David Von Lehman from Aerobatic will discuss how they used [email protected] to reduce latency and cloud costs for their customer’s websites.

Security, Identity & Compliance

April 30, 2018 | 09:00 AM – 09:45 AM PTAmazon GuardDuty – Let’s Attack My Account! (300) – Amazon GuardDuty Test Drive – Practical steps on generating test findings.

May 3, 2018 | 09:00 AM – 09:45 AM PTProtect Your Game Servers from DDoS Attacks (200) – Learn how to use the new AWS Shield Advanced for EC2 to protect your internet-facing game servers against network layer DDoS attacks and application layer attacks of all kinds.

Serverless

April 24, 2018 | 01:00 PM – 01:45 PM PTTips and Tricks for Building and Deploying Serverless Apps In Minutes (200) – Learn how to build and deploy apps in minutes.

Storage

May 1, 2018 | 11:00 AM – 11:45 AM PTBuilding Data Lakes That Cost Less and Deliver Results Faster (300) – Learn how Amazon S3 Select And Amazon Glacier Select increase application performance by up to 400% and reduce total cost of ownership by extending your data lake into cost-effective archive storage.

May 3, 2018 | 11:00 AM – 11:45 AM PTIntegrating On-Premises Vendors with AWS for Backup (300) – Learn how to work with AWS and technology partners to build backup & restore solutions for your on-premises, hybrid, and cloud native environments.

AWS Achieves Spain’s ENS High Certification Across 29 Services

Post Syndicated from Oliver Bell original https://aws.amazon.com/blogs/security/aws-achieves-spains-ens-high-certification-across-29-services/

AWS has achieved Spain’s Esquema Nacional de Seguridad (ENS) High certification across 29 services. To successfully achieve the ENS High Standard, BDO España conducted an independent audit and attested that AWS meets confidentiality, integrity, and availability standards. This provides the assurance needed by Spanish Public Sector organizations wanting to build secure applications and services on AWS.

The National Security Framework, regulated under Royal Decree 3/2010, was developed through close collaboration between ENAC (Entidad Nacional de Acreditación), the Ministry of Finance and Public Administration and the CCN (National Cryptologic Centre), and other administrative bodies.

The following AWS Services are ENS High accredited across our Dublin and Frankfurt Regions:

  • Amazon API Gateway
  • Amazon DynamoDB
  • Amazon Elastic Container Service
  • Amazon Elastic Block Store
  • Amazon Elastic Compute Cloud
  • Amazon Elastic File System
  • Amazon Elastic MapReduce
  • Amazon ElastiCache
  • Amazon Glacier
  • Amazon Redshift
  • Amazon Relational Database Service
  • Amazon Simple Queue Service
  • Amazon Simple Storage Service
  • Amazon Simple Workflow Service
  • Amazon Virtual Private Cloud
  • Amazon WorkSpaces
  • AWS CloudFormation
  • AWS CloudTrail
  • AWS Config
  • AWS Database Migration Service
  • AWS Direct Connect
  • AWS Directory Service
  • AWS Elastic Beanstalk
  • AWS Key Management Service
  • AWS Lambda
  • AWS Snowball
  • AWS Storage Gateway
  • Elastic Load Balancing
  • VM Import/Export

AWS Online Tech Talks – January 2018

Post Syndicated from Ana Visneski original https://aws.amazon.com/blogs/aws/aws-online-tech-talks-january-2018/

Happy New Year! Kick of 2018 right by expanding your AWS knowledge with a great batch of new Tech Talks. We’re covering some of the biggest launches from re:Invent including Amazon Neptune, Amazon Rekognition Video, AWS Fargate, AWS Cloud9, Amazon Kinesis Video Streams, AWS PrivateLink, AWS Single-Sign On and more!

January 2018– Schedule

Noted below are the upcoming scheduled live, online technical sessions being held during the month of January. Make sure to register ahead of time so you won’t miss out on these free talks conducted by AWS subject matter experts.

Webinars featured this month are:

Monday January 22

Analytics & Big Data
11:00 AM – 11:45 AM PT Analyze your Data Lake, Fast @ Any Scale  Lvl 300

Database
01:00 PM – 01:45 PM PT Deep Dive on Amazon Neptune Lvl 200

Tuesday, January 23

Artificial Intelligence
9:00 AM – 09:45 AM PT  How to get the most out of Amazon Rekognition Video, a deep learning based video analysis service Lvl 300

Containers

11:00 AM – 11:45 AM Introducing AWS Fargate Lvl 200

Serverless
01:00 PM – 02:00 PM PT Overview of Serverless Application Deployment Patterns Lvl 400

Wednesday, January 24

DevOps
09:00 AM – 09:45 AM PT Introducing AWS Cloud9  Lvl 200

Analytics & Big Data
11:00 AM – 11:45 AM PT Deep Dive: Amazon Kinesis Video Streams
Lvl 300
Database
01:00 PM – 01:45 PM PT Introducing Amazon Aurora with PostgreSQL Compatibility Lvl 200

Thursday, January 25

Artificial Intelligence
09:00 AM – 09:45 AM PT Introducing Amazon SageMaker Lvl 200

Mobile
11:00 AM – 11:45 AM PT Ionic and React Hybrid Web/Native Mobile Applications with Mobile Hub Lvl 200

IoT
01:00 PM – 01:45 PM PT Connected Product Development: Secure Cloud & Local Connectivity for Microcontroller-based Devices Lvl 200

Monday, January 29

Enterprise
11:00 AM – 11:45 AM PT Enterprise Solutions Best Practices 100 Achieving Business Value with AWS Lvl 100

Compute
01:00 PM – 01:45 PM PT Introduction to Amazon Lightsail Lvl 200

Tuesday, January 30

Security, Identity & Compliance
09:00 AM – 09:45 AM PT Introducing Managed Rules for AWS WAF Lvl 200

Storage
11:00 AM – 11:45 AM PT  Improving Backup & DR – AWS Storage Gateway Lvl 300

Compute
01:00 PM – 01:45 PM PT  Introducing the New Simplified Access Model for EC2 Spot Instances Lvl 200

Wednesday, January 31

Networking
09:00 AM – 09:45 AM PT  Deep Dive on AWS PrivateLink Lvl 300

Enterprise
11:00 AM – 11:45 AM PT Preparing Your Team for a Cloud Transformation Lvl 200

Compute
01:00 PM – 01:45 PM PT  The Nitro Project: Next-Generation EC2 Infrastructure Lvl 300

Thursday, February 1

Security, Identity & Compliance
09:00 AM – 09:45 AM PT  Deep Dive on AWS Single Sign-On Lvl 300

Storage
11:00 AM – 11:45 AM PT How to Build a Data Lake in Amazon S3 & Amazon Glacier Lvl 300

Now Open AWS EU (Paris) Region

Post Syndicated from Jeff Barr original https://aws.amazon.com/blogs/aws/now-open-aws-eu-paris-region/

Today we are launching our 18th AWS Region, our fourth in Europe. Located in the Paris area, AWS customers can use this Region to better serve customers in and around France.

The Details
The new EU (Paris) Region provides a broad suite of AWS services including Amazon API Gateway, Amazon Aurora, Amazon CloudFront, Amazon CloudWatch, CloudWatch Events, Amazon CloudWatch Logs, Amazon DynamoDB, Amazon Elastic Compute Cloud (EC2), EC2 Container Registry, Amazon ECS, Amazon Elastic Block Store (EBS), Amazon EMR, Amazon ElastiCache, Amazon Elasticsearch Service, Amazon Glacier, Amazon Kinesis Streams, Polly, Amazon Redshift, Amazon Relational Database Service (RDS), Amazon Route 53, Amazon Simple Notification Service (SNS), Amazon Simple Queue Service (SQS), Amazon Simple Storage Service (S3), Amazon Simple Workflow Service (SWF), Amazon Virtual Private Cloud, Auto Scaling, AWS Certificate Manager (ACM), AWS CloudFormation, AWS CloudTrail, AWS CodeDeploy, AWS Config, AWS Database Migration Service, AWS Direct Connect, AWS Elastic Beanstalk, AWS Identity and Access Management (IAM), AWS Key Management Service (KMS), AWS Lambda, AWS Marketplace, AWS OpsWorks Stacks, AWS Personal Health Dashboard, AWS Server Migration Service, AWS Service Catalog, AWS Shield Standard, AWS Snowball, AWS Snowball Edge, AWS Snowmobile, AWS Storage Gateway, AWS Support (including AWS Trusted Advisor), Elastic Load Balancing, and VM Import.

The Paris Region supports all sizes of C5, M5, R4, T2, D2, I3, and X1 instances.

There are also four edge locations for Amazon Route 53 and Amazon CloudFront: three in Paris and one in Marseille, all with AWS WAF and AWS Shield. Check out the AWS Global Infrastructure page to learn more about current and future AWS Regions.

The Paris Region will benefit from three AWS Direct Connect locations. Telehouse Voltaire is available today. AWS Direct Connect will also become available at Equinix Paris in early 2018, followed by Interxion Paris.

All AWS infrastructure regions around the world are designed, built, and regularly audited to meet the most rigorous compliance standards and to provide high levels of security for all AWS customers. These include ISO 27001, ISO 27017, ISO 27018, SOC 1 (Formerly SAS 70), SOC 2 and SOC 3 Security & Availability, PCI DSS Level 1, and many more. This means customers benefit from all the best practices of AWS policies, architecture, and operational processes built to satisfy the needs of even the most security sensitive customers.

AWS is certified under the EU-US Privacy Shield, and the AWS Data Processing Addendum (DPA) is GDPR-ready and available now to all AWS customers to help them prepare for May 25, 2018 when the GDPR becomes enforceable. The current AWS DPA, as well as the AWS GDPR DPA, allows customers to transfer personal data to countries outside the European Economic Area (EEA) in compliance with European Union (EU) data protection laws. AWS also adheres to the Cloud Infrastructure Service Providers in Europe (CISPE) Code of Conduct. The CISPE Code of Conduct helps customers ensure that AWS is using appropriate data protection standards to protect their data, consistent with the GDPR. In addition, AWS offers a wide range of services and features to help customers meet the requirements of the GDPR, including services for access controls, monitoring, logging, and encryption.

From Our Customers
Many AWS customers are preparing to use this new Region. Here’s a small sample:

Societe Generale, one of the largest banks in France and the world, has accelerated their digital transformation while working with AWS. They developed SG Research, an application that makes reports from Societe Generale’s analysts available to corporate customers in order to improve the decision-making process for investments. The new AWS Region will reduce latency between applications running in the cloud and in their French data centers.

SNCF is the national railway company of France. Their mobile app, powered by AWS, delivers real-time traffic information to 14 million riders. Extreme weather, traffic events, holidays, and engineering works can cause usage to peak at hundreds of thousands of users per second. They are planning to use machine learning and big data to add predictive features to the app.

Radio France, the French public radio broadcaster, offers seven national networks, and uses AWS to accelerate its innovation and stay competitive.

Les Restos du Coeur, a French charity that provides assistance to the needy, delivering food packages and participating in their social and economic integration back into French society. Les Restos du Coeur is using AWS for its CRM system to track the assistance given to each of their beneficiaries and the impact this is having on their lives.

AlloResto by JustEat (a leader in the French FoodTech industry), is using AWS to to scale during traffic peaks and to accelerate their innovation process.

AWS Consulting and Technology Partners
We are already working with a wide variety of consulting, technology, managed service, and Direct Connect partners in France. Here’s a partial list:

AWS Premier Consulting PartnersAccenture, Capgemini, Claranet, CloudReach, DXC, and Edifixio.

AWS Consulting PartnersABC Systemes, Atos International SAS, CoreExpert, Cycloid, Devoteam, LINKBYNET, Oxalide, Ozones, Scaleo Information Systems, and Sopra Steria.

AWS Technology PartnersAxway, Commerce Guys, MicroStrategy, Sage, Software AG, Splunk, Tibco, and Zerolight.

AWS in France
We have been investing in Europe, with a focus on France, for the last 11 years. We have also been developing documentation and training programs to help our customers to improve their skills and to accelerate their journey to the AWS Cloud.

As part of our commitment to AWS customers in France, we plan to train more than 25,000 people in the coming years, helping them develop highly sought after cloud skills. They will have access to AWS training resources in France via AWS Academy, AWSome days, AWS Educate, and webinars, all delivered in French by AWS Technical Trainers and AWS Certified Trainers.

Use it Today
The EU (Paris) Region is open for business now and you can start using it today!

Jeff;

 

Now Open – AWS China (Ningxia) Region

Post Syndicated from Jeff Barr original https://aws.amazon.com/blogs/aws/now-open-aws-china-ningxia-region/

Today we launched our 17th Region globally, and the second in China. The AWS China (Ningxia) Region, operated by Ningxia Western Cloud Data Technology Co. Ltd. (NWCD), is generally available now and provides customers another option to run applications and store data on AWS in China.

The Details
At launch, the new China (Ningxia) Region, operated by NWCD, supports Auto Scaling, AWS Config, AWS CloudFormation, AWS CloudTrail, Amazon CloudWatch, CloudWatch Events, Amazon CloudWatch Logs, AWS CodeDeploy, AWS Direct Connect, Amazon DynamoDB, Amazon Elastic Compute Cloud (EC2), Amazon Elastic Block Store (EBS), Amazon EC2 Systems Manager, AWS Elastic Beanstalk, Amazon ElastiCache, Amazon Elasticsearch Service, Elastic Load Balancing, Amazon EMR, Amazon Glacier, AWS Identity and Access Management (IAM), Amazon Kinesis Streams, Amazon Redshift, Amazon Relational Database Service (RDS), Amazon Simple Storage Service (S3), Amazon Simple Notification Service (SNS), Amazon Simple Queue Service (SQS), AWS Support API, AWS Trusted Advisor, Amazon Simple Workflow Service (SWF), Amazon Virtual Private Cloud, and VM Import. Visit the AWS China Products page for additional information on these services.

The Region supports all sizes of C4, D2, M4, T2, R4, I3, and X1 instances.

Check out the AWS Global Infrastructure page to learn more about current and future AWS Regions.

Operating Partner
To comply with China’s legal and regulatory requirements, AWS has formed a strategic technology collaboration with NWCD to operate and provide services from the AWS China (Ningxia) Region. Founded in 2015, NWCD is a licensed datacenter and cloud services provider, based in Ningxia, China. NWCD joins Sinnet, the operator of the AWS China China (Beijing) Region, as an AWS operating partner in China. Through these relationships, AWS provides its industry-leading technology, guidance, and expertise to NWCD and Sinnet, while NWCD and Sinnet operate and provide AWS cloud services to local customers. While the cloud services offered in both AWS China Regions are the same as those available in other AWS Regions, the AWS China Regions are different in that they are isolated from all other AWS Regions and operated by AWS’s Chinese partners separately from all other AWS Regions. Customers using the AWS China Regions enter into customer agreements with Sinnet and NWCD, rather than with AWS.

Use it Today
The AWS China (Ningxia) Region, operated by NWCD, is open for business, and you can start using it now! Starting today, Chinese developers, startups, and enterprises, as well as government, education, and non-profit organizations, can leverage AWS to run their applications and store their data in the new AWS China (Ningxia) Region, operated by NWCD. Customers already using the AWS China (Beijing) Region, operated by Sinnet, can select the AWS China (Ningxia) Region directly from the AWS Management Console, while new customers can request an account at www.amazonaws.cn to begin using both AWS China Regions.

Jeff;

 

 

Glenn’s Take on re:Invent 2017 – Part 3

Post Syndicated from Glenn Gore original https://aws.amazon.com/blogs/architecture/glenns-take-on-reinvent-2017-part-3/

Glenn Gore here, Chief Architect for AWS. I was in Las Vegas last week — with 43K others — for re:Invent 2017. I checked in to the Architecture blog here and here with my take on what was interesting about some of the bigger announcements from a cloud-architecture perspective.

In the excitement of so many new services being launched, we sometimes overlook feature updates that, while perhaps not as exciting as Amazon DeepLens, have significant impact on how you architect and develop solutions on AWS.

Amazon DynamoDB is used by more than 100,000 customers around the world, handling over a trillion requests every day. From the start, DynamoDB has offered high availability by natively spanning multiple Availability Zones within an AWS Region. As more customers started building and deploying truly-global applications, there was a need to replicate a DynamoDB table to multiple AWS Regions, allowing for read/write operations to occur in any region where the table was replicated. This update is important for providing a globally-consistent view of information — as users may transition from one region to another — or for providing additional levels of availability, allowing for failover between AWS Regions without loss of information.

There are some interesting concurrency-design aspects you need to be aware of and ensure you can handle correctly. For example, we support the “last writer wins” reconciliation where eventual consistency is being used and an application updates the same item in different AWS Regions at the same time. If you require strongly-consistent read/writes then you must perform all of your read/writes in the same AWS Region. The details behind this can be found in the DynamoDB documentation. Providing a globally-distributed, replicated DynamoDB table simplifies many different use cases and allows for the logic of replication, which may have been pushed up into the application layers to be simplified back down into the data layer.

The other big update for DynamoDB is that you can now back up your DynamoDB table on demand with no impact to performance. One of the features I really like is that when you trigger a backup, it is available instantly, regardless of the size of the table. Behind the scenes, we use snapshots and change logs to ensure a consistent backup. While backup is instant, restoring the table could take some time depending on its size and ranges — from minutes to hours for very large tables.

This feature is super important for those of you who work in regulated industries that often have strict requirements around data retention and backups of data, which sometimes limited the use of DynamoDB or required complex workarounds to implement some sort of backup feature in the past. This often incurred significant, additional costs due to increased read transactions on their DynamoDB tables.

Amazon Simple Storage Service (Amazon S3) was our first-released AWS service over 11 years ago, and it proved the simplicity and scalability of true API-driven architectures in the cloud. Today, Amazon S3 stores trillions of objects, with transactional requests per second reaching into the millions! Dealing with data as objects opened up an incredibly diverse array of use cases ranging from libraries of static images, game binary downloads, and application log data, to massive data lakes used for big data analytics and business intelligence. With Amazon S3, when you accessed your data in an object, you effectively had to write/read the object as a whole or use the range feature to retrieve a part of the object — if possible — in your individual use case.

Now, with Amazon S3 Select, an SQL-like query language is used that can work with delimited text and JSON files, as well as work with GZIP compressed files. We don’t support encryption during the preview of Amazon S3 Select.

Amazon S3 Select provides two major benefits:

  • Faster access
  • Lower running costs

Serverless Lambda functions, where every millisecond matters when you are being charged, will benefit greatly from Amazon S3 Select as data retrieval and processing of your Lambda function will experience significant speedups and cost reductions. For example, we have seen 2x speed improvement and 80% cost reduction with the Serverless MapReduce code.

Other AWS services such as Amazon Athena, Amazon Redshift, and Amazon EMR will support Amazon S3 Select as well as partner offerings including Cloudera and Hortonworks. If you are using Amazon Glacier for longer-term data archival, you will be able to use Amazon Glacier Select to retrieve a subset of your content from within Amazon Glacier.

As the volume of data that can be stored within Amazon S3 and Amazon Glacier continues to scale on a daily basis, we will continue to innovate and develop improved and optimized services that will allow you to work with these magnificently-large data sets while reducing your costs (retrieval and processing). I believe this will also allow you to simplify the transformation and storage of incoming data into Amazon S3 in basic, semi-structured formats as a single copy vs. some of the duplication and reformatting of data sometimes required to do upfront optimizations for downstream processing. Amazon S3 Select largely removes the need for this upfront optimization and instead allows you to store data once and process it based on your individual Amazon S3 Select query per application or transaction need.

Thanks for reading!

Glenn contemplating why CSV format is still relevant in 2017 (Italy).

AWS Achieves FedRAMP JAB Moderate Provisional Authorization for 20 Services in the AWS US East/West Region

Post Syndicated from Chris Gile original https://aws.amazon.com/blogs/security/aws-achieves-fedramp-jab-moderate-authorization-for-20-services-in-us-eastwest/

The AWS US East/West Region has received a Provisional Authority to Operate (P-ATO) from the Joint Authorization Board (JAB) at the Federal Risk and Authorization Management Program (FedRAMP) Moderate baseline.

Though AWS has maintained an AWS US East/West Region Agency-ATO since early 2013, this announcement represents AWS’s carefully deliberated move to the JAB for the centralized maintenance of our P-ATO for 10 services already authorized. This also includes the addition of 10 new services to our FedRAMP program (see the complete list of services below). This doubles the number of FedRAMP Moderate services available to our customers to enable increased use of the cloud and support modernized IT missions. Our public sector customers now can leverage this FedRAMP P-ATO as a baseline for their own authorizations and look to the JAB for centralized Continuous Monitoring reporting and updates. In a significant enhancement for our partners that build their solutions on the AWS US East/West Region, they can now achieve FedRAMP JAB P-ATOs of their own for their Platform as a Service (PaaS) and Software as a Service (SaaS) offerings.

In line with FedRAMP security requirements, our independent FedRAMP assessment was completed in partnership with a FedRAMP accredited Third Party Assessment Organization (3PAO) on our technical, management, and operational security controls to validate that they meet or exceed FedRAMP’s Moderate baseline requirements. Effective immediately, you can begin leveraging this P-ATO for the following 20 services in the AWS US East/West Region:

  • Amazon Aurora (MySQL)*
  • Amazon CloudWatch Logs*
  • Amazon DynamoDB
  • Amazon Elastic Block Store
  • Amazon Elastic Compute Cloud
  • Amazon EMR*
  • Amazon Glacier*
  • Amazon Kinesis Streams*
  • Amazon RDS (MySQL, Oracle, Postgres*)
  • Amazon Redshift
  • Amazon Simple Notification Service*
  • Amazon Simple Queue Service*
  • Amazon Simple Storage Service
  • Amazon Simple Workflow Service*
  • Amazon Virtual Private Cloud
  • AWS CloudFormation*
  • AWS CloudTrail*
  • AWS Identity and Access Management
  • AWS Key Management Service
  • Elastic Load Balancing

* Services with first-time FedRAMP Moderate authorizations

We continue to work with the FedRAMP Project Management Office (PMO), other regulatory and compliance bodies, and our customers and partners to ensure that we are raising the bar on our customers’ security and compliance needs.

To learn more about how AWS helps customers meet their security and compliance requirements, see the AWS Compliance website. To learn about what other public sector customers are doing on AWS, see our Government, Education, and Nonprofits Case Studies and Customer Success Stories. To review the public posting of our FedRAMP authorizations, see the FedRAMP Marketplace.

– Chris Gile, Senior Manager, AWS Public Sector Risk and Compliance

Event-Driven Computing with Amazon SNS and AWS Compute, Storage, Database, and Networking Services

Post Syndicated from Christie Gifrin original https://aws.amazon.com/blogs/compute/event-driven-computing-with-amazon-sns-compute-storage-database-and-networking-services/

Contributed by Otavio Ferreira, Manager, Software Development, AWS Messaging

Like other developers around the world, you may be tackling increasingly complex business problems. A key success factor, in that case, is the ability to break down a large project scope into smaller, more manageable components. A service-oriented architecture guides you toward designing systems as a collection of loosely coupled, independently scaled, and highly reusable services. Microservices take this even further. To improve performance and scalability, they promote fine-grained interfaces and lightweight protocols.

However, the communication among isolated microservices can be challenging. Services are often deployed onto independent servers and don’t share any compute or storage resources. Also, you should avoid hard dependencies among microservices, to preserve maintainability and reusability.

If you apply the pub/sub design pattern, you can effortlessly decouple and independently scale out your microservices and serverless architectures. A pub/sub messaging service, such as Amazon SNS, promotes event-driven computing that statically decouples event publishers from subscribers, while dynamically allowing for the exchange of messages between them. An event-driven architecture also introduces the responsiveness needed to deal with complex problems, which are often unpredictable and asynchronous.

What is event-driven computing?

Given the context of microservices, event-driven computing is a model in which subscriber services automatically perform work in response to events triggered by publisher services. This paradigm can be applied to automate workflows while decoupling the services that collectively and independently work to fulfil these workflows. Amazon SNS is an event-driven computing hub, in the AWS Cloud, that has native integration with several AWS publisher and subscriber services.

Which AWS services publish events to SNS natively?

Several AWS services have been integrated as SNS publishers and, therefore, can natively trigger event-driven computing for a variety of use cases. In this post, I specifically cover AWS compute, storage, database, and networking services, as depicted below.

Compute services

  • Auto Scaling: Helps you ensure that you have the correct number of Amazon EC2 instances available to handle the load for your application. You can configure Auto Scaling lifecycle hooks to trigger events, as Auto Scaling resizes your EC2 cluster.As an example, you may want to warm up the local cache store on newly launched EC2 instances, and also download log files from other EC2 instances that are about to be terminated. To make this happen, set an SNS topic as your Auto Scaling group’s notification target, then subscribe two Lambda functions to this SNS topic. The first function is responsible for handling scale-out events (to warm up cache upon provisioning), whereas the second is in charge of handling scale-in events (to download logs upon termination).

  • AWS Elastic Beanstalk: An easy-to-use service for deploying and scaling web applications and web services developed in a number of programming languages. You can configure event notifications for your Elastic Beanstalk environment so that notable events can be automatically published to an SNS topic, then pushed to topic subscribers.As an example, you may use this event-driven architecture to coordinate your continuous integration pipeline (such as Jenkins CI). That way, whenever an environment is created, Elastic Beanstalk publishes this event to an SNS topic, which triggers a subscribing Lambda function, which then kicks off a CI job against your newly created Elastic Beanstalk environment.

  • Elastic Load Balancing: Automatically distributes incoming application traffic across Amazon EC2 instances, containers, or other resources identified by IP addresses.You can configure CloudWatch alarms on Elastic Load Balancing metrics, to automate the handling of events derived from Classic Load Balancers. As an example, you may leverage this event-driven design to automate latency profiling in an Amazon ECS cluster behind a Classic Load Balancer. In this example, whenever your ECS cluster breaches your load balancer latency threshold, an event is posted by CloudWatch to an SNS topic, which then triggers a subscribing Lambda function. This function runs a task on your ECS cluster to trigger a latency profiling tool, hosted on the cluster itself. This can enhance your latency troubleshooting exercise by making it timely.

Storage services

  • Amazon S3: Object storage built to store and retrieve any amount of data.You can enable S3 event notifications, and automatically get them posted to SNS topics, to automate a variety of workflows. For instance, imagine that you have an S3 bucket to store incoming resumes from candidates, and a fleet of EC2 instances to encode these resumes from their original format (such as Word or text) into a portable format (such as PDF).In this example, whenever new files are uploaded to your input bucket, S3 publishes these events to an SNS topic, which in turn pushes these messages into subscribing SQS queues. Then, encoding workers running on EC2 instances poll these messages from the SQS queues; retrieve the original files from the input S3 bucket; encode them into PDF; and finally store them in an output S3 bucket.

  • Amazon EFS: Provides simple and scalable file storage, for use with Amazon EC2 instances, in the AWS Cloud.You can configure CloudWatch alarms on EFS metrics, to automate the management of your EFS systems. For example, consider a highly parallelized genomics analysis application that runs against an EFS system. By default, this file system is instantiated on the “General Purpose” performance mode. Although this performance mode allows for lower latency, it might eventually impose a scaling bottleneck. Therefore, you may leverage an event-driven design to handle it automatically.Basically, as soon as the EFS metric “Percent I/O Limit” breaches 95%, CloudWatch could post this event to an SNS topic, which in turn would push this message into a subscribing Lambda function. This function automatically creates a new file system, this time on the “Max I/O” performance mode, then switches the genomics analysis application to this new file system. As a result, your application starts experiencing higher I/O throughput rates.

  • Amazon Glacier: A secure, durable, and low-cost cloud storage service for data archiving and long-term backup.You can set a notification configuration on an Amazon Glacier vault so that when a job completes, a message is published to an SNS topic. Retrieving an archive from Amazon Glacier is a two-step asynchronous operation, in which you first initiate a job, and then download the output after the job completes. Therefore, SNS helps you eliminate polling your Amazon Glacier vault to check whether your job has been completed, or not. As usual, you may subscribe SQS queues, Lambda functions, and HTTP endpoints to your SNS topic, to be notified when your Amazon Glacier job is done.

  • AWS Snowball: A petabyte-scale data transport solution that uses secure appliances to transfer large amounts of data.You can leverage Snowball notifications to automate workflows related to importing data into and exporting data from AWS. More specifically, whenever your Snowball job status changes, Snowball can publish this event to an SNS topic, which in turn can broadcast the event to all its subscribers.As an example, imagine a Geographic Information System (GIS) that distributes high-resolution satellite images to users via Web browser. In this example, the GIS vendor could capture up to 80 TB of satellite images; create a Snowball job to import these files from an on-premises system to an S3 bucket; and provide an SNS topic ARN to be notified upon job status changes in Snowball. After Snowball changes the job status from “Importing” to “Completed”, Snowball publishes this event to the specified SNS topic, which delivers this message to a subscribing Lambda function, which finally creates a CloudFront web distribution for the target S3 bucket, to serve the images to end users.

Database services

  • Amazon RDS: Makes it easy to set up, operate, and scale a relational database in the cloud.RDS leverages SNS to broadcast notifications when RDS events occur. As usual, these notifications can be delivered via any protocol supported by SNS, including SQS queues, Lambda functions, and HTTP endpoints.As an example, imagine that you own a social network website that has experienced organic growth, and needs to scale its compute and database resources on demand. In this case, you could provide an SNS topic to listen to RDS DB instance events. When the “Low Storage” event is published to the topic, SNS pushes this event to a subscribing Lambda function, which in turn leverages the RDS API to increase the storage capacity allocated to your DB instance. The provisioning itself takes place within the specified DB maintenance window.

  • Amazon ElastiCache: A web service that makes it easy to deploy, operate, and scale an in-memory data store or cache in the cloud.ElastiCache can publish messages using Amazon SNS when significant events happen on your cache cluster. This feature can be used to refresh the list of servers on client machines connected to individual cache node endpoints of a cache cluster. For instance, an ecommerce website fetches product details from a cache cluster, with the goal of offloading a relational database and speeding up page load times. Ideally, you want to make sure that each web server always has an updated list of cache servers to which to connect.To automate this node discovery process, you can get your ElastiCache cluster to publish events to an SNS topic. Thus, when ElastiCache event “AddCacheNodeComplete” is published, your topic then pushes this event to all subscribing HTTP endpoints that serve your ecommerce website, so that these HTTP servers can update their list of cache nodes.

  • Amazon Redshift: A fully managed data warehouse that makes it simple to analyze data using standard SQL and BI (Business Intelligence) tools.Amazon Redshift uses SNS to broadcast relevant events so that data warehouse workflows can be automated. As an example, imagine a news website that sends clickstream data to a Kinesis Firehose stream, which then loads the data into Amazon Redshift, so that popular news and reading preferences might be surfaced on a BI tool. At some point though, this Amazon Redshift cluster might need to be resized, and the cluster enters a ready-only mode. Hence, this Amazon Redshift event is published to an SNS topic, which delivers this event to a subscribing Lambda function, which finally deletes the corresponding Kinesis Firehose delivery stream, so that clickstream data uploads can be put on hold.At a later point, after Amazon Redshift publishes the event that the maintenance window has been closed, SNS notifies a subscribing Lambda function accordingly, so that this function can re-create the Kinesis Firehose delivery stream, and resume clickstream data uploads to Amazon Redshift.

  • AWS DMS: Helps you migrate databases to AWS quickly and securely. The source database remains fully operational during the migration, minimizing downtime to applications that rely on the database.DMS also uses SNS to provide notifications when DMS events occur, which can automate database migration workflows. As an example, you might create data replication tasks to migrate an on-premises MS SQL database, composed of multiple tables, to MySQL. Thus, if replication tasks fail due to incompatible data encoding in the source tables, these events can be published to an SNS topic, which can push these messages into a subscribing SQS queue. Then, encoders running on EC2 can poll these messages from the SQS queue, encode the source tables into a compatible character set, and restart the corresponding replication tasks in DMS. This is an event-driven approach to a self-healing database migration process.

Networking services

  • Amazon Route 53: A highly available and scalable cloud-based DNS (Domain Name System). Route 53 health checks monitor the health and performance of your web applications, web servers, and other resources.You can set CloudWatch alarms and get automated Amazon SNS notifications when the status of your Route 53 health check changes. As an example, imagine an online payment gateway that reports the health of its platform to merchants worldwide, via a status page. This page is hosted on EC2 and fetches platform health data from DynamoDB. In this case, you could configure a CloudWatch alarm for your Route 53 health check, so that when the alarm threshold is breached, and the payment gateway is no longer considered healthy, then CloudWatch publishes this event to an SNS topic, which pushes this message to a subscribing Lambda function, which finally updates the DynamoDB table that populates the status page. This event-driven approach avoids any kind of manual update to the status page visited by merchants.

  • AWS Direct Connect (AWS DX): Makes it easy to establish a dedicated network connection from your premises to AWS, which can reduce your network costs, increase bandwidth throughput, and provide a more consistent network experience than Internet-based connections.You can monitor physical DX connections using CloudWatch alarms, and send SNS messages when alarms change their status. As an example, when a DX connection state shifts to 0 (zero), indicating that the connection is down, this event can be published to an SNS topic, which can fan out this message to impacted servers through HTTP endpoints, so that they might reroute their traffic through a different connection instead. This is an event-driven approach to connectivity resilience.

More event-driven computing on AWS

In addition to SNS, event-driven computing is also addressed by Amazon CloudWatch Events, which delivers a near real-time stream of system events that describe changes in AWS resources. With CloudWatch Events, you can route each event type to one or more targets, including:

Many AWS services publish events to CloudWatch. As an example, you can get CloudWatch Events to capture events on your ETL (Extract, Transform, Load) jobs running on AWS Glue and push failed ones to an SQS queue, so that you can retry them later.

Conclusion

Amazon SNS is a pub/sub messaging service that can be used as an event-driven computing hub to AWS customers worldwide. By capturing events natively triggered by AWS services, such as EC2, S3 and RDS, you can automate and optimize all kinds of workflows, namely scaling, testing, encoding, profiling, broadcasting, discovery, failover, and much more. Business use cases presented in this post ranged from recruiting websites, to scientific research, geographic systems, social networks, retail websites, and news portals.

Start now by visiting Amazon SNS in the AWS Management Console, or by trying the AWS 10-Minute Tutorial, Send Fan-out Event Notifications with Amazon SNS and Amazon SQS.

 

Hot Startups on AWS – October 2017

Post Syndicated from Tina Barr original https://aws.amazon.com/blogs/aws/hot-startups-on-aws-october-2017/

In 2015, the Centers for Medicare and Medicaid Services (CMS) reported that healthcare spending made up 17.8% of the U.S. GDP – that’s almost $3.2 trillion or $9,990 per person. By 2025, the CMS estimates this number will increase to nearly 20%. As cloud technology evolves in the healthcare and life science industries, we are seeing how companies of all sizes are using AWS to provide powerful and innovative solutions to customers across the globe. This month we are excited to feature the following startups:

  • ClearCare – helping home care agencies operate efficiently and grow their business.
  • DNAnexus – providing a cloud-based global network for sharing and managing genomic data.

ClearCare (San Francisco, CA)

ClearCare envisions a future where home care is the only choice for aging in place. Home care agencies play a critical role in the economy and their communities by significantly lowering the overall cost of care, reducing the number of hospital admissions, and bending the cost curve of aging. Patients receiving home care typically have multiple chronic conditions and functional limitations, driving over $190 billion in healthcare spending in the U.S. each year. To offset these costs, health insurance payers are developing in-home care management programs for patients. ClearCare’s goal is to help home care agencies leverage technology to improve costs, outcomes, and quality of life for the aging population. The company’s powerful software platform is specifically designed for use by non-medical, in-home care agencies to manage their businesses.

Founder and CEO Geoff Nudd created ClearCare because of his own grandmother’s need for care. Keeping family members and caregivers up to date on a loved one’s well being can be difficult, so Geoff created what is now ClearCare’s Family Room, which enables caregivers and agency staff to check schedules and receive real-time updates about what’s happening in the home. Since then, agencies have provided feedback on others areas of their businesses that could be streamlined. ClearCare has now built over 20 modules to help home care agencies optimize operations with services including a telephony service, billing and payroll, and more. ClearCare now serves over 4,000 home care agencies, representing 500,000 caregivers and 400,000 seniors.

Using AWS, ClearCare is able to spin up reliable infrastructure for proofs of concept and iterate on those systems to quickly get value to market. The company runs many AWS services including Amazon Elasticsearch Service, Amazon RDS, and Amazon CloudFront. Amazon EMR and Amazon Athena have enabled ClearCare to build a Hadoop-based ETL and data warehousing system that processes terabytes of data each day. By utilizing these managed services, ClearCare has been able to go from concept to customer delivery in less than three months.

To learn more about ClearCare, check out their website.

DNAnexus (Mountain View, CA)

DNAnexus is accelerating the application of genomic data in precision medicine by providing a cloud-based platform for sharing and managing genomic and biomedical data and analysis tools. The company was founded in 2009 by Stanford graduate student Andreas Sundquist and two Stanford professors Arend Sidow and Serafim Batzoglou, to address the need for scaling secondary analysis of next-generation sequencing (NGS) data in the cloud. The founders quickly learned that users needed a flexible solution to build complex analysis workflows and tools that enable them to share and manage large volumes of data. DNAnexus is optimized to address the challenges of security, scalability, and collaboration for organizations that are pursuing genomic-based approaches to health, both in clinics and research labs. DNAnexus has a global customer base – spanning North America, Europe, Asia-Pacific, South America, and Africa – that runs a million jobs each month and is doubling their storage year-over-year. The company currently stores more than 10 petabytes of biomedical and genomic data. That is equivalent to approximately 100,000 genomes, or in simpler terms, over 50 billion Facebook photos!

DNAnexus is working with its customers to help expand their translational informatics research, which includes expanding into clinical trial genomic services. This will help companies developing different medicines to better stratify clinical trial populations and develop companion tests that enable the right patient to get the right medicine. In collaboration with Janssen Human Microbiome Institute, DNAnexus is also launching Mosaic – a community platform for microbiome research.

AWS provides DNAnexus and its customers the flexibility to grow and scale research programs. Building the technology infrastructure required to manage these projects in-house is expensive and time-consuming. DNAnexus removes that barrier for labs of any size by using AWS scalable cloud resources. The company deploys its customers’ genomic pipelines on Amazon EC2, using Amazon S3 for high-performance, high-durability storage, and Amazon Glacier for low-cost data archiving. DNAnexus is also an AWS Life Sciences Competency Partner.

Learn more about DNAnexus here.

-Tina

AWS HIPAA Eligibility Update (October 2017) – Sixteen Additional Services

Post Syndicated from Jeff Barr original https://aws.amazon.com/blogs/aws/aws-hipaa-eligibility-post-update-october-2017-sixteen-additional-services/

Our Health Customer Stories page lists just a few of the many customers that are building and running healthcare and life sciences applications that run on AWS. Customers like Verge Health, Care Cloud, and Orion Health trust AWS with Protected Health Information (PHI) and Personally Identifying Information (PII) as part of their efforts to comply with HIPAA and HITECH.

Sixteen More Services
In my last HIPAA Eligibility Update I shared the news that we added eight additional services to our list of HIPAA eligible services. Today I am happy to let you know that we have added another sixteen services to the list, bringing the total up to 46. Here are the newest additions, along with some short descriptions and links to some of my blog posts to jog your memory:

Amazon Aurora with PostgreSQL Compatibility – This brand-new addition to Amazon Aurora allows you to encrypt your relational databases using keys that you create and manage through AWS Key Management Service (KMS). When you enable encryption for an Amazon Aurora database, the underlying storage is encrypted, as are automated backups, read replicas, and snapshots. Read New – Encryption at Rest for Amazon Aurora to learn more.

Amazon CloudWatch Logs – You can use the logs to monitor and troubleshoot your systems and applications. You can monitor your existing system, application, and custom log files in near real-time, watching for specific phrases, values, or patterns. Log data can be stored durably and at low cost, for as long as needed. To learn more, read Store and Monitor OS & Application Log Files with Amazon CloudWatch and Improvements to CloudWatch Logs and Dashboards.

Amazon Connect – This self-service, cloud-based contact center makes it easy for you to deliver better customer service at a lower cost. You can use the visual designer to set up your contact flows, manage agents, and track performance, all without specialized skills. Read Amazon Connect – Customer Contact Center in the Cloud and New – Amazon Connect and Amazon Lex Integration to learn more.

Amazon ElastiCache for Redis – This service lets you deploy, operate, and scale an in-memory data store or cache that you can use to improve the performance of your applications. Each ElastiCache for Redis cluster publishes key performance metrics to Amazon CloudWatch. To learn more, read Caching in the Cloud with Amazon ElastiCache and Amazon ElastiCache – Now With a Dash of Redis.

Amazon Kinesis Streams – This service allows you to build applications that process or analyze streaming data such as website clickstreams, financial transactions, social media feeds, and location-tracking events. To learn more, read Amazon Kinesis – Real-Time Processing of Streaming Big Data and New: Server-Side Encryption for Amazon Kinesis Streams.

Amazon RDS for MariaDB – This service lets you set up scalable, managed MariaDB instances in minutes, and offers high performance, high availability, and a simplified security model that makes it easy for you to encrypt data at rest and in transit. Read Amazon RDS Update – MariaDB is Now Available to learn more.

Amazon RDS SQL Server – This service lets you set up scalable, managed Microsoft SQL Server instances in minutes, and also offers high performance, high availability, and a simplified security model. To learn more, read Amazon RDS for SQL Server and .NET support for AWS Elastic Beanstalk and Amazon RDS for Microsoft SQL Server – Transparent Data Encryption (TDE) to learn more.

Amazon Route 53 – This is a highly available Domain Name Server. It translates names like www.example.com into IP addresses. To learn more, read Moving Ahead with Amazon Route 53.

AWS Batch – This service lets you run large-scale batch computing jobs on AWS. You don’t need to install or maintain specialized batch software or build your own server clusters. Read AWS Batch – Run Batch Computing Jobs on AWS to learn more.

AWS CloudHSM – A cloud-based Hardware Security Module (HSM) for key storage and management at cloud scale. Designed for sensitive workloads, CloudHSM lets you manage your own keys using FIPS 140-2 Level 3 validated HSMs. To learn more, read AWS CloudHSM – Secure Key Storage and Cryptographic Operations and AWS CloudHSM Update – Cost Effective Hardware Key Management at Cloud Scale for Sensitive & Regulated Workloads.

AWS Key Management Service – This service makes it easy for you to create and control the encryption keys used to encrypt your data. It uses HSMs to protect your keys, and is integrated with AWS CloudTrail in order to provide you with a log of all key usage. Read New AWS Key Management Service (KMS) to learn more.

AWS Lambda – This service lets you run event-driven application or backend code without thinking about or managing servers. To learn more, read AWS Lambda – Run Code in the Cloud, AWS Lambda – A Look Back at 2016, and AWS Lambda – In Full Production with New Features for Mobile Devs.

[email protected] – You can use this new feature of AWS Lambda to run Node.js functions across the global network of AWS locations without having to provision or manager servers, in order to deliver rich, personalized content to your users with low latency. Read [email protected] – Intelligent Processing of HTTP Requests at the Edge to learn more.

AWS Snowball Edge – This is a data transfer device with 100 terabytes of on-board storage as well as compute capabilities. You can use it to move large amounts of data into or out of AWS, as a temporary storage tier, or to support workloads in remote or offline locations. To learn more, read AWS Snowball Edge – More Storage, Local Endpoints, Lambda Functions.

AWS Snowmobile – This is an exabyte-scale data transfer service. Pulled by a semi-trailer truck, each Snowmobile packs 100 petabytes of storage into a ruggedized 45-foot long shipping container. Read AWS Snowmobile – Move Exabytes of Data to the Cloud in Weeks to learn more (and to see some of my finest LEGO work).

AWS Storage Gateway – This hybrid storage service lets your on-premises applications use AWS cloud storage (Amazon Simple Storage Service (S3), Amazon Glacier, and Amazon Elastic File System) in a simple and seamless way, with storage for volumes, files, and virtual tapes. To learn more, read The AWS Storage Gateway – Integrate Your Existing On-Premises Applications with AWS Cloud Storage and File Interface to AWS Storage Gateway.

And there you go! Check out my earlier post for a list of resources that will help you to build applications that comply with HIPAA and HITECH.

Jeff;

 

AWS Hot Startups – September 2017

Post Syndicated from Tina Barr original https://aws.amazon.com/blogs/aws/aws-hot-startups-september-2017/

As consumers continue to demand faster, simpler, and more on-the-go services, FinTech companies are responding with ever more innovative solutions to fit everyone’s needs and to improve customer experience. This month, we are excited to feature the following startups—all of whom are disrupting traditional financial services in unique ways:

  • Acorns – allowing customers to invest spare change automatically.
  • Bondlinc – improving the bond trading experience for clients, financial institutions, and private banks.
  • Lenda – reimagining homeownership with a secure and streamlined online service.

Acorns (Irvine, CA)

Driven by the belief that anyone can grow wealth, Acorns is relentlessly pursuing ways to help make that happen. Currently the fastest-growing micro-investing app in the U.S., Acorns takes mere minutes to get started and is currently helping over 2.2 million people grow their wealth. And unlike other FinTech apps, Acorns is focused on helping America’s middle class – namely the 182 million citizens who make less than $100,000 per year – and looking after their financial best interests.

Acorns is able to help their customers effortlessly invest their money, little by little, by offering ETF portfolios put together by Dr. Harry Markowitz, a Nobel Laureate in economic sciences. They also offer a range of services, including “Round-Ups,” whereby customers can automatically invest spare change from every day purchases, and “Recurring Investments,” through which customers can set up automatic transfers of just $5 per week into their portfolio. Additionally, Found Money, Acorns’ earning platform, can help anyone spend smarter as the company connects customers to brands like Lyft, Airbnb, and Skillshare, who then automatically invest in customers’ Acorns account.

The Acorns platform runs entirely on AWS, allowing them to deliver a secure and scalable cloud-based experience. By utilizing AWS, Acorns is able to offer an exceptional customer experience and fulfill its core mission. Acorns uses Terraform to manage services such as Amazon EC2 Container Service, Amazon CloudFront, and Amazon S3. They also use Amazon RDS and Amazon Redshift for data storage, and Amazon Glacier to manage document retention.

Acorns is hiring! Be sure to check out their careers page if you are interested.

Bondlinc (Singapore)

Eng Keong, Founder and CEO of Bondlinc, has long wanted to standardize, improve, and automate the traditional workflows that revolve around bond trading. As a former trader at BNP Paribas and Jefferies & Company, E.K. – as Keong is known – had personally seen how manual processes led to information bottlenecks in over-the-counter practices. This drove him, along with future Bondlinc CTO Vincent Caldeira, to start a new service that maximizes efficiency, information distribution, and accessibility for both clients and bankers in the bond market.

Currently, bond trading requires banks to spend a significant amount of resources retrieving data from expensive and restricted institutional sources, performing suitability checks, and attaching required documentation before presenting all relevant information to clients – usually by email. Bankers are often overwhelmed by these time-consuming tasks, which means clients don’t always get proper access to time-sensitive bond information and pricing. Bondlinc bridges this gap between banks and clients by providing a variety of solutions, including easy access to basic bond information and analytics, updates of new issues and relevant news, consolidated management of your portfolio, and a chat function between banker and client. By making the bond market much more accessible to clients, Bondlinc is taking private banking to the next level, while improving efficiency of the banks as well.

As a startup running on AWS since inception, Bondlinc has built and operated its SaaS product by leveraging Amazon EC2, Amazon S3, Elastic Load Balancing, and Amazon RDS across multiple Availability Zones to provide its customers (namely, financial institutions) a highly available and seamlessly scalable product distribution platform. Bondlinc also makes extensive use of Amazon CloudWatch, AWS CloudTrail, and Amazon SNS to meet the stringent operational monitoring, auditing, compliance, and governance requirements of its customers. Bondlinc is currently experimenting with Amazon Lex to build a conversational interface into its mobile application via a chat-bot that provides trading assistance services.

To see how Bondlinc works, request a demo at Bondlinc.com.

Lenda (San Francisco, CA)

Lenda is a digital mortgage company founded by seasoned FinTech entrepreneur Jason van den Brand. Jason wanted to create a smarter, simpler, and more streamlined system for people to either get a mortgage or refinance their homes. With Lenda, customers can find out if they are pre-approved for loans, and receive accurate, real-time mortgage rate quotes from industry-experienced home loan advisors. Lenda’s advisors support customers through the loan process by providing financial advice and guidance for a seamless experience.

Lenda’s innovative platform allows borrowers to complete their home loans online from start to finish. Through a savvy combination of being a direct lender with proprietary technology, Lenda has simplified the mortgage application process to save customers time and money. With an interactive dashboard, customers know exactly where they are in the mortgage process and can manage all of their documents in one place. The company recently received its Series A funding of $5.25 million, and van den Brand shared that most of the capital investment will be used to improve Lenda’s technology and fulfill the company’s mission, which is to reimagine homeownership, starting with home loans.

AWS allows Lenda to scale its business while providing a secure, easy-to-use system for a faster home loan approval process. Currently, Lenda uses Amazon S3, Amazon EC2, Amazon CloudFront, Amazon Redshift, and Amazon WorkSpaces.

Visit Lenda.com to find out more.

Thanks for reading and see you in October for another round of hot startups!

-Tina

New – AWS Resource Tagging API

Post Syndicated from Jeff Barr original https://aws.amazon.com/blogs/aws/new-aws-resource-tagging-api/

AWS customers frequently use tags to organize their Amazon EC2 instances, Amazon EBS volumes, Amazon S3 buckets, and other resources. Over the past couple of years we have been working to make tagging more useful and more powerful. For example, we have added support for tagging during Auto Scaling, the ability to use up to 50 tags per resource, console-based support for the creation of resources that share a common tag (also known as resource groups), and the option to use Config Rules to enforce the use of tags.

As customers grow to the point where they are managing thousands of resources, each with up to 50 tags, they have been looking to us for additional tooling and options to simplify their work. Today I am happy to announce that our new Resource Tagging API is now available. You can use these APIs from the AWS SDKs or via the AWS Command Line Interface (CLI). You now have programmatic access to the same resource group operations that had been accessible only from the AWS Management Console.

Recap: Console-Based Resource Group Operations
Before I get in to the specifics of the new API functions, I thought you would appreciate a fresh look at the console-based grouping and tagging model. I already have the ability to find and then tag AWS resources using a search that spans one or more regions. For example, I can select a long list of regions and then search them for my EC2 instances like this:

After I locate and select all of the desired resources, I can add a new tag key by clicking Create a new tag key and entering the desired tag key:

Then I enter a value for each instance (the new ProjectCode column):

Then I can create a resource group that contains all of the resources that are tagged with P100:

After I have created the resource group, I can locate all of the resources by clicking on the Resource Groups menu:

To learn more about this feature, read Resource Groups and Tagging for AWS.

New API for Resource Tagging
The API that we are announcing today gives you power to tag, untag, and locate resources using tags, all from your own code. With these new API functions, you are now able to operate on multiple resource types with a single set of functions.

Here are the new functions:

TagResources – Add tags to up to 20 resources at a time.

UntagResources – Remove tags from up to 20 resources at a time.

GetResources – Get a list of resources, with optional filtering by tags and/or resource types.

GetTagKeys – Get a list of all of the unique tag keys used in your account.

GetTagValues – Get all tag values for a specified tag key.

These functions support the following AWS services and resource types:

AWS Service Resource Types
Amazon CloudFront Distribution.
Amazon EC2 AMI, Customer Gateway, DHCP Option, EBS Volume, Instance, Internet Gateway, Network ACL, Network Interface, Reserved Instance, Reserved Instance Listing, Route Table, Security Group – EC2 Classic, Security Group – VPC, Snapshot, Spot Batch, Spot Instance Request, Spot Instance, Subnet, Virtual Private Gateway, VPC, VPN Connection.
Amazon ElastiCache Cluster, Snapshot.
Amazon Elastic File System Filesystem.
Amazon Elasticsearch Service Domain.
Amazon EMR Cluster.
Amazon Glacier Vault.
Amazon Inspector Assessment.
Amazon Kinesis Stream.
Amazon Machine Learning Batch Prediction, Data Source, Evaluation, ML Model.
Amazon Redshift Cluster.
Amazon Relational Database Service DB Instance, DB Option Group, DB Parameter Group, DB Security Group, DB Snapshot, DB Subnet Group, Event Subscription, Read Replica, Reserved DB Instance.
Amazon Route 53 Domain, Health Check, Hosted Zone.
Amazon S3 Bucket.
Amazon WorkSpaces WorkSpace.
AWS Certificate Manager Certificate.
AWS CloudHSM HSM.
AWS Directory Service Directory.
AWS Storage Gateway Gateway, Virtual Tape, Volume.
Elastic Load Balancing Load Balancer, Target Group.

Things to Know
Here are a couple of things to keep in mind when you build code or write scripts that use the new API functions or the CLI equivalents:

Compatibility – The older, service-specific functions remain available and you can continue to use them.

Write Permission – The new tagging API adds another layer of permission on top of existing policies that are specific to a single AWS service. For example, you will need to have access to tag:tagResources and EC2:createTags in order to add a tag to an EC2 instance.

Read Permission – You will need to have access to tag:GetResources, tag:GetTagKeys, and tag:GetTagValues in order to call functions that access tags and tag values.

Pricing – There is no charge for the use of these functions or for tags.

Available Now
The new functions are supported by the latest versions of the AWS SDKs. You can use them to tag and access resources in all commercial AWS regions.

Jeff;

 

AWS Achieves FedRAMP Authorization for New Services in the AWS GovCloud (US) Region

Post Syndicated from Chad Woolf original https://aws.amazon.com/blogs/security/aws-achieves-fedramp-authorization-for-a-wide-array-of-services/

Today, we’re pleased to announce an array of AWS services that are available in the AWS GovCloud (US) Region and have achieved Federal Risk and Authorization Management Program (FedRAMP) High authorizations. The FedRAMP Joint Authorization Board (JAB) has issued Provisional Authority to Operate (P-ATO) approvals, which are effective immediately. If you are a federal or commercial customer, you can use these services to process and store your critical workloads in the AWS GovCloud (US) Region’s authorization boundary with data up to the high impact level.

The services newly available in the AWS GovCloud (US) Region include database, storage, data warehouse, security, and configuration automation solutions that will help you increase your ability to manage data in the cloud. For example, with AWS CloudFormation, you can deploy AWS resources by automating configuration processes. AWS Key Management Service (KMS) enables you to create and control the encryption keys used to secure your data. Amazon Redshift enables you to analyze all your data cost effectively by using existing business intelligence tools to automate common administrative tasks for managing, monitoring, and scaling your data warehouse.

Our federal and commercial customers can now leverage our FedRAMP P-ATO to access the following services:

  • CloudFormation – CloudFormation gives developers and systems administrators an easy way to create and manage a collection of related AWS resources, provisioning and updating them in an orderly and predictable fashion. You can use sample templates in CloudFormation, or create your own templates to describe the AWS resources and any associated dependencies or run-time parameters required to run your application.
  • Amazon DynamoDBAmazon DynamoDB is a fast and flexible NoSQL database service for all applications that need consistent, single-digit-millisecond latency at any scale. It is a fully managed cloud database and supports both document and key-value store models.
  • Amazon EMRAmazon EMR provides a managed Hadoop framework that makes it efficient and cost effective to process vast amounts of data across dynamically scalable Amazon EC2 instances. You can also run other popular distributed frameworks such as Apache Spark, HBase, Presto, and Flink in EMR, and interact with data in other AWS data stores such as Amazon S3 and DynamoDB.
  • Amazon GlacierAmazon Glacier is a secure, durable, and low-cost cloud storage service for data archiving and long-term backup. Customers can reliably store large or small amounts of data for as little as $0.004 per gigabyte per month, a significant savings compared to on-premises solutions.
  • KMS – KMS is a managed service that makes it easier for you to create and control the encryption keys used to encrypt your data, and uses Hardware Security Modules (HSMs) to protect the security of your keys. KMS is integrated with other AWS services to help you protect the data you store with these services. For example, KMS is integrated with CloudTrail to provide you with logs of all key usage and help you meet your regulatory and compliance needs.
  • Redshift – Redshift is a fast, fully managed, petabyte-scale data warehouse that makes it simple and cost effective to analyze all your data by using your existing business intelligence tools.
  • Amazon Simple Notification Service (SNS)Amazon SNS is a fast, flexible, fully managed push notification service that lets you send individual messages or “fan out” messages to large numbers of recipients. SNS makes it simple and cost effective to send push notifications to mobile device users and email recipients or even send messages to other distributed services.
  • Amazon Simple Queue Service (SQS)Amazon SQS is a fully-managed message queuing service for reliably communicating among distributed software components and microservices—at any scale. Using SQS, you can send, store, and receive messages between software components at any volume, without losing messages or requiring other services to be always available.
  • Amazon Simple Workflow Service (SWF)Amazon SWF helps developers build, run, and scale background jobs that have parallel or sequential steps. SWF is a fully managed state tracker and task coordinator in the cloud.

AWS works closely with the FedRAMP Program Management Office (PMO), National Institute of Standards and Technology (NIST), and other federal regulatory and compliance bodies to ensure that we provide you with the cutting-edge technology you need in a secure and compliant fashion. We are working with our authorizing officials to continue to expand the scope of our authorized services, and we are fully committed to ensuring that AWS GovCloud (US) continues to offer government customers the most comprehensive mix of functionality and security.

– Chad

AWS Monthly Online Tech Talks – March, 2017

Post Syndicated from Tara Walker original https://aws.amazon.com/blogs/aws/aws-monthly-online-tech-talks-march-2017/

Unbelievably it is March already, as you enter into the madness of March don’t forget to take some time and learning more about the latest service innovations from AWS. Each month, we have a series of webinars targeting best practices and new service features in the AWS Cloud.

I have shared below the schedule for the live, online technical sessions scheduled for the month of March. Remember these talks are free, but they fill up quickly so register ahead of time. The online tech talks scheduled times are shown in Pacific Time (PT) time zone.

Webinars featured this month are as follows:

Tuesday, March 21

Big Data

9:00 AM – 10:00 AM: Deploying a Data Lake in AWS

Databases

10:30 AM – 11:30 AM: Optimizing the Data Tier for Serverless Web Applications

IoT

12:00 Noon – 1:00 PM: One Click Enterprise IoT Services

 

Wednesday, March 22

Databases

10:30 – 11:30 AM: ElastiCache Deep Dive: Best Practices and Usage Patterns

Mobile

12:00 Noon – 1:00 PM: A Deeper Dive into Apache MXNet on AWS

 

Thursday, March 23

IoT

9:00 – 10:00 AM: Developing Applications with the IoT Button

Compute

10:30 – 11:30 AM: Automating Management of Amazon EC2 Instances with Auto Scaling

 

Friday, March 24

Compute

10:30 – 11:30 AM: An Overview of Designing Microservices Based Applications on AWS

 

Monday, March 27

AI

9:00 – 10:00 AM: How to get the most out of Amazon Polly, a text-to-speech service

 

Tuesday, March 28

Compute

10:30 AM – 11:30 AM: Getting the Most Out of the New Amazon EC2 Reserved Instances Enhancements

Getting Started

12:00 Noon – 1:30 PM: Getting Started with AWS

 

Wednesday, March 29

Security

9:00 – 10:00 AM: Best Practices for Managing Security Operations in AWS

Storage

10:30 – 11:30 AM: Deep Dive on Amazon S3

Big Data

12:00 Noon – 1:00 PM: Log Analytics with Amazon Elasticsearch Service and Amazon Kinesis

 

Thursday, March 30

Storage

9:00 – 10:00 AM: Active Archiving with Amazon S3 and Tiering to Amazon Glacier

Mobile

10:30 AM – 11:30 AM: Deep Dive on Amazon Cognito

Compute

12:00 Noon – 1:00 PM: Building a Development Workflow for Serverless Applications

 

The AWS Online Tech Talks series covers a broad range of topics at varying technical levels. These technical sessions are led by AWS solutions architects and engineers and feature live demonstrations & customer examples. You can also check out the AWS on-demand webinar series on the AWS YouTube channel.

Tara

 

What’s the Diff: Hot and Cold Data Storage

Post Syndicated from Peter Cohen original https://www.backblaze.com/blog/whats-the-diff-hot-and-cold-data-storage/

Hot And Cold Storage

Differentiating cloud data storage by “temperature” is common practice when it comes to describing the tiered storage setups offered by various cloud storage providers. “Hot” and “cold” describes how often that data is accessed. What’s the actual difference, and how does each temperature fit your cloud storage strategy? Let’s take a look.

First of all, let’s get this out of the way: There’s no set industry definition of what hot and cold actually mean. So some of this may need to be adapted to your specific circumstances. You’re bound to see some variance or disagreement if you research the topic.

Hot Storage

“Hot” storage is data you need to access right away, where performance is at a premium. Hot storage often goes hand in hand with cloud computing. If you’re depending on cloud services not only to store your data but also to process it, you’re looking at hot storage.

Business-critical information that needs to be accessed frequently and quickly is hot storage. If performance is of the essence – if you need the data stored on SSDs instead of hard drives, because speed is that much of a factor – then that’s hot storagae.

High-performance primary storage comes at a price, though. Cloud data storage providers charge a premium for hot data storage, because it’s resource-intensive. Microsoft’s Azure Hot Blobs and Amazon AWS services don’t come cheap.

Read on for how our B2 Cloud Storage fits the hot storage model. But first, let’s talk about cold storage.

Cold Storage

“Cold” storage is information that you don’t need to access very often. Inactive data that doesn’t need to be accessed for months, years, decades, potentially ever. That’s the sort of content that cold storage is ideal for. Practical examples of data suitable for cold storage include old projects, records you might need for auditing or bookkeeping purposes at some point in the future, or other content you only need to access infrequently.

Data retrieval and response time for cold cloud storage systems are typically slower than services designed for active data manipulation. Practical examples of cold cloud storage include services like Amazon Glacier and Google Coldline.

Storage prices for cold cloud storage systems are typically lower than warm or hot storage. But cold storage often incur higher per-operation costs than other kinds of cloud storage. Access to the data typically requires patience and planning.

Apocryphally, “cold” storage meant just that: Data physically stored away from the hot machines running the media. Today, cold storage is still sometimes used to describe purely offline storage – that is, data that’s not stored in the cloud at all. Sometimes this is data that you might want to quarantine from from the Internet altogether – for example, cryptocurrency like BitCoin. Sometimes this is that old definition of cold storage: data that is archived on some sort of durable medium and stored in a secure offsite facility.

How B2 Cloud Storage Fits the Cold and Hot Model

We’ve designed B2 Cloud Storage to be instantly available. With B2, you won’t have delays accessing your information like you might have with offline or some nearline systems. Your data is available when you need it.

B2 is built on the physical architecture and advanced software framework we’ve been developing for the past decade to power our signature backup services. B2 Cloud Storage sports multiple layers of redundancy to make sure that your data is stored safely and is available when you need it.

We’ve taken the concept of hot storage a step further by offering reliable, affordable, and scalable storage in the cloud for a mere fraction of what others charge. We’re one-quarter the price of Amazon.

B2 Cloud Storage changes the pricing model for cloud storage. B2 changes the pricing model so much that our customers have found it economical to migrate away altogether from slow, inconvenient and frustrating cold storage and offline archival systems. Our media and entertainment customers are using B2 instead of LTO tape systems, for example.

What Temperature Is Your Cloud Storage?

Different organizations have different needs, so there’s no right answer about what temperature your cloud data should be. It’s imperative to your bottom line that you don’t pay for more than what you need. That’s why we’ve designed B2 to be an affordable and reliable cloud storage solution. Get started today and you’ll get the first 10GB for free!

Have a different idea of what hot and cold storage are? Have questions that aren’t answered here? Join the discussion!

The post What’s the Diff: Hot and Cold Data Storage appeared first on Backblaze Blog | Cloud Storage & Cloud Backup.

Create Tables in Amazon Athena from Nested JSON and Mappings Using JSONSerDe

Post Syndicated from Rick Wiggins original https://aws.amazon.com/blogs/big-data/create-tables-in-amazon-athena-from-nested-json-and-mappings-using-jsonserde/

Most systems use Java Script Object Notation (JSON) to log event information. Although it’s efficient and flexible, deriving information from JSON is difficult.

In this post, you will use the tightly coupled integration of Amazon Kinesis Firehose for log delivery, Amazon S3 for log storage, and Amazon Athena with JSONSerDe to run SQL queries against these logs without the need for data transformation or insertion into a database. It’s done in a completely serverless way. There’s no need to provision any compute.

Amazon SES provides highly detailed logs for every message that travels through the service and, with SES event publishing, makes them available through Firehose. However, parsing detailed logs for trends or compliance data would require a significant investment in infrastructure and development time. Athena is a boon to these data seekers because it can query this dataset at rest, in its native format, with zero code or architecture. On top of that, it uses largely native SQL queries and syntax.

Walkthrough: Establishing a dataset

We start with a dataset of an SES send event that looks like this:

{
	"eventType": "Send",
	"mail": {
		"timestamp": "2017-01-18T18:08:44.830Z",
		"source": "[email protected]",
		"sourceArn": "arn:aws:ses:us-west-2:111222333:identity/[email protected]",
		"sendingAccountId": "111222333",
		"messageId": "01010159b2c4471e-fc6e26e2-af14-4f28-b814-69e488740023-000000",
		"destination": ["[email protected]"],
		"headersTruncated": false,
		"headers": [{
				"name": "From",
				"value": "[email protected]"
			}, {
				"name": "To",
				"value": "[email protected]"
			}, {
				"name": "Subject",
				"value": "Bounced Like a Bad Check"
			}, {
				"name": "MIME-Version",
				"value": "1.0"
			}, {
				"name": "Content-Type",
				"value": "text/plain; charset=UTF-8"
			}, {
				"name": "Content-Transfer-Encoding",
				"value": "7bit"
			}
		],
		"commonHeaders": {
			"from": ["[email protected]"],
			"to": ["[email protected]"],
			"messageId": "01010159b2c4471e-fc6e26e2-af14-4f28-b814-69e488740023-000000",
			"subject": "Test"
		},
		"tags": {
			"ses:configuration-set": ["Firehose"],
			"ses:source-ip": ["54.55.55.55"],
			"ses:from-domain": ["amazon.com"],
			"ses:caller-identity": ["root"]
		}
	},
	"send": {}
}

This dataset contains a lot of valuable information about this SES interaction. There are thousands of datasets in the same format to parse for insights. Getting this data is straightforward.

1. Create a configuration set in the SES console or CLI that uses a Firehose delivery stream to send and store logs in S3 in near real-time.
NestedJson_1

2. Use SES to send a few test emails. Be sure to define your new configuration set during the send.

To do this, when you create your message in the SES console, choose More options. This will display more fields, including one for Configuration Set.
NestedJson_2
You can also use your SES verified identity and the AWS CLI to send messages to the mailbox simulator addresses.

$ aws ses send-email --to [email protected] --from [email protected] --subject "Bounced Like a Bad Check" --text "This should bounce" --configuration-set-name Firehose

3. Select your S3 bucket to see that logs are being created.
NestedJson_3

Walkthrough: Querying with Athena

Amazon Athena is an interactive query service that makes it easy to use standard SQL to analyze data resting in Amazon S3. Athena requires no servers, so there is no infrastructure to manage. You pay only for the queries you run. This makes it perfect for a variety of standard data formats, including CSV, JSON, ORC, and Parquet.

You now need to supply Athena with information about your data and define the schema for your logs with a Hive-compliant DDL statement. Athena uses Presto, a distributed SQL engine, to run queries. It also uses Apache Hive DDL syntax to create, drop, and alter tables and partitions. Athena uses an approach known as schema-on-read, which allows you to use this schema at the time you execute the query. Essentially, you are going to be creating a mapping for each field in the log to a corresponding column in your results.

If you are familiar with Apache Hive, you might find creating tables on Athena to be pretty similar. You can create tables by writing the DDL statement in the query editor or by using the wizard or JDBC driver. An important part of this table creation is the SerDe, a short name for “Serializer and Deserializer.” Because your data is in JSON format, you will be using org.openx.data.jsonserde.JsonSerDe, natively supported by Athena, to help you parse the data. Along the way, you will address two common problems with Hive/Presto and JSON datasets:

  • Nested or multi-level JSON.
  • Forbidden characters (handled with mappings).

In the Athena Query Editor, use the following DDL statement to create your first Athena table. For  LOCATION, use the path to the S3 bucket for your logs:

CREATE EXTERNAL TABLE sesblog (
  eventType string,
  mail struct<`timestamp`:string,
              source:string,
              sourceArn:string,
              sendingAccountId:string,
              messageId:string,
              destination:string,
              headersTruncated:boolean,
              headers:array<struct<name:string,value:string>>,
              commonHeaders:struct<`from`:array<string>,to:array<string>,messageId:string,subject:string>
              > 
  )           
ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe'
LOCATION 's3://<YOUR BUCKET HERE>/FH2017/' 

In this DDL statement, you are declaring each of the fields in the JSON dataset along with its Presto data type. You are using Hive collection data types like Array and Struct to set up groups of objects.

Walkthrough: Nested JSON

Defining the mail key is interesting because the JSON inside is nested three levels deep. In the example, you are creating a top-level struct called mail which has several other keys nested inside. This includes fields like messageId and destination at the second level. You can also see that the field timestamp is surrounded by the backtick (`) character. timestamp is also a reserved Presto data type so you should use backticks here to allow the creation of a column of the same name without confusing the table creation command. On the third level is the data for headers. It contains a group of entries in name:value pairs. You define this as an array with the structure of <name:string,value:string> defining your schema expectations here. You must enclose `from` in the commonHeaders struct with backticks to allow this reserved word column creation.

Now that you have created your table, you can fire off some queries!

SELECT * FROM sesblog limit 10;

This output shows your two top-level columns (eventType and mail) but this isn’t useful except to tell you there is data being queried. You can use some nested notation to build more relevant queries to target data you care about.

“Which messages did I bounce from Monday’s campaign?”

SELECT eventtype as Event,
       mail.destination as Destination, 
       mail.messageId as MessageID,
       mail.timestamp as Timestamp
FROM sesblog
WHERE eventType = 'Bounce' and mail.timestamp like '2017-01-09%'

“How many messages have I bounced to a specific domain?”

SELECT COUNT(*) as Bounces 
FROM sesblog
WHERE eventType = 'Bounce' and mail.destination like '%amazonses.com%'

“Which messages did I bounce to the domain amazonses.com?”

SELECT eventtype as Event,
       mail.destination as Destination, 
       mail.messageId as MessageID 
FROM sesblog
WHERE eventType = 'Bounce' and mail.destination like '%amazonses.com%'

There are much deeper queries that can be written from this dataset to find the data relevant to your use case. You might have noticed that your table creation did not specify a schema for the tags section of the JSON event. You’ll do that next.

Walkthrough: Handling forbidden characters with mappings

Here is a major roadblock you might encounter during the initial creation of the DDL to handle this dataset: you have little control over the data format provided in the logs and Hive uses the colon (:) character for the very important job of defining data types. You need to give the JSONSerDe a way to parse these key fields in the tags section of your event. This is some of the most crucial data in an auditing and security use case because it can help you determine who was responsible for a message creation.

In the Athena query editor, use the following DDL statement to create your second Athena table. For LOCATION, use the path to the S3 bucket for your logs:

CREATE EXTERNAL TABLE sesblog2 (
  eventType string,
  mail struct<`timestamp`:string,
              source:string,
              sourceArn:string,
              sendingAccountId:string,
              messageId:string,
              destination:string,
              headersTruncated:boolean,
              headers:array<struct<name:string,value:string>>,
              commonHeaders:struct<`from`:array<string>,to:array<string>,messageId:string,subject:string>,
              tags:struct<ses_configurationset:string,ses_source_ip:string,ses_from_domain:string,ses_caller_identity:string>
              > 
  )           
ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe'
WITH SERDEPROPERTIES (
  "mapping.ses_configurationset"="ses:configuration-set",
  "mapping.ses_source_ip"="ses:source-ip", 
  "mapping.ses_from_domain"="ses:from-domain", 
  "mapping.ses_caller_identity"="ses:caller-identity"
  )
LOCATION 's3://<YOUR BUCKET HERE>/FH2017/' 

In your new table creation, you have added a section for SERDEPROPERTIES. This allows you to give the SerDe some additional information about your dataset. For your dataset, you are using the mapping property to work around your data containing a column name with a colon smack in the middle of it. ses:configuration-set would be interpreted as a column named ses with the datatype of configuration-set. Unlike your earlier implementation, you can’t surround an operator like that with backticks. The JSON SERDEPROPERTIES mapping section allows you to account for any illegal characters in your data by remapping the fields during the table’s creation.

For example, you have simply defined that the column in the ses data known as ses:configuration-set will now be known to Athena and your queries as ses_configurationset. This mapping doesn’t do anything to the source data in S3. This is a Hive concept only. It won’t alter your existing data. You have set up mappings in the Properties section for the four fields in your dataset (changing all instances of colon to the better-supported underscore) and in your table creation you have used those new mapping names in the creation of the tags struct.

Now that you have access to these additional authentication and auditing fields, your queries can answer some more questions.

“Who is creating all of these bounced messages?”

SELECT eventtype as Event,
         mail.timestamp as Timestamp,
         mail.tags.ses_source_ip as SourceIP,
         mail.tags.ses_caller_identity as AuthenticatedBy,
         mail.commonHeaders."from" as FromAddress,
         mail.commonHeaders.to as ToAddress
FROM sesblog2
WHERE eventtype = 'Bounce'

Of special note here is the handling of the column mail.commonHeaders.”from”. Because from is a reserved operational word in Presto, surround it in quotation marks (“) to keep it from being interpreted as an action.

Walkthrough: Querying using SES custom tagging

What makes this mail.tags section so special is that SES will let you add your own custom tags to your outbound messages. Now you can label messages with tags that are important to you, and use Athena to report on those tags. For example, if you wanted to add a Campaign tag to track a marketing campaign, you could use the –tags flag to send a message from the SES CLI:

$ aws ses send-email --to [email protected] --from [email protected] --subject "Perfume Campaign Test" --text "Buy our Smells" --configuration-set-name Firehose --tags Name=Campaign,Value=Perfume

This results in a new entry in your dataset that includes your custom tag.

…
		"tags": {
			"ses:configuration-set": ["Firehose"],
			"Campaign": ["Perfume"],
			"ses:source-ip": ["54.55.55.55"],
			"ses:from-domain": ["amazon.com"],
			"ses:caller-identity": ["root"],
			"ses:outgoing-ip": ["54.240.27.11"]
		}
…

You can then create a third table to account for the Campaign tagging.

CREATE EXTERNAL TABLE sesblog3 (
  eventType string,
  mail struct<`timestamp`:string,
              source:string,
              sourceArn:string,
              sendingAccountId:string,
              messageId:string,
              destination:string,
              headersTruncated:string,
              headers:array<struct<name:string,value:string>>,
              commonHeaders:struct<`from`:array<string>,to:array<string>,messageId:string,subject:string>,
              tags:struct<ses_configurationset:string,Campaign:string,ses_source_ip:string,ses_from_domain:string,ses_caller_identity:string>
              > 
  )           
ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe'
WITH SERDEPROPERTIES (
  "mapping.ses_configurationset"="ses:configuration-set",
  "mapping.ses_source_ip"="ses:source-ip", 
  "mapping.ses_from_domain"="ses:from-domain", 
  "mapping.ses_caller_identity"="ses:caller-identity"
  )
LOCATION 's3://<YOUR BUCKET HERE>/FH2017/' 

Then you can use this custom value to begin to query which you can define on each outbound email.

SELECT eventtype as Event,
       mail.destination as Destination, 
       mail.messageId as MessageID,
       mail.tags.Campaign as Campaign
FROM sesblog3
where mail.tags.Campaign like '%Perfume%'

NestedJson_4

Walkthrough: Building your own DDL programmatically with hive-json-schema

In all of these examples, your table creation statements were based on a single SES interaction type, send. SES has other interaction types like delivery, complaint, and bounce, all which have some additional fields. I’ll leave you with this, a master DDL that can parse all the different SES eventTypes and can create one table where you can begin querying your data.

Building a properly working JSONSerDe DLL by hand is tedious and a bit error-prone, so this time around you’ll be using an open source tool commonly used by AWS Support. All you have to do manually is set up your mappings for the unsupported SES columns that contain colons.

This sample JSON file contains all possible fields from across the SES eventTypes. It has been run through hive-json-schema, which is a great starting point to build nested JSON DDLs.

Here is the resulting “master” DDL to query all types of SES logs:

CREATE EXTERNAL TABLE sesmaster (
  eventType string,
  complaint struct<arrivaldate:string, 
                   complainedrecipients:array<struct<emailaddress:string>>,
                   complaintfeedbacktype:string, 
                   feedbackid:string, 
                   `timestamp`:string, 
                   useragent:string>,
  bounce struct<bouncedrecipients:array<struct<action:string, diagnosticcode:string, emailaddress:string, status:string>>,
                bouncesubtype:string, 
                bouncetype:string, 
                feedbackid:string,
                reportingmta:string, 
                `timestamp`:string>,
  mail struct<`timestamp`:string,
              source:string,
              sourceArn:string,
              sendingAccountId:string,
              messageId:string,
              destination:string,
              headersTruncated:boolean,
              headers:array<struct<name:string,value:string>>,
              commonHeaders:struct<`from`:array<string>,to:array<string>,messageId:string,subject:string>,
              tags:struct<ses_configurationset:string,ses_source_ip:string,ses_outgoing_ip:string,ses_from_domain:string,ses_caller_identity:string>
              >,
  send string,
  delivery struct<processingtimemillis:int,
                  recipients:array<string>, 
                  reportingmta:string, 
                  smtpresponse:string, 
                  `timestamp`:string>
  )           
ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe'
WITH SERDEPROPERTIES (
  "mapping.ses_configurationset"="ses:configuration-set",
  "mapping.ses_source_ip"="ses:source-ip", 
  "mapping.ses_from_domain"="ses:from-domain", 
  "mapping.ses_caller_identity"="ses:caller-identity",
  "mapping.ses_outgoing_ip"="ses:outgoing-ip"
  )
LOCATION 's3://<YOUR BUCKET HERE>/FH2017/'

Conclusion

In this post, you’ve seen how to use Amazon Athena in real-world use cases to query the JSON used in AWS service logs. Some of these use cases can be operational like bounce and complaint handling. Others report on trends and marketing data like querying deliveries from a campaign. Still others provide audit and security like answering the question, which machine or user is sending all of these messages? You’ve also seen how to handle both nested JSON and SerDe mappings so that you can use your dataset in its native format without making changes to the data to get your queries running.

With the new AWS QuickSight suite of tools, you also now have a data source that that can be used to build dashboards. This makes reporting on this data even easier. For information about using Athena as a QuickSight data source, see this blog post.

There are also optimizations you can make to these tables to increase query performance or to set up partitions to query only the data you need and restrict the amount of data scanned. If you only need to report on data for a finite amount of time, you could optionally set up S3 lifecycle configuration to transition old data to Amazon Glacier or to delete it altogether.

Feel free to leave questions or suggestions in the comments.

 


 

About the  Author

rick_wiggins_100Rick Wiggins is a Cloud Support Engineer for AWS Premium Support. He works with our customers to build solutions for Email, Storage and Content Delivery, helping them spend more time on their business and less time on infrastructure. In his spare time, he enjoys traveling the world with his family and volunteering at his children’s school teaching lessons in Computer Science and STEM.

 

 

Related

Migrate External Table Definitions from a Hive Metastore to Amazon Athena

exporting_hive