All posts by Larry Heathcote

Your guide to Amazon Kinesis sessions, chalk talks, and workshops at AWS re:Invent 2018

Post Syndicated from Larry Heathcote original https://aws.amazon.com/blogs/big-data/your-guide-to-amazon-kinesis-sessions-chalk-talks-and-workshops-at-aws-reinvent-2018/

AWS re:Invent 2018 is almost here! This post includes a list of Amazon Kinesis sessions, chalk talks, and workshops at AWS re:Invent 2018. You can choose the link next to each session description for the session schedule. Use the information to help schedule your conference week in Las Vegas to learn more about Amazon Kinesis.

Sessions

ANT208 – Serverless Video Ingestion & Analytics with Amazon Kinesis Video Streams

Amazon Kinesis Video Streams makes it easy to capture live video, play it back, and store it for real-time and batch-oriented ML-driven analytics. In this session, we first dive deep on the top five best practices for getting started and scaling with Amazon Kinesis Video Streams. Next, we demonstrate a streaming video from a standard USB camera connected to a laptop, and we perform a live playback on a standard browser within minutes. We also have on stage members of Amazon Go, who are building the next generation of physical retail store experiences powered by their “just walk out” technology. They walk through the technical details of their integration with Kinesis Video Streams and highlight their successes and difficulties along the way.

ANT310 – Architecting for Real-Time Insights with Amazon Kinesis

Amazon Kinesis makes it easy to speed up the time it takes for you to get valuable, real-time insights from your streaming data. In this session, we walk through the most popular applications that customers implement using Amazon Kinesis, including streaming extract-transform-load, continuous metric generation, and responsive analytics. Our customer Autodesk joins us to describe how they created real-time metrics generation and analytics using Amazon Kinesis and Amazon Elasticsearch Service. They walk us through their architecture and the best practices they learned in building and deploying their real-time analytics solution.

ANT322-R – High Performance Data Streaming with Amazon Kinesis: Best Practices

Amazon Kinesis makes it easy to collect, process, and analyze real-time, streaming data so you can get timely insights and react quickly to new information. In this session, we dive deep into best practices for Kinesis Data Streams and Kinesis Data Firehose to get the most performance out of your data streaming applications. Our customer NICE inContact joins us to discuss how they utilize Amazon Kinesis Data Streams to make real-time decisions on customer contact routing and agent assignments for its Call Center as a Service (CCaaS) Platform. NICE inContact walks through their architecture and requirements for low-latency, accurate processing to be as responsive as possible to changes.

ANT322-R1 – High Performance Data Streaming with Amazon Kinesis: Best Practices

Amazon Kinesis makes it easy to collect, process, and analyze real-time, streaming data so you can get timely insights and react quickly to new information. In this session, we dive deep into best practices for Kinesis Data Streams and Kinesis Data Firehose to get the most performance out of your data streaming applications. Comcast uses Amazon Kinesis Data Streams to build a Streaming Data Platform that centralizes data exchanges. It is foundational to the way our data analysts and data scientists derive real-time insights from the data. In the second part of this talk, Comcast zooms into how to properly scale a Kinesis stream. We first list the factors to consider to avoid scaling issues with standard Kinesis stream consumption, and then we see how the new fan-out feature changes these scaling considerations.

SRV316-R & SRV316-R1 – Serverless Stream Processing Pipeline Best Practices

Real-time analytics has traditionally been analyzed using batch processing in DWH/Hadoop environments. Common use cases use data lakes, data science, and machine learning (ML). Creating serverless data-driven architecture and serverless streaming solutions with services like Amazon Kinesis, AWS Lambda, and Amazon Athena can solve real-time ingestion, storage, and analytics challenges, and help you focus on application logic without managing infrastructure. In this session, we introduce design patterns, best practices, and share customer journeys from batch to real-time insights in building modern serverless data-driven architecture applications. Hear how Intel built the Intel Pharma Analytics Platform using a serverless architecture. This AI cloud-based offering enables remote monitoring of patients using an array of sensors, wearable devices, and ML algorithms to objectively quantify the impact of interventions and power clinical studies in various therapeutics conditions.

SEC402-R – AWS, I Choose You: Pokemon’s Battle against the Bots

Join us for this advanced-level talk to learn about Pokemon’s journey defending against DDoS attacks and bad bots with AWS WAF, AWS Shield, and other AWS services. We go through their initial challenges and the evolution of their bot mitigation solution, which includes offline log analysis and dynamic updates of badbot IPs along with rate-based rules. This is an advanced talk and assumes some knowledge of Amazon DynamoDB, Amazon Kinesis Data Firehose, Amazon Kinesis Data Analytics, AWS Firewall Manager, AWS Shield, and AWS WAF.

Chalk Talks

ANT358 – Serverless Stream Processing Tips & Tricks

Streaming data ingestion and near real-time analysis gives you immediate insights into your data. By using AWS Lambda with Amazon Kinesis, you can obtain these insights without the need to manage servers. But are you doing this in the most optimal way? In this interactive session, we review the best practices for using Lambda with Kinesis, and how to avoid common pitfalls.

ANT359 – Considerations for Building Your First Streaming Application

Do you want to increase your knowledge of AWS big data web services and launch your first big data application on the cloud? In this chalk talk, we provide an overview of many of the AWS analytics services, including Amazon EMR, Amazon Kinesis, Amazon Athena, and Amazon Redshift. We discuss how they are architected together to solve common big data problems, such as ingestion, ETL, and real-time analytics.

ANT360 – Don’t Wait Until Tomorrow: From Batch to Streaming

In recent years, there has been explosive growth in the number of connected devices and real-time data sources. Data is being produced continuously and its production rate is accelerating. Businesses can no longer wait for hours or days to use this data. To gain the most valuable insights, they must use this data immediately so they can react quickly to new information. In this chalk talk, we discuss how to take advantage of streaming data sources to analyze and react in near-real time. In addition, we present different options for how to solve a real-world scenario and walk through those solutions.

ANT361 – Using Amazon Kinesis Data Streams as a Low-Latency Message Bus

Amazon Kinesis makes it easy to collect, process, and analyze real-time, streaming data so you can get timely insights and react quickly to new information. In this chalk talk, we dive deep into best practices for Kinesis Data Streams and how to optimize for low-latency, multi-consumer solutions.

BAP328-R & BAP328-R1 – Architectures for Gaining Data Insights into Your Contact Center Experience

Join us for a deep dive into using Amazon Kinesis Data Analytics for insight into what’s happening with the contacts and agents in your Amazon Connect contact center. Learn how to leverage AWS analytics and ML services to inspect, transform, and gain insight into the customer’s journey through your contact center. We also show you how to use Alexa for Business to receive timely voice-activated business intelligence on your contact center’s performance.

Workshops

Before starting a workshop, you should have a basic understanding of Amazon Kinesis. Please bring your laptop and power supply to the workshop.

ANT213-R – Build Your First Big Data Application on AWS

Do you want to increase your knowledge of AWS big data web services and launch your first big data application on the cloud? In this session, we walk you through simplifying big data processing as a data bus comprising ingest, store, process, and visualize. You will build a big data application using AWS managed services, including Amazon Athena, Amazon Kinesis, Amazon DynamoDB, and Amazon S3. Along the way, we review architecture design patterns for big data applications and give you access to a take-home lab so you can rebuild and customize the application yourself.

ANT213-R1 – Build Your First Big Data Application on AWS

Do you want to increase your knowledge of AWS big data web services and launch your first big data application on the cloud? In this session, we walk you through simplifying big data processing as a data bus comprising ingest, store, process, and visualize. You will build a big data application using AWS managed services, including Amazon Athena, Amazon Kinesis, Amazon DynamoDB, and Amazon S3. Along the way, we review architecture design patterns for big data applications and give you access to a take-home lab so you can rebuild and customize the application yourself.

ANT357 – Stream Video, Analyze It in Real Time, and Share It in Real Time

Video is ‘big data.’ Image sensors—in our smartphones, smart home devices, and traffic cameras—are getting Internet-connected. Massive streams of video data are generated, but currently not mined for real-time insights to drive businesses forward. In this workshop, learn to capture, process, and analyze video streams. Build and configure your camera device’s media pipeline to start streaming video into the AWS Cloud using Amazon Kinesis Video Streams. Next, build and deploy your own machine learning (ML) model in Amazon SageMaker to generate inferences about objects or activities in your video stream. Finally, build a browser-based web player to view the video in Live and On-Demand modes, including the analyzed video stream. In this workshop, you use Amazon Kinesis Video Streams, Amazon SageMaker, Amazon Rekognition Video, and Amazon ECS.

ANT362 – Use Streaming Data to Gain Real-Time Insights into Your Business

In recent years, there has been an explosive growth in the number of connected devices and real-time data sources. Because of this, data is being continuously produced, and its production rate is accelerating. Businesses can no longer wait for hours or days to use this data. To gain the most valuable insights, they must use this data immediately so they can react quickly to new information. In this workshop, you will learn how to take advantage of streaming data sources to analyze and react in near real time. We provide several requirements for a real-world streaming data scenario, and you’re tasked with creating a solution that successfully satisfies the requirements using services such as Amazon Kinesis, AWS Lambda, and Amazon SNS.

ANT318-R – Build, Deploy and Serve Machine learning models on streaming data using Amazon SageMaker, Apache Spark on Amazon EMR and Amazon Kinesis

As data exponentially grows in organizations, there is an increasing need to use machine learning (ML) to gather insights from this data at scale and to use those insights to perform real-time predictions on incoming data. In this workshop, we walk you through how to train an Apache Spark model using Amazon SageMaker that points to Apache Livy and running on an Amazon EMR Spark cluster. We also show you how to host the Spark model on Amazon SageMaker to serve a RESTful inference API. Finally, we show you how to use the RESTful API to serve real-time predictions on streaming data from Amazon Kinesis Data Streams.

ANT318-R1 – Build, Deploy and Serve Machine learning models on streaming data using Amazon SageMaker, Apache Spark on Amazon EMR and Amazon Kinesis

As data exponentially grows in organizations, there is an increasing need to use machine learning (ML) to gather insights from this data at scale and to use those insights to perform real-time predictions on incoming data. In this workshop, we walk you through how to train an Apache Spark model using Amazon SageMaker that points to Apache Livy and running on an Amazon EMR Spark cluster. We also show you how to host the Spark model on Amazon SageMaker to serve a RESTful inference API. Finally, we show you how to use the RESTful API to serve real-time predictions on streaming data from Amazon Kinesis Data Streams.

GPSWS406 – Advanced Serverless Data Processing

In this hands-on workshop, you learn best practices and architectural patterns for building streaming data processing pipelines without servers. Using Amazon Kinesis, AWS Lambda, and other services, you have the opportunity to build, deploy, and monitor an application to ingest and process high-velocity data at scale. This advanced workshop assumes that you have experience writing Lambda functions and understand the basics of the AWS serverless platform, so come ready to dive into the deep end. Bring your laptop with a full keyboard. We provide a sandbox AWS account for you to use during the workshop.

MAE309 – Build an AWS Analytics Solution to Monitor the Video Streaming Experience

In this workshop, we build and deploy an end-to-end analytics solution for monitoring the video streaming experience. We integrate an open source video player with Amazon Kinesis Data Streams to capture events in real time. We explore the data available for capture and a variety of use cases: from generating alerts on poor experience to content recommendations based on user behavior. We also show you how this real-time data can be archived in a data lake and further used to generate reports of aggregate performance and experience across a number of dimensions.

ADT401 – Real-Time Web Analytics with Amazon Kinesis Data Analytics

Knowing what users are doing on your websites in real time provides insights you can act on without waiting for delayed batch processing of clickstream data. Watching the immediate impact on user behavior after new releases, detecting and responding to anomalies, situational awareness, and evaluating trends are all benefits of real-time website analytics. In this workshop, we build a cost-optimized platform to capture web beacon traffic, analyze it for interesting metrics, and display it on a customized dashboard. We start by deploying the Web Analytics Solution Accelerator, then once the core is complete, we extend their solution to capture new and interesting metrics, process those with Amazon Kinesis Data Analytics, and display new graphs on their custom dashboard. Participants come away with a fully functional system for capturing, analyzing, and displaying valuable website metrics in real time.

GAM305 – Dynamic Encounters for Veteran Players Using Machine Learning

Are you trying to keep your game fresh for long-time players but don’t have the resources to keep building new handcrafted content? Join this session to learn how to launch dynamic content for groups of players without relying on static techniques like instancing or spawn points. We dive into Amazon Kinesis and Amazon Kinesis Data Analytics for real-time data collection and hotspot detection, and machine learning for encounter-building based on observed player behavior.

Conclusion

We look forward to seeing you at AWS re:Invent 2018 in Las Vegas. In addition to the sessions described in this blog post, please stop by the Analytics booth during Expo hours to learn more about Amazon Kinesis.

 


About the Author

Larry Heathcote is a Principal Product Marketing Manager at Amazon Web Services. Larry is passionate about seeing the results of data-driven insights on business outcomes. He enjoys family time, home projects, grilling out, and classic barbeque.

 

 

Amazon Redshift – 2017 Recap

Post Syndicated from Larry Heathcote original https://aws.amazon.com/blogs/big-data/amazon-redshift-2017-recap/

We have been busy adding new features and capabilities to Amazon Redshift, and we wanted to give you a glimpse of what we’ve been doing over the past year. In this article, we recap a few of our enhancements and provide a set of resources that you can use to learn more and get the most out of your Amazon Redshift implementation.

In 2017, we made more than 30 announcements about Amazon Redshift. We listened to you, our customers, and delivered Redshift Spectrum, a feature of Amazon Redshift, that gives you the ability to extend analytics to your data lake—without moving data. We launched new DC2 nodes, doubling performance at the same price. We also announced many new features that provide greater scalability, better performance, more automation, and easier ways to manage your analytics workloads.

To see a full list of our launches, visit our what’s new page—and be sure to subscribe to our RSS feed.

Major launches in 2017

Amazon Redshift Spectrumextend analytics to your data lake, without moving data

We launched Amazon Redshift Spectrum to give you the freedom to store data in Amazon S3, in open file formats, and have it available for analytics without the need to load it into your Amazon Redshift cluster. It enables you to easily join datasets across Redshift clusters and S3 to provide unique insights that you would not be able to obtain by querying independent data silos.

With Redshift Spectrum, you can run SQL queries against data in an Amazon S3 data lake as easily as you analyze data stored in Amazon Redshift. And you can do it without loading data or resizing the Amazon Redshift cluster based on growing data volumes. Redshift Spectrum separates compute and storage to meet workload demands for data size, concurrency, and performance. Redshift Spectrum scales processing across thousands of nodes, so results are fast, even with massive datasets and complex queries. You can query open file formats that you already use—such as Apache Avro, CSV, Grok, ORC, Apache Parquet, RCFile, RegexSerDe, SequenceFile, TextFile, and TSV—directly in Amazon S3, without any data movement.

For complex queries, Redshift Spectrum provided a 67 percent performance gain,” said Rafi Ton, CEO, NUVIAD. “Using the Parquet data format, Redshift Spectrum delivered an 80 percent performance improvement. For us, this was substantial.

To learn more about Redshift Spectrum, watch our AWS Summit session Intro to Amazon Redshift Spectrum: Now Query Exabytes of Data in S3, and read our announcement blog post Amazon Redshift Spectrum – Exabyte-Scale In-Place Queries of S3 Data.

DC2 nodes—twice the performance of DC1 at the same price

We launched second-generation Dense Compute (DC2) nodes to provide low latency and high throughput for demanding data warehousing workloads. DC2 nodes feature powerful Intel E5-2686 v4 (Broadwell) CPUs, fast DDR4 memory, and NVMe-based solid state disks (SSDs). We’ve tuned Amazon Redshift to take advantage of the better CPU, network, and disk on DC2 nodes, providing up to twice the performance of DC1 at the same price. Our DC2.8xlarge instances now provide twice the memory per slice of data and an optimized storage layout with 30 percent better storage utilization.

Redshift allows us to quickly spin up clusters and provide our data scientists with a fast and easy method to access data and generate insights,” said Bradley Todd, technology architect at Liberty Mutual. “We saw a 9x reduction in month-end reporting time with Redshift DC2 nodes as compared to DC1.”

Read our customer testimonials to see the performance gains our customers are experiencing with DC2 nodes. To learn more, read our blog post Amazon Redshift Dense Compute (DC2) Nodes Deliver Twice the Performance as DC1 at the Same Price.

Performance enhancements— 3x-5x faster queries

On average, our customers are seeing 3x to 5x performance gains for most of their critical workloads.

We introduced short query acceleration to speed up execution of queries such as reports, dashboards, and interactive analysis. Short query acceleration uses machine learning to predict the execution time of a query, and to move short running queries to an express short query queue for faster processing.

We launched results caching to deliver sub-second response times for queries that are repeated, such as dashboards, visualizations, and those from BI tools. Results caching has an added benefit of freeing up resources to improve the performance of all other queries.

We also introduced late materialization to reduce the amount of data scanned for queries with predicate filters by batching and factoring in the filtering of predicates before fetching data blocks in the next column. For example, if only 10 percent of the table rows satisfy the predicate filters, Amazon Redshift can potentially save 90 percent of the I/O for the remaining columns to improve query performance.

We launched query monitoring rules and pre-defined rule templates. These features make it easier for you to set metrics-based performance boundaries for workload management (WLM) queries, and specify what action to take when a query goes beyond those boundaries. For example, for a queue that’s dedicated to short-running queries, you might create a rule that aborts queries that run for more than 60 seconds. To track poorly designed queries, you might have another rule that logs queries that contain nested loops.

Customer insights

Amazon Redshift and Redshift Spectrum serve customers across a variety of industries and sizes, from startups to large enterprises. Visit our customer page to see the success that customers are having with our recent enhancements. Learn how companies like Liberty Mutual Insurance saw a 9x reduction in month-end reporting time using DC2 nodes. On this page, you can find case studies, videos, and other content that show how our customers are using Amazon Redshift to drive innovation and business results.

In addition, check out these resources to learn about the success our customers are having building out a data warehouse and data lake integration solution with Amazon Redshift:

Partner solutions

You can enhance your Amazon Redshift data warehouse by working with industry-leading experts. Our AWS Partner Network (APN) Partners have certified their solutions to work with Amazon Redshift. They offer software, tools, integration, and consulting services to help you at every step. Visit our Amazon Redshift Partner page and choose an APN Partner. Or, use AWS Marketplace to find and immediately start using third-party software.

To see what our Partners are saying about Amazon Redshift Spectrum and our DC2 nodes mentioned earlier, read these blog posts:

Resources

Blog posts

Visit the AWS Big Data Blog for a list of all Amazon Redshift articles.

YouTube videos

GitHub

Our community of experts contribute on GitHub to provide tips and hints that can help you get the most out of your deployment. Visit GitHub frequently to get the latest technical guidance, code samples, administrative task automation utilities, the analyze & vacuum schema utility, and more.

Customer support

If you are evaluating or considering a proof of concept with Amazon Redshift, or you need assistance migrating your on-premises or other cloud-based data warehouse to Amazon Redshift, our team of product experts and solutions architects can help you with architecting, sizing, and optimizing your data warehouse. Contact us using this support request form, and let us know how we can assist you.

If you are an Amazon Redshift customer, we offer a no-cost health check program. Our team of database engineers and solutions architects give you recommendations for optimizing Amazon Redshift and Amazon Redshift Spectrum for your specific workloads. To learn more, email us at [email protected].

If you have any questions, email us at [email protected].

 


Additional Reading

If you found this post useful, be sure to check out Amazon Redshift Spectrum – Exabyte-Scale In-Place Queries of S3 Data, Using Amazon Redshift for Fast Analytical Reports and How to Migrate Your Oracle Data Warehouse to Amazon Redshift Using AWS SCT and AWS DMS.


About the Author

Larry Heathcote is a Principle Product Marketing Manager at Amazon Web Services for data warehousing and analytics. Larry is passionate about seeing the results of data-driven insights on business outcomes. He enjoys family time, home projects, grilling out and the taste of classic barbeque.