Tag Archives: AWS re:Invent

The security attendee’s guide to AWS re:Invent 2023

2023-10-27 Katie Collins

Post Syndicated from Katie Collins original https://aws.amazon.com/blogs/security/the-security-attendees-guide-to-aws-reinvent-2023/

AWS re:Invent 2023 is fast approaching, and we can’t wait to see you in Las Vegas in November. re:Invent offers you the chance to come together with cloud enthusiasts from around the world to hear the latest cloud industry innovations, meet with Amazon Web Services (AWS) experts, and build connections. This post will highlight key security sessions organized by various themes, so you don’t miss any of the newest and most exciting tech innovations and the sessions where you can learn how to put those innovations into practice.

re:Invent offers a diverse range of content tailored to all personas. Seminar-style content includes breakout sessions and innovation talks, delivered by AWS thought leaders. These are curated to focus on topics most critical to our customers’ businesses and spotlight advancements AWS has enabled for them. For more interactive or hands-on content, check out chalk talks, dev chats, builder sessions, workshops, and code talks.

If you plan to attend re:Invent 2023, and you’re interested in connecting with a security, identity, or compliance product team, reach out to your AWS account team.

Sessions for security leaders

Security leaders are always reinventing, tasked with aligning security goals to business objectives and reducing overall risk to the organization. Attend sessions at re:Invent where you can learn from security leadership and thought leaders on how to empower your teams, build sustainable security culture, and move fast and stay secure in an ever-evolving threat landscape.

INNOVATION TALK

SEC237-INT | Move fast, stay secure: Strategies for the future of security

BREAKOUT SESSIONS

SEC211 | Sustainable security culture: Empower builders for success
SEC216 | The AWS Digital Sovereignty Pledge: Control without compromise
SEC219 | Build secure applications on AWS the well-architected way
SEC236 | The AWS data-driven perspective on threat landscape trends
NET201 | Safeguarding infrastructure from DDoS attacks with AWS edge services

The role of generative AI in security

The swift rise of generative artificial intelligence (generative AI) illustrates the need for security practices to quickly adapt to meet evolving business requirements and drive innovation. In addition to the security Innovation Talk (Move fast, stay secure: Strategies for the future of security), attend sessions where you can learn about how large language models can impact security practices, how security teams can support safer use of this technology in the business, and how generative AI can help organizations move security forward.

BREAKOUT SESSIONS

SEC210 | How security teams can strengthen security using generative AI
SEC214 | Threat modeling your generative AI workload to evaluate security risk

CHALK TALKS

SEC317 | Building secure generative AI applications on AWS
OPN201 | Evolving OSPOs for supply chain security and generative AI
AIM352 | Securely build generative AI apps and control data with Amazon Bedrock

DEV CHAT

COM309 | Shaping the future of security on AWS with generative AI

Architecting and operating container workloads securely

The unique perspectives that drive how system builders and security teams perceive and address system security can present both benefits and obstacles to collaboration within a business. Find out more about how you can bolster your container security through sessions focused on best practices, detecting and patching threats and vulnerabilities in containerized environments, and managing risk across your AWS container workloads.

BREAKOUT SESSIONS

CON325 | Securing containerized workloads on Amazon ECS and AWS Fargate
CON335 | Securing Kubernetes workloads
CON320 | Building for the future with AWS serverless services

CHALK TALKS

SEC332 | Comprehensive vulnerability management across your AWS environments
FSI307 | Best practices for securing containers and being compliant
CON334 | Strategies and best practices for securing containerized environments

WORKSHOP

SEC303 | Container threat detection with AWS security services

BUILDER SESSION

SEC330 | Patch it up: Building a vulnerability management solution

Zero Trust

At AWS, we consider Zero Trust a security model—not a product. Zero Trust requires users and systems to strongly prove their identities and trustworthiness, and enforces fine-grained identity-based authorization rules before allowing access to applications, data, and other systems. It expands authorization decisions to consider factors like the entity’s current state and the environment. Learn more about our approach to Zero Trust in these sessions.

INNOVATION TALK

SEC237-INT | Move fast, stay secure: Strategies for the future of security

CHALK TALKS

WPS304 | Using Zero Trust to reduce security risk for the public sector
OPN308 | Build and operate a Zero Trust Apache Kafka cluster
NET312 | Connecting and securing services with Amazon VPC Lattice
NET315 | Building Zero Trust architectures using AWS Verified Access

WORKSHOPS

SEC302 | Zero Trust architecture for service-to-service workloads

Managing identities and encrypting data

At AWS, security is our top priority. AWS provides you with features and controls to encrypt data at rest, in transit, and in memory. We build features into our services that make it easier to encrypt your data and control user and application access to data. Explore these topics in depth during these sessions.

BREAKOUT SESSIONS

SEC209 | Modernize authorization: Lessons from cryptography and authentication
SEC336 | Spur productivity with options for identity and access
SEC333 | Better together: Using encryption & authorization for data protection

CHALK TALKS

SEC221 | Centrally manage application secrets with AWS Secrets Manager
SEC322 | Integrate apps with Amazon Cognito and Amazon Verified Permissions
SEC223 | Optimize your workforce identity strategy from top to bottom

WORKSHOPS

SEC247 | Practical data protection and risk assessment for sensitive workloads
SEC203 | Refining IAM permissions like an expert

For a full view of security content, including hands-on learning and interactive sessions, explore the AWS re:Invent catalog and under Topic, filter on Security, Compliance, & Identity. Not able to attend in-person? Livestream keynotes and leadership sessions for free by registering for the virtual-only pass!

If you have feedback about this post, submit comments in the Comments section below. If you have questions about this post, contact AWS Support.

Want more AWS Security news? Follow us on Twitter.

AWS Security Profile: Liam Wadman, Senior Solutions Architect, AWS Identity

2023-10-20 Maddie Bacon

Post Syndicated from Maddie Bacon original https://aws.amazon.com/blogs/security/aws-security-profile-liam-wadman-sr-solutions-architect-aws-identity/

In the AWS Security Profile series, I interview some of the humans who work in AWS Security and help keep our customers safe and secure. In this profile, I interviewed Liam Wadman, Senior Solutions Architect for AWS Identity.

Pictured: Liam making quick informed decisions about risk and reward

How long have you been at AWS and what do you do in your current role?

My first day was 1607328000 — for those who don’t speak fluent UTC, that’s December 2020. I’m a member of the Identity Solutions team. Our mission is to make it simpler for customers to implement access controls that protect their data in a straightforward and consistent manner across AWS services.

I spend a lot of time talking with security, identity, and cloud teams at some of our largest and most complex customers, understanding their problems, and working with teams across AWS to make sure that we’re building solutions that meet their diverse security requirements.

I’m a big fan of working with customers and fellow Amazonians on threat modeling and helping them make informed decisions about risks and the controls they put in place. It’s such a productive exercise because many people don’t have that clear model about what they’re protecting, and what they’re protecting it from.

When I work with AWS service teams, I advocate for making services that are simple to secure and simple for customers to configure. It’s not enough to offer only good security controls; the service should be simple to understand and straightforward to apply to meet customer expectations.

How did you get started in security? What about it piqued your interest?

I got started in security at a very young age: by circumventing network controls at my high school so that I could play Flash games circa 2004. Ever since then, I’ve had a passion for deeply understanding a system’s rules and how they can be bent or broken. I’ve been lucky enough to have a diverse set of experiences throughout my career, including working in a network operation center, security operation center, Linux and windows server administration, telephony, investigations, content delivery, perimeter security, and security architecture. I think having such a broad base of experience allows me to empathize with all the different people who are AWS customers on a day-to-day basis.

As I progressed through my career, I became very interested in the psychology of security and the mindsets of defenders, unauthorized users, and operators of computer systems. Security is about so much more than technology—it starts with people and processes.

How do you explain your job to non-technical friends and family?

I get to practice this question a lot! Very few of my family and friends work in tech.

I always start with something relatable to the person. I start with a website, mobile app, or product that they use, tell the story of how it uses AWS, then tie that in around how my team works to support many of the products they use in their everyday lives. You don’t have to look far into our customer success stories or AWS re:Invent presentations to see a product or company that’s meaningful to almost anyone you’d talk to.

I got to practice this very recently because the software used by my personal trainer is hosted on AWS. So when she asked what I actually do for a living, I was ready for her.

In your opinion, what’s the coolest thing happening in identity right now?

You left this question wide open, so I’m going to give you more than one answer.

First, outside of AWS, it’s the rise of ubiquitous, easy-to-use personal identity technology. I’m talking about products such as password managers, sign-in with Google or Apple, and passkeys. I’m excited to see the industry is finally offering services to consumers at no extra cost that you don’t need to be an expert to use and that will work on almost any device you sign in to. Everyday people can benefit from their use, and I have successfully converted many of the people I care about.

At AWS, it’s the work that we’re doing to enable data perimeters and provable security. We hear quite regularly from customers that data perimeters are super important to them, and they want to see us do more in that space and keep refining that journey. I’m all too happy to oblige. Provable security, while identity adjacent, is about getting real answers to questions such as “Can this resource be accessed publicly?” It’s making it simple for customers who don’t want to spend the time or money building the operational expertise to answer tough questions, and I think that’s incredible.

You presented at AWS re:Inforce 2023. What was your session about and what do you hope attendees took away from it?

My session was IAM336: Best practices for delegating access on IAM. I initially delivered this session at re:Inforce 2022, where customers gave it the highest overall rating for an identity session, so we brought it back for 2023!

The talk dives deep into some AWS Identity and Access Management (IAM) primitives and provides a lot of candor on what we feel are best practices based on many of the real-world engagements I’ve had with customers. The top thing that I hope attendees learned is how they can safely empower their developers to have some self service and autonomy when working with IAM and help transform central teams from blockers to enablers.

I’m also presenting at re:Invent 2023 in November. I’ll be doing a chalk talk called Best practices for setting up AWS Organizations policies. We’re targeting it towards a more general audience, not just customers whose primary jobs are AWS security or identity. I’m excited about this presentation because I usually talk to a lot of customers who have very mature security and identity practices, and this is a great chance to get feedback from customers who do not.

I’d like to thank all the customers who attended the sessions over the years — the best part of AWS events is the customer interactions and fantastic discussions that we have.

Is there anything you wish customers would ask about more often?

I wish more customers would frame their problems within a threat model. Many customer engagements start with a specific problem, but it isn’t in the context of the risk this poses to their business, and often focuses too much on specific technical controls for very specific issues, rather than an outcome that they’re trying to arrive at or a risk that they’re trying to mitigate. I like to take a step back and work with the customer to frame the problem that they’re talking about in a bigger picture, then have a more productive conversation around how we can mitigate these risks and other considerations that they may not have thought of.

Where do you see the identity space heading in the future?

I think the industry is really getting ready for an identity renaissance as we start shifting towards more modern and Zero Trust architectures. I’m really excited to start seeing adoption of technologies such as token exchange to help applications avoid impersonating users to downstream systems, or mechanisms such as proof of possession to provide scalable ways to bind a given credential to a system that it’s intended to be used from.

On the AWS Identity side: More controls. Simpler. Scalable. Provable.

What are you most proud of in your career?

Getting involved with speaking at AWS: presenting at summits, re:Inforce, and re:Invent. It’s something I never would have seen myself doing before. I grew up with a pretty bad speech impediment that I’m always working against.

I think my proudest moment in particular is when I had customers come to my re:Invent session because they saw me at AWS Summits earlier in the year and liked what I did there. I get a little emotional thinking about it.

Being a speaker also allowed me to go to Disneyland for the first time last year before the Anaheim Summit, and that would have made 5-year-old Liam proud.

If you had to pick a career outside of tech, what would you want to do?

I think I’d certainly be involved in something in forestry, resource management, or conservation. I spend most of my free time in the forests of British Columbia. I’m a big believer in shinrin-yoku, and I believe in being a good steward of the land. We’ve only got one earth.

If you have feedback about this post, submit comments in the Comments section below. If you have questions about this post, contact AWS Support.

Want more AWS Security news? Follow us on Twitter.

Architecting for Sustainability at AWS re:Invent 2022

2023-02-24 Thomas Burns

Post Syndicated from Thomas Burns original https://aws.amazon.com/blogs/architecture/architecting-for-sustainability-at-aws-reinvent-2022/

AWS re:Invent 2022 featured 24 breakout sessions, chalk talks, and workshops on sustainability. In this blog post, we’ll highlight the sessions and announcements and discuss their relevance to the sustainability of, in, and through the cloud.

First, we’ll look at AWS’ initiatives and progress toward delivering efficient, shared infrastructure, water stewardship, and sourcing renewable power.

We’ll then summarize breakout sessions featuring AWS customers who are demonstrating the best practices from the AWS Well-Architected Framework Sustainability Pillar.

Lastly, we’ll highlight use cases presented by customers who are solving sustainability challenges through the cloud.

Sustainability of the cloud

The re:Invent 2022 Sustainability in AWS global infrastructure (SUS204) session is a deep dive on AWS’ initiatives to optimize data centers to minimize their environmental impact. These increases in efficiency provide carbon reduction opportunities to customers who migrate workloads to the cloud. Amazon’s progress includes:

Amazon is on path to power its operations with 100% renewable energy by 2025, five years ahead of the original target of 2030.
Amazon is the largest corporate purchaser of renewable energy with more than 400 projects globally, including recently announced projects in India, Canada, and Singapore. Once operational, the global renewable energy projects are expected to generate 56,881 gigawatt-hours (GWh) of clean energy each year.

At re:Invent, AWS announced that it will become water positive (Water+) by 2030. This means that AWS will return more water to communities than it uses in direct operations. This Water stewardship and renewable energy at scale (SUS211) session provides an excellent overview of our commitment. For more details, explore the Water Positive Methodology that governs implementation of AWS’ water positive goal, including the approach and measuring of progress.

Sustainability in the cloud

Independent of AWS efforts to make the cloud more sustainable, customers continue to influence the environmental impact of their workloads through the architectural choices they make. This is what we call sustainability in the cloud.

At re:Invent 2021, AWS launched the sixth pillar of the AWS Well-Architected Framework to explain the concepts, architectural patterns, and best practices to architect sustainably. In 2022, we extended the Sustainability Pillar best practices with a more comprehensive structure of anti-patterns to avoid, expected benefits, and implementation guidance.

Let’s explore sessions that show the Sustainability Pillar in practice. In the session Architecting sustainably and reducing your AWS carbon footprint (SUS205), Elliot Nash, Senior Manager of Software Development at Amazon Prime Video, dives deep on the exclusive streaming of Thursday Night Football on Prime Video. The teams followed the Sustainability Pillar’s improvement process from setting goals to replicating the successes to other teams. Implemented improvements include:

Automation of contingency switches that turn off non-critical customer features under stress to flatten demand peaks
Pre-caching content shown to the whole audience at the end of the game

Amazon Prime Video uses the AWS Customer Carbon Footprint Tool along with sustainability proxy metrics and key performance indicators (KPIs) to quantify and track the effectiveness of optimizations. Example KPIs are normalized Amazon Elastic Compute Cloud (Amazon EC2) instance hours per page impression or infrastructure cost per concurrent stream.

Another example of sustainability KPIs was presented in the Build a cost-, energy-, and resource-efficient compute environment (CMP204) session by Troy Gasaway, Vice President of Infrastructure and Engineering at Arm—a global semiconductor industry leader. Troy’s team wanted to measure, track, and reduce the impact of Electronic Design Automation (EDA) jobs. They used Amazon EC2 instances’ vCPU hours to calculate KPIs for Amazon EC2 Spot adoption, AWS Graviton adoption, and the resources needed per job.

The Sustainability Pillar recommends selecting Amazon EC2 instance types with the least impact and taking advantage of those designed to support specific workloads. The Sustainability and AWS silicon (SUS206) session gives an overview of the embodied carbon and energy consumption of silicon devices. The session highlights examples in which AWS silicon reduced the power consumption for machine learning (ML) inference with AWS Inferentia by 92 percent, and model training with AWS Trainium by 52 percent. Two effects contributed to the reduction in power consumption:

Purpose-built processors use less energy for the job
Due to better performance fewer instances are needed

David Chaiken, Chief Architect at Pinterest, shared Pinterest’s sustainability journey and how they complemented a rigid cost and usage management for ML workloads with data from the AWS Customer Carbon Footprint Tool, as in the figure below.

Figure 1. David Chaiken, Chief Architect at Pinterest, describes Pinterest’s sustainability journey with AWS

AWS announced the preview of a new generation of AWS Inferentia with the Inf2 instances, and C7gn instances. C7gn instances utilize the fifth generation of AWS Nitro cards. AWS Nitro offloads the host CPU to specialized hardware for a more consistent performance with lower CPU utilization. The new Nitro cards offer 40 percent better performance per watt than the previous generation.

Another best practice from the Sustainability Pillar is to use managed services. AWS is responsible for a large share of the optimization for resource efficiency for AWS managed services. We want to highlight the launch of AWS Verified Access. Traditionally, customers protect internal services from unauthorized access by placing resources into private subnets accessible through a Virtual Private Network (VPN). This often involves dedicated on-premises infrastructure that is provisioned to handle peak network usage of the staff. AWS Verified Access removes the need for a VPN. It shifts the responsibility for managing the hardware to securely access corporate applications to AWS and even improves your security posture. The service is built on AWS Zero Trust guiding principles and validates each application request before granting access. Explore the Introducing AWS Verified Access: Secure connections to your apps (NET214) session for demos and more.

In the session Provision and scale OpenSearch resources with serverless (ANT221) we announced the availability of Amazon OpenSearch Serverless. By decoupling compute and storage, OpenSearch Serverless scales resources in and out for both indexing and searching independently. This feature supports two key sustainability in the cloud design principles from the Sustainability Pillar out of the box:

Maximizing utilization
Scaling the infrastructure with user load

Sustainability through the cloud

Sustainability challenges are data problems that can be solved through the cloud with big data, analytics, and ML.

According to one study by PEDCA research, data centers in the EU consume approximately 3 percent of the EU’s energy generated. While it’s important to optimize IT for sustainability, we must also pay attention to reducing the other 97 percent of energy usage.

The session Serve your customers better with AWS Supply Chain (BIZ213) introduces AWS Supply Chain that generates insights into the data from your suppliers and your network to forecast and mitigate inventory risks. This service provides recommendations for stock rebalancing scored by distance to move inventory, risks, and also an estimation of the carbon emission impact.

The Easily build, train, and deploy ML models using geospatial data (AIM218) session introduces new Amazon SageMaker geospatial capabilities to analyze satellite images for forest density and land use changes and observe supply chain impacts. The AWS Solutions Library contains dedicated Guidance for Geospatial Insights for Sustainability on AWS with example code.

Some other examples for driving sustainability through the cloud as covered at re:Invent 2022 include these sessions:

SUS208: Utilizing sustainability data at scale
SUS210: Modeling climate change impacts and risks at scale
SUS212: Accelerating decarbonization and sustainability transformation
SUS301: Sustainable machine learning for protecting natural resources
SUS312: How innovators are driving more sustainable manufacturing
STP213: Scaling global carbon footprint management

Conclusion

We recommend revisiting the talks highlighted in this post to learn how you can utilize AWS to enhance your sustainability strategy. You can find all videos from the AWS re:Invent 2022 sustainability track in the Customer Enablement playlist. If you’d like to optimize your workloads on AWS for sustainability, visit the AWS Well-Architected Sustainability Pillar.

Let’s Architect! Designing event-driven architectures

2023-01-25 Luca Mezzalira

Post Syndicated from Luca Mezzalira original https://aws.amazon.com/blogs/architecture/lets-architect-designing-event-driven-architectures/

During the design of distributed systems, we have to identify a communication strategy to exchange information between different services while keeping the evolutionary nature of the architecture in mind. Event-driven architectures are based on events (facts that happened in a system), which are asynchronously exchanged to implement communication across different services while having a high degree of decoupling. This paradigm also allows us to run code in response to events, with benefits like cost optimization and sustainability for the entire infrastructure.

In this edition of Let’s Architect!, we share architectural resources to introduce event-driven architectures, how to build them on AWS, and how to approach the design phase.

AWS re:Invent 2022 – Keynote with Dr. Werner Vogels

re:Invent 2022 may be finished, but the keynote given by Amazon’s Chief Technology Officer, Dr. Werner Vogels, will not be forgotten. Vogels not only covered the announcements of new services but also event-driven architecture foundations in conjunction with customers’ stories on how this architecture helped to improve their systems.

Take me to this re:Invent 2022 video!

Dr. Werner Vogels presenting an example of architecture where Amazon EventBridge is used as event bus

Benefits of migrating to event-driven architecture

In this blog post, we enumerate clearly and concisely the benefits of event-driven architectures, such as scalability, fault tolerance, and developer velocity. This is a great post to start your journey into the event-driven architecture style, as it explains the difference from request-response architecture.

Take me to this Compute Blog post!

Two common options when building applications are request-response and event-driven architectures

Building next-gen applications with event-driven architectures

When we build distributed systems or migrate from a monolithic to a microservices architecture, we need to identify a communication strategy to integrate the different services. Teams who are building microservices often find that integration with other applications and external services can make their workloads tightly coupled.

In this re:Invent 2022 video, you learn how to use event-driven architectures to decouple and decentralize application components through asynchronous communication. The video introduces the differences between synchronous and asynchronous communications before drilling down into some key concepts for designing and building event-driven architectures on AWS.

Take me to this re:Invent 2022 video!

How to use choreography to exchange information across services plus implement orchestration for managing operations within the service boundaries

Designing events

When starting on the journey to event-driven architectures, a common challenge is how to design events: “how much data should an event contain?” is a typical first question we encounter.

In this pragmatic post, you can explore the different types of events, watch a video that explains even further how to use event-driven architectures, and also go through the new event-driven architecture section of serverlessland.com.

Take me to Serverless Land!

An example of events with sparse and full state description

See you next time!

Thanks for reading our first blog of 2023! Join us next time, when we’ll talk about architecture and sustainability.

To find all the blogs from this series, visit the Let’s Architect! section of the AWS Architecture Blog.

Three key security themes from AWS re:Invent 2022

2023-01-13 Anne Grahn

Post Syndicated from Anne Grahn original https://aws.amazon.com/blogs/security/three-key-security-themes-from-aws-reinvent-2022/

AWS re:Invent returned to Las Vegas, Nevada, November 28 to December 2, 2022. After a virtual event in 2020 and a hybrid 2021 edition, spirits were high as over 51,000 in-person attendees returned to network and learn about the latest AWS innovations.

Now in its 11^th year, the conference featured 5 keynotes, 22 leadership sessions, and more than 2,200 breakout sessions and hands-on labs at 6 venues over 5 days.

With well over 100 service and feature announcements—and innumerable best practices shared by AWS executives, customers, and partners—distilling highlights is a challenge. From a security perspective, three key themes emerged.

Turn data into actionable insights

Security teams are always looking for ways to increase visibility into their security posture and uncover patterns to make more informed decisions. However, as AWS Vice President of Data and Machine Learning, Swami Sivasubramanian, pointed out during his keynote, data often exists in silos; it isn’t always easy to analyze or visualize, which can make it hard to identify correlations that spark new ideas.

“Data is the genesis for modern invention.” – Swami Sivasubramanian, AWS VP of Data and Machine Learning

At AWS re:Invent, we launched new features and services that make it simpler for security teams to store and act on data. One such service is Amazon Security Lake, which brings together security data from cloud, on-premises, and custom sources in a purpose-built data lake stored in your account. The service, which is now in preview, automates the sourcing, aggregation, normalization, enrichment, and management of security-related data across an entire organization for more efficient storage and query performance. It empowers you to use the security analytics solutions of your choice, while retaining control and ownership of your security data.

Amazon Security Lake has adopted the Open Cybersecurity Schema Framework (OCSF), which AWS cofounded with a number of organizations in the cybersecurity industry. The OCSF helps standardize and combine security data from a wide range of security products and services, so that it can be shared and ingested by analytics tools. More than 37 AWS security partners have announced integrations with Amazon Security Lake, enhancing its ability to transform security data into a powerful engine that helps drive business decisions and reduce risk. With Amazon Security Lake, analysts and engineers can gain actionable insights from a broad range of security data and improve threat detection, investigation, and incident response processes.

Strengthen security programs

According to Gartner, by 2026, at least 50% of C-Level executives will have performance requirements related to cybersecurity risk built into their employment contracts. Security is top of mind for organizations across the globe, and as AWS CISO CJ Moses emphasized during his leadership session, we are continuously building new capabilities to help our customers meet security, risk, and compliance goals.

In addition to Amazon Security Lake, several new AWS services announced during the conference are designed to make it simpler for builders and security teams to improve their security posture in multiple areas.

Identity and networking

Authorization is a key component of applications. Amazon Verified Permissions is a scalable, fine-grained permissions management and authorization service for custom applications that simplifies policy-based access for developers and centralizes access governance. The new service gives developers a simple-to-use policy and schema management system to define and manage authorization models. The policy-based authorization system that Amazon Verified Permissions offers can shorten development cycles by months, provide a consistent user experience across applications, and facilitate integrated auditing to support stringent compliance and regulatory requirements.

Additional services that make it simpler to define authorization and service communication include Amazon VPC Lattice, an application-layer service that consistently connects, monitors, and secures communications between your services, and AWS Verified Access, which provides secure access to corporate applications without a virtual private network (VPN).

Threat detection and monitoring

Monitoring for malicious activity and anomalous behavior just got simpler. Amazon GuardDuty RDS Protection expands the threat detection capabilities of GuardDuty by using tailored machine learning (ML) models to detect suspicious logins to Amazon Aurora databases. You can enable the feature with a single click in the GuardDuty console, with no agents to manually deploy, no data sources to enable, and no permissions to configure. When RDS Protection detects a potentially suspicious or anomalous login attempt that indicates a threat to your database instance, GuardDuty generates a new finding with details about the potentially compromised database instance. You can view GuardDuty findings in AWS Security Hub, Amazon Detective (if enabled), and Amazon EventBridge, allowing for integration with existing security event management or workflow systems.

To bolster vulnerability management processes, Amazon Inspector now supports AWS Lambda functions, adding automated vulnerability assessments for serverless compute workloads. With this expanded capability, Amazon Inspector automatically discovers eligible Lambda functions and identifies software vulnerabilities in application package dependencies used in the Lambda function code. Actionable security findings are aggregated in the Amazon Inspector console, and pushed to Security Hub and EventBridge to automate workflows.

Data protection and privacy

The first step to protecting data is to find it. Amazon Macie now automatically discovers sensitive data, providing continual, cost-effective, organization-wide visibility into where sensitive data resides across your Amazon Simple Storage Service (Amazon S3) estate. With this new capability, Macie automatically and intelligently samples and analyzes objects across your S3 buckets, inspecting them for sensitive data such as personally identifiable information (PII), financial data, and AWS credentials. Macie then builds and maintains an interactive data map of your sensitive data in S3 across your accounts and Regions, and provides a sensitivity score for each bucket. This helps you identify and remediate data security risks without manual configuration and reduce monitoring and remediation costs.

Encryption is a critical tool for protecting data and building customer trust. The launch of the end-to-end encrypted enterprise communication service AWS Wickr offers advanced security and administrative controls that can help you protect sensitive messages and files from unauthorized access, while working to meet data retention requirements.

Management and governance

Maintaining compliance with regulatory, security, and operational best practices as you provision cloud resources is key. AWS Config rules, which evaluate the configuration of your resources, have now been extended to support proactive mode, so that they can be incorporated into infrastructure-as-code continuous integration and continuous delivery (CI/CD) pipelines to help identify noncompliant resources prior to provisioning. This can significantly reduce time spent on remediation.

Managing the controls needed to meet your security objectives and comply with frameworks and standards can be challenging. To make it simpler, we launched comprehensive controls management with AWS Control Tower. You can use it to apply managed preventative, detective, and proactive controls to accounts and organizational units (OUs) by service, control objective, or compliance framework. You can also use AWS Control Tower to turn on Security Hub detective controls across accounts in an OU. This new set of features reduces the time that it takes to define and manage the controls required to meet specific objectives, such as supporting the principle of least privilege, restricting network access, and enforcing data encryption.

Do more with less

As we work through macroeconomic conditions, security leaders are facing increased budgetary pressures. In his opening keynote, AWS CEO Adam Selipsky emphasized the effects of the pandemic, inflation, supply chain disruption, energy prices, and geopolitical events that continue to impact organizations.

Now more than ever, it is important to maintain your security posture despite resource constraints. Citing specific customer examples, Selipsky underscored how the AWS Cloud can help organizations move faster and more securely. By moving to the cloud, agricultural machinery manufacturer Agco reduced costs by 78% while increasing data retrieval speed, and multinational HVAC provider Carrier Global experienced a 40% reduction in the cost of running mission-critical ERP systems.

“If you’re looking to tighten your belt, the cloud is the place to do it.” – Adam Selipsky, AWS CEO

Security teams can do more with less by maximizing the value of existing controls, and bolstering security monitoring and analytics capabilities. Services and features announced during AWS re:Invent—including Amazon Security Lake, sensitive data discovery with Amazon Macie, support for Lambda functions in Amazon Inspector, Amazon GuardDuty RDS Protection, and more—can help you get more out of the cloud and address evolving challenges, no matter the economic climate.

Security is our top priority

AWS re:Invent featured many more highlights on a variety of topics, such as Amazon EventBridge Pipes and the pre-announcement of GuardDuty EKS Runtime protection, as well as Amazon CTO Dr. Werner Vogels’ keynote, and the security partnerships showcased on the Expo floor. It was a whirlwind week, but one thing is clear: AWS is working harder than ever to make our services better and to collaborate on solutions that ease the path to proactive security, so that you can focus on what matters most—your business.

For more security-related announcements and on-demand sessions, see A recap for security, identity, and compliance sessions at AWS re:Invent 2022 and the AWS re:Invent Security, Identity, and Compliance playlist on YouTube.

If you have feedback about this post, submit comments in the Comments section below.

Recap to security, identity, and compliance sessions at AWS re:Invent 2022

2023-01-13 Katie Collins

Post Syndicated from Katie Collins original https://aws.amazon.com/blogs/security/recap-to-security-identity-and-compliance-sessions-at-aws-reinvent-2022/

AWS re:Invent returned to Las Vegas, NV, in November 2022. The conference featured over 2,200 sessions and hands-on labs and more than 51,000 attendees over 5 days. If you weren’t able to join us in person, or just want to revisit some of the security, identity, and compliance announcements and on-demand sessions, this blog post is for you.

re:Invent 2022

Key announcements

Here are some of the security announcements that we made at AWS re:Invent 2022.

We announced the preview of a new service, Amazon Security Lake. Amazon Security Lake automatically centralizes security data from cloud, on-premises, and custom sources into a purpose-built data lake stored in your AWS account. Security Lake makes it simpler to analyze security data so that you can get a more complete understanding of security across your entire organization. You can also improve the protection of your workloads, applications, and data. Security Lake automatically gathers and manages your security data across accounts and AWS Regions.
We introduced the AWS Digital Sovereignty Pledge—our commitment to offering the most advanced set of sovereignty controls and features available in the cloud. As part of this pledge, we launched a new feature of AWS Key Management Service, External Key Store (XKS), where you can use your own encryption keys stored outside of the AWS Cloud to protect data on AWS.
To help you with the building blocks for zero trust, we introduced two new services:
- AWS Verified Access provides secure access to corporate applications without a VPN. Verified Access verifies each access request in real time and only connects users to the applications that they are allowed to access, removing broad access to corporate applications and reducing the associated risks.
- Amazon Verified Permissions is a scalable, fine-grained permissions management and authorization service for custom applications. Using the Cedar policy language, Amazon Verified Permissions centralizes fine-grained permissions for custom applications and helps developers authorize user actions in applications.
We announced Automated sensitive data discovery for Amazon Macie. This new capability helps you gain visibility into where your sensitive data resides on Amazon Simple Storage Service (Amazon S3) at a fraction of the cost of running a full data inspection across all your S3 buckets. Automated sensitive data discovery automates the continual discovery of sensitive data and potential data security risks across your S3 storage aggregated at the AWS Organizations level.
Amazon Inspector now supports AWS Lambda functions, adding continual, automated vulnerability assessments for serverless compute workloads. Amazon Inspector automatically discovers eligible AWS Lambda functions and identifies software vulnerabilities in application package dependencies used in the Lambda function code. The functions are initially assessed upon deployment to Lambda and continually monitored and reassessed, informed by updates to the function and newly published vulnerabilities. When vulnerabilities are identified, actionable security findings are generated, aggregated in Amazon Inspector, and pushed to Security Hub and Amazon EventBridge to automate workflows.
Amazon GuardDuty now offers threat detection for Amazon Aurora to identify potential threats to data stored in Aurora databases. Currently in preview, Amazon GuardDuty RDS Protection profiles and monitors access activity to existing and new databases in your account, and uses tailored machine learning models to detect suspicious logins to Aurora databases. When a potential threat is detected, GuardDuty generates a security finding that includes database details and contextual information on the suspicious activity. GuardDuty is integrated with Aurora for direct access to database events without requiring you to modify your databases.
AWS Security Hub is now integrated with AWS Control Tower, allowing you to pair Security Hub detective controls with AWS Control Tower proactive or preventive controls and manage them together using AWS Control Tower. Security Hub controls are mapped to related control objectives in the AWS Control Tower control library, providing you with a holistic view of the controls required to meet a specific control objective. This combination of over 160 detective controls from Security Hub, with the AWS Control Tower built-in automations for multi-account environments, gives you a strong baseline of governance and off-the-shelf controls to scale your business using new AWS workloads and services. This combination of controls also helps you monitor whether your multi-account AWS environment is secure and managed in accordance with best practices, such as the AWS Foundational Security Best Practices standard.
We launched our Cloud Audit Academy (CAA) course for Federal and DoD Workloads (FDW) on AWS. This new course is a 12-hour interactive training based on NIST SP 800-171, with mappings to NIST SP 800-53 and the Cybersecurity Maturity Model Certification (CMMC) and covers AWS services relevant to each NIST control family. This virtual instructor-led training is industry- and framework-specific for our U.S. Federal and DoD customers.
AWS Wickr allows businesses and public sector organizations to collaborate more securely, while retaining data to help meet requirements such as e-discovery and Freedom of Information Act (FOIA) requests. AWS Wickr is an end-to-end encrypted enterprise communications service that facilitates one-to-one chats, group messaging, voice and video calling, file sharing, screen sharing, and more.
We introduced the Post-Quantum Cryptography hub that aggregates resources and showcases AWS research and engineering efforts focused on providing cryptographic security for our customers, and how AWS interfaces with the global cryptographic community.

Watch on demand

Were you unable to join the event in person? See the following for on-demand sessions.

Keynotes and leadership sessions

Watch the AWS re:Invent 2022 keynote where AWS Chief Executive Officer Adam Selipsky shares best practices for managing security, compliance, identity, and privacy in the cloud. You can also replay the other AWS re:Invent 2022 keynotes.

To learn about the latest innovations in cloud security from AWS and what you can do to foster a culture of security in your business, watch AWS Chief Information Security Officer CJ Moses’s leadership session with guest Deneen DeFiore, Chief Information Security Officer at United Airlines.

Breakout sessions and new launch talks

You can watch talks and learning sessions on demand to learn about the following topics:

See how AWS, customers, and partners work together to raise their security posture with AWS infrastructure and services. Learn about trends in identity and access management, threat detection and incident response, network and infrastructure security, data protection and privacy, and governance, risk, and compliance.
Dive into our launches! Hear from security experts on recent announcements. Learn how new services and solutions can help you meet core security and compliance requirements.

Consider joining us for more in-person security learning opportunities by saving the date for AWS re:Inforce 2023, which will be held June 13-14 in Anaheim, California. We look forward to seeing you there!

If you’d like to discuss how these new announcements can help your organization improve its security posture, AWS is here to help. Contact your AWS account team today.

If you have feedback about this post, submit comments in the Comments section below. If you have questions about this post, contact AWS Support.

Want more AWS Security news? Follow us on Twitter.

Amazon QuickSight AWS re:Invent recap 2022

2023-01-12 Mia Heard

Post Syndicated from Mia Heard original https://aws.amazon.com/blogs/big-data/amazon-quicksight-aws-reinvent-recap-2022/

AWS re:Invent is a learning conference hosted by AWS for the global cloud computing community. Re:Invent was held at the end of 2022 in Las Vegas, Nevada, from November 28 to December 2.

Amazon QuickSight powers data-driven organizations with unified business intelligence (BI) at hyperscale. This post walks you through a full recap of QuickSight at this year’s re:Invent, including key launch announcements, sessions available virtually, and additional resources for continued learning.

Launch announcements

AWS Announces Five New Capabilities for Amazon QuickSight

This press release covers five new QuickSight capabilities to help you streamline business intelligence operations, using the most popular serverless BI service built for the cloud.

New analytical questions available in Amazon QuickSight Q: “Why” and “Forecast”

QuickSight announces support for two new question types that simplify and scale complex analytical tasks using natural language: “forecast” and “why.”

Announcing Automated Data Preparation for Amazon QuickSight Q

Automated data preparation utilizes machine learning (ML) to infer semantic information about data and adds it to datasets as metadata about the columns (fields), making it faster for you to prepare data in order to support natural language questions.

New Amazon QuickSight API Capabilities to Accelerate Your BI Transformation

New QuickSight API capabilities allow programmatic creation and management of dashboards, analysis, and templates.

Create and Share Operational Reports at Scale with Amazon QuickSight Paginated Reports

This feature allows you to create and share highly formatted, personalized reports containing business-critical data to hundreds of thousands of end-users—without any infrastructure setup or maintenance, up-front licensing, or long-term commitments.

Keynotes

Adam Selipsky, Chief Executive Officer of Amazon Web Services

Watch Adam Selipsky, Chief Executive Officer of Amazon Web Services, as he looks at the ways that forward-thinking builders are transforming industries and even our future, powered by AWS. He highlights innovations in data, infrastructure, and more that are helping customers achieve their goals faster, take advantage of untapped potential, and create a better future with AWS.

Swami Sivasubramanian, Vice President of AWS Data and Machine Learning

Watch Swami Sivasubramanian, Vice President of AWS Data and Machine Learning, as he reveals the latest AWS innovations that can help you transform your company’s data into meaningful insights and actions for your business. In this keynote, several speakers discuss the key components of a future-proof data strategy and how to empower your organization to drive the next wave of modern invention with data. Hear from leading AWS customers who are using data to bring new experiences to life for their customers.

Leadership sessions

Unlock the value of your data with AWS analytics

Data fuels digital transformation and drives effective business decisions. To survive in an ever-changing world, organizations are turning to data to derive insights, create new experiences, and reinvent themselves so they can remain relevant today and in the future. AWS offers analytics services that allow organizations to gain faster and deeper insights from all their data. In this session, G2 Krishnamoorthy, VP of AWS Analytics, addresses the current state of analytics on AWS, covers the latest service innovations around data, and highlights customer successes with AWS analytics. Also, learn from organizations like FINRA and more who have turned to AWS for their digital transformation journey.

Reinvent how you derive value from your data with Amazon QuickSight

In this session, learn how you can use AWS-native business analytics to provide your users with ML-powered interactive dashboards, natural language query (NLQ), and embedded analytics to provide insights to users at scale, when and where they need it. Watch this session to also learn more about how Amazon uses QuickSight internally.

Breakout sessions

What’s New with Amazon QuickSight?

This session covers all of QuickSight’s newly launched capabilities, including paginated reporting in the cloud, 1 billion rows of data with SPICE, assets as code, and new Amazon QuickSight Q capabilities, including ML-powered semantic inferencing, forecasting, new question types, and more.

Differentiate your apps with Amazon QuickSight embedded analytics

Watch this session to learn how to enable new monetization opportunities and grow your business with QuickSight embedded analytics. Discover how you can differentiate your end-user experience by embedding data visualizations, dashboards, and ML-powered natural language query into your applications at scale with no infrastructure to manage. Hear from customers Guardian Life and Showpad and learn more about their QuickSight use cases.

Migrate to cloud-native business analytics with Amazon QuickSight

Legacy BI systems can hurt agile decision-making in the modern organization, with expensive licensing, outdated capabilities, and expensive infrastructure management. In this session, discover how migrating your BI to the cloud with cloud-native, fully managed business analytics capabilities from QuickSight can help you overcome these challenges. Learn how you can use QuickSight’s interactive dashboards and reporting capabilities to provide insights to every user in the organization, lowering your costs and enabling better decision-making. Watch this session to also learn more about Siemens’s QuickSight use case.

Get clarity on your data in seconds with Amazon QuickSight Q

Amazon QuickSight Q is an ML–powered natural language capability that empowers business users to ask questions about all of their data using everyday business language and get answers in seconds. Q interprets questions to understand their intent and generates an answer instantly in the form of a visual without requiring authors to create graphics, dashboards, or analyses. In this session, the QuickSight Q team provides an overview and demonstration of Q in action. Watch this session to also learn more about NASDAQ’s QuickSight use case.

Optimize your AWS cost and usage with Cloud Intelligence Dashboards

Do your engineers know how much they’re spending? Do you have insight into the details of your cost and usage on AWS? Are you taking advantage of all your cost optimization opportunities? Attend this session to learn how organizations are using the Cloud Intelligence Dashboards to start their FinOps journeys and create cost-aware cultures within their organizations. Dive deep into specific use cases and learn how you can use these insights to drive and measure your cost optimization efforts. Discover how unit economics, resource-level visibility, and periodic spend updates make it possible for FinOps practitioners, developers, and business executives to come together to make smarter decisions. Watch this session to also learn more about Dolby laboratories’ QuickSight use case.

Useful resources

QuickSight Community – Ask, answer, and learn with others in the QuickSight Community.
QuickSight YouTube Channel – Subscribe to stay up to date on the latest QuickSight workshops, getting started tutorials, and demo videos.
QuickSight DemoCentral – Experience QuickSight first-hand through interactive dashboards and demos.
QuickSight workshops – Enhance your BI skills with self-paced QuickSight workshops.

With the QuickSight re:Invent breakout session recordings and additional resources, we hope that you learn how to dive deeper into your data with QuickSight. For continued learning, check out more information and resources via our website.

About the author

Mia Heard is a product marketing manager for Amazon QuickSight, AWS’ cloud-native, fully managed BI service.

Top analytics announcements of AWS re:Invent 2022

2022-12-19 Gwen Chen

Post Syndicated from Gwen Chen original https://aws.amazon.com/blogs/big-data/top-analytics-announcements-of-aws-reinvent-2022/

Missed AWS re:Invent 2022? We’ve got you covered!

AWS offers the most scalable, highest performing data services to keep up with the growing volume and velocity of data to help organizations to be data-driven in real-time. We help customers unify diverse data sources by investing in a zero ETL future. We provide the industry’s most comprehensive set of capabilities for an end-to-end data strategy for all your workloads. And, our services help you enable end-to-end data governance so your teams are free to move faster with data.

This post walks you through all of the new analytics service launches. You’ll find links to blog posts, announcements, session recordings, and press releases so you can dive deeper into those launches.

For more 2022 re:Invent recaps and ongoing coverage of all the important AWS launches, be sure to stay in touch with us here:

Analytics at re:Invent: session recordings
AWS Databases & Analytics online events: Analytics in 15, fireside chat, deep dive, and roadmap
Top announcements of AWS re:Invent 2022: AWS VP and Chief Evangelist Jeff Barr’s picks for some of the most impactful launches

Press releases

Press release – AWS Announces Five New Database and Analytics Capabilities

This press release covers five new database and analytics capabilities that make it faster and easier for you to manage and analyze data at petabyte scale—Amazon DocumentDB Elastic Clusters, Amazon OpenSearch Serverless, Amazon Athena for Apache Spark, AWS Glue Data Quality, and Amazon Redshift Multi-AZ.

Press release – AWS Announced Five New Capabilities for Amazon QuickSight

This press release covers five new capabilities for Amazon QuickSight, the most popular serverless BI service built for the cloud. These new capabilities will help customers streamline business intelligence operations.

Press release – AWS Announced Two New Capabilities to Move Toward a Zero-ETL Future on AWS

This press release covers two new integrations that make it easier for customers to connect and analyze data across data stores without having to move data between services.

Press release – AWS Announced Amazon DataZone

Amazon DataZone makes it faster and easier for customers to catalog, discover, share, and govern data stored across AWS, on premises, and on third-party sources. To learn more about Amazon DataZone, please visit the product page or watch the re:Invent session recording.

Keynotes and leadership sessions

Adam Selipsky, Chief Executive Officer of Amazon Web Services, highlighted innovations in data, infrastructure, and more that are helping customers achieve their goals faster, take advantage of untapped potential, and create a better future with AWS. The analytics new launches Adam mentioned include Amazon OpenSearch Serverless, Amazon DataZone, and Amazon Aurora zero-ETL integration with Amazon Redshift.

Swami Sivasubramanian, Vice President of AWS Data and Machine Learning, revealed the latest AWS innovations that can help you transform your company’s data into meaningful insights and actions for your business. In this keynote, Swami launched Amazon Athena for Apache Spark, Amazon Redshift integration for Apache Spark, Amazon Redshift Multi-AZ, AWS Glue Data Quality, and other new AWS capabilities.

G2 Krishnamoorthy, VP of AWS Analytics, covered the latest service innovations around data and also highlighted customer successes with AWS analytics. New analytics capabilities he covered include AWS Glue for Ray, Amazon Redshift Streaming Ingestion, Amazon Aurora zero-ETL to Amazon Redshift, Amazon Redshift integration for Apache Spark, Amazon DataZone, Amazon Athena for Apache Spark, Amazon OpenSearch Serverless, Amazon QuickSight API, and more.

Mai-Lan Tomsen Bukovec, Vice President of AWS Foundational Data Services, and Andy Warfield, AWS Distinguished Engineer, shared the latest AWS storage innovations and an inside look at how customers drive modern business on data lakes and with high-performance data. They also dived deep into technical and organizational strategies that protect with resilience, respond with agility, and fuel innovations with data-driven insights on AWS storage.

Amazon DataZone

To gain value from your data, it needs to be accessible by people and systems that need it for analytics. This session introduces you to Amazon DataZone, a new AWS business data catalog that allows you to unlock data across organizational boundaries with built-in governance.

Amazon Security Lake

Blog – Preview: Amazon Security Lake – A Purpose-Built Customer-Owned Data Lake Service
This new service automatically centralizes your organization’s security data from cloud and on-premises sources into a purpose-built data lake stored in your account.

Amazon Redshift

Blog – New for Amazon Redshift – Zero-ETL integration, simplified data ingestion techniques, security and reliability features.
This year at re:Invent, Amazon Redshift has announced a number of features to help you simplify data ingestion and get to insights easily and quickly within a secure, reliable environment.
Blog – New for Amazon Redshift – General Availability of Streaming Ingestion for Kinesis Data Streams and Managed Streaming for Apache Kafka
Amazon Redshift Streaming Ingestion ingests hundreds of megabytes of data per second so you can query data in near real time. You can connect to multiple Kinesis Data Streams or Amazon Managed Streaming for Apache Kafka data streams and pull data directly to Amazon Redshift without staging data in Amazon S3.
Blog – New – Amazon Redshift Integration with Apache Spark
This new release makes it easy to build and run Spark applications on Amazon Redshift and Redshift Serverless, enabling you to open up the data warehouse for a broader set of AWS analytics and machine learning (ML) solutions.
Blog – Simplify data ingestion from Amazon S3 to Amazon Redshift using auto-copy
This feature simplifies data loading from Amazon S3 into Amazon Redshift. You can now set up continuous file ingestion rules to track your Amazon S3 paths and automatically load new files without needing additional tools or custom solutions.
Blog – Centrally manage access and permissions for Amazon Redshift data sharing with AWS Lake Formation
This feature enables you to centrally manage access to your Amazon Redshift datashares using Lake Formation. It opens up new design patterns and broadens governance and security posture across data warehouses.
Blog – Simplify data loading on the Amazon Redshift console with Informatica Data Loader
You need to bring data quickly and at scale from various data stores. You also need a simple, easy, and cloud-native solution to quickly onboard new data sources or to analyze recent data for actionable insights. With this feature, you can securely connect and load data to Amazon Redshift at scale via a simple and guided interface.

Watch the recording to learn about important new features of Amazon Redshift. Learn how Amazon Redshift reinvented data warehousing to help you analyze all your data across data lakes, data warehouses, and databases with the best price performance. In this session, Goldman Sachs shared their Amazon Redshift use case.

Amazon OpenSearch Service

Blog – Run Search and Analytics Workloads without Managing Clusters
This new release provisions and scales resources to deliver fast data ingestion and query responses for even the most demanding and unpredictable workloads, eliminating the need to configure and optimize clusters. Learn more by watching the recording of the re:Invent breakout session Provision & scale OpenSearch resources with serverless. To get started with Amazon OpenSearch Serverless, please join this workshop.

Streaming services

Announcement – Introducing Amazon Managed Streaming for Apache Kafka (MSK) Delivery Partners
The new Amazon MSK Service Delivery specialization for AWS partners helps customers migrate and build real-time streaming analytics solutions with fully managed Apache Kafka.
Announcement – Amazon Kinesis Data Firehose adds support for data stream delivery to Amazon OpenSearch Serverless
With a few clicks, you can easily ingest, transform, and reliably deliver streaming data into an Amazon OpenSearch Serverless without building and managing your own data ingestion and delivery infrastructure.

AWS Glue

Blog – Join the Preview – AWS Glue Data Quality
AWS Glue Data Quality can analyze your tables and recommend a set of rules automatically based on what it finds.
Announcement – Announcing AWS Glue for Ray (Preview)
Data engineers can use AWS Glue for Ray to process large datasets with Python and popular Python libraries.
Blog – New AWS Glue 4.0 – New and Updated Engines, More Data Formats, and More
This version of AWS Glue includes Python 3.10 and Apache Spark 3.3.0, plus native support for the Cloud Shuffle Storage Plugin for Apache Spark. It also includes Pandas support and more.
Announcement – Introducing AWS Glue Delivery
The new AWS Glue Delivery specialization validates AWS Partners with deep expertise and proven success in delivering AWS Glue for data integration, data pipeline, and data catalog use cases.
Announcement – AWS Glue for Apache Spark Native support for Data Lake Frameworks
AWS Glue for Apache Spark now supports three open-source data lake storage frameworks: Apache Hudi, Apache Iceberg, and Linux Foundation Delta Lake.
Announcement – AWS Glue introduces custom visual transforms
AWS Glue now offers custom visual transforms, which let customers define, reuse, and share business-specific ETL logic among their teams.

Amazon Athena

Blog – New — Amazon Athena for Apache Spark
With this feature, we can run Apache Spark workloads, use Jupyter Notebook as the interface to perform data processing on Athena, and programmatically interact with Spark applications using Athena APIs.

Amazon QuickSight

Blog – New analytical questions available in Amazon QuickSight Q: “Why” and “Forecast”
Amazon QuickSight announces support for two new question types that simplify and scale complex analytical tasks using natural language: “forecast” and “why.”
Blog – Announcing Automated Data Preparation for Amazon QuickSight Q
Automated data preparation uses machine learning to infer semantic information about data and adds it to datasets as metadata about the columns (fields), making it faster for you to prepare data in order to support natural language questions.
Blog – New Amazon QuickSight API Capabilities to Accelerate Your BI Transformation
New QuickSight API capabilities allow programmatic creation and management of dashboards, analysis, and templates.
Blog – Create and Share Operational Reports at Scale with Amazon QuickSight Paginated Reports
This feature allows customers to create and share highly formatted, personalized reports containing business-critical data to hundreds of thousands of end users—without any infrastructure setup or maintenance, up-front licensing, or long-term commitments.

Amazon AppFlow

Blog – Announcing Additional Data Connectors for Amazon AppFlow
We’ve added 22 new data connectors for Amazon AppFlow, including connectors for marketing, customer service and engagement, and business operations.

Thanks for reading! re:Invent is certainly not just about new launches. The Analytics and Business Intelligence tracks dived deep into each one of the analytics services through sessions (86 in total!), covering a wide range of topics and use cases.

Check out the YouTube playlist for session recordings.

About the author

Gwen Chen is Senior Product Marketing Manager for Amazon Redshift and re:Invent Analytics Track Lead. She believes in the power of communication, and likes data, analytics, and AI/ML.

New – Process PDFs, Word Documents, and Images with Amazon Comprehend for IDP

2022-12-01 Marcia Villalba

Post Syndicated from Marcia Villalba original https://aws.amazon.com/blogs/aws/now-process-pdfs-word-documents-and-images-with-amazon-comprehend-for-idp/

Today we are announcing a new Amazon Comprehend feature for intelligent document processing (IDP). This feature allows you to classify and extract entities from PDF documents, Microsoft Word files, and images directly from Amazon Comprehend without you needing to extract the text first.

Many customers need to process documents that have a semi-structured format, like images of receipts that were scanned or tax statements in PDF format. Until today, those customers first needed to preprocess those documents using optical character recognition (OCR) tools to extract the text. Then they could use Amazon Comprehend to classify and extract entities from those preprocessed files.

Now with Amazon Comprehend for IDP, customers can process their semi-structured documents, such as PDFs, docx, PNG, JPG, or TIFF images, as well as plain-text documents, with a single API call. This new feature combines OCR and Amazon Comprehend’s existing natural language processing (NLP) capabilities to classify and extract entities from the documents. The custom document classification API allows you to organize documents into categories or classes, and the custom-named entity recognition API allows you to extract entities from documents like product codes or business-specific entities. For example, an insurance company can now process scanned customers’ claims with fewer API calls. Using the Amazon Comprehend entity recognition API, they can extract the customer number from the claims and use the custom classifier API to sort the claim into the different insurance categories—home, car, or personal.

Starting today, Amazon Comprehend for IDP APIs are available for real-time inferencing of files, as well as for asynchronous batch processing on large document sets. This feature simplifies the document processing pipeline and reduces development effort.

Getting Started
You can use Amazon Comprehend for IDP from the AWS Management Console, AWS SDKs, or AWS Command Line Interface (CLI).

In this demo, you will see how to asynchronously process a semi-structured file with a custom classifier. For extracting entities, the steps are different, and you can learn how to do it by checking the documentation.

In order to process a file with a classifier, you will first need to train a custom classifier. You can follow the steps in the Amazon Comprehend Developer Guide. You need to train this classifier with plain text data.

After you train your custom classifier, you can classify documents using either asynchronous or synchronous operations. For using the synchronous operation to analyze a single document, you need to create an endpoint to run real-time analysis using a custom model. You can find more information about real-time analysis in the documentation. For this demo, you are going to use the asynchronous operation, placing the documents to classify in an Amazon Simple Storage Service (Amazon S3) bucket and running an analysis batch job.

To get started classifying documents in batch from the console, on the Amazon Comprehend page, go to Analysis jobs and then Create job.

Then you can configure the new analysis job. First, input a name and pick Custom classification and the custom classifier you created earlier.

Then you can configure the input data. First, select the S3 location for that data. In that location, you can place your PDFs, images, and Word Documents. Because you are processing semi-structured documents, you need to choose One document per file. If you want to override Amazon Comprehend settings for extracting and parsing the document, you can configure the Advanced document input options.

After configuring the input data, you can select where the output of this analysis should be stored. Also, you need to give access permissions for this analysis job to read and write on the specified Amazon S3 locations, and then you are ready to create the job.

The job takes a few minutes to run, depending on the size of the input. When the job is ready, you can check the output results. You can find the results in the Amazon S3 location you specified when you created the job.

In the results folder, you will find a .out file for each of the semi-structured files Amazon Comprehend classified. The .out file is a JSON, in which each line represents a page of the document. In the amazon-textract-output directory, you will find a folder for each classified file, and inside that folder, there is one file per page from the original file. Those page files contain the classification results. To learn more about the outputs of the classifications, check the documentation page.

Available Now
You can get started classifying and extracting entities from semi-structured files like PDFs, images, and Word Documents asynchronously and synchronously today from Amazon Comprehend in all the Regions where Amazon Comprehend is available. Learn more about this new launch in the Amazon Comprehend Developer Guide.

— Marcia

Introducing Amazon GameLift Anywhere – Run Your Game Servers on Your Own Infrastructure

2022-12-01 Channy Yun

Post Syndicated from Channy Yun original https://aws.amazon.com/blogs/aws/introducing-amazon-gamelift-anywhere-run-your-game-servers-on-your-own-infrastructure/

In 2016, we launched Amazon GameLift, a dedicated hosting solution that securely deploys and automatically scales fleets of session-based multiplayer game servers to meet worldwide player demand.

With Amazon GameLift, you can create and upload a game server build once, replicate, and then deploy across multiple AWS Regions and AWS Local Zones to reach your players with low-latency experiences across the world. GameLift also includes standalone features for low-cost game fleets with GameLift FleetIQ and player matchmaking with GameLift FlexMatch.

Game developers asked us to reduce the wait time to deploy a candidate server build to the cloud each time they needed to test and iterate their game during the development phase. In addition, our customers told us that they often have ongoing bare-metal contracts or on-premises game servers and want the flexibility to use their existing infrastructure with cloud servers.

Today we are announcing the general availability of Amazon GameLift Anywhere, which decouples game session management from the underlying compute resources. With this new release, you can now register and deploy any hardware, including your own local workstations, under a logical construct called an Anywhere Fleet.

Because your local hardware can now be a GameLift-managed server, you can iterate on the server build in your familiar local desktop environment, and any server error can materialize in seconds. You can also set breakpoints in your environment’s debugger, thereby eliminating trial and error and further speeding up the iteration process.

Here are the major benefits for game developers to use GameLift Anywhere.

Faster game development – Instantly test and iterate on your local workstation while still leveraging GameLift FlexMatch and Queue services.
Hybrid server management – Deploy, operate, and scale dedicated game servers hosted in the cloud or on-premises, all from a single location.
Streamline server operations – Reduce cost and operational complexity by unifying server infrastructure under a single game server orchestration layer.

During the beta period of GameLift Anywhere, lots of customers gave feedback. For example, Nitro Games has been an Amazon GameLift customer since 2020 and have used the service for player matchmaking and managing dedicated game servers in the cloud. Daniel Liljeqvist, Senior DevOps Engineer at Nitro Games said “With GameLift Anywhere we can easily debug a game server on our local machine, saving us time and making the feedback loop much shorter when we are developing new games and features.”

GameLift Anywhere resources such as locations, fleets, and compute are managed through the same highly secure AWS API endpoints as all AWS services. This also applies to generating the authentication tokens for game server processes that are only valid for a limited amount of time for additional security. You can leverage AWS Identity and Access Management (AWS IAM) roles and policies to fully manage access to all the GameLift Anywhere endpoints.

Getting Started with GameLift Anywhere
Before creating your GameLift fleet in your local hardware, you can create custom locations to run your game builds or scripts. Choose Locations in the left navigation pane of the GameLift console and select Create location.

You can create a custom location of your hardware that you can use with your GameLift Anywhere fleet to test your games.

Choose Fleets from the left navigation pane, then choose Create fleet to add your GameLift Anywhere fleet in the desired location.

Choose Anywhere on the Choose compute type step.

Define your fleet details, such as a fleet name and optional items. For more information on settings, see Create a new GameLift fleet in the AWS documentation.

On the Select locations step, select the custom location that you created. The home AWS Region is automatically selected as the Region you are creating the fleet in. You can use the home Region to access and use your resources.

After completing the fleet creation steps to create your Anywhere fleet, you can see active fleets in both the managed EC2 instances and the Anywhere location. You also can integrate remote on-premises hardware by adding more GameLift Anywhere locations, so you can manage your game sessions from one place. To learn more, see Create a new GameLift fleet in the AWS documentation.

You can register your laptop as a compute resource in the fleet that you created. Use the fleet-id created in the previous step and add a compute-name and your laptop’s ip-address.

$ aws gamelift register-compute \
    --compute-name ChannyDevLaptop \
    --fleet-id fleet-12345678-abcdefghi \
    --ip-address 10.1.2.3

Now, you can start a debug session of your game server by retrieving the authorization token for your laptop in the fleet that you created.

$ aws gamelift get-compute-auth-token \
    --fleet-id fleet-12345678-abcdefghi \
    --compute-name ChannyDevLaptop

To run a debug instance of your game server executable, your game server must call InitSDK(). After the process is ready to host a game session, the game server calls ProcessReady(). To learn more, see Integrating games with custom game servers and Testing your integration in the AWS documentation.

Now Available
Amazon GameLift Anywhere is available in all Regions where Amazon GameLift is available. GameLift offers a step-by-step developer guide, API reference guide, and GameLift SDKs. You can also see for yourself how easy it is to test Amazon GameLift using our sample game to get started.

Give it a try, and please send feedback to AWS re:Post for Amazon GameLift or through your usual AWS support contacts.

– Channy

Announcing Amazon CodeCatalyst (preview), a Unified Software Development Service

2022-12-01 Steve Roberts

Post Syndicated from Steve Roberts original https://aws.amazon.com/blogs/aws/announcing-amazon-codecatalyst-preview-a-unified-software-development-service/

Today, we announced the preview release of Amazon CodeCatalyst. A unified software development and delivery service, Amazon CodeCatalyst enables software development teams to quickly and easily plan, develop, collaborate on, build, and deliver applications on AWS, reducing friction throughout the development lifecycle.

In my time as a developer the biggest excitement—besides shipping software to users—was the start of a new project, or being invited to join a project. Both came with the anticipation of building something cool, cutting new code—seeing an idea come to life. However, starting out was sometimes a slow process. My team or I would need to update our local development environments—or entirely new machines—with tools, libraries, and programming frameworks. We had to create source code repositories and set up other shared tools such as Jira, Confluence, or Jenkins, configure build pipelines and other automation workflows, create test environments, and so on. Day-to-day maintenance of development and build environments consumed valuable team cycles and energy. Collaboration between the team took effort, too, because tools to share information and have a single source of truth were not available. Context switching between projects and dealing with conflicting dependencies in those projects, e.g., Python 3.6 for project X and Python 2.7 for project Y—especially when we had only a single machine to work on—further increased the burden.

It doesn’t seem to have gotten any better! These days, when talking to developers about their experiences, I often hear them express that they feel modern development has become even more complicated. This is due to having to select and configure a wider collection of modern frameworks and libraries, tools, cloud services, continuous integration and delivery pipelines, and many other choices that all need to work together to deliver the application experience. What was once manageable by one developer on one machine is now a sprawling, dynamic, complex net of decisions and tradeoffs, made even more challenging by the need to coordinate all this across dispersed teams.

Enter Amazon CodeCatalyst
I’ve spent some time talking with the team behind Amazon CodeCatalyst about their sources of inspiration and goals. Taking feedback from both new and experienced developers and service teams here at AWS, they examined the challenges typically experienced by teams and individual developers when building software for the cloud. Having gathered and reviewed this feedback, they set about creating a unified tool that smooths out the rough edges that needlessly slow down software delivery, and they added features to make it easier for teams to work together and collaborate. Features in Amazon CodeCatalyst to address these challenges include:

Blueprints that set up the project’s resources—not just scaffolding for new projects, but also the resources needed to support software delivery and deployment.
On-demand cloud-based Dev Environments, to make it easy to replicate consistent development environments for you or your teams.
Issue management, enabling tracing of changes across commits, pull requests, and deployments.
Automated build and release (CI/CD) pipelines using flexible, managed build infrastructure.
Dashboards to surface a feed of project activities such as commits, pull requests, and test reporting.
The ability to invite others to collaborate on a project with just an email.
Unified search, making it easy to find what you’re looking for across users, issues, code and other project resources.

There’s a lot in Amazon CodeCatalyst that I don’t have space to cover in this post, so I’m going to briefly cover some specific features, like blueprints, Dev Environments, and project collaboration. Other upcoming posts will cover additional features.

Project Blueprints
When I first heard about blueprints, they sounded like a feature to scaffold some initial code for a project. However, they’re much more! Parameterized application blueprints enable you to set up shared project resources to support the application development lifecycle and team collaboration in minutes—not just initial starter code for an application. The resources that a blueprint creates for a project include a source code repository, complete with initial sample code and AWS service configuration for popular application patterns, which follow AWS best practices by default. If you prefer, an external Git repository such as GitHub may be used instead. The blueprint can also add an issue tracker, but external trackers such as Jira can also be used. Then, the blueprint adds a build and release pipeline for CI/CD, which I’ll come to shortly, as well as other integrated tooling.

The project resources and integrated tools set up using blueprints, including the CI/CD pipeline and the AWS resources to host your application, make it so that you can press “deploy” and get sample code running in a few minutes, enabling you to jump right in and start working on your specific business logic.

At launch, customers can choose from blueprints with Typescript, Python, Java, .NET, Javascript for languages and React, Angular, and Vue frameworks, with more to come. And you don’t need to start with a blueprint. You can build projects with workflows that run on anything that works with Linux and Windows operating systems.

Cloud-Based Dev Environments
Development teams can often run into a problem of “environment drift” where one team member has a slightly different version of a toolchain or library compared to everyone else or the test environments. This can introduce subtle bugs that might go unnoticed for some time. Dev Environment specifications, and the other shared resources, that blueprints create help ensure there’s no unnecessary variance, and everyone on the team gets the same setup to provide a consistent, repeatable experience between developers.

Amazon CodeCatalyst uses a devfile to define the configuration of an on-demand, cloud-based Dev Environment, which currently supports four resizable instance size options with 2, 4, 8, or 16 vCPUs. The devfile defines and configures all of the resources needed to code, test, and debug for a given project, minimizing the time the development team members need to spend on creating and maintaining their local development environments. Devfiles, which are added to the source code repository by the selected blueprint can also be modified if required. With Dev Environments, context switching between projects incurs less overhead—with one click, you can simply switch to a different environment, and you’re ready to start working. This means you’re easily able to work concurrently on multiple codebases without reconfiguring. Being on-demand, Dev Environments can also be paused, restarted, or deleted as needed.

Below is an example of a devfile that bootstraps a Dev Environment.

schemaVersion: 2.0.0
metadata:
  name: aws-universal
  version: 1.0.1
  displayName: AWS Universal
  description: Stack with AWS Universal Tooling
  tags:
    - aws
    - a12
  projectType: aws
commands:
  - id: npm_install
    exec:
      component: aws-runtime
      commandLine: "npm install"
      workingDir: /projects/spa-app
events:
  postStart:
    - npm_install
components:
  - name: aws-runtime
    container:
      image: public.ecr.aws/aws-mde/universal-image:latest
      mountSources: true
      volumeMounts:
        - name: docker-store
          path: /var/lib/docker
  - name: docker-store
    volume:
      size: 16Gi

Developers working in cloud-based Dev Environments provisioned by Amazon CodeCatalyst can use AWS Cloud9 as their IDE. However, they can just as easily work with Amazon CodeCatalyst from other IDEs on their local machines, such as JetBrains IntelliJ IDEA Ultimate, PyCharm Pro, GoLand, and Visual Studio Code. Developers can also create Dev Environments from within their IDE, such as Visual Studio Code or for JetBrains using the JetBrains Gateway app. Below, JetBrains IntelliJ is being used.

Build and Release Pipelines
The build and release pipeline created by the blueprint run on flexible, managed infrastructure. The pipelines can use on-demand compute or preprovisioned builds, including a choice of machine sizes, and you can bring your own container environments. You can incorporate build actions that are built in or provided by partners (e.g., Mend, which provides a software composition analysis build action), and you can also incorporate GitHub Actions to compose fully automated pipelines. Pipelines are configurable using either a visual editor or YAML files.

Build and release pipelines enable deployment to popular AWS services, including Amazon Elastic Container Service (Amazon ECS), AWS Lambda, and Amazon Elastic Compute Cloud (Amazon EC2). Amazon CodeCatalyst makes it trivial to set up test and production environments and deploy using pipelines to one or many Regions or even multiple accounts for security.

Project Collaboration
As a unified software development service, Amazon CodeCatalyst not only makes it easier to get started building and delivering applications on AWS, it helps developers of all levels collaborate on projects through a single shared project space and source of truth. Developers can be invited to collaborate using just an email. On accepting the invitation, the developer sees the full project context and can begin work at once using the project’s Dev Environments—no need to spend time updating or reconfiguring their local machine with required tools, libraries, or other pre-requisites.

Existing members of an Amazon CodeCatalyst space, or new members using their email, can be invited to collaborate on a project:

Each will receive an invitation email containing a link titled Accept Invitation, which when clicked, opens a browser tab to sign in. Once signed in, they can view all the projects in the Amazon CodeCatalyst space they’ve been invited to and can also quickly switch to other spaces in which they are the owner or to which they’ve been invited.

From there, they can select a project and get an immediate overview of where things stand, for example, the status of recent workflows, any open pull requests, and available Dev Environments.

On the Issues board, team members can see which issues need to be worked on, select one, and get started.

Being able to immediately see the context for the project, and have access to on-demand cloud-based Dev Environments, all help with being able to start contributing more quickly, eliminating setup delays.

Get Started with Amazon CodeCatalyst in the Free Tier Today!
Blueprints to scaffold not just application code but also shared project resources supporting the development and deployment of applications, issue tracking, invite-by-email collaboration, automated workflows, and more are all available today in the newly released preview of Amazon CodeCatalyst to help accelerate your cloud development and delivery efforts. Learn more in the Amazon CodeCatalyst User Guide. And, as I mentioned earlier, additional blogs posts and other supporting content are planned by the team to dive into the range of features in more detail, so be sure to look out for them!

New — Create Point-to-Point Integrations Between Event Producers and Consumers with Amazon EventBridge Pipes

2022-12-01 Donnie Prakoso

Post Syndicated from Donnie Prakoso original https://aws.amazon.com/blogs/aws/new-create-point-to-point-integrations-between-event-producers-and-consumers-with-amazon-eventbridge-pipes/

It is increasingly common to use multiple cloud services as building blocks to assemble a modern event-driven application. Using purpose-built services to accomplish a particular task ensures developers get the best capabilities for their use case. However, communication between services can be difficult if they use different technologies to communicate, meaning that you need to learn the nuances of each service and how to integrate them with each other. We usually need to create integration code (or “glue” code) to connect and bridge communication between services. Writing glue code slows our velocity, increases the risk of bugs, and means we spend our time writing undifferentiated code rather than building better experiences for our customers.

Introducing Amazon EventBridge Pipes
Today, I’m excited to announce Amazon EventBridge Pipes, a new feature of Amazon EventBridge that makes it easier for you to build event-driven applications by providing a simple, consistent, and cost-effective way to create point-to-point integrations between event producers and consumers, removing the need to write undifferentiated glue code.

The simplest pipe consists of a source and a target. An optional filtering step allows only specific source events to flow into the Pipe and an optional enrichment step using AWS Lambda, AWS Step Functions, Amazon EventBridge API Destinations, or Amazon API Gateway enriches or transforms events before they reach the target. With Amazon EventBridge Pipes, you can integrate supported AWS and self-managed services as event producers and event consumers into your application in a simple, reliable, consistent and cost-effective way.

Amazon EventBridge Pipes bring the most popular features of Amazon EventBridge Event Bus, such as event filtering, integration with more than 14 AWS services, and automatic delivery retries.

How Amazon EventBridge Pipes Works
Amazon EventBridge Pipes provides you a seamless means of integrating supported AWS and self-managed services, favouring configuration over code. To start integrating services with EventBridge Pipes, you need to take the following steps:

Choose a source that is producing your events. Supported sources include: Amazon DynamoDB, Amazon Kinesis Data Streams, Amazon SQS, Amazon Managed Streaming for Apache Kafka, and Amazon MQ (both ActiveMQ and RabbitMQ).
(Optional) Specify an event filter to only process events that match your filter (you’re not charged for events that are filtered out).
(Optional) Transform and enrich your events using built-in free transformations, or AWS Lambda, AWS Step Functions, Amazon API Gateway, or EventBridge API Destinations to perform more advanced transformations and enrichments.
Choose a target destination from more than 14 AWS services, including Amazon Step Functions, Kinesis Data Streams, AWS Lambda, and third-party APIs using EventBridge API destinations.

Amazon EventBridge Pipes provides simplicity to accelerate development velocity by reducing the time needed to learn the services and write integration code, to get reliable and consistent integration.

EventBridge Pipes also comes with additional features that can help in building event-driven applications. For example, with event filtering, Pipes helps event-driven applications become more cost-effective by only processing the events of interest.

Get Started with Amazon EventBridge Pipes
Let’s see how to get started with Amazon EventBridge Pipes. In this post, I will show how to integrate an Amazon SQS queue with AWS Step Functions using Amazon EventBridge Pipes.

The following screenshot is my existing Amazon SQS queue and AWS Step Functions state machine. In my case, I need to run the state machine for every event in the queue. To do so, I need to connect my SQS queue and Step Functions state machine with EventBridge Pipes.

Existing Amazon SQS queue and AWS Step Functions state machine

First, I open the Amazon EventBridge console. In the navigation section, I select Pipes. Then I select Create pipe.

On this page, I can start configuring a pipe and set the AWS Identity and Access Management (IAM) permission, and I can navigate to the Pipe settings tab.

Navigate to Pipe Settings

In the Permissions section, I can define a new IAM role for this pipe or use an existing role. To improve developer experience, the EventBridge Pipes console will figure out the IAM role for me, so I don’t need to manually configure required permissions and let EventBridge Pipes configures least-privilege permissions for IAM role. Since this is my first time creating a pipe, I select Create a new role for this specific resource.

Setting IAM Permission for pipe

Then, I go back to the Build pipe section. On this page, I can see the available event sources supported by EventBridge Pipes.

List of available services as the event source

I select SQS and select my existing SQS queue. If I need to do batch processing, I can select Additional settings to start defining Batch size and Batch window. Then, I select Next.

Select SQS Queue as event source

On the next page, things get even more interesting because I can define Event filtering from the event source that I just selected. This step is optional, but the event filtering feature makes it easy for me to process events that only need to be processed by my event-driven application. In addition, this event filtering feature also helps me to be more cost-effective, as this pipe won’t process unnecessary events. For example, if I use Step Functions as the target, the event filtering will only execute events that match the filter.

Event filtering in Amazon EventBridge Pipes

I can use sample events from AWS events or define custom events. For example, I want to process events for returned purchased items with a value of 100 or more. The following is the sample event in JSON format:

{
   "event-type":"RETURN_PURCHASE",
   "value":100
}

Then, in the event pattern section, I can define the pattern by referring to the Content filtering in Amazon EventBridge event patterns documentation. I define the event pattern as follows:

{
   "event-type": ["RETURN_PURCHASE"],
   "value": [{
      "numeric": [">=", 100]
   }]
}

I can also test by selecting test pattern to make sure this event pattern will match the custom event I’m going to use. Once I’m confident that this is the event pattern that I want, I select Next.

Defining and testing an event pattern for filtering

In the next optional step, I can use an Enrichment that will augment, transform, or expand the event before sending the event to the target destination. This enrichment is useful when I need to enrich the event using an existing AWS Lambda function, or external SaaS API using the Destination API. Additionally, I can shape the event using the Enrichment Input Transformer.

The final step is to define a target for processing the events delivered by this pipe.

Defining target destination service

Here, I can select various AWS services supported by EventBridge Pipes.

I select my existing AWS Step Functions state machine, named pipes-statemachine.

In addition, I can also use Target Input Transformer by referring to the Transforming Amazon EventBridge target input documentation. For my case, I need to define a high priority for events going into this target. To do that, I define a sample custom event in Sample events/Event Payload and add the priority: HIGH in the Transformer section. Then in the Output section, I can see the final event to be passed to the target destination service. Then, I select Create pipe.

In less than a minute, my pipe was successfully created.

Pipe successfully created

To test this pipe, I need to put an event into the Amaon SQS queue.

Sending a message into Amazon SQS Queue

To check if my event is successfully processed by Step Functions, I can look into my state machine in Step Functions. On this page, I see my event is successfully processed.

I can also go to Amazon CloudWatch Logs to get more detailed logs.

Things to Know
Event Sources – At launch, Amazon EventBridge Pipes supports the following services as event sources: Amazon DynamoDB, Amazon Kinesis, Amazon Managed Streaming for Apache Kafka (Amazon MSK) alongside self-managed Apache Kafka, Amazon SQS (standard and FIFO), and Amazon MQ (both for ActiveMQ and RabbitMQ).

Event Targets – Amazon EventBridge Pipes supports 15 Amazon EventBridge targets, including AWS Lambda, Amazon API Gateway, Amazon SNS, Amazon SQS, and AWS Step Functions. To deliver events to any HTTPS endpoint, developers can use API destinations as the target.

Event Ordering – EventBridge Pipes maintains the ordering of events received from an event sources that support ordering when sending those events to a destination service.

Programmatic Access – You can also interact with Amazon EventBridge Pipes and create a pipe using AWS Command Line Interface (CLI), AWS CloudFormation, and AWS Cloud Development Kit (AWS CDK).

Independent Usage – EventBridge Pipes can be used separately from Amazon EventBridge bus and Amazon EventBridge Scheduler. This flexibility helps developers to define source events from supported AWS and self-managed services as event sources without Amazon EventBridge Event Bus.

Availability – Amazon EventBridge Pipes is now generally available in all AWS commercial Regions, with the exception of Asia Pacific (Hyderabad) and Europe (Zurich).

Visit the Amazon EventBridge Pipes page to learn more about this feature and understand the pricing. You can also visit the documentation page to learn more about how to get started.

Happy building!

— Donnie

Step Functions Distributed Map – A Serverless Solution for Large-Scale Parallel Data Processing

2022-12-01 Sébastien Stormacq

Post Syndicated from Sébastien Stormacq original https://aws.amazon.com/blogs/aws/step-functions-distributed-map-a-serverless-solution-for-large-scale-parallel-data-processing/

I am excited to announce the availability of a distributed map for AWS Step Functions. This flow extends support for orchestrating large-scale parallel workloads such as the on-demand processing of semi-structured data.

Step Function’s map state executes the same processing steps for multiple entries in a dataset. The existing map state is limited to 40 parallel iterations at a time. This limit makes it challenging to scale data processing workloads to process thousands of items (or even more) in parallel. In order to achieve higher parallel processing prior to today, you had to implement complex workarounds to the existing map state component.

The new distributed map state allows you to write Step Functions to coordinate large-scale parallel workloads within your serverless applications. You can now iterate over millions of objects such as logs, images, or .csv files stored in Amazon Simple Storage Service (Amazon S3). The new distributed map state can launch up to ten thousand parallel workflows to process data.

You can process data by composing any service API supported by Step Functions, but typically, you will invoke Lambda functions to process the data with code written in your favorite programming language.

Step Functions distributed map supports a maximum concurrency of up to 10,000 executions in parallel, which is well above the concurrency supported by many other AWS services. You can use the maximum concurrency feature of the distributed map to ensure that you do not exceed the concurrency of a downstream service. There are two factors to consider when working with other services. First, the maximum concurrency supported by the service for your account. Second, the burst and ramping rates, which determine how quickly you can achieve the maximum concurrency.

Let’s use Lambda as an example. Your functions’ concurrency is the number of instances that serve requests at a given time. The default maximum concurrency quota for Lambda is 1,000 per AWS Region. You can ask for an increase at any time. For an initial burst of traffic, your functions’ cumulative concurrency in a Region can reach an initial level of between 500 and 3000, which varies per Region. The burst concurrency quota applies to all your functions in the Region.

When using a distributed map, be sure to verify the quota on downstream services. Limit the distributed map maximum concurrency during your development, and plan for service quota increases accordingly.

To compare the new distributed map with the original map state flow, I created this table.

	Original map state flow	New distributed map flow
Sub workflows	Runs a sub-workflow for each item in an array. The array must be passed from the previous state. Each iteration of the sub-workflow is called a map iteration, and its events are added to the state machine’s execution history.	Runs a sub-workflow for each item in an array or Amazon S3 dataset. Each sub-workflow is run as a totally separate child execution, with its own event history.
Parallel branches	Map iterations run in parallel, with an effective maximum concurrency of around 40 at a time.	Can pass millions of items to multiple child executions, with concurrency of up to 10,000 executions at a time.
Input source	Accepts only a JSON array as input.	Accepts input as Amazon S3 object list, JSON arrays or files, csv files, or Amazon S3 inventory.
Payload	256 KB	Each iteration receives a reference to a file (Amazon S3) or a single record from a file (state input). Actual file processing capability is limited by Lambda storage and memory.
Execution history	25,000 events	Each iteration of the map state is a child execution, with up to 25,000 events each (express mode has no limit on execution history).

Sub-workflows within a distributed map work with both Standard workflows and the low-latency, short-duration Express Workflows.

This new capability is optimized to work with S3. I can configure the bucket and prefix where my data are stored directly from the distributed map configuration. The distributed map stops reading after 100 million items and supports JSON or csv files of up to 10GB.

When processing large files, think about downstream service capabilities. Let’s take Lambda again as an example. Each input—a file on S3, for example—must fit within the Lambda function execution environment in terms of temporary storage and memory. To make it easier to handle large files, Lambda Powertools for Python introduced a new streaming feature to fetch, transform, and process S3 objects with minimal memory footprint. This allows your Lambda functions to handle files larger than the size of their execution environment. To learn more about this new capability, check the Lambda Powertools documentation.

Let’s See It in Action
For this demo, I will create a workflow that processes one thousand dog images stored on S3. The images are already stored on S3.

➜  ~ aws s3 ls awsnewsblog-distributed-map/images/
2022-11-08 15:03:36      27034 n02085620_10074.jpg
2022-11-08 15:03:36      34458 n02085620_10131.jpg
2022-11-08 15:03:36      12883 n02085620_10621.jpg
2022-11-08 15:03:36      34910 n02085620_1073.jpg
...

➜  ~ aws s3 ls awsnewsblog-distributed-map/images/ | wc -l
    1000

The workflow and the S3 bucket must be in the same Region.

To get started, I navigate to the Step Functions page of the AWS Management Console and select Create state machine. On the next page, I choose to design my workflow using the visual editor. The distributed map works with Standard workflows, and I keep the default selection as-is. I select Next to enter the visual editor.

In the visual editor, I search and select the Map component on the left-side pane, and I drag it to the workflow area. On the right side, I configure the component. I choose Distributed as Processing mode and Amazon S3 as Item Source.

Distributed maps are natively integrated with S3. I enter the name of the bucket (awsnewsblog-distributed-map) and the prefix (images) where my images are stored.

On the Runtime Settings section, I choose Express for Child workflow type. I also may decide to restrict the Concurrency limit. It helps to ensure we operate within the concurrency quotas of the downstream services (Lambda in this demo) for a particular account or Region.

By default, the output of my sub-workflows will be aggregated as state output, up to 256KB. To process larger outputs, I may choose to Export map state results to Amazon S3.

Finally, I define what to do for each file. In this demo, I want to invoke a Lambda function for each file in the S3 bucket. The function exists already. I search for and select the Lambda invocation action on the left-side pane. I drag it to the distributed map component. Then, I use the right-side configuration panel to select the actual Lambda function to invoke: AWSNewsBlogDistributedMap in this example.

When I am done, I select Next. I select Next again on the Review generated code page (not shown here).

On the Specify state machine settings page, I enter a Name for my state machine and the IAM Permissions to run. Then, I select Create state machine.

Now I am ready to start the execution. On the State machine page, I select the new workflow and select Start execution. I can optionally enter a JSON document to pass to the workflow. In this demo, the workflow does not handle the input data. I leave it as-is, and I select Start execution.

Start workflow execution - pass input data

During the execution of the workflow, I can monitor the progress. I observe the number of iterations, and the number of items successfully processed or in error.

I can drill down on one specific execution to see the details.

With just a few clicks, I created a large-scale and heavily parallel workflow able to handle a very large quantity of data.

Which AWS Service Should I Use
As often happens on AWS, you might observe an overlap between this new capability and existing services such as AWS Glue, Amazon EMR, or Amazon S3 Batch Operations. Let’s try to differentiate the use cases.

In my mental model, data scientists and data engineers use AWS Glue and EMR to process large amounts of data. On the other hand, application developers will use Step Functions to add serverless data processing into their applications. Step Functions is able to scale from zero quickly, which makes it a good fit for interactive workloads where customers may be waiting for the results. Finally, system administrators and IT operation teams are likely to use Amazon S3 Batch Operations for single-step IT automation operations such as copying, tagging, or changing permissions on billions of S3 objects.

Pricing and Availability
AWS Step Functions’ distributed map is generally available in the following ten AWS Regions: US East (Ohio, N. Virginia), US West (Oregon), Asia Pacific (Singapore, Sydney, Tokyo), Canada (Central), and Europe (Frankfurt, Ireland, Stockholm).

The pricing model for the existing inline map state does not change. For the new distributed map state, we charge one state transition per iteration. Pricing varies between Regions, and it starts at $0.025 per 1,000 state transitions. When you process your data using express workflows, you are also charged based on the number of requests for your workflow and its duration. Again, prices vary between Regions, but they start at $1.00 per 1 million requests and $0.06 per GB-hour (prorated to 100ms).

For the same amount of iterations, you will observe a cost reduction when using the combination of the distributed map and standard workflows compared to the existing inline map. When you use express workflows, expect the costs to stay the same for more value with the distributed map.

I am really excited to discover what you will build using this new capability and how it will unlock innovation. Go start to build highly parallel serverless data processing workflows today!

— seb

AWS Marketplace Vendor Insights – Simplify Third-Party Software Risk Assessments

2022-12-01 Sébastien Stormacq

Post Syndicated from Sébastien Stormacq original https://aws.amazon.com/blogs/aws/aws-marketplace-vendor-insights-simplify-third-party-software-risk-assessments/

AWS Marketplace Vendor Insights is a new capability of AWS Marketplace. It simplifies third-party software risk assessments when procuring solutions from the AWS Marketplace.

It helps you to ensure that the third-party software continuously meets your industry standards by compiling security and compliance information, such as data privacy and residency, application security, and access control, in one consolidated dashboard.

As a security engineer, you may now complete third-party software risk assessment in a few days instead of months. You can now:

Quickly discover products in AWS Marketplace that meet your security and certification standards by searching for and accessing Vendor Insights profiles.
Access and download current and validated information, with evidence gathered from the vendors’ security tools and audit reports. Reports are available for download on AWS Artifact third-party reports (now available in preview).
Monitor your software’s security posture post-procurement and receive notifications for security and compliance events.

As a software vendor, you can now reduce the operational burden of responding to buyer requests for risk assessment information. It gives your customers a self-service access experience. You can now:

Build your product’s security profile by uploading your ISO 27001 or SOC2 Type 2 report and completing a software risk assessment with AWS Audit Manager.
Store and share your compliance reports such as ISO 27001 and SOC2 Type 2, using AWS Artifact third-party reports (preview).
View and approve your buyer requests for viewing security controls and compliance artifacts stored in Vendor Insights.

Let’s See It in Action
I want to procure a solution on the AWS Marketplace. But before purchasing the product, as a security engineer, I want to review its compliance. I navigate to the AWS Marketplace page of the AWS Management Console. I use the faceted search on the left side to select vendors that are ISO 27001 compliant.

I select a product. On the Product Overview page, I select View assessment data on the top right side (not shown on the screenshot). Then, I can see the overview page, which shows the Security certification received and the Expiration date.

I select the Security and compliance tab and see that I need to request access to see the detailed security and compliance information. I select the Request access button on the top right side to ask the vendor for access to their compliance documents.

On the next page, I fill in the Your information form with my details, and I select Request access.

The Next Steps section details what will happen next. The seller will contact me to sign a nondisclosure agreement (NDA). The seller will notify AWS Marketplace when the NDA is signed. Then, I will be granted access to Vendor Insights data.

The process can take a few days. For this demo, I switch to a fictional product—Everest—for which I have access to the compliance data. Here is the Security and compliance tab when my request for access is accepted.

The Summary section shows how many controls are available. It reports how many have been validated with evidence and how many have been self-reported by the seller. It also shows how many noncompliant controls are reported.

I can scroll down the page to see the details for multiple categories: Audit, compliance and security policy, Data security, Access management, Application security, Risk management and incident response, Business resiliency and continuity, End user device security, Infrastructure security, Human resources, and Security and configuration policy. The screenshot does not show all of them.

I select the detail for Access control and see the list under Control name. For each of them, I can see the compliance for SOC2 Type 2, ISO 27001, and the Vendor self-assessment.

I select the noncompliant one to get the details and the explanation the vendor provided.

If needed, I might also use AWS Artifact third-party reports (preview) to download the compliance reports.

For Software Vendors
As a software vendor, you can create a security profile for your SaaS products on AWS Marketplace and share this profile with your prospective and existing buyers. It helps you to reduce the manual work for engineering and security teams to respond to your customer questionnaires.

To create a security profile, you will need to complete a self-assessment using AWS Audit Manager on your marketplace management AWS account, share the current SOC2 Type II and ISO27001 compliance artifacts, if available, and turn on automated assessment using Audit Manager and AWS Config on your production AWS accounts.

Our team has created an AWS CloudFormation template to automate the onboarding steps. You can find the technical resources, such as the setup guide and the onboarding templates, on our GitHub repository. Once the profile is created, Vendor Insights will keep your security profile up to date by using automated evidence from Audit Manager and AWS Config. The updates to your profile are sent as notifications. Your security and compliance team can review the updates before they are shared with buyers.

With Vendor Insights, you manage access to your product’s security profile by approving the buyer’s subscription requests. When a buyer requests access, Vendor Insights shares their contact information over email to your compliance or deal-desk operations team. They can complete the NDA with the buyer and notify AWS Marketplace to grant the buyer access to your security profile. You can also request AWS Marketplace to revoke the buyer’s subscription on a later day if you don’t want to share your product’s security and compliance posture information with the buyer anymore.

The entire process is documented in the AWS Marketplace Vendor Insights seller guide.

Pricing and Availability
Vendor Insights is now available in all AWS Regions where AWS Marketplace is available.

The pricing model is very simple; there is no charge involved for using AWS Marketplace Vendor Insights.

For buyers, you can access and download assets during your procurement phase. You lose access to the Vendor Insights profile if you have not purchased the product after 60 days. When you purchase the product, you keep access to the product’s security profile for continuous monitoring of its compliance status.

For sellers, AWS Marketplace doesn’t charge to activate and use Vendor Insights. You will incur fees for using Audit Manager and AWS Config.

Go and start your risk assessments on the AWS Marketplace today.

— seb

New for Amazon SageMaker – Perform Shadow Tests to Compare Inference Performance Between ML Model Variants

2022-11-30 Antje Barth

Post Syndicated from Antje Barth original https://aws.amazon.com/blogs/aws/new-for-amazon-sagemaker-perform-shadow-tests-to-compare-inference-performance-between-ml-model-variants/

As you move your machine learning (ML) workloads into production, you need to continuously monitor your deployed models and iterate when you observe a deviation in your model performance. When you build a new model, you typically start validating the model offline using historical inference request data. But this data sometimes fails to account for current, real-world conditions. For example, new products might become trending that your product recommendation model hasn’t seen yet. Or, you experience a sudden spike in the volume of inference requests in production that you never exposed your model to before.

Today, I’m excited to announce Amazon SageMaker support for shadow testing!

Deploying a model in shadow mode lets you conduct a more holistic test by routing a copy of the live inference requests for a production model to the new (shadow) model. Yet, only the responses from the production model are returned to the calling application. Shadow testing helps you build further confidence in your model and catch potential configuration errors and performance issues before they impact end users. Once you complete a shadow test, you can use the deployment guardrails for SageMaker inference endpoints to safely update your model in production.

Get Started with Amazon SageMaker Shadow Testing
You can create shadow tests using the new SageMaker Inference Console and APIs. Shadow testing gives you a fully managed experience for setup, monitoring, viewing, and acting on the results of shadow tests. If you have existing workflows built around SageMaker endpoints, you can also deploy a model in shadow mode using the existing SageMaker Inference APIs.

On the SageMaker console, select Inference and Shadow tests to create, monitor, and deploy shadow tests.

To create a shadow test, select an existing (or create a new) SageMaker endpoint and production variant you want to test against.

Next, configure the proportion of traffic to send to the shadow variant, the comparison metrics you want to evaluate, and the duration of the test. You can also enable data capture for your production and shadow variant.

That’s it. SageMaker now automatically deploys the new variant in shadow mode and routes a copy of the inference requests to it in real time, all within the same endpoint. The following diagram illustrates this workflow.

Note that only the responses of the production variant are returned to the calling application. You can choose to either discard or log the responses of the shadow variant for offline comparison.

You can also use shadow testing to validate changes you made to any component in your production variant, including the serving container or ML instance. This can be useful when you’re upgrading to a new framework version of your serving container, applying patches, or if you want to make sure that there is no impact to latency or error rate due to this change. Similarly, if you consider moving to another ML instance type, for example, Amazon EC2 C7g instances based on AWS Graviton processors, or EC2 G5 instances powered by NVIDIA A10G Tensor Core GPUs, you can use shadow testing to evaluate the performance on production traffic prior to rollout.

You can monitor the progress of the shadow test and performance metrics such as latency and error rate through a live dashboard. On the SageMaker console, select Inference and Shadow tests, then select the shadow test you want to monitor.

If you decide to promote the shadow model to production, select Deploy shadow variant and define the infrastructure configuration to deploy the shadow variant.

You can also use the SageMaker deployment guardrails if you want to add linear or canary traffic shifting modes and auto rollbacks to your update.

Availability and Pricing
SageMaker support for shadow testing is available today in all AWS Regions where SageMaker hosting is available except for the AWS GovCloud (US) Regions and AWS China Regions.

There is no additional charge for SageMaker shadow testing other than usage charges for the ML instances and ML storage provisioned to host the shadow variant. The pricing for ML instances and ML storage dimensions is the same as the real-time inference option. There is no additional charge for data processed in and out of shadow deployments. The SageMaker pricing page has all the details.

To learn more, visit Amazon SageMaker shadow testing.

Start validating your new ML models with SageMaker shadow tests today!

— Antje

Next Generation SageMaker Notebooks – Now with Built-in Data Preparation, Real-Time Collaboration, and Notebook Automation

2022-11-30 Antje Barth

Post Syndicated from Antje Barth original https://aws.amazon.com/blogs/aws/next-generation-sagemaker-notebooks-now-with-built-in-data-preparation-real-time-collaboration-and-notebook-automation/

In 2019, we introduced Amazon SageMaker Studio, the first fully integrated development environment (IDE) for data science and machine learning (ML). SageMaker Studio gives you access to fully managed Jupyter Notebooks that integrate with purpose-built tools to perform all ML steps, from preparing data to training and debugging models, tracking experiments, deploying and monitoring models, and managing pipelines.

Today, I’m excited to announce the next generation of Amazon SageMaker Notebooks to increase efficiency across the ML development workflow. You can now improve data quality in minutes with the built-in data preparation capability, edit the same notebooks with your teams in real time, and automatically convert notebook code to production-ready jobs.

Let me show you what’s new!

New Notebook Capability for Simplified Data Preparation
The new built-in data preparation capability is powered by Amazon SageMaker Data Wrangler and is available in SageMaker Studio notebooks. SageMaker Studio notebooks automatically generate key visualizations on top of Pandas data frames to help you understand data distribution and identify data quality issues, like missing values, invalid data, and outliers. You can also select the target column for ML models and generate ML-specific insights such as imbalanced class or high correlation columns. You then receive recommendations for data transformations to resolve the issues. You can apply the data transformations right in the UI, and SageMaker Studio notebooks automatically generate the corresponding transformation code in the notebook cells that you can use to replay your data preparation pipeline.

Using the Built-in Data Preparation Capability
To get started, pip install and import sagemaker_datawrangler along with the pandas Python package. Then, download the dataset you want to analyze to the notebook working directory, and read the dataset with pandas.

import pandas as pd 
import sagemaker_datawrangler 

!aws s3 cp s3://<YOUR_S3_BUCKET>/data.csv . 

df = pd.read_csv("data.csv")

Now, when you display the data frame, it automatically shows key data visualizations at the top of each column, surfaces data insights, detects data quality issues, and suggests solutions to improve data quality. When you select a column as the target column for ML predictions, you get target-specific insights and warnings, such as mixed data types in target (for regression use cases) or too few instances per class (for classification use cases).

In this example, I’m using the Women’s E-Commerce Clothing Reviews dataset that contains customer reviews and ratings for women’s clothing. This dataset was obtained from Kaggle and has been modified by Amazon to add synthetic data quality issues.

You can review the suggested data transformations to improve the data quality and apply them right in the UI. For a list of all supported data transformations, have a look at the documentation. Once you apply a data transformation, SageMaker Studio notebooks automatically generate the code to reproduce those data preparation steps in another notebook cell.

For my example, I select Rating as my target column. Target column insights tells me in a high-priority warning that this column has too few instances per class and with a medium-priority warning that classes are too imbalanced. Let’s follow the suggestions and drop rare target values and drop missing values. I will also follow the suggestions for some of the feature columns and drop missing values in the Review Text column and drop the Division Name column.

Once I apply the transformations, the notebook generates this code for me:

# Pandas code generated by sagemaker_datawrangler
output_df = df.copy(deep=True)


# Code to Drop rare target values for column: Rating to resolve warning: Too few instances per class 
rare_target_labels_to_drop = ['-100', '100']
output_df = output_df[~output_df['Rating'].isin(rare_target_labels_to_drop)]


# Code to Drop missing for column: Rating to resolve warning: Missing values 
output_df = output_df[output_df['Rating'].notnull()]


# Code to Drop missing for column: Review Text to resolve warning: Missing values 
output_df = output_df[output_df['Review Text'].notnull()]


# Code to Drop column for column: Division Name to resolve warning: Missing values 
output_df=output_df.drop(columns=['Division Name'])

I can now review and modify the code if needed or start integrating the data transformations as part of my ML development workflow.

Introducing Shared Spaces for Team-Based Sharing and Real-Time Collaboration
SageMaker Studio now offers shared spaces that give data science and ML teams a workspace where they can read, edit, and run notebooks together in real time to streamline collaboration and communication during the development process. Shared spaces provide a shared Amazon EFS directory that you can utilize to share files within a shared space. All taggable SageMaker resources that you create in a shared space are automatically tagged to help you organize and have a filtered view of your ML resources, such as training jobs, experiments, and models, that are relevant to the business problem you work on in the space. This also helps you monitor costs and plan budgets using tools such as AWS Budgets and AWS Cost Explorer.

And that’s not all. You can now also create multiple SageMaker domains within the same AWS account to scope access and isolate resources to different teams or business units in your organization. Now, let me show you how to create a shared space for users within a SageMaker domain.

Using Shared Spaces
You can use the SageMaker console or the AWS CLI to create shared spaces for a SageMaker domain. To get started in the SageMaker console, go to Domains, select or create a new domain, and select Space management on the Domain details page. Then, select Create and give the shared space a name.

Users in this SageMaker domain can now launch and join the shared space through their SageMaker domain user profiles.

In a shared space, select the new Collaborators icon in the left navigation menu. You can now see who else is currently active in this space. The following screenshot shows user tom on the left, editing a notebook file. On the right, user antje sees the edits in real time, together with an annotation of the user name that currently edits that notebook cell.

New Notebook Capability to Automatically Convert Notebook Code to Production-Ready Jobs
You can now select a notebook and automate it as a job that can run in a production environment without the need to manage the underlying infrastructure. When you create a SageMaker Notebook Job, SageMaker Studio takes a snapshot of the entire notebook, packages its dependencies in a container, builds the infrastructure, runs the notebook as an automated job on a schedule you define, and deprovisions the infrastructure upon job completion. This notebook capability is now also available in SageMaker Studio Lab, our free ML development environment that provides the compute, storage, and security to learn and experiment with ML.

Using the Notebook Capability to Automate Notebooks
To get started, open a notebook file in SageMaker Studio. Then, right-click your notebook file and select Create Notebook Job or select the Create Notebook Job icon, as highlighted in the following screenshot.

Define a name for the Notebook Job, review the input file location, specify the compute type to use, and whether to run the job immediately or on a schedule. Then, select Create.

The Notebook Job has been created, and you can review all Notebook Job Definitions in the UI.

Now Available
The new Amazon SageMaker Studio notebook capabilities are now available in all AWS Regions where Amazon SageMaker Studio is available except for the AWS China Regions.

At launch, the built-in data preparation capability powered by SageMaker Data Wrangler is supported for SageMaker Studio notebooks and the following notebook kernel images:

Python 3 (Data Science) with Python 3.7
Python 3 (Data Science 2.0) with Python 3.8
Python 3 (Data Science 3.0) with Python 3.10
Spark Analytics 1.0 and 2.0

For more information, visit Amazon SageMaker Notebooks.

Start building your ML projects with the next generation of Amazon SageMaker Notebooks today!

— Antje

New – Share ML Models and Notebooks More Easily Within Your Organization with Amazon SageMaker JumpStart

2022-11-30 Antje Barth

Post Syndicated from Antje Barth original https://aws.amazon.com/blogs/aws/new-share-ml-models-and-notebooks-more-easily-within-your-organization-with-amazon-sagemaker-jumpstart/

Amazon SageMaker JumpStart is a machine learning (ML) hub that can help you accelerate your ML journey. SageMaker JumpStart gives you access to built-in algorithms with pre-trained models from popular model hubs, pre-trained foundation models to help you perform tasks such as article summarization and image generation, and end-to-end solutions to solve common use cases.

Today, I’m happy to announce that you can now share ML artifacts, such as models and notebooks, more easily with other users that share your AWS account using SageMaker JumpStart.

Using SageMaker JumpStart to Share ML Artifacts
Machine learning is a team sport. You might want to share your models and notebooks with other data scientists in your team to collaborate and increase productivity. Or, you might want to share your models with operations teams to put your models into production. Let me show you how to share ML artifacts using SageMaker JumpStart.

In SageMaker Studio, select Models in the left navigation menu. Then, select Shared models and Shared by my organization. You can now discover and search ML artifacts that other users shared within your AWS account. Note that you can add and share ML artifacts developed with SageMaker as well as those developed outside of SageMaker.

To share a model or notebook, select Add. For models, provide basic information, such as title, description, data type, ML task, framework, and any additional metadata. This information helps other users to find the right models for their use cases. You can also enable training and deployment for your model. This allows users to fine-tune your shared model and deploy the model in just a few clicks through SageMaker JumpStart.

To enable model training, you can select an existing SageMaker training job that will autopopulate all relevant information. This information includes the container framework, training script location, model artifact location, instance type, default training and validation datasets, and target column. You can also provide custom model training information by selecting a prebuilt SageMaker Deep Learning Container or selecting a custom Docker container in Amazon ECR. You can also specify default hyperparameters and metrics for model training.

To enable model deployment, you also need to define the container image to use, the inference script and model artifact location, and the default instance type. Have a look at the SageMaker Developer Guide to learn more about model training and model deployment options.

Sharing a notebook works similarly. You need to provide basic information about your notebook and the Amazon S3 location of the notebook file.

Users that share your AWS account can now browse and select shared models to fine-tune, deploy endpoints, or run notebooks directly in SageMaker JumpStart.

In SageMaker Studio, select Quick start solutions in the left navigation menu, then select Solutions, models, example notebooks to access all shared ML artifacts, together with pre-trained models from popular model hubs and end-to-end solutions.

Now Available
The new ML artifact-sharing capability within Amazon SageMaker JumpStart is available today in all AWS Regions where Amazon SageMaker JumpStart is available. To learn more, visit Amazon SageMaker JumpStart and the SageMaker JumpStart documentation.

Start sharing your models and notebooks with Amazon SageMaker JumpStart today!

— Antje

AWS Machine Learning University New Educator Enablement Program to Build Diverse Talent for ML/AI Jobs

2022-11-30 Marcia Villalba

Post Syndicated from Marcia Villalba original https://aws.amazon.com/blogs/aws/aws-machine-learning-university-new-educator-enablement-program-to-build-diverse-talent-for-ml-ai-jobs/

AWS Machine Learning University is now providing a free educator enablement program. This program provides faculty at community colleges, minority-serving institutions (MSIs), and historically Black colleges and universities (HBCUs) with the skills and resources to teach data analytics, artificial intelligence (AI), and machine learning (ML) concepts to build a diverse pipeline for in-demand jobs of today and tomorrow.

According to the National Science Foundation, Black and Hispanic or Latino students earn bachelor’s degrees in Computer Science—the dominant pathway to AI/ML—at a much lower rate than their white peers, earning less than 11 percent of computer science degrees awarded. However, research shows that having diverse perspectives among skilled practitioners and across the AI/ML lifecycle contributes to the development of AI/ML systems that are safe, trustworthy, and have less bias.

In 2018, we announced the Machine Learning University (MLU) to share with all developers the same courses that we used to train engineers at Amazon and AWS. This platform offers self-service, self-paced, AI/ML digital courses.

And today, we add this new program to our AI/ML training offering. Although anyone could access the MLU self-paced learning, it places the burden on the learner to source prerequisite work and solutions. This educator enablement program takes the concepts and lessons developed by MLU and makes them more accessible to educators. It offers a year-round educator enablement program with lesson planning, course playbooks, and access to free compute resources.

Program Details
Educators are onboarded in small-group cohorts into bootcamps where they will learn the material and deep dive into how to teach it via instructor-led lectures and hands-on projects. Educators who complete the bootcamp can take part in different year-round development opportunities, such as a dedicated Slack channel to share teaching best practices, education topic series and virtual study sessions moderated by MLU instructors, and regional events for continued professional development. Also, they will receive continuing education credits and AWS-provided stipends.

Faculty and students get access to instructional material through Amazon SageMaker Studio Lab. SageMaker Studio Lab was announced last year and is AWS’s free (no credit card required) ML development environment. It provides computing and storage for anybody that wants to learn and experiment with ML. Institutions can unlock additional resources to support their ML programs by registering for AWS Academy. AWS Academy unlocks all the AWS services for a complete AI/ML program.

Community colleges and universities can integrate this educator enablement program into their computer science, information technology, and business curricula to create an AI/ML course, certificate, or degree. We have worked with educators and education boards such as Houston Community College to create content that is vetted for credit-worthy and degree-earning curricula.

In August 2022, we launched our first educator bootcamp in partnership with The Coding School. The bootcamp was delivered over two weeks, offering lectures, case studies, and hands-on projects. 25 educators completed the Educator Machine Learning Bootcamp, representing 22 US community colleges and universities.

Learn More and Join The Program
During 2023, AWS Machine Learning University will run six educator-enablement cohorts starting in January. The program will give priority consideration to educators at community colleges, MSIs, and HBCUs, in alignment with this program mission to increase access to AI/ML technology to historically underserved and underrepresented students.

If you are a computer science educator or part of a board of educators interested in fostering more depth in your computer science coursework, you should sign up for the educator enablement program.

— Marcia

New for Amazon Redshift – Simplify Data Ingestion and Make Your Data Warehouse More Secure and Reliable

2022-11-30 Danilo Poccia

Post Syndicated from Danilo Poccia original https://aws.amazon.com/blogs/aws/new-for-amazon-redshift-simplify-data-ingestion-and-make-your-data-warehouse-more-secure-and-reliable/

When we talk with customers, we hear that they want to be able to harness insights from data in order to make timely, impactful, and actionable business decisions. A common pattern with data-driven organizations is that they have many diﬀerent data sources they need to ingest into their analytics systems. This requires them to build manual data pipelines spanning across their operational databases, data lakes, streaming data, and data within their warehouse. As a consequence of this complex setup, it can take data engineers weeks or even months to build data ingestion pipelines. These data pipelines are costly, and the delays can lead to missed business opportunities. Additionally, data warehouses are increasingly becoming mission critical systems that require high availability, reliability, and security.

Amazon Redshift is a fully managed petabyte-scale data warehouse used by tens of thousands of customers to easily, quickly, securely, and cost-effectively analyze all their data at any scale. This year at re:Invent, Amazon Redshift has announced a number of features to help you simplify data ingestion and get to insights easily and quickly, within a secure, reliable environment.

In this blog, I introduce some of these new features that ﬁt into two main categories:

Simplify data ingestion
- Amazon Redshift now supports auto-copy from Amazon S3 (available in preview). With this new capability, Amazon Redshift automatically loads the files that arrive in an Amazon Simple Storage Service (Amazon S3) location that you specify into your data warehouse. The files can use any of the formats supported by the Amazon Redshift copy command, such as CSV, JSON, Parquet, and Avro. In this way, you don’t need to manually or repeatedly run copy procedures. Amazon Redshift automates file ingestion and takes care of data-loading steps under the hood.
- With Amazon Aurora zero-ETL integration with Amazon Redshift, you can use Amazon Redshift for near real-time analytics and machine learning on petabytes of transactional data stored on Amazon Aurora MySQL databases (available in limited preview). With this capability, you can choose the Amazon Aurora databases containing the data you want to analyze with Amazon Redshift. Data is then replicated into your data warehouse within seconds after transactional data is written into Amazon Aurora, eliminating the need to build and maintain complex data pipelines. You can replicate data from multiple Amazon Aurora databases into the same Amazon Redshift instance to run analytics across multiple applications. With near real-time access to transactional data, you can leverage Amazon Redshift’s analytics and capabilities, such as built-in machine learning (ML), materialized views, data sharing, and federated access to multiple data stores and data lakes, to derive insights from transactional and other data.
- With the general availability of Amazon Redshift Streaming Ingestion, you can now natively ingest hundreds of megabytes of data per second from Amazon Kinesis Data Streams and Amazon MSK into an Amazon Redshift materialized view and query it in seconds. Learn more in this post.
Make your data warehouse more secure and reliable
- You can now improve the availability of your data warehouse by choosing multiple Availability Zone (AZ) deployments. Multi-AZ deployments for your Amazon Redshift clusters are available in preview and reduce recovery times to seconds through automatic recovery. In this way, you can build solutions that are more compliant with the recommendations of the Reliability Pillar of the AWS Well-Architected Framework.
- With dynamic data masking (available in preview), you can protect sensitive information stored in your data warehouse and ensure that only the relevant data is accessible by users based on their roles. You can limit how much identifiable data is visible to users using multiple levels of policies so different users and groups can have different levels of data access without having to create multiple copies of data. Dynamic data masking complements other granular access control capabilities in Amazon Redshift including row-level and column-level security and role-based access controls. In this way, Dynamic Data Masking helps you meet requirements for GDPR, CCPA, and other privacy regulations.
- Amazon Redshift now supports central access controls for data sharing with AWS Lake Formation (available in public preview). You can now use Lake Formation to simplify governance of data shared from Amazon Redshift and centrally manage granular access across all data-sharing consumers.

There have been other interesting news for Amazon Redshift at re:Invent you might have already heard about:

The general availability of Amazon Redshift integration for Apache Spark makes it easy to build and run Spark applications on Amazon Redshift and Redshift Serverless, opening up the data warehouse for a broader set of AWS analytics and machine learning solutions.
AWS Backup now supports Amazon Redshift. AWS Backup allows you to define a central backup policy to manage data protection of your applications and can also protect your Amazon Redshift clusters. In this way, you have a consistent experience when managing data protection across all supported services.

Availability and Pricing
Multi-AZ deployments, central access control for data sharing with AWS Lake Formation, auto-copy from Amazon S3, and dynamic data masking are available in preview in US East (Ohio), US East (N. Virginia), US West (Oregon), Asia Pacific (Tokyo), Europe (Ireland), and Europe (Stockholm).

There is no additional cost for using auto-copy from Amazon S3 and near real-time analytics on transactional data. There is no extra charge for dynamic data masking and central access control for data sharing. For more information, see Amazon Redshift pricing.

These new capabilities take you one step further in analyzing all your data across data sources with simple data ingestion capabilities, while improving the security and reliability of your data warehouse.

— Danilo

New — Introducing Support for Real-Time and Batch Inference in Amazon SageMaker Data Wrangler

2022-11-30 Donnie Prakoso

Post Syndicated from Donnie Prakoso original https://aws.amazon.com/blogs/aws/new-introducing-support-for-real-time-and-batch-inference-in-amazon-sagemaker-data-wrangler/

To build machine learning models, machine learning engineers need to develop a data transformation pipeline to prepare the data. The process of designing this pipeline is time-consuming and requires a cross-team collaboration between machine learning engineers, data engineers, and data scientists to implement the data preparation pipeline into a production environment.

The main objective of Amazon SageMaker Data Wrangler is to make it easy to do data preparation and data processing workloads. With SageMaker Data Wrangler, customers can simplify the process of data preparation and all of the necessary steps of data preparation workflow on a single visual interface. SageMaker Data Wrangler reduces the time to rapidly prototype and deploy data processing workloads to production, so customers can easily integrate with MLOps production environments.

However, the transformations applied to the customer data for model training need to be applied to new data during real-time inference. Without support for SageMaker Data Wrangler in a real-time inference endpoint, customers need to write code to replicate the transformations from their flow in a preprocessing script.

Introducing Support for Real-Time and Batch Inference in Amazon SageMaker Data Wrangler
I’m pleased to share that you can now deploy data preparation flows from SageMaker Data Wrangler for real-time and batch inference. This feature allows you to reuse the data transformation flow which you created in SageMaker Data Wrangler as a step in Amazon SageMaker inference pipelines.

SageMaker Data Wrangler support for real-time and batch inference speeds up your production deployment because there is no need to repeat the implementation of the data transformation flow. You can now integrate SageMaker Data Wrangler with SageMaker inference. The same data transformation flows created with the easy-to-use, point-and-click interface of SageMaker Data Wrangler, containing operations such as Principal Component Analysis and one-hot encoding, will be used to process your data during inference. This means that you don’t have to rebuild the data pipeline for a real-time and batch inference application, and you can get to production faster.

Get Started with Real-Time and Batch Inference
Let’s see how to use the deployment supports of SageMaker Data Wrangler. In this scenario, I have a flow inside SageMaker Data Wrangler. What I need to do is to integrate this flow into real-time and batch inference using the SageMaker inference pipeline.

First, I will apply some transformations to the dataset to prepare it for training.

I add one-hot encoding on the categorical columns to create new features.

Then, I drop any remaining string columns that cannot be used during training.

My resulting flow now has these two transform steps in it.

After I’m satisfied with the steps I have added, I can expand the Export to menu, and I have the option to export to SageMaker Inference Pipeline (via Jupyter Notebook).

I select Export to SageMaker Inference Pipeline, and SageMaker Data Wrangler will prepare a fully customized Jupyter notebook to integrate the SageMaker Data Wrangler flow with inference. This generated Jupyter notebook performs a few important actions. First, define data processing and model training steps in a SageMaker pipeline. The next step is to run the pipeline to process my data with Data Wrangler and use the processed data to train a model that will be used to generate real-time predictions. Then, deploy my Data Wrangler flow and trained model to a real-time endpoint as an inference pipeline. Last, invoke my endpoint to make a prediction.

This feature uses Amazon SageMaker Autopilot, which makes it easy for me to build ML models. I just need to provide the transformed dataset which is the output of the SageMaker Data Wrangler step and select the target column to predict. The rest will be handled by Amazon SageMaker Autopilot to explore various solutions to find the best model.

Using AutoML as a training step from SageMaker Autopilot is enabled by default in the notebook with the use_automl_step variable. When using the AutoML step, I need to define the value of target_attribute_name, which is the column of my data I want to predict during inference. Alternatively, I can set use_automl_step to False if I want to use the XGBoost algorithm to train a model instead.

On the other hand, if I would like to instead use a model I trained outside of this notebook, then I can skip directly to the Create SageMaker Inference Pipeline section of the notebook. Here, I would need to set the value of the byo_model variable to True. I also need to provide the value of algo_model_uri, which is the Amazon Simple Storage Service (Amazon S3) URI where my model is located. When training a model with the notebook, these values will be auto-populated.

In addition, this feature also saves a tarball inside the data_wrangler_inference_flows folder on my SageMaker Studio instance. This file is a modified version of the SageMaker Data Wrangler flow, containing the data transformation steps to be applied at the time of inference. It will be uploaded to S3 from the notebook so that it can be used to create a SageMaker Data Wrangler preprocessing step in the inference pipeline.

The next step is that this notebook will create two SageMaker model objects. The first object model is the SageMaker Data Wrangler model object with the variable data_wrangler_model, and the second is the model object for the algorithm, with the variable algo_model. Object data_wrangler_model will be used to provide input in the form of data that has been processed into algo_model for prediction.

The final step inside this notebook is to create a SageMaker inference pipeline model, and deploy it to an endpoint.

Once the deployment is complete, I will get an inference endpoint that I can use for prediction. With this feature, the inference pipeline uses the SageMaker Data Wrangler flow to transform the data from your inference request into a format that the trained model can use.

In the next section, I can run individual notebook cells in Make a Sample Inference Request. This is helpful if I need to do a quick check to see if the endpoint is working by invoking the endpoint with a single data point from my unprocessed data. Data Wrangler automatically places this data point into the notebook, so I don’t have to provide one manually.

Things to Know
Enhanced Apache Spark configuration — In this release of SageMaker Data Wrangler, you can now easily configure how Apache Spark partitions the output of your SageMaker Data Wrangler jobs when saving data to Amazon S3. When adding a destination node, you can set the number of partitions, corresponding to the number of files that will be written to Amazon S3, and you can specify column names to partition by, to write records with different values of those columns to different subdirectories in Amazon S3. Moreover, you can also define the configuration in the provided notebook.

You can also define memory configurations for SageMaker Data Wrangler processing jobs as part of the Create job workflow. You will find similar configuration as part of your notebook.

Availability — SageMaker Data Wrangler supports for real-time and batch inference as well as enhanced Apache Spark configuration for data processing workloads are generally available in all AWS Regions that Data Wrangler currently supports.

To get started with Amazon SageMaker Data Wrangler supports for real-time and batch inference deployment, visit AWS documentation.

Happy building
— Donnie

Sessions for security leaders

INNOVATION TALK

BREAKOUT SESSIONS

The role of generative AI in security

BREAKOUT SESSIONS

CHALK TALKS

DEV CHAT

Architecting and operating container workloads securely

BREAKOUT SESSIONS

CHALK TALKS

WORKSHOP

BUILDER SESSION

Zero Trust

INNOVATION TALK

CHALK TALKS

WORKSHOPS

Managing identities and encrypting data

BREAKOUT SESSIONS

CHALK TALKS

WORKSHOPS

How long have you been at AWS and what do you do in your current role?

How did you get started in security? What about it piqued your interest?

How do you explain your job to non-technical friends and family?

In your opinion, what’s the coolest thing happening in identity right now?

You presented at AWS re:Inforce 2023. What was your session about and what do you hope attendees took away from it?

Is there anything you wish customers would ask about more often?

Where do you see the identity space heading in the future?

What are you most proud of in your career?

If you had to pick a career outside of tech, what would you want to do?

Sustainability of the cloud

Sustainability in the cloud

Sustainability through the cloud

Conclusion

See you next time!

Turn data into actionable insights

Strengthen security programs

Identity and networking

Threat detection and monitoring

Data protection and privacy

Management and governance

Do more with less

Security is our top priority

Key announcements

Watch on demand

Keynotes and leadership sessions

Launch announcements

Keynotes

Leadership sessions

Breakout sessions

Useful resources

About the author

Press releases

Amazon DataZone

Amazon Security Lake

Amazon Redshift

Amazon OpenSearch Service

Streaming services

AWS Glue

Amazon Athena

Amazon QuickSight

Amazon AppFlow

About the author

The collective thoughts of the interwebz