Tag Archives: Amazon Transcribe

Analyze media content using AWS AI services

Post Syndicated from Jack Bradham original https://aws.amazon.com/blogs/architecture/analyze-media-content-using-aws-ai-services/

Organizations managing large audio and video archives face significant challenges in extracting value from their media content. Consider a radio network with thousands of broadcast hours across multiple stations and the challenges they face to efficiently verify ad placements, identify interview segments, and analyze programming patterns.

In this post, we demonstrate how you can automatically transform unstructured media files into searchable, analyzable content. By combining Amazon Transcribe, Amazon Bedrock, Amazon QuickSight, and Amazon Q, organizations can achieve the following:

  • Process and transcribe media files upon upload
  • Identify commercials, interviews, and program segments
  • Extract insights using foundation models (FMs)
  • Create a searchable knowledge base
  • Generate rich visualizations for decision-making
  • Enable natural language queries across their media archive
  • Visualize complex information with intuitive graphics

In the following sections, we explore how these AWS services work together to help organizations unlock the full potential of their media content, whether for advertising compliance, content analysis, or discovering specific segments within thousands of hours of recordings.

Solution overview

This solution provides an event-driven media analysis pipeline that transforms how you manage and extract value from your content:

  • Streamline content management – Automatically process media files the moment they’re uploaded, saving time and reducing manual work
  • Unlock deeper insights – Generate accurate transcriptions that capture not just words, but the full context of your content—including speakers, timing, and key moments
  • Harness AI – Automatically extract meaningful insights and uncover hidden patterns in your media without extensive manual review
  • Build a searchable knowledge base – Turn scattered media files into a discoverable catalog that your entire team can use
  • Build a customizable interface – Create a customizable UI to search the catalog
  • Create powerful visualizations – Bring your insights to life with intuitive visualizations that make complex information immediately understandable

The following diagram illustrates our architecture.

Media Analysis Architecture

This event-driven architecture automatically processes and analyzes multimedia content using AWS services. The workflow consists of the following steps:

  1. A user uploads media files to an Amazon Simple Storage Service (Amazon S3) bucket. A “New Media” event triggers the first AWS Step Functions workflow. This workflow handles the initial cataloging based on values in the file name and launches the transcription process.
  2. Amazon Transcribe converts the audio into accurate, readable text. The transcribed content is securely saved to an S3 bucket for further analysis. A “Transcription Complete” event triggers the next step.
  3. A second Step Functions workflow processes the transcription. Using predefined prompts, Amazon Bedrock analyzes the transcripts to extract meaningful information. Key insights extracted from the transcript are stored in an S3 data lake.
  4. The processed results are organized systematically, structured by date (year/month/day) and tagged with relevant attributes. This organized data enables natural language queries through Amazon Q when used as a knowledge base, interactive visualizations using QuickSight, and straightforward content discovery and analysis.
  5. Amazon Athena serves as the data exploration tool to query the data lake. Athena is used as the data source in QuickSight, which turns complex data into clear, compelling visuals.

This architecture automatically transforms raw media content into searchable, analyzable data while maintaining an organized hierarchy for efficient access and analysis. The event-driven design provides automatic processing of new content as it arrives, and the combination of AWS AI services enables deep content understanding and insight extraction. Each AWS service plays a crucial role in transforming your media content:

  • Amazon Bedrock – Reviews content after transcription for insights and entity extraction:
    • Uses advanced FMs to analyze transcripts
    • Identifies commercials, interviews, and program segments
    • Extracts meaningful insights from content
  • Amazon EventBridge – Triggers actions in the cataloging workflow:
    • Monitors for new media files and completed transcriptions
    • Automatically triggers Step Functions workflows
  • AWS Lambda – Handles custom code actions needed in the workflow:
    • Facilitates interaction with Amazon Bedrock
    • Executes custom prompts on transcripts
    • Enables flexible, scalable processing
  • Amazon Q – Serves as the frontend and Retrieval Augmented Generation (RAG) engine:
    • Addresses enterprise generative AI needs by providing a turnkey solution with built-in security features like single sign-on (SSO) integration and responsible AI governance policies
    • Allows businesses to quickly deploy AI assistance while maintaining compliance, data privacy, and security standards
    • Enables natural language queries across the media archive
    • Links results to the source media files
    • Provides conversational access to content
  • Amazon QuickSight – Turns insights in beautiful visualization for better consumption:
    • Creates interactive dashboards and visualizations
    • Displays comprehensive media analytics
    • Helps track advertising, programming, and content patterns
  • Amazon S3 – Stores assets and the catalog:
    • Securely stores raw media files, transcripts, and processed data
    • Automatically triggers processing when new content is uploaded
  • AWS Step Functions – Orchestrates the entire content processing workflow:
    • Manages transcription and AI analysis steps
    • Provides robust error handling and automatic retries
  • Amazon Transcribe – Converts speech to accurate, readable text:
    • Identifies speakers and timestamps
    • Provides accurate transcriptions of audio content

Security considerations

Although this post focuses on the technical implementation of media content analysis, it’s important to acknowledge that production deployments should include comprehensive security measures:

  • Data storage security (Amazon S3):
    • Enable server-side encryption using AWS Key Management Service (AWS KMS) keys
    • Apply bucket policies restricting access to authorized principals only
    • Enable Amazon S3 Block Public Access at account and bucket levels
    • Enable versioning for data recovery
    • Implement lifecycle policies for data retention
    • Enable S3 access logging
    • Use presigned URLs for temporary access
  • Identity and Access Management (IAM):
    • Create dedicated service roles with minimum required permissions for:
      • Step Functions execution
      • Amazon Transcribe jobs
      • Amazon Bedrock API calls
      • Athena queries
    • Implement role-based access control
    • Regularly rotate credentials
    • Enable multi-factor authentication (MFA) for all users
    • Use AWS Organizations for multi-account management
  • Network security:
    • Deploy virtual private cloud (VPC) endpoints for:
      • Amazon S3
      • Athena
      • QuickSight
    • Implement network access control lists (ACLs) and security groups
    • Enable VPC Flow Logs
    • Use AWS PrivateLink where applicable
    • Configure route tables to control traffic flow
  • Data encryption:
    • Implement AWS KMS encryption for S3 objects
    • Use TLS 1.2+ for all API communications
    • Enable automatic key rotation in AWS KMS
    • Implement envelope encryption for sensitive data
  • Monitoring and detection:
    • Enable AWS CloudTrail for API activity logging
    • Configure Amazon GuardDuty for threat detection
    • Set up Amazon CloudWatch:
      • Metrics for service health
      • Alarms for security events
      • Log groups for application logs
    • Enable S3 server access logging
    • Configure VPC Flow Logs
  • Access controls:
    • Implement fine-grained access controls for:
      • Amazon Bedrock model access
      • Athena query permissions
      • QuickSight dashboard sharing
    • Conduct regular access reviews

Additionally, compliance requirements and data governance policies might impact how you implement this solution in your environment.

These security considerations are crucial but beyond the scope of this post. We recommend consulting AWS security best practices and working with your security team to implement appropriate measures for your specific use case. For more information on AWS security best practices, refer to Best Practices for Security, Identity, & Compliance.

The following sections walk you through setting up each component of the architecture to help you transform raw media into actionable insights.

Prerequisites

The following are the prerequisites to follow along this post:

Create S3 buckets

For this solution, we create three distinct buckets to support the media analytics workflow:

  • Raw media bucket for incoming files
  • Transcription outputs bucket
  • Processed insights bucket

For instructions on creating buckets, refer to Creating a general purpose bucket.

Configure EventBridge

You can enable event notifications on the raw media bucket to trigger your automated workflow through EventBridge. Establish your automation backbone by monitoring S3 bucket activities. When new media arrives or transcription completes, EventBridge will trigger the appropriate workflow, providing continuous processing. For further instructions, refer to Creating rules that react to events in Amazon EventBridge.

The following are two example triggers that can be used to filter events and trigger Step Functions workflows. The following is an example filter for new media files:

{
  "source": ["aws.s3"],
  "detail-type": ["Object Created"],
  "detail": {
    "bucket": {
      "name": ["rawinputbucket"]
    }
  }
}

The following is an example filter for new transcripts added to the data lake:

{
  "source": ["aws.s3"],
  "detail-type": ["Object Created"],
  "detail": {
    "bucket": {
      "name": ["business-data-lake"]
    },
    "object": {
      "key": [{
        "suffix": ".transcription"
      }]
    }
  }
}

Create Step Functions workflows

We design the orchestration layer with two key workflows. The first handles media intake and transcription, and the second manages AI analysis. Each workflow includes safeguards for potential failures and retry mechanisms. For further instructions, refer to Learn how to get started with Step Functions.

The following diagram shows an example of processing new media uploads for indexing and transcription.

Media Analysis - Step Function Example

The following diagram shows an example of the Step Functions workflow that analyzes the transcription.

Set up Amazon Transcribe

To create an Amazon Transcribe job, you need permissions to do so. You can implement a speech-to-text conversion with powerful features like language detection, speaker identification, and custom vocabulary support to provide accurate transcription of your media content. For further instructions, refer to How Amazon Transcribe works.

Configure Amazon Bedrock

You can power your AI analysis engine by setting up precise prompts that extract meaningful insights. Amazon Bedrock processes transcripts to identify key segments, speakers, and topics, transforming raw text into structured data. For instructions, refer to Design a prompt.

The following is a sample prompt:

You will be reviewing a radio transcription to identify advertisements and extract relevant details. Your task is to analyze the provided transcript and output the results in a specific JSON format based on a given schema.
Please follow these steps to complete the task:
1. Carefully read through the entire transcript.
2. Identify all advertisements within the transcript. Look for clear indicators such as product mentions, promotional language, or transitions from regular content to commercial content.
3. For each advertisement you identify, determine the following information:
    - Company: The name of the company being advertised
    - Start time: The timestamp in the transcript where the ad begins
    - End time: The timestamp in the transcript where the ad ends
    - Product: The specific product or service being advertised
4. Format your findings into a JSON object that follows the provided schema. Each advertisement should be a separate object within an array.
5. Ensure these fields in your response are provided for each advertisement.. All are required fields: company, starttime, endtime, product. 
6. Use precise timestamps for start and end times. If exact times are not available, make a best estimate based on the transcript's context.
7. If a particular field is unclear or not explicitly mentioned in the transcript, you may use "Unknown" as the value.
8. Only respond with json and nothing else. Do not provide comments or explain your answer. 
9. Surround the JSON response with standard ```json markers
Here's an example of how your output should be formatted:
{
"advertisements": [
        {
            "company": "TechGadgets Inc.",
            "starttime": "00:05:30",
            "endtime": "00:06:15",
            "product": "SmartHome Hub"
        },
        {
            "company": "FreshFoods Market",
            "starttime": "00:15:45",
            "endtime": "00:16:30",
            "product": "Organic Produce Delivery Service"
        }
    ]
}
Do not add any fields that are not specified in the schema, and ensure all required fields are present for each advertisement.

Create a structured data lake

We create a hierarchical data organization strategy that enables efficient access and analysis. You can use AWS Glue crawlers to automatically discover and catalog your media metadata. For instructions, refer to Using crawlers to populate the Data Catalog. Configure Athena tables to enable SQL-based querying of your media insights:

CREATE OR REPLACE VIEW "commercials_view" AS 
SELECT
  metadata.market market
, metadata.station_call station_call
, metadata.format_type format_type
, CAST(metadata.timestamp AS timestamp) timestamp
, ads.company adCompany
, ads.product adProduct
, ads.starttime
, ads.endtime
FROM
  (commercials
CROSS JOIN UNNEST(advertisements) t (ads))

Set Up Amazon Q

You can enable natural language interaction with your media archive using Amazon Q Business. Configure the knowledge base and metadata to make your content searchable and accessible through conversational queries. Use the processed insights S3 buckets to configure the knowledge base. For instructions, refer to Getting started with Amazon Q Business.

The following screenshot shows example conversations with an AI assistant.

Build QuickSight dashboards

With QuickSight, you can create visual analytics that bring your data to life. Connect to Athena views to display advertising patterns, content analysis, and performance metrics in interactive dashboards. For more information, refer to Tutorial: Create an Amazon QuickSight dashboard.

The following screenshots are a few examples of dashboards created for a fictitious radio station as part of our use case.

Validate and optimize your media analytics solution

After you implement your media analytics architecture, follow these critical steps to achieve robust performance and alignment with your organization’s needs. First, configure a comprehensive testing approach. Imagine you’re preparing to launch your media analysis solution. Your testing journey begins with accuracy validation:

  • Compare transcription outputs against original media
  • Verify AI-generated insights for precision
  • Use representative sample sets from your content library

You start by taking a recently processed radio show and comparing its transcription against the original broadcast. Your team meticulously reviews the AI-generated insights, checking if key moments like ad transitions or interview segments are correctly identified. To make sure your system works across all content types, you select diverse samples from your library—perhaps a morning talk show, an evening news segment, and a weekend sports broadcast. Next, you delve into performance benchmarking:

  • Measure processing time for different media types
  • Evaluate resource utilization across AWS services
  • Identify potential bottlenecks in the workflow

Time how long it takes to process different types of media files, from short commercial segments to lengthy program recordings. As you watch how your AWS services respond under various loads, you can monitor resource consumption patterns. This helps you identify processing bottlenecks—for instance, you might discover that certain file types take longer to transcribe or that concurrent processing needs optimization. Finally, you put yourself in your users’ shoes for a thorough user experience assessment:

  • Test natural language queries with Amazon Q
  • Validate search result relevance
  • Gather feedback from potential end-users

Team members can interact with Amazon Q, asking questions they would naturally pose when searching for specific content. For example, you can test whether searching for “interviews about climate change last week” returns relevant results. Gathering feedback from potential users—perhaps a content manager with different needs than a compliance officer—provides invaluable insights. Their real-world experiences guide your refinements and make sure the system serves its intended purpose. This comprehensive testing approach, combining structured evaluation with real-world scenarios, sets the stage for a robust and user-friendly media analysis solution. As your media analysis solution moves from initial deployment to production, optimizing its performance becomes crucial for both cost-efficiency and user satisfaction. A radio network processing thousands of hours of content weekly might find that even small improvements in transcription accuracy or processing speed can lead to significant cost savings and better content discoverability. Similarly, a marketing team analyzing ad placements across multiple stations needs precise insights to make data-driven decisions about advertising effectiveness. With these business imperatives in mind, consider the following configuration optimization strategies:

  • Transcription refinement:
    • Adjust language models for domain-specific terminology
    • Fine-tune speaker identification settings
    • Implement custom vocabularies for improved accuracy
  • AI insight generation:
    • Refine prompts for more targeted analysis
    • Experiment with different AI models
    • Align extraction parameters with business objectives
  • Scalability considerations:
    • Test workflow performance with increasing media volumes
    • Implement appropriate auto scaling configurations
    • Monitor cost-effectiveness of your architecture
  • Continuous improvement:
    • Establish regular review cycles
    • Track key performance metrics
    • Iterate on your solution based on real-world usage

We recommend starting with a pilot implementation and gradually expanding your media analytics capabilities.

Clean up

To avoid incurring ongoing charges, clean up the resources you created as part of this solution:

  1. Delete QuickSight resources:
    1. Delete dashboards created for media analytics.
    2. Delete the datasets connected to Athena.
    3. If no longer needed, delete the QuickSight Enterprise subscription.
  2. Delete S3 buckets:
    1. Empty and delete the raw media bucket, transcription outputs bucket, and processed insights bucket.
  3. Remove EventBridge rules:
    1. Delete the rules created for monitoring S3 bucket activities.
    2. Remove targets associated with these rules.
  4. Delete Step Functions workflows:
    1. Delete the media intake and transcription workflow.
    2. Delete the AI analysis workflow.
  5. Remove Lambda functions:
    1. Delete Lambda functions created for interaction with Amazon Bedrock.
    2. Remove associated IAM roles and policies.
  6. Clean up data lake components:
    1. Delete Athena views and tables.
    2. Remove AWS Glue crawlers and databases.
    3. Delete stored query results.
  7. Remove Amazon Q configurations:
    1. Delete knowledge bases created.
    2. Remove custom configurations.
  8. Remove Amazon Bedrock settings:
    1. Remove custom prompts.
    2. Disable access to FMs if no longer needed.
  9. Delete Amazon Transcribe settings:
    1. Remove custom vocabularies.
    2. Delete stored transcription jobs.
  10. Remove IAM resources:
    1. Delete custom IAM roles created for this solution.
    2. Remove associated IAM policies.
  11. Complete additional cleanup:
    1. Delete CloudWatch Logs groups associated with Lambda functions.
    2. Remove CloudWatch alarms or metrics created for monitoring.
    3. Delete saved queries in Athena.

Common use cases

Organizations in different sectors can use this architecture to unlock value from their audio and video content. You can adapt this solution to meet your specific needs, such as managing broadcast media, corporate communications, educational materials, and more. Let’s explore how different industries might apply this technology:

  • Media and broadcasting:
    • Track advertising compliance
    • Verify media placement accuracy
    • Analyze broadcast content at scale
  • Corporate and enterprise:
    • Convert meeting recordings into searchable knowledge bases
    • Identify key decisions and action items
    • Enhance organizational knowledge management
  • Education and training:
    • Create comprehensive, topic-based course catalogs
    • Index training materials for quick retrieval
    • Support continuous learning initiatives
  • Legal services:
    • Generate precise, timestamped transcripts
    • Develop searchable legal proceeding archives
    • Improve document review efficiency
  • Healthcare:
    • Extract critical medical insights from consultations
    • Categorize patient interaction data
    • Support clinical documentation processes
  • Government and public sector:
    • Build comprehensive public meeting archives
    • Implement automated topic categorization
    • Enhance transparency and accessibility
  • Customer service:
    • Analyze call recordings for quality improvement
    • Identify service trends and customer pain points
    • Drive continuous customer experience enhancement

This media analytics architecture demonstrates notable versatility. By using AI, organizations can transform raw audio and video content into structured, meaningful insights that drive decision-making across industries.

Conclusion

In this post, we demonstrated how to use AWS services to convert unstructured media content into actionable intelligence. By combining Amazon Transcribe, Amazon Bedrock, QuickSight, and Amazon Q, you can create a scalable, automated solution for media analysis that adapts to your organizational needs.

This solution offers the following key architectural advantages:

  • Automated media file processing at scale
  • AI-powered insight generation
  • Natural language search capabilities
  • Interactive decision-making visualizations
  • Flexible, maintainable infrastructure

Organizations can now convert content into searchable knowledge, extract insights automatically, develop data-driven content strategies, and enhance operational efficiency through automation.

As audio and video content generation continues to accelerate, the ability to efficiently process and extract value becomes increasingly critical. This architecture provides a robust foundation for current needs while remaining adaptable to future technological innovations.

We invite you to explore how this media analytics solution can address your organization’s unique challenges. Consider your specific use cases and unlock the insights waiting to be discovered in your media archives.


About the authors

The serverless attendee’s guide to AWS re:Invent 2024

Post Syndicated from Julian Wood original https://aws.amazon.com/blogs/compute/the-serverless-attendees-guide-to-aws-reinvent-2024/

AWS re:Invent 2024 offers an extensive selection of serverless and application integration content.

AWS re:Invent Banner

AWS re:Invent Banner

For detailed descriptions and schedule, visit the AWS re:Invent Session Catalog.

Join AWS serverless experts and community members at the AWS Modern Apps and Open Source Zone in the AWS Expo Village. This serves as a hub for serverless discussions at re:Invent. While you are there, enjoy a free coffee and learn about serverless architectures at the Serverlesspresso booth. There are two this year, another one at the Certificate Lounge. The AWS Expo Village also includes Serverless and Serverless Containers booths.

Don’t have a ticket yet? Join us in Las Vegas from November 28-December 2, 2022 by registering for re:Invent 2024.

This guide organizes the sessions into categories to help you find the content this is most relevant to you.

Session Types

  • Breakout Sessions are lecture-style presentations covering architecture, best practices, and deep dives into AWS services.
  • Workshops are 2-hour hands-on sessions where you work through tasks in AWS accounts using AWS services. Laptops are required and AWS credits are provided.
  • Chalk Talks are highly interactive 60-minute sessions with smaller audiences, focused on technical deep dives with whiteboards for architectural discussions.
  • Builders’ Sessions are 60-minute small-group sessions led by an AWS expert who guides you through a technical problem using AWS services.
  • Code Talks are 60-minute live coding sessions where AWS experts show how to build solutions using AWS services.

Leadership session: Nick Coult, Usman Khalid, Kathleen deValk

  • SVS211: Celebrating 10 years of pioneering serverless and containers – Breakout.
    • Explore how serverless has evolved to help organizations drive the highest performance, availability, and security at low costs.

Getting started sessions

Are you new to serverless or taking your first steps? Hear from AWS experts and customers on best practices and strategies for building serverless workloads. Get hands-on with services by attending a workshop or builders session. Create the next great “to do” app or add a new customer experience for a theme park.

  • SVS202: Thinking serverless – Chalk Talk
    • Learn how to approach building solutions with a serverless mindset by breaking down business problems into serverless building blocks.
  • SVS205: Building a serverless web application for a theme park – Workshop
    • Learn how to build a complete serverless web application for a theme park called Innovator Island.
  • SVS201: Getting started with serverless patterns – Workshop
    • Learn how to recognize and apply common serverless patterns by building production-ready code for a serverless application.
  • SVS204: Write less code: Building applications with a serverless mindset – Builders Session
    • Get more value by using built-in integrations between AWS services through configuration rather than writing glue code.
  • SVS207: Effectively model costs for your serverless applications – Chalk Talk
    • Gain insights into modeling the cost of serverless applications on AWS by considering request loads, payload sizes, and service pricing.
  • API201: The AWS Step Functions workshop – Workshop
    • Learn about the features of AWS Step Functions through hands-on interactive modules.
  • API204: Building event-driven architectures – Workshop
    • Learn about the basics of event-driven design using examples involving Amazon SNS, Amazon SQS, AWS Lambda, Amazon EventBridge, and more.
  • API205: Unlock the power of an exceptional serverless developer experience – Code Talk
    • Learn how to accelerate your serverless development with AWS tools, including Amazon Q Developer integrated into IDEs.
  • SEG209: Getting started building serverless SaaS architectures
    • Discover how to build your first serverless application, and learn how to handle multi-tenant architectures for SaaS applications.

Understanding serverless architectures

  • SVS208: Balance consistency and developer freedom with platform engineering – Breakout
    • Learn how platform teams can provide opinionated security, cost, observability, reliability, and sustainability patterns while maintaining developer flexibility.
  • SVS209: Containers or serverless functions: A path for cloud-native success – Breakout
    • Explore the fundamental differences between containers and serverless functions through real-world scenarios and insights into choosing the right approach.
  • OPN301: Level up your serverless applications with Powertools for AWS Lambda – Workshop
    • Learn why Powertools for AWS Lambda can be the developer toolkit of choice for serverless workloads.
  • DEV341: From single to multi-tenant: Scaling a mission-critical serverless app
    • Explore how to transition a mission-critical application from a single-tenant to a multi-tenant architecture
  • DEV337: Zero to production serverless in 8 weeks
    • Hear about a real-world project journey, from concept to production in only eight weeks. Expect practical insights, mistakes, tips, and how using the right technologies and development process can deliver results fast.

Building event-driven applications

  • API204: Building event-driven architectures – Workshop
    • Learn about the basics of event-driven design using examples involving Amazon SNS, Amazon SQS, AWS Lambda, Amazon EventBridge, and more.
  • API206: How event-driven architectures can go wrong and how to fix them – Chalk Talk
    • Explore common event-driven pitfalls including YOLO events, god events, observability soup, event loops, and surprise bills.
  • DEV321: Choosing the right serverless compute services
    • Learn when to use AWS serverless compute services like AWS Lambda and Amazon ECS on AWS Fargate and how to integrate them into your application architectures.
  • API307: Event-driven architectures at scale: Manage millions of events – Breakout
    • Discover proven patterns for building high-scale event-driven systems that can be effectively managed across a distributed organization with Amazon EventBridge.
  • SVS206: Building an event sourcing system using AWS serverless technologies – Chalk Talk
    • Explore strategies for building effective event sourcing architectures using AWS serverless technologies to store application state as an append-only event log.
  • COP408: Coding for serverless observability
    • Join this code talk to learn best practices for collecting signals from your serverless applications. Dive deep into techniques to effectively instrument your applications to provide you with optimal observability.

Incorporating orchestration

  • API201: The AWS Step Functions workshop – Workshop
    • Learn about the features of AWS Step Functions through hands-on interactive modules.
  • API203: Building common orchestrated workflows with AWS Step Functions – Builders Session
    • Build three orchestrated workflows, including streamlined data processing with Distributed Map state, external system integration using callback, and implementing the saga pattern.
  • API207: Optimize data processing with built-in AWS Step Functions features – Chalk Talk
    • Learn to optimize your serverless data processing workflows at scale using AWS Step Functions features, including intrinsic functions and Distributed Map state.
  • API402: Building advanced workflows with AWS Step Functions – Breakout
    • Learn how you can use generative AI to generate state machines automatically from textual descriptions and chat with your workflow to optimize it.

Understanding integration patterns

  • API208: Building an integration strategy for the future – Breakout
    • Boost productivity and create better customer experiences by building a modern integration strategy using AWS application, data, and file integration services.
  • API306: Integration patterns for distributed systems – Breakout
    • Learn about common design trade-offs for distributed systems and how to navigate them with design patterns, illustrated with real-world examples.
  • API311: Application integration for platform builders – Breakout
    • Explore the implementation of application integration using serverless components in enterprise environments.

Building APIs and frontends

  • SVS203: Create your first API from scratch with OpenAPI and Amazon API Gateway – Builders Session
    • Learn how to design and provision complete APIs using infrastructure as code following the OpenAPI specification.
  • API303: Building modern API architectures: Which front door should I use? – Chalk Talk
    • Explore options for building modern APIs including REST, GraphQL, and real-time APIs along with their benefits and drawbacks.
  • API304: Building rate-limited solutions on AWS – Chalk Talk
    • Learn some of the best ways to build rate limiting into your systems for improved reliability.
  • API305: Asynchronous frontends: Building seamless event-driven experiences – Breakout
    • Explore patterns to enable asynchronous, event-driven integrations with the frontend designed for architects and frontend, backend, and full-stack engineers.

Diving deep into advanced topics

  • SVS401: Best practices for serverless developers – Breakout
    • Discover architectural best practices, optimizations, and useful shortcuts for building production-ready serverless workloads.
  • SVS403: From serverful to serverless Java – Workshop
    • Learn how to bring your traditional Java Spring application to AWS Lambda with minimal effort and iteratively apply optimizations.
  • SVS406: Scale streaming workloads with AWS Lambda – Chalk Talk
    • Learn how to implement parallel processing techniques for ordered and unordered use cases to address throughput limitations in streaming data processing.

Processing data

  • SVS404: Building serverless distributed data processing workloads – Workshop
    • Learn how serverless technologies like AWS Step Functions and AWS Lambda can help you simplify management and scaling of distributed data processing.
  • API401: Multi-tenant Amazon SQS queues: Mitigating noisy neighbors – Chalk Talk
    • Explore advanced strategies for managing multi-tenant Amazon SQS queues and effective mitigation techniques, including shuffle sharding and overflow queues.
  • SVS321: AWS Lambda and Apache Kafka for real-time data processing applications – Breakout
    • Gain practical insights into building scalable, serverless data processing applications by integrating AWS Lambda with Apache Kafka.

Incorporating generative AI

  • API209: Generative AI at scale: Serverless workflows for enterprise-ready apps – Workshop
    • Learn to build enterprise-ready, scalable generative AI applications that can scale from serving 100 to 100,000 users.
  • API310: Build a meeting summarization solution with generative AI & serverless – Code Talk
    • See live coding of a serverless application for producing meeting summaries with generative AI using Amazon Transcribe and Amazon Bedrock, orchestrated with AWS Step Functions.
  • SVS319: Unlock the power of generative AI with AWS Serverless – Breakout
    • Learn to harness AWS Serverless to build robust, cost-effective generative AI applications. Explore using AWS Step Functions to orchestrate complex AI workflows.
  • SVS325: Secure access to enterprise generative AI with serverless AI gateway – Chalk Talk
    • Explore how to architect a serverless AI gateway on AWS to securely integrate and consume large language models from multiple providers.

Additional resources

For social activities see the Unofficial list of AWS re:Invent Conference and Vendor Parties.

If you are attending re:Invent, connect at our AWS Modern Apps and Open Source Zone in the AWS Expo Village. The AWS Expo Village also includes Serverless and Serverless Containers booths.

If you can not join us in-person, breakout sessions will be available via our YouTube channel after the event.

We look forward to seeing you at re:Invent 2024! For more serverless learning resources, visit Serverless Land.

AWS Weekly Roundup: AWS BuilderCards at re:Invent 2024, AWS Community Day, Amazon Bedrock, vector databases, and more (Nov 18, 2024)

Post Syndicated from Elizabeth Fuentes original https://aws.amazon.com/blogs/aws/aws-weekly-roundup-aws-buildercards-at-reinvent-2024-aws-community-day-amazon-bedrock-vector-databases-and-more-nov-18-2024/

This week, we wrapped up the final 2024 Latin America Amazon Web Services (AWS) Community Days of the year in Brazil, with multiple parallel events taking place. In Goiânia, we had Marcelo Palladino, senior developer advocate, and Marcelo Paiva, AWS Community Builder, as keynote speakers. Florianópolis feature Ana Cunha, senior developer advocate, and in Santiago de Chile, I had the honor to share the stage with Rossana Suarez, AWS Container Hero, as keynote speakers. These events, organized by communities for communities, provide opportunities to network, learn something new, and immerse yourself in the community. In a community, everyone grows together, and no one is left behind.

AWS Lambda celebrates its 10th anniversary, the service that introduced me to AWS and remains my favorite. Born from customer needs, it revolutionized cloud computing by allowing code execution without server management. Since its inception, documented in this LinkedIn post by Dr. Werner Vogels, Chief Technology Officer at Amazon.com, through the original PR/FAQ document, the service has grown significantly, introducing features such as 1ms billing precision and support for 10GB memory. Thank you AWS Lambda, here’s to many more anniversaries.

Amazon invests $110 million to support AI research at universities using Trainium chips. The initiative provides computing resources using AWS Trainium chips, enabling researchers to develop new AI architectures and machine learning innovations that will be open-sourced for broader advancement. Check out the Linkedin post by Matt Garman, CEO at AWS.

Last week’s launches
AWS BuilderCards second edition at re:Invent 2024Jeff Barr announced the launch of the second edition of AWS BuilderCards at re:Invent 2024. It includes improvements to the design and game mechanics, plus a new add-on pack on generative AI. Over 15,000 sets have been distributed at previous events, with excellent user feedback. They’ll be available for online purchase after re:Invent.

Amazon EventBridge announces up to 94% improvement in end-to-end latency for Event BusesAmazon EventBridge has improved end-to-end latency for Event Buses by up to 94%, reducing average latency from 2235.23ms (measured in January 2023) to 129.33ms (measured in August 2024 at P99). This enhancement enables faster processing for time-sensitive applications such as fraud detection, industrial automation, and gaming across all AWS Regions where Amazon EventBridge is available, including the AWS GovCloud (US) Regions, at no additional cost to you.

Introducing resource control policies (RCPs), a new type of authorization policy in AWS OrganizationsResource control policies (RCPs), a new authorization policy in AWS Organizations. RCPs allow centralized control over maximum permissions granted to resources, complementing service control policies (SCPs) that control permissions for principals. RCPs can restrict external access to resources like Amazon Simple Storage Service (Amazon S3) buckets, enforcing a data perimeter across the organization.

Replicate changes from databases to Apache Iceberg tables using Amazon Data Firehose (in preview) – A new preview capability in Amazon Data Firehose that captures and replicates database changes to Apache Iceberg tables on Amazon S3. This feature supports PostgreSQL and MySQL databases, providing a simple solution to stream database updates without impacting performance. It automatically handles data partitioning and schema evolution, eliminating the need for complex ETL processes.

Amazon S3 now supports up to 1 million buckets per AWS account– Amazon S3 has increased its default bucket quota from 100 to 10,000 per AWS account. Customers can now request increases up to 1 million buckets. The first 2,000 buckets are free, with a small monthly fee applying thereafter for additional buckets.

Amazon Keyspaces (for Apache Cassandra) reduces prices by up to 75%Amazon Keyspaces (for Apache Cassandra) announces significant price reductions of up to 75%. The service reduces on-demand mode pricing by up to 56% for single-region and 65% for multi-region usage. Time-to-live (TTL) delete prices are also reduced by 75%.

Centrally managing root access for customers using AWS OrganizationsAWS Identity and Access Management (IAM) launches a new capability for centrally managing root access in AWS Organizations. This feature allows security teams to remove long-term root credentials from member accounts and use temporary, task-scoped root sessions for specific actions. The solution enhances security by eliminating permanent root credentials while maintaining the ability to perform necessary privileged operations.

Amazon DynamoDB reduces prices for on-demand throughput and global tablesAmazon DynamoDB announces significant price reductions, cutting on-demand mode throughput costs by 50% and global tables by up to 67%. Multi-region replicated writes now match single-region pricing. These changes make on-demand mode the recommended choice for most DynamoDB workloads.

Amazon Q Developer plugins for Datadog and Wiz now generally availableAmazon Q Developer now offers plugins for Datadog and Wiz services, allowing users to access these partners features directly through the AWS Console. Users can query information using natural language commands like @datadog or @wiz to get real-time updates and security insights.

Other AWS blog posts
Here are some additional projects and blog posts that you might find interesting:

Introducing Stable Diffusion 3.5 Large in Amazon SageMaker JumpStart – This powerful 8.1 billion parameter model enables high-quality, photorealistic image generation from text prompts. Customers can seamlessly deploy and use the model in Amazon SageMaker JumpStart, benefiting from Amazon SageMaker security and machine learning operations (MLOps) capabilities.

Transcribe, translate, and summarize live streams in your browser with AWS AI and generative AI services – This blog post explains how we developed a Chrome extension that uses AI services to enhance live streaming experiences. The extension use Amazon Transcribe, Amazon Translate, and Amazon Bedrock to provide real-time transcription, translation, and summarization of live streams directly in the browser. It supports over 50 languages for transcription and 75 for translation, making content globally accessible.

Simplify automotive damage processing with Amazon Bedrock and vector databases –This blog post presents a solution combining Amazon Bedrock and vector databases to streamline automotive damage assessment. The system uses AI to analyze vehicle damage images, provide cost estimates, and match with similar cases from existing datasets. It use Anthropic’s Claude 3 and Amazon Titan Multimodal Embeddings, for efficient, accurate processing.

Revolutionize trip planning with Amazon Bedrock and Amazon Location Service – Amazon Bedrock and Amazon OpenSearch Service vector databases combine to automate automotive damage assessment, using AI to analyze images and match them with historical data for accurate repair estimates.

Upcoming AWS events
Check your calendars and sign up for upcoming AWS events:

AWS Community Days – Join community-led conferences featuring technical discussions, workshops, and hands-on labs driven by expert AWS users and industry leaders from around the world. Upcoming AWS Community Days are scheduled for November 23 in Indonesia, and on December 14 in Kochi, India.

AWS re:Invent 2024 – Join us in Las Vegas to learn all things AWS. Our annual conference is the best—and fastest—way to grow your skills. If you can’t join us in person, you can attend virtually by registering at
Watch re:Invent online.

Browse all upcoming AWS led in-person and virtual events and developer-focused events.

Create your AWS Builder ID and reserve your alias. Builder ID is a universal login credential that gives users access to AWS tools and resources, including over 600 free training courses, community features, and developer tools such as Amazon Q Developer beyond the AWS Management Console.

That’s all for this week. Check back next Monday for another Weekly Roundup!

Thanks to Odina Jacobs for the AWS Community Chile photo.

Eli

This post is part of our Weekly Roundup series. Check back each week for a quick roundup of interesting news and announcements from AWS!

AWS Weekly Roundup: Agentic workflows, Amazon Transcribe, AWS Lambda insights, and more (October 21, 2024)

Post Syndicated from Antje Barth original https://aws.amazon.com/blogs/aws/aws-weekly-roundup-agentic-workflows-amazon-transcribe-aws-lambda-insights-and-more-october-21-2024/

Agentic workflows are quickly becoming a cornerstone of AI innovation, enabling intelligent systems to autonomously handle and refine complex tasks in a way that mirrors human problem-solving. Last week, we launched Serverless Agentic Workflows with Amazon Bedrock, a new short course developed in collaboration with Dr. Andrew Ng and DeepLearning.AI.

Serverless Agentic Workflows with Amazon Bedrock

This hands-on course, taught by my colleague Mike Chambers, teaches how to build serverless agents that can handle complex tasks without the hassle of managing infrastructure. You will learn everything you need to know about integrating tools, automating workflows, and deploying responsible agents with built-in guardrails on Amazon Web Services (AWS) with Amazon Bedrock. The hands-on labs provided with the course let you apply your knowledge directly in an AWS environment, hosted by AWS Partner Vocareum. Find more information and enroll for free on the DeepLearning.AI course page.

Now, let’s turn our attention to other exciting news in the AWS universe from last week.

Last week’s launches
Here are some launches that got my attention:

Amazon Transcribe now supports streaming transcription in 30 additional languagesAmazon Transcribe has expanded its support to include 30 additional languages, bringing the total number of supported languages to 54. This enhancement helps you reach a broader global audience and improves accessibility across various industries, including contact centers, broadcasting, and e-learning. The expanded language support allows for more efficient content moderation, improved agent productivity, and automatic subtitling for live events and meetings.

AWS Lambda console now surfaces key function insights and supports real-time log analytics – The AWS Lambda console now features a built-in Amazon CloudWatch Metrics Insights dashboard and supports CloudWatch Logs Live Tail, providing instant visibility into critical function metrics and real-time log streaming. You can now identify and troubleshoot errors or performance issues for your Lambda functions without leaving the console, as well as view and analyze logs in real time as they become available. You can reduce context switching and accelerate the development and troubleshooting processes for serverless applications. Check out the launch post for more details.

Amazon Bedrock Model Evaluation now supports evaluating custom model import models – You can now evaluate custom models you’ve imported to Amazon Bedrock using the model evaluation feature. This helps you to complete the full cycle of selecting, customizing, and evaluating models before deploying them. To evaluate an imported model, select the custom model from the list of models to evaluate in the model selector tool when creating an evaluation job.

Amazon Q in AWS Supply Chain – You can now use Amazon Q, an interactive AI assistant, to analyze your supply chain data in AWS Supply Chain and get insights to operate your supply chain more efficiently. Amazon Q can answer your supply chain questions by diving into your data. This reduces the time spent searching for information and streamlines finding answers to improve your supply chain operations.

For a full list of AWS announcements, be sure to keep an eye on the What’s New at AWS page.

Other AWS news
Here are some additional news items and posts that you might find interesting:

New Amazon OpenSearch Service YouTube channel – The channel offers bite-sized tutorials, curated content, and organized playlists on topics such as log analytics, semantic search, vector databases, and operational best practices. You can also provide feedback to influence future channel content and the OpenSearch Service roadmap. Check out the launch post for more details and subscribe to the Amazon OpenSearch Service YouTube channel.

Deploying Generative AI Applications with NVIDIA NIM Microservices on Amazon Elastic Kubernetes Service (Amazon EKS) – This post shows you how to use Amazon EKS to orchestrate the deployment of pods containing NVIDIA NIM microservices, to enable quick-to-setup and optimized large-scale large language model (LLM) inference on Amazon EC2 G5 instances. It also demonstrates how to scale (both pod and cluster) by monitoring for custom metrics through Prometheus, and how you can load balance using an Application Load Balancer.

Instant Well-Architected CDK Resources with Solutions Constructs Factories – You can now create well-architected AWS resources such as Amazon Simple Storage Service (Amazon S3) buckets and AWS Step Functions state machines with a single function call using the new AWS Solutions Constructs Factories. These factories handle all the best practices configuration for you while still allowing customization. Try using a Constructs factory the next time you need to deploy one of the supported resources.

Upcoming AWS events
Check your calendars and sign up for these AWS events:

AWS GenAI LoftsAWS GenAI LoftsAWS GenAI Lofts are about more than just the tech, they bring together startups, developers, investors, and industry experts. Whether you’re looking to gain deep insights, or get your questions answered by generative AI pros, our GenAI Lofts have you covered and provide everything you need to start building your next innovation. Join events in London (through October 25), Seoul (October 30–November 6), São Paulo (through November 20), and Paris (through November 25).

AWS Community DaysAWS Community Days – Join community-led conferences that feature technical discussions, workshops, and hands-on labs led by expert AWS users and industry leaders from around the world: Malta (November 8), Chile (November 9), and Kochi, India (December 14).

AWS re:Invent 2024AWS re:InventRegistration is now open for the annual tech extravaganza, taking place December 2–6 in Las Vegas. At re:Invent 2024, you’ll get a front row seat to hear real stories from customers and AWS leaders about navigating pressing topics, such as generative AI. Learn about new product launches, watch demos, and get behind-the-scenes insights during five headline-making keynotes.

You can browse all upcoming in-person and virtual events.

That’s all for this week. Check back next Monday for another Weekly Roundup!

— Antje

This post is part of our Weekly Roundup series. Check back each week for a quick roundup of interesting news and announcements from AWS!

Amazon Transcribe Call Analytics adds new generative AI-powered call summaries (preview)

Post Syndicated from Veliswa Boya original https://aws.amazon.com/blogs/aws/amazon-transcribe-call-analytics-adds-new-generative-ai-powered-call-summaries-preview/

We are announcing generative artificial intelligence (AI)-powered call summarization in Amazon Transcribe Call Analytics in preview. Powered by Amazon Bedrock, this feature helps businesses improve customer experience, and agent and supervisor productivity by automatically summarizing customer service calls. Amazon Transcribe Call Analytics provides machine learning (ML)-powered analytics that allows contact centers to understand the sentiment, trends, and policy compliance of customer conversations to improve their experience and identify crucial feedback. A single API call is all it takes to extract transcripts, rich insights, and summaries from your customer conversations.

We understand that as a business, you want to maintain an accurate historical record of key conversation points, including action items associated with each conversation. To do this, agents summarize notes after the conversation has ended and enter these in their CRM system, a process that is time-consuming and subject to human error. Now imagine the customer trust erosion that follows when the agent fails to correctly capture and act upon important action items discussed during conversations.

How it works
Starting today, to assist agents and supervisors with the summarization of customer conversations, Amazon Transcribe Call Analytics will generate a concise summary of a contact center interaction that captures key components such as why the customer called, how the issue was addressed, and what follow-up actions were identified. After completing a customer interaction, agents can directly proceed to help the next customer since they don’t have to summarize a conversation, resulting in reduced customer wait times and improved agent productivity. Further, supervisors can review the summary when investigating a customer issue to get a gist of the conversation, without having to listen to the entire call recording or read the transcript.

Exploring Amazon Transcribe Call Analytics in the console
To see how this works visually, I first create an Amazon Simple Storage Service (Amazon S3) bucket in the relevant AWS Region. I then upload the audio file to the S3 bucket.

Audio file in S3 bucket

To create an analytics job that transcribes the audio and provides additional analytics about the conversation that the customer and the agent were having, I go to the Amazon Transcribe Call Analytics console. I select Post-call Analytics in the left hand navigation bar and then choose Create job.

Create Post-call analytics job

Next I enter a job name making sure to keep the language settings based on the language in the audio file.

Job settings

In the Amazon S3 URI path, I provide the link to the audio file uploaded in the first screenshot shown in this post.

Audio file details

In Role name, I select Create an IAM role which will have access to the Amazon S3 bucket, then choose Next.

Create IAM Role

I enable Generative call summarization, and then choose Create job.

Configure job

After a few minutes, the job’s status will change from In progress to Complete, indicating that it was completed successfully.

Job status

Select the job, and the next screen will show the transcript and a new tab, Generative call summarization – preview.

You can also download the transcript to view the analytics and summary.

Now available
Generative call summarization in Amazon Transcribe Call Analytics is available today in English in US East (N. Virginia) and US West (Oregon).

With generative call summarization in Amazon Transcribe Call Analytics, you pay as you go and are billed monthly based on tiered pricing. For more information, see Amazon Transcribe pricing.

Learn more:

Veliswa

Unstructured data management and governance using AWS AI/ML and analytics services

Post Syndicated from Sakti Mishra original https://aws.amazon.com/blogs/big-data/unstructured-data-management-and-governance-using-aws-ai-ml-and-analytics-services/

Unstructured data is information that doesn’t conform to a predefined schema or isn’t organized according to a preset data model. Unstructured information may have a little or a lot of structure but in ways that are unexpected or inconsistent. Text, images, audio, and videos are common examples of unstructured data. Most companies produce and consume unstructured data such as documents, emails, web pages, engagement center phone calls, and social media. By some estimates, unstructured data can make up to 80–90% of all new enterprise data and is growing many times faster than structured data. After decades of digitizing everything in your enterprise, you may have an enormous amount of data, but with dormant value. However, with the help of AI and machine learning (ML), new software tools are now available to unearth the value of unstructured data.

In this post, we discuss how AWS can help you successfully address the challenges of extracting insights from unstructured data. We discuss various design patterns and architectures for extracting and cataloging valuable insights from unstructured data using AWS. Additionally, we show how to use AWS AI/ML services for analyzing unstructured data.

Why it’s challenging to process and manage unstructured data

Unstructured data makes up a large proportion of the data in the enterprise that can’t be stored in a traditional relational database management systems (RDBMS). Understanding the data, categorizing it, storing it, and extracting insights from it can be challenging. In addition, identifying incremental changes requires specialized patterns and detecting sensitive data and meeting compliance requirements calls for sophisticated functions. It can be difficult to integrate unstructured data with structured data from existing information systems. Some view structured and unstructured data as apples and oranges, instead of being complementary. But most important of all, the assumed dormant value in the unstructured data is a question mark, which can only be answered after these sophisticated techniques have been applied. Therefore, there is a need to being able to analyze and extract value from the data economically and flexibly.

Solution overview

Data and metadata discovery is one of the primary requirements in data analytics, where data consumers explore what data is available and in what format, and then consume or query it for analysis. If you can apply a schema on top of the dataset, then it’s straightforward to query because you can load the data into a database or impose a virtual table schema for querying. But in the case of unstructured data, metadata discovery is challenging because the raw data isn’t easily readable.

You can integrate different technologies or tools to build a solution. In this post, we explain how to integrate different AWS services to provide an end-to-end solution that includes data extraction, management, and governance.

The solution integrates data in three tiers. The first is the raw input data that gets ingested by source systems, the second is the output data that gets extracted from input data using AI, and the third is the metadata layer that maintains a relationship between them for data discovery.

The following is a high-level architecture of the solution we can build to process the unstructured data, assuming the input data is being ingested to the raw input object store.

Unstructured Data Management - Block Level Architecture Diagram

The steps of the workflow are as follows:

  1. Integrated AI services extract data from the unstructured data.
  2. These services write the output to a data lake.
  3. A metadata layer helps build the relationship between the raw data and AI extracted output. When the data and metadata are available for end-users, we can break the user access pattern into additional steps.
  4. In the metadata catalog discovery step, we can use query engines to access the metadata for discovery and apply filters as per our analytics needs. Then we move to the next stage of accessing the actual data extracted from the raw unstructured data.
  5. The end-user accesses the output of the AI services and uses the query engines to query the structured data available in the data lake. We can optionally integrate additional tools that help control access and provide governance.
  6. There might be scenarios where, after accessing the AI extracted output, the end-user wants to access the original raw object (such as media files) for further analysis. Additionally, we need to make sure we have access control policies so the end-user has access only to the respective raw data they want to access.

Now that we understand the high-level architecture, let’s discuss what AWS services we can integrate in each step of the architecture to provide an end-to-end solution.

The following diagram is the enhanced version of our solution architecture, where we have integrated AWS services.

Unstructured Data Management - AWS Native Architecture

Let’s understand how these AWS services are integrated in detail. We have divided the steps into two broad user flows: data processing and metadata enrichment (Steps 1–3) and end-users accessing the data and metadata with fine-grained access control (Steps 4–6).

  1. Various AI services (which we discuss in the next section) extract data from the unstructured datasets.
  2. The output is written to an Amazon Simple Storage Service (Amazon S3) bucket (labeled Extracted JSON in the preceding diagram). Optionally, we can restructure the input raw objects for better partitioning, which can help while implementing fine-grained access control on the raw input data (labeled as the Partitioned bucket in the diagram).
  3. After the initial data extraction phase, we can apply additional transformations to enrich the datasets using AWS Glue. We also build an additional metadata layer, which maintains a relationship between the raw S3 object path, the AI extracted output path, the optional enriched version S3 path, and any other metadata that will help the end-user discover the data.
  4. In the metadata catalog discovery step, we use the AWS Glue Data Catalog as the technical catalog, Amazon Athena and Amazon Redshift Spectrum as query engines, AWS Lake Formation for fine-grained access control, and Amazon DataZone for additional governance.
  5. The AI extracted output is expected to be available as a delimited file or in JSON format. We can create an AWS Glue Data Catalog table for querying using Athena or Redshift Spectrum. Like the previous step, we can use Lake Formation policies for fine-grained access control.
  6. Lastly, the end-user accesses the raw unstructured data available in Amazon S3 for further analysis. We have proposed integrating Amazon S3 Access Points for access control at this layer. We explain this in detail later in this post.

Now let’s expand the following parts of the architecture to understand the implementation better:

  • Using AWS AI services to process unstructured data
  • Using S3 Access Points to integrate access control on raw S3 unstructured data

Process unstructured data with AWS AI services

As we discussed earlier, unstructured data can come in a variety of formats, such as text, audio, video, and images, and each type of data requires a different approach for extracting metadata. AWS AI services are designed to extract metadata from different types of unstructured data. The following are the most commonly used services for unstructured data processing:

  • Amazon Comprehend – This natural language processing (NLP) service uses ML to extract metadata from text data. It can analyze text in multiple languages, detect entities, extract key phrases, determine sentiment, and more. With Amazon Comprehend, you can easily gain insights from large volumes of text data such as extracting product entity, customer name, and sentiment from social media posts.
  • Amazon Transcribe – This speech-to-text service uses ML to convert speech to text and extract metadata from audio data. It can recognize multiple speakers, transcribe conversations, identify keywords, and more. With Amazon Transcribe, you can convert unstructured data such as customer support recordings into text and further derive insights from it.
  • Amazon Rekognition – This image and video analysis service uses ML to extract metadata from visual data. It can recognize objects, people, faces, and text, detect inappropriate content, and more. With Amazon Rekognition, you can easily analyze images and videos to gain insights such as identifying entity type (human or other) and identifying if the person is a known celebrity in an image.
  • Amazon Textract – You can use this ML service to extract metadata from scanned documents and images. It can extract text, tables, and forms from images, PDFs, and scanned documents. With Amazon Textract, you can digitize documents and extract data such as customer name, product name, product price, and date from an invoice.
  • Amazon SageMaker – This service enables you to build and deploy custom ML models for a wide range of use cases, including extracting metadata from unstructured data. With SageMaker, you can build custom models that are tailored to your specific needs, which can be particularly useful for extracting metadata from unstructured data that requires a high degree of accuracy or domain-specific knowledge.
  • Amazon Bedrock – This fully managed service offers a choice of high-performing foundation models (FMs) from leading AI companies like AI21 Labs, Anthropic, Cohere, Meta, Stability AI, and Amazon with a single API. It also offers a broad set of capabilities to build generative AI applications, simplifying development while maintaining privacy and security.

With these specialized AI services, you can efficiently extract metadata from unstructured data and use it for further analysis and insights. It’s important to note that each service has its own strengths and limitations, and choosing the right service for your specific use case is critical for achieving accurate and reliable results.

AWS AI services are available via various APIs, which enables you to integrate AI capabilities into your applications and workflows. AWS Step Functions is a serverless workflow service that allows you to coordinate and orchestrate multiple AWS services, including AI services, into a single workflow. This can be particularly useful when you need to process large amounts of unstructured data and perform multiple AI-related tasks, such as text analysis, image recognition, and NLP.

With Step Functions and AWS Lambda functions, you can create sophisticated workflows that include AI services and other AWS services. For instance, you can use Amazon S3 to store input data, invoke a Lambda function to trigger an Amazon Transcribe job to transcribe an audio file, and use the output to trigger an Amazon Comprehend analysis job to generate sentiment metadata for the transcribed text. This enables you to create complex, multi-step workflows that are straightforward to manage, scalable, and cost-effective.

The following is an example architecture that shows how Step Functions can help invoke AWS AI services using Lambda functions.

AWS AI Services - Lambda Event Workflow -Unstructured Data

The workflow steps are as follows:

  1. Unstructured data, such as text files, audio files, and video files, are ingested into the S3 raw bucket.
  2. A Lambda function is triggered to read the data from the S3 bucket and call Step Functions to orchestrate the workflow required to extract the metadata.
  3. The Step Functions workflow checks the type of file, calls the corresponding AWS AI service APIs, checks the job status, and performs any postprocessing required on the output.
  4. AWS AI services can be accessed via APIs and invoked as batch jobs. To extract metadata from different types of unstructured data, you can use multiple AI services in sequence, with each service processing the corresponding file type.
  5. After the Step Functions workflow completes the metadata extraction process and performs any required postprocessing, the resulting output is stored in an S3 bucket for cataloging.

Next, let’s understand how can we implement security or access control on both the extracted output as well as the raw input objects.

Implement access control on raw and processed data in Amazon S3

We just consider access controls for three types of data when managing unstructured data: the AI-extracted semi-structured output, the metadata, and the raw unstructured original files. When it comes to AI extracted output, it’s in JSON format and can be restricted via Lake Formation and Amazon DataZone. We recommend keeping the metadata (information that captures which unstructured datasets are already processed by the pipeline and available for analysis) open to your organization, which will enable metadata discovery across the organization.

To control access of raw unstructured data, you can integrate S3 Access Points and explore additional support in the future as AWS services evolve. S3 Access Points simplify data access for any AWS service or customer application that stores data in Amazon S3. Access points are named network endpoints that are attached to buckets that you can use to perform S3 object operations. Each access point has distinct permissions and network controls that Amazon S3 applies for any request that is made through that access point. Each access point enforces a customized access point policy that works in conjunction with the bucket policy that is attached to the underlying bucket. With S3 Access Points, you can create unique access control policies for each access point to easily control access to specific datasets within an S3 bucket. This works well in multi-tenant or shared bucket scenarios where users or teams are assigned to unique prefixes within one S3 bucket.

An access point can support a single user or application, or groups of users or applications within and across accounts, allowing separate management of each access point. Every access point is associated with a single bucket and contains a network origin control and a Block Public Access control. For example, you can create an access point with a network origin control that only permits storage access from your virtual private cloud (VPC), a logically isolated section of the AWS Cloud. You can also create an access point with the access point policy configured to only allow access to objects with a defined prefix or to objects with specific tags. You can also configure custom Block Public Access settings for each access point.

The following architecture provides an overview of how an end-user can get access to specific S3 objects by assuming a specific AWS Identity and Access Management (IAM) role. If you have a large number of S3 objects to control access, consider grouping the S3 objects, assigning them tags, and then defining access control by tags.

S3 Access Points - Unstructured Data Management - Access Control

If you are implementing a solution that integrates S3 data available in multiple AWS accounts, you can take advantage of cross-account support for S3 Access Points.

Conclusion

This post explained how you can use AWS AI services to extract readable data from unstructured datasets, build a metadata layer on top of them to allow data discovery, and build an access control mechanism on top of the raw S3 objects and extracted data using Lake Formation, Amazon DataZone, and S3 Access Points.

In addition to AWS AI services, you can also integrate large language models with vector databases to enable semantic or similarity search on top of unstructured datasets. To learn more about how to enable semantic search on unstructured data by integrating Amazon OpenSearch Service as a vector database, refer to Try semantic search with the Amazon OpenSearch Service vector engine.

As of writing this post, S3 Access Points is one of the best solutions to implement access control on raw S3 objects using tagging, but as AWS service features evolve in the future, you can explore alternative options as well.


About the Authors

Sakti Mishra is a Principal Solutions Architect at AWS, where he helps customers modernize their data architecture and define their end-to-end data strategy, including data security, accessibility, governance, and more. He is also the author of the book Simplify Big Data Analytics with Amazon EMR. Outside of work, Sakti enjoys learning new technologies, watching movies, and visiting places with family.

Bhavana Chirumamilla is a Senior Resident Architect at AWS with a strong passion for data and machine learning operations. She brings a wealth of experience and enthusiasm to help enterprises build effective data and ML strategies. In her spare time, Bhavana enjoys spending time with her family and engaging in various activities such as traveling, hiking, gardening, and watching documentaries.

Sheela Sonone is a Senior Resident Architect at AWS. She helps AWS customers make informed choices and trade-offs about accelerating their data, analytics, and AI/ML workloads and implementations. In her spare time, she enjoys spending time with her family—usually on tennis courts.

Daniel Bruno is a Principal Resident Architect at AWS. He had been building analytics and machine learning solutions for over 20 years and splits his time helping customers build data science programs and designing impactful ML products.

Amazon Chime SDK Call Analytics: Real-Time Voice Tone Analysis and Speaker Search

Post Syndicated from Sébastien Stormacq original https://aws.amazon.com/blogs/aws/amazon-chime-sdk-call-analytics-real-time-voice-tone-analysis-and-speaker-search/

Today, I am pleased to announce the availability of Amazon Chime SDK call analytics, a new set of capabilities that helps make it easier and cost effective to record and generate insights on real-time audio calls: transcription, voice tone analysis, and speaker search. We’ve also improved the Amazon Chime SDK section of the AWS Management Console to let you integrate machine learning (ML)-based services, such as these new call analytics capabilities or Amazon Transcribe into your audio applications in just a few steps.

Voice Analytics: Voice Tone Analysis and Speaker Search
Voice analytics delivers real-time insights into audio conversations. It helps detect and classify participants expressing a positive, neutral, or negative tone. Typically, enterprises working in regulated industries have obligations to record or want to analyze conversations between employees and their business partners, customers, or suppliers.

Voice tone analysis uses ML to extract sentiment from a speech signal based on a joint analysis of lexical and linguistic information as well as acoustic and tonal information. Voice tone analysis for live calls are delivered in the data lake of your choice, on top of which you can create your own dashboards to visualize the data.

Let’s take an example from the finance industry. Trading room supervisors are sometimes required to record all the trading conversations occurring on the floor. Voice tone analysis helps them meet their regulatory requirements. They can also deliver these insights to the traders to help to improve their productivity. But finance is not the only industry that needs to record and analyze calls. We have received similar requests from customers in Business Process Outsourcing (BPO), public sector, healthcare, telecom, and insurance industries.

Alongside with voice tone analysis, your applications can now benefit from speaker search to help match speakers to an existing database. It only requires a short sample to recognize a speaker based on their voice stored in a database of known voices. Speaker search helps your applications expedite caller lookup and enrich call records and transcripts with identity attribution. Speaker search delivers a suggested unique internal identifier for the speaker and a confidence score. The decision to match current the speaker with a known speaker from your organization is up to your application. Some of our customers plan to use speaker search for real-time speaker labeling on communication happening over trading turrets, which are shared devices.

Integration with AI Services in the AWS Management Console
We want to make it easier for developers to add these capabilities into existing telephony applications without requiring expertise in telephony, cloud infrastructure, or AI.

This is why we added a easier-to-use graphical configuration in the Amazon Chime SDK section of the console. On the console, you can choose the AWS AI service you want to use to analyze real-time audio data: voice analytics, Amazon Transcribe, or Amazon Transcribe Call Analytics. Whether you choose to use voice analytics or Amazon Transcribe to generate insights, you don’t have to write any integration code. We manage the integrations with AWS AI services and your voice-based or telephony applications. The console helps you define where you want to send the analytics data: an Amazon Kinesis stream or an Amazon Simple Storage Service (Amazon S3) bucket. Voice analytics can send real-time notifications to a function deployed on AWS Lambda, or an SQS queue or Amazon Simple Notification Service (Amazon SNS) topic.

To visualize insights, call analytics also delivers analyses to a data lake of your choice. You can then use Amazon QuickSight or Tableau to build dashboards and get insights from real-time media. These dashboards can be embedded in apps, wikis, and portals. Of course, we don’t leave you alone with your data. You can download prebuilt dashboards as AWS CloudFormation templates to deploy into your own AWS account. The link to download these templates is available on the console.

Finally, call analytics can generate real-time alerts by posting events to Amazon EventBridge. You can route these events to any destination of your choice, on your AWS account or supported third-party applications.

When using call analytics, you can reduce the initial project time to generate insights from real-time audio from months to days.

How It Works
I’d like to show you how it works.

On the Amazon Chime SDK section of the console, I open Configuration under Call Analytics on the left-side menu. Then, I select Create configuration.

Amazon Chime SDK - Create configuration

I give a name to my configuration. Optionally, I may also associate tags.

Amazon Chime SDK - Configuration first step

Under Configure analytics service, I can choose between Amazon Chime SDK voice analytics or Amazon Transcribe services to analyse calls. For this demo, I select Voice analytics.

Amazon Chime SDK - Configuration second step

I configure where to send the analysis. Voice analytics results are always sent to Kinesis. I specify a Kinesis data stream I created previously. When I want to use a business intelligence tool such as Quicksight to create a dashboard with analytics results, I also specify an S3 bucket to receive the analysis.

The console also gives me the link to the CloudFormation templates I can use to create the voice analytics dashboards.

Finally, I choose a Lambda function, SQS queue, or SNS topic that will receive notifications of events such as when the analytics are available, a new voice enrollment occurs, or the result of a voice verification. In the later case, the payload looks as follow:

{
    ...common to all events...
    "detail-type": "SpeakerSearchStatus",
    "detail": {
        "taskId": "uuid",
        "detailStatus": "IdentificationSuccessful",
        "speakerSearchDetails" : {
            "results": [
                {
                    "voiceProfileId": "guid",
                    "confidenceScore": "0.94",
                },
                {
                    "voiceProfileId": "guid",
                    "confidenceScore": "0.92",
                },
                {
                    "voiceProfileId": "guid",
                    "confidenceScore": "0.91",
                },
                ... (up to 10)
            ]
        },
        "isCaller": false,
        "voiceConnectorId": "guid",
        "transactionId": "guid"

        ...details from Voice connector
    }
}

For this demo, I choose an existing SQS queue.

Amazon Chime SDK - Configuration third step

Under Consent acknowledgment, I select all the boxes and select Next.

Amazon Chime SDK - Configuration second step consent

The next step is only available when I didn’t specify any analytics service in the previous step. It allows us to configure voice recordings. Recordings are available when no analytics are selected.

Under Configure access permissions, I choose a previously created AWS Identity and Access Management (IAM) role allowing the Amazon Chime SDK to access the other AWS services I configured: the Kinesis data stream, S3 bucket, and Lambda function, SQS queue, or SNS topic. The console may create an IAM role for me if I don’t have one already.

Amazon Chime SDK - Configuration four step

The next step is available if I selected Amazon Transcribe service under Configure analytics service. It allows me to configure real-time alerts through EventBridge. I may configure rules to send messages based on keyword match, sentiment detected, or issue detection.

The final step is Review and Create my configuration. I review the configuration details and then, I select Create configuration.

Finally, I link this configuration to a voice connector under the Voice Connector section, on the Streaming tab.

That’s it! As I mentioned earlier, no glue between AWS services or AI knowledge is required.

After the data arrives on Kinesis or your S3 bucket, you can point your preferred business reporting solution at it. When you use the QuickSight template we provide, you can get started in minutes with a high-level overview and a deep-dive view, as shown on the following screenshot.

Chime SDK Call Analytics - dashboard general

Chime SDK Call Analytics - dashboard deep dive

The deep-dive dashboard gives you graphical representations about the distribution of agent and customer sentiments and emotions. You also get a detailed analysis and transcript of the conversation.

Pricing and Availability
Adopting these capabilities in your audio applications requires no up-front infrastructure investment; you will be charged based only on your usage. Pricing is per minute of audio data analyzed. Visit Amazon Chime SDK pricing for details.

Call analytics is available in the following AWS Regions: US East (Ohio, N. Virginia), Asia Pacific (Singapore), and Europe (Frankfurt).

In this post, I discussed Amazon Chime SDK call analytics, a new set of capabilities that makes it easier and cost-effective to record and generate insights on real-time audio calls. With their focus on ease of use, these new capabilities are particularly well adapted to customers with minimal knowledge of cloud infrastructure, telephony, and ML.

Start today and configure your first dashboard!

— seb

Automating adverse events reporting for pharma with Amazon Connect and Amazon Lex

Post Syndicated from Siva Thangavel original https://aws.amazon.com/blogs/architecture/automating-adverse-events-reporting-for-pharma-with-amazon-connect-and-amazon-lex/

Every pharmaceutical company manufacturing medicine must provide customers nationwide with a method to report adverse events following medicine usage as well as emergency assistance as needed. To comply with regulatory policy and enable an Adverse Events Reporting System (AERS), pharma companies must provide dedicated, toll-free phone numbers and contact center agents to handle inbound calls.

But they must also be prepared for sudden spikes in call volume, which can increase contact center agents’ workloads and lead to long wait times for customers. With these limitations comes the possibility that customers may not be able to report adverse events.

Further still, as medicine status keeps changing, all agents must be retrained to handle calls and extend support. Pharma companies incur significant costs for training and onboarding additional agents, as well as the physical infrastructure to support their work.

To overcome these challenges, we designed a self-service Interactive Voice Response (IVR) solution with Amazon Connect. The IVR solution handles customer calls without agent involvement. It captures customer information and records data into an enterprise AERS. It also provides an option to receive a link to an Adverse Events (AE) portal using Short Message Service (SMS), or to be routed to a live agent queue.

In this blog post, we introduce a reference architecture for this use case. This framework can help other pharma companies solve similar problems.

Solution overview

Let’s explore how the IVR solution architecture routes customer calls step by step, as shown in Figure 1.

Adverse events reporting architecture diagram

Figure 1. Adverse events reporting architecture diagram

  1.  Callers who dial in to report a medicine-related AE are routed to the Amazon Lex chatbot through IVR in Amazon Connect.
  2. Callers can proceed to IVR self-service functions, such as understanding the intent of a customer call and the AE.
  3. AEs are analyzed with Amazon SageMaker for a decision on whether to complete the call on IVR or forward it to an agent.
  4. If the caller remains on the self-service option, the bot captures information from 15 to 20 essential questions.
  5. The bot follows a hybrid workflow that allows for guided responses where appropriate and free-text conversations using AWS Lambda. It confirms captured AE information with the caller before closing the call and submitting information to the AERS system.
  6. The bot provides the option to route the call to a human agent contextually.
  7. The bot provides the option to share an AE reporting link over SMS using Amazon Simple Notification System (SNS), so the caller can access it through a mobile device to continue AE reporting outside of the call.
  8. The bot records customer interactions in AERS using Amazon DynamoDB, leveraging the current validated process used by the AE portal team
  9. The bot makes call recordings available for auditing, monitoring, and training purposes. These recordings are not be provided to live agents.
  10. Standard analytics are available to help the business continuously train the bot and measure its performance.

Leveraging IVR as an extended solution

Recorded customer calls can be used for further analytics with Amazon Transcribe. Actionable insights can be derived from the text using a machine learning (ML) model such as AE detection at scale. A (Named Entity Recognition model (NER) model can also identify medicines and caller types.

Further, all recorded calls may be stored in a secure AWS ecosystem and archived for longer durations for compliance purposes. Storage costs can be optimized by setting up a policy to move old calls to Amazon Simple Storage Service Glacier (Amazon S3 Glacier) storage classes and calls over two years old to the Amazon S3 Deep Glacier storage class. This results in significant cost saving and helps companies archive at scale.

Finally, the Amazon Lex bot can be enhanced and continuously trained with additional intents and utterances to handle complex AE reporting for various drugs. This provides significant cost saving and operational efficiencies as bots can be trained faster than human agents, as well as at scale.

Conclusion: Using IVR to better manage AE reporting

This IVR solution was deployed for a pharma company and helped handle unusually high call volumes for AE reporting with its current agent population. It resulted in cost savings in contact center operations and significantly improved the customer experience by reducing wait times.

The IVR solution can also be used with any existing contact center platform to first forward the calls to Amazon Connect for initial triage, and then handover to existing platforms for agent involvement. This adds intelligence to existing contact centers.

This blog post demonstrates how pharma companies can leverage the self-service option for AERS to handle any AE reporting call. With solution enhancements using Amazon SageMaker models, it can quickly be transformed to handle calls for any medicine. They can also:

  • Incorporate related information into the model, such as the age, gender, or existing AEs to further improve the ML prediction performance
  • Leverage audio data augmentation plus handcrafted features to help yield better predictions
  • Use the audio-based diagnostic prediction in an Amazon Connect contact flow to triage the targeted group of incoming calls and escalate to a doctor for follow up if necessary
  • Allow call center agents to use the intelligence provided by the acoustic classification in conjunction with Contact Lens for Amazon Connect, which provides a turn-by-turn transcript; real-time alerts; automated call categorization based on keywords and phrases; sentiment analysis, and sensitive data redaction—truly making it a real-time intelligent solution.

The IVR solution can also be used for other industry use cases where a series of data is collected from customers. This solution improves the customer experience and can be implemented without increasing call center agent counts.

New for Amazon Transcribe – Real-Time Analytics During Live Calls

Post Syndicated from Danilo Poccia original https://aws.amazon.com/blogs/aws/new-for-amazon-transcribe-real-time-analytics-during-live-calls/

The experience customers have when interacting with a contact center can have a profound impact on them. For this reason, we launched Amazon Transcribe Call Analytics last year to help you analyze customer call recordings and get insights into issues and trends related to customer satisfaction and agent performance.

To assist agents in resolving live calls faster, we are introducing today real-time call analytics in Amazon Transcribe Call Analytics. Real-time call analytics provides APIs for developers to accurately transcribe live calls and at the same time identify customer experience issues and sentiment in real time. Transcribe Call Analytics uses state-of-the-art machine learning capabilities to automatically assess thousands of in-progress calls and detect customer experience issues, such as repeated requests to speak to a manager or cancel a subscription.

With a few clicks, supervisors and analysts can create categories in the AWS console to identify customer experience issues using criteria such as specific terms such as “not happy,” “poor quality,” and “cancel my subscription.” Transcribe Call Analytics analyzes in-progress calls in real time to detect when a category is met. Developers can use those signals, along with sentiment trends from the API, to build a proactive system that alerts supervisors about emerging issues or assists agents with relevant information to solve customer issues.

Transcribe Call Analytics also provides a real-time transcript of the live conversation that supervisors can use to quickly get up to speed on the customer interaction and assess the appropriate action. The in-call transcript also eliminates the need for customers to repeat themselves if the call is transferred to another agent. Agents can focus all their attention on the customer during the call instead of taking notes for entry in a CRM system because Transcribe Call Analytics includes an automated call summarization capability, which identifies the issue, outcome, and action item associated with a call.

Transcribe Call Analytics is a foundational API for AWS Contact Center Intelligence solutions such as post-call analytics and the updated real-time call analytics with agent assist solution using the new real-time capabilities.

Let’s see how this works in practice.

Exploring Real-Time Call Analytics in the Console
To see how this works visually, I use the Amazon Transcribe console. First, I create a category to be notified if some terms are used in the call that would require an escalation. I choose Category Management from the navigation pane and then Create category.

I enter Escalation as the name for the category. I select REAL_TIME in the Category type dropdown. Then, I choose Create from scratch.

Console screenshot.

I only need one rule for this category. In the Rule type dropdown, I select Transcript content match. In the next three options, I choose to trigger the rule when any of the words are mentioned during the entire call, and the speaker is either the customer or the agent. Now, I can enter the words or phrases to look for in the transcript. In this case, I enter cancel, canceled, cancelled, manager, and supervisor. In your case, you might be more specific depending on your business. For example, if subscriptions are your business, you can look for the phrase cancel my subscription.

Console screenshot.

Now that the category has been created, I use one of the sample calls in the console to test it. I choose Real-Time Analytics in the navigation pane. By choosing Configure advanced settings, I can configure the personally identifiable information (PII) identification and redaction settings. For example, I can choose to identify personal data such as email addresses or redact financial data like bank account numbers.

With no additional charge, I can enable Post-call Analytics so that, at the end of the call, I receive the output of the transcription job in an Amazon Simple Storage Service (Amazon S3) bucket. This output is in a similar format to what I’d receive if I were analyzing a call recording with Transcribe Call Analytics. In this way, I can use the post-call analytics output derived from the audio stream in any process I already have in place for output of analytics generated from call recordings, for example, to update dashboards or generate periodic reports.

With Insurance complaints in Step 1: Specify input audio selected, I choose Start streaming. In the Transcription output section of the console, I receive in real-time the transcription of the call. The words of the customer and agent appear as they are pronounced. Each sentence is flagged with its recognized sentiment (positive, neutral, or negative). The Escalation category that I just configured is found in two sentences, first, when the customer mentions that their insurance has been canceled, and then when the agent mentions their manager. Also, part of a sentence is underlined because an issue has been detected.

Console screenshot.

Using the Download dropdown, I download the full JSON transcript. If I am only interested in the transcription, I can download the text transcript. The JSON transcript contains an array where each item is similar to what I’d get in real time when using the real-time call analytics API.

Using the Live Call Analytics With Agent Assist (LCA) Solution
You can use the open-source real-time call analytics with agent assist solution for your contact center or as an inspiration of what Amazon Transcribe enables for developers. Let’s look at a couple of screenshots to understand how it works.

Here there is a list of on-going calls with the overall sentiment, the sentiment trend (is it improving or not?), and the categories found in real-time during the call that can be used for specific activities.

Screenshot from the real-time call analytics with agent assist solution.

When selecting a call from the list, you have access to more in-depth information, including the call transcript and the issues found during the on-going call. This allows to take action quickly to help resolve the call.

Screenshot from the real-time call analytics with agent assist solution.

Availability and Pricing
Amazon Transcribe Call Analytics with real-time capabilities is available today in US (N. Virginia, Oregon), Canada (Central), Europe (Frankfurt, London), and Asia Pacific (Seoul, Sydney, Tokyo) and supports US English, British English, Australian English, US Spanish, Canadian French, French, German, Italian, and Brazilian Portuguese.

With Amazon Transcribe Call Analytics, you pay as you go and are billed monthly based on tiered pricing. For more information, see Amazon Transcribe pricing.

As part of the AWS Free Tier, you can get started with Amazon Transcribe Call Analytics for free, including the new real-time call analytics API. You can analyze up to 60 minutes of call audio monthly for free for the first 12 months. For more information, see the AWS Free Tier page.

If you’re at re:Invent, you can learn more about this new capability in session AIM307 – JPMorganChase real-time agent assist for contact center productivity. I will update this post when the recording of the session is publicly available.

Start analyzing contact center conversations in real-time to improve your customers’ experience.

Danilo

Building a low-code speech “you know” counter using AWS Step Functions

Post Syndicated from Julian Wood original https://aws.amazon.com/blogs/compute/building-a-low-code-speech-you-know-counter-using-aws-step-functions/

This post is written by Doug Toppin, Software Development Engineer, and Kishore Dhamodaran, Solutions Architect.

In public speaking, filler phrases can distract the audience and reduce the value and impact of what you are telling them. Reviewing recordings of presentations can be helpful to determine whether presenters are using filler phrases. Instead of manually reviewing prior recordings, automation can process media files and perform a speech-to-text function. That text can then be processed to report on the use of filler phrases.

This blog explains how to use AWS Step Functions, Amazon EventBridge, Amazon Transcribe and Amazon Athena to report on the use of the common phrase “you know” in media files. These services can automate and reduce the time required to find the use of filler phrases.

Step Functions can automate and chain together multiple activities and other Amazon services. Amazon Transcribe is a speech to text service that uses media files as input and produces textual transcripts from them. Athena is an interactive query service that makes it easier to analyze data in Amazon S3 using standard SQL. Athena enables the use of standard SQL to query data in S3.

This blog shows a low-code, configuration driven approach to implementing this solution. Low-code means writing little or no custom software to perform a function. Instead, you use a configuration drive approach using service integrations where state machine tasks call AWS services using existing SDKs, APIs, or interfaces. A configuration driven approach in this example is using Step Functions’ Amazon States Language (ASL) to tie actions together rather than writing traditional code. This requires fewer details for data management and error handling combined with a visual user interface for composing the workflow. As the actions and logic are clearly defined with the visual workflow, this reduces maintenance.

Solution overview

The following diagram shows the solution architecture.

SolutionOverview

Solution Overview

  1. You upload a media file to an Amazon S3 Media bucket.
  2. The media file upload to S3 triggers an EventBridge rule.
  3. The EventBridge rule starts the Step Functions state machine execution.
  4. The state machine invokes Amazon Transcribe to process the media file.
  5. The transcription output file is stored in the Amazon S3 Transcript bucket.
  6. The state machine invokes Athena to query the textual transcript for the filler phrase. This uses the AWS Glue table to describe the format of the transcription results file.
  7. The filler phrase count determined by Athena is returned and stored in the Amazon S3 Results bucket.

Prerequisites

  1. An AWS account and an AWS user or role with sufficient permissions to create the necessary resources.
  2. Access to the following AWS services: Step Functions, Amazon Transcribe, Athena, and Amazon S3.
  3. Latest version of the AWS Serverless Application Model (AWS SAM) CLI, which helps developers create and manage serverless applications in the AWS Cloud.
  4. Test media files (for example, the Official AWS Podcast).

Example walkthrough

  1. Clone the GitHub repository to your local machine.
  2. git clone https://github.com/aws-samples/aws-stepfunctions-examples.git
  3. Deploy the resources using AWS SAM. The deploy command processes the AWS SAM template file to create the necessary resources in AWS. Choose you-know as the stack name and the AWS Region that you want to deploy your solution to.
  4. cd aws-stepfunctions-examples/sam/app-low-code-you-know-counter/
    sam deploy --guided

Use the default parameters or replace with different values if necessary. For example, to get counts of a different filler phrase, replace the FillerPhrase parameter.

GlueDatabaseYouKnowP Name of the AWS Glue database to create.
AthenaTableName Name of the AWS Glue table that is used by Athena to query the results.
FillerPhrase The filler phrase to check.
AthenaQueryPreparedStatementName Name of the Athena prepared statement used to run SQL queries on.
AthenaWorkgroup Athena workgroup to use
AthenaDataCatalog The data source for running the Athena queries
SAM Deploy

SAM Deploy

Running the filler phrase counter

  1. Navigate to the Amazon S3 console and upload an mp3 or mp4 podcast recording to the bucket named bucket-{account number}-{Region}-you-know-media.
  2. Navigate to the Step Functions console. Choose the running state machine, and monitor the execution of the transcription state machine.
  3. State Machine Execution

    State Machine Execution

  4. When the execution completes successfully, select the QueryExecutionSuccess task to examine the output and see the filler phrase count.
  5. State Machine Output

    State Machine Output

  6. Amazon Transcribe produces the transcript text of the media file. You can examine the output in the Results bucket. Using the S3 console, navigate to the bucket, choose the file matching the media file name and use ‘Query with S3 Select’ to view the content.
  7. If the transcription job does not execute, the state machine reports the failure and exits.
  8. State Machine Fail

    State Machine Fail

Exploring the state machine

The state machine orchestrates the transcription processing:

State Machine Explore

State Machine Explore

The StartTranscriptionJob task starts the transcription job. The Wait state adds a 60-second delay before checking the status of the transcription job. Until the status of the job changes to FAILED or COMPLETED, the choice state continues.

When the job successfully completes, the AthenaStartQueryExecutionUsingPreparedStatement task starts the Athena query, and stores the results in the S3 results bucket. The AthenaGetQueryResults task retrieves the count from the resultset.

The TranscribeMediaBucket holds the media files to be uploaded. The configuration sends the upload notification event to EventBridge:

      
   NotificationConfiguration:
     EventBridgeConfiguration:
       EventBridgeEnabled: true
	  

The TranscribeResultsBucket has an associated policy to provide access to Amazon Transcribe. Athena stores the output from the queries performed by the state machine in the AthenaQueryResultsBucket .

When a media upload occurs, the YouKnowTranscribeStateMachine uses Step Functions’ native event integration to trigger an EventBridge rule. This contains an event object similar to:

{
  "version": "0",
  "id": "99a0cb40-4b26-7d74-dc59-c837f5346ac6",
  "detail-type": "Object Created",
  "source": "aws.s3",
  "account": "012345678901",
  "time": "2022-05-19T22:21:10Z",
  "region": "us-east-2",
  "resources": [
    "arn:aws:s3:::bucket-012345678901-us-east-2-you-know-media"
  ],
  "detail": {
    "version": "0",
    "bucket": {
      "name": "bucket-012345678901-us-east-2-you-know-media"
    },
    "object": {
      "key": "Podcase_Episode.m4a",
      "size": 202329,
      "etag": "624fce93a981f97d85025e8432e24f48",
      "sequencer": "006286C2D604D7A390"
    },
    "request-id": "B4DA7RD214V1QG3W",
    "requester": "012345678901",
    "source-ip-address": "172.0.0.1",
    "reason": "PutObject"
  }
}

The state machine allows you to prepare parameters and use the direct SDK integrations to start the transcription job by calling the Amazon Transcribe service’s API. This integration means you don’t have to write custom code to perform this function. The event triggering the state machine execution contains the uploaded media file location.


  StartTranscriptionJob:
	Type: Task
	Comment: Start a transcribe job on the provided media file
	Parameters:
	  Media:
		MediaFileUri.$: States.Format('s3://{}/{}', $.detail.bucket.name, $.detail.object.key)
	  TranscriptionJobName.$: "$.detail.object.key"
	  IdentifyLanguage: true
	  OutputBucketName: !Ref TranscribeResultsBucket
	Resource: !Sub 'arn:${AWS::Partition}:states:${AWS::Region}:${AWS::AccountId}:aws-sdk:transcribe:startTranscriptionJob'

The SDK uses aws-sdk:transcribe:getTranscriptionJob to get the status of the job.


  GetTranscriptionJob:
	Type: Task
	Comment: Retrieve the status of an Amazon Transcribe job
	Parameters:
	  TranscriptionJobName.$: "$.TranscriptionJob.TranscriptionJobName"
	Resource: !Sub 'arn:${AWS::Partition}:states:${AWS::Region}:${AWS::AccountId}:aws-sdk:transcribe:getTranscriptionJob'
	Next: TranscriptionJobStatus

The state machine uses a polling loop with a delay to check the status of the transcription job.


  TranscriptionJobStatus:
	Type: Choice
	Choices:
	- Variable: "$.TranscriptionJob.TranscriptionJobStatus"
	  StringEquals: COMPLETED
	  Next: AthenaStartQueryExecutionUsingPreparedStatement
	- Variable: "$.TranscriptionJob.TranscriptionJobStatus"
	  StringEquals: FAILED
	  Next: Failed
	Default: Wait

When the transcription job completes successfully, the filler phrase counting process begins.

An Athena prepared statement performs the query with the transcription job name as a runtime parameter. The AWS SDK starts the query and the state machine execution pauses, waiting for the results to return before progressing to the next state:

athena:startQueryExecution.sync

When the query completes, Step Functions uses the SDK integration to retrieve the results using athena:getQueryResults:

athena:getQueryResults

It creates an Athena prepared statement to pass the transcription jobname as a parameter for the query execution:

  ResultsQueryPreparedStatement:
    Type: AWS::Athena::PreparedStatement
    Properties:
      Description: Create a statement that allows the use of a parameter for specifying an Amazon Transcribe job name in the Athena query
      QueryStatement: !Sub >-
        select cardinality(regexp_extract_all(results.transcripts[1].transcript, '${FillerPhrase}')) AS item_count from "${GlueDatabaseYouKnow}"."${AthenaTableName}" where jobname like ?
      StatementName: !Ref AthenaQueryPreparedStatementName
      WorkGroup: !Ref AthenaWorkgroup

There are several opportunities to enhance this tool. For example, adding support for multiple filler phrases. You could build a larger application to upload media and retrieve the results. You could take advantage of Amazon Transcribe’s real-time transcription API to display the results while a presentation is in progress to provide immediate feedback to the presenter.

Cleaning up

  1. Navigate to the Amazon Transcribe console. Choose Transcription jobs in the left pane, select the jobs created by this example, and choose Delete.
  2. Cleanup Delete

    Cleanup Delete

  3. Navigate to the S3 console. In the Find buckets by name search bar, enter “you-know”. This shows the list of buckets created for this example. Choose each of the radio buttons next to the bucket individually and choose Empty.
  4. Cleanup S3

    Cleanup S3

  5. Use the following command to delete the stack, and confirm the stack deletion.
  6. sam delete

Conclusion

Low-code applications can increase developer efficiency by reducing the amount of custom code required to build solutions. They can also enable non-developer roles to create automation to perform business functions by providing drag-and-drop style user interfaces.

This post shows how a low-code approach can build a tool chain using AWS services. The example processes media files to produce text transcripts and count the use of filler phrases in those transcripts. It shows how to process EventBridge data and how to invoke Amazon Transcribe and Athena using Step Functions state machines.

For more serverless learning resources, visit Serverless Land.

AWS Week In Review – June 6, 2022

Post Syndicated from Antje Barth original https://aws.amazon.com/blogs/aws/aws-week-in-review-june-6-2022/

This post is part of our Week in Review series. Check back each week for a quick roundup of interesting news and announcements from AWS!

I’ve just come back from a long (extended) holiday weekend here in the US and I’m still catching up on all the AWS launches that happened this past week. I’m particularly excited about some of the data, machine learning, and quantum computing news. Let’s have a look!

Last Week’s Launches
The launches that caught my attention last week are the following:

Amazon EMR Serverless is now generally available Amazon EMR Serverless allows you to run big data applications using open-source frameworks such as Apache Spark and Apache Hive without configuring, managing, and scaling clusters. The new serverless deployment option for Amazon EMR automatically scales resources up and down to provide just the right amount of capacity for your application, and you only pay for what you use. To learn more, check out Channy’s blog post and listen to The Official AWS Podcast episode on EMR Serverless.

AWS PrivateLink is now supported by additional AWS services AWS PrivateLink provides private connectivity between your virtual private cloud (VPC), AWS services, and your on-premises networks without exposing your traffic to the public internet. The following AWS services just added support for PrivateLink:

  • Amazon S3 on Outposts has added support for PrivateLink to perform management operations on your S3 storage by using private IP addresses in your VPC. This eliminates the need to use public IPs or proxy servers. Read the June 1 What’s New post for more information.
  • AWS Panorama now supports PrivateLink, allowing you to access AWS Panorama from your VPC without using public endpoints. AWS Panorama is a machine learning appliance and software development kit (SDK) that allows you to add computer vision (CV) to your on-premises cameras. Read the June 2 What’s New post for more information.
  • AWS Backup has added PrivateLink support for VMware workloads, providing direct access to AWS Backup from your VMware environment via a private endpoint within your VPC. Read the June 3 What’s New post for more information.

Amazon SageMaker JumpStart now supports incremental model training and automatic tuning – Besides ready-to-deploy solution templates for common machine learning (ML) use cases, SageMaker JumpStart also provides access to more than 300 pre-trained, open-source ML models. You can now incrementally train all the JumpStart models with new data without training from scratch. Through this fine-tuning process, you can shorten the training time to reach a better model. SageMaker JumpStart now also supports model tuning with SageMaker Automatic Model Tuning from its pre-trained model, solution templates, and example notebooks. Automatic tuning allows you to automatically search for the best hyperparameter configuration for your model.

Amazon Transcribe now supports automatic language identification for multi-lingual audioAmazon Transcribe converts audio input into text using automatic speech recognition (ASR) technology. If your audio recording contains more than one language, you can now enable multi-language identification, which identifies all languages spoken in the audio file and creates a transcript using each identified language. Automatic language identification for multilingual audio is supported for all 37 languages that are currently supported for batch transcriptions. Read the What’s New post from Amazon Transcribe to learn more.

Amazon Braket adds support for Borealis, the first publicly accessible quantum computer that is claimed to offer quantum advantage – If you are interested in quantum computing, you’ve likely heard the term “quantum advantage.” It refers to the technical milestone when a quantum computer outperforms the world’s fastest supercomputers on a well-defined task. Until now, none of the devices claimed to demonstrate quantum advantage have been accessible to the public. The Borealis device, a new photonic quantum processing unit (QPU) from Xanadu, is the first publicly available quantum computer that is claimed to have achieved quantum advantage. Amazon Braket, the quantum computing service from AWS, has just added support for Borealis. To learn more about how you can test a quantum advantage claim for yourself now on Amazon Braket, check out the What’s New post covering the addition of Borealis support.

For a full list of AWS announcements, be sure to keep an eye on the What’s New at AWS page.

Other AWS News
Some other updates and news that you may have missed:

New AWS Heroes – A warm welcome to our newest AWS Heroes! The AWS Heroes program is a worldwide initiative that acknowledges individuals who have truly gone above and beyond to share knowledge in technical communities. Get to know them in the June 2022 introduction blog post!

AWS open-source news and updates – My colleague Ricardo Sueiras writes this weekly open-source newsletter in which he highlights new open-source projects, tools, and demos from the AWS Community. Read edition #115 here.

Upcoming AWS Events
Join me in Las Vegas for Amazon re:MARS 2022. The conference takes place June 21–24 and is all about the latest innovations in machine learning, automation, robotics, and space. I will deliver a talk on how machine learning can help to improve disaster response. Say “Hi!” if you happen to be around and see me.

We also have more AWS Summits coming up over the next couple of months, both in-person and virtual.

In Europe:

In North America:

In South America:

Find an AWS Summit near you, and get notified when registration opens in your area.

Imagine Conference 2022You can now register for IMAGINE 2022 (August 3, Seattle). The IMAGINE 2022 conference is a no-cost event that brings together education, state, and local leaders to learn about the latest innovations and best practices in the cloud.

Sign up for the SQL Server Database Modernization webinar on June 21 to learn how to modernize and cost-optimize Microsoft SQL Server on AWS.

That’s all for this week. Check back next Monday for another Week in Review!

— Antje

Extract Insights From Customer Conversations with Amazon Transcribe Call Analytics

Post Syndicated from Julien Simon original https://aws.amazon.com/blogs/aws/extract-insights-from-customer-conversations-with-amazon-transcribe-call-analytics/

In 2017, we launched Amazon Transcribe, an automatic speech recognition (ASR) service that makes it easy to add speech-to-text capabilities to any application. Today, I’m very happy to announce the availability of Amazon Transcribe Call Analytics, a new feature that lets you easily extract valuable insights from customer conversations with a single API call.

Each discussion with potential or existing customers is an opportunity to learn about their needs and expectations. For example, it’s important for customer service teams to figure out the main reasons why customers are calling them, and measure customer satisfaction during these calls. Likewise, salespeople try to gauge customer interest, and their reaction to a particular sales pitch.

Thus, many customers and partners would like to add call analytics capabilities in different applications, regardless of their contact center provider. They often need to analyze more than phone calls, for example web-based audio and video calls. So far, they’ve typically done this by stitching AI services and dedicated ML models together, and they’ve asked us for a simpler solution.

We got to work and built Amazon Transcribe Call Analytics, a new addition to Transcribe and a key enhancement to AWS Contact Center Intelligence. If you can’t wait to try it, feel free to jump now to the AWS console. If you’d like to learn more, read on!

Introducing Amazon Transcribe Call Analytics
Based on ASR implemented in Transcribe, Transcribe Call Analytics adds natural language processing (NLP) capabilities specifically trained on customer calls, and optimized to provide highly accurate call transcripts and actionable insights. With a simple API call, developers can now easily add call analytics to any application, and extract customer insights from conversations without having to build AI pipelines and train custom ML models.

Key features of Transcribe Call Analytics include:

  • Timestamped turn-by-turn call transcription in 21 languages.
  • Issue detection, which picks up the shortest set of contiguous words in a conversation turn that represents the reason why the customer is calling. This works out of the box without any configuration or training.
  • Call categorization based on conversational characteristics:
    • Matching specific words and phrases,
    • Detecting non-talk time,
    • Detecting interruptions,
    • Analyzing sentiment for the customer and the agent.
  • Call characteristics such as:
    • How quickly and loudly a customer or agent are speaking,
    • Detecting non-talk time,
    • Detecting interruptions.
  • Redaction of sensitive data from the text transcript and the corresponding audio file.

For example, you can create rules to flag calls where customers interrupt the agent, exhibit negative sentiment, and say “I want to speak with the manager”. These calls certainly did not go well, and are worth analyzing in detail! You can also look for calls where agents don’t use pre-defined greetings (“Welcome to ACME Support, how can I help you today?”) within the first 15 seconds, to measure script compliance and help supervisors identify agent coaching opportunities. Another popular scenario is to create rules that flag mentions of your specific products and services (“Your ACME Turbo 2000 vacuum cleaner isn’t working like it should”), in order to pick up any emerging trends you’d need to be aware of.

Last but not least, you can further process the text transcript with other AI services such as Amazon Translate, or with custom NLP models built with Amazon SageMaker.

Now, let’s do a quick demo.

Extracting Insights with Amazon Transcribe Call Analytics
Here’s a fictitious support call, where a lady calls her bank to report that she’s lost her credit and debit cards. The sound file is a stereo WAV file (16-bit, 8KHz).

Transcribe Call Analytics requires that the agent and the customer are recorded in their own channel. We’ll also need to tell which is the agent channel. In a stereo file, the left channel is usually the first channel (channel #0), and the right channel is the second one (channel #1). This is the case for this call.

If you’re not sure which is which, you can easily use the versatile ffmpeg open source tool to extract each channel to a separate audio file.

$ ffmpeg -i demo-call.wav -map_channel 0.0.0 channel0.wav -map_channel 0.0.1 channel1.wav

You can use the same technique to extract audio channels from other file types, such as video files, and recombine them to a stereo audio file. You’ll find more information in the ffmpeg documentation.

Now that I’m sure that the agent is in channel #1, I use the AWS CLI to upload the audio file to an S3 bucket.

$ aws s3 cp launch-call.wav s3://jsimon-transcribe-useast1/demo-call.wav --region us-east-1

Opening the Transcribe Call Analytics console, I see that call category templates are available.

Call categories

I decide to create one for supervisor escalations. Then, with a couple of clicks, I create a custom call category named welcome-message, to check if the agent starts the call with an appropriate welcome. I could add several phrases to check for if needed. We recommend that you use short sentences to minimize the chance of filler words popping up (‘hmm’, ‘err’, and so on).

Call category

Then, I create a call analytics job using the general model available in Transcribe. I also enable automatic language detection.

Creating a job

Then, I define the location of the audio file in S3, flagging channel #1 as the agent channel.

Creating a job

I decide to store the transcript in the default S3 bucket created by Transcribe in my account. I could also use my own bucket if needed. Then, I pick an AWS Identity and Access Management (IAM) role with sufficient permissions, and I launch the job.

A minute later or so, the job is complete. The console contains a preview of the text transcript, as well as a link to the full JSON transcript.

Viewing the transcript

As the agent used the proper welcome sentence in the first 15 seconds, the call is tagged with the category I created earlier.

Call categories

Downloading the JSON transcript, each sentence in the conversation is enriched with metadata on per-word loudness, measured on a 0-100 range with 100 being extremely loud. Here’s the first sentence:

"BeginOffsetMillis":440,"EndOffsetMillis":4960,
"Sentiment":"NEUTRAL",
"ParticipantRole":"AGENT",
"LoudnessScores":[78.68,80.4,81.91,78.95,82.34],
"Content":"Hello and thank you for calling the bank. This is Ashley speaking, how may I help you today?"

Looking at the next sentence, I see that Transcribe Call Analytics automatically detected what the customer issue is. The corresponding text is in bold:

"Content": "Hi um uh you just need to cancel my card. Um I have a debit card and a credit card.",
"IssuesDetected":[{"UnredactedCharacterOffsets":{"Begin": 26,"End": 40}}. . .

At the end of the transcript, I see global call statistics (duration, talk time, words per minute, matched categories). Transcribe also gives me overall sentiment information, meaured from -5 (extremely negative) to +5 (extremely positive). I also get a a breakdown in four quarters.

"Sentiment":{"OverallSentiment":{"AGENT":2.6,"CUSTOMER":0.2},
"SentimentByPeriod":{"QUARTER":
{"AGENT":[
{"Score":1.9,"BeginOffsetMillis":0,"EndOffsetMillis":68457},
{"Score":-0.7,"BeginOffsetMillis":68457,"EndOffsetMillis":136915},
{"Score":5.0,"BeginOffsetMillis":136915,"EndOffsetMillis":205372},
{"Score":3.0,"BeginOffsetMillis":205372,"EndOffsetMillis":273830}],
"CUSTOMER":[
{"Score":-1.7,"BeginOffsetMillis":0,"EndOffsetMillis":68165},
{"Score":0.0,"BeginOffsetMillis":68165,"EndOffsetMillis":136330},
{"Score":0.0,"BeginOffsetMillis":136330,"EndOffsetMillis":204495},
{"Score":2.1,"BeginOffsetMillis":204495,"EndOffsetMillis":272660}]}}}

We can see that the customer started the call with negative sentiment, moving quickly to neutral sentiment, and ending the call with positive sentiment. This is a good sign that the call was handled satisfactorily, and that the customer problem was solved.

If you’d like to convert the transcript to a Word document with additional visualizations, my colleague Andrew Kane has built a nice tool and made it available on Github. Here’s a sample report produced by his tool.

Andrew's tool

AWS Customers and Partners Are Using Amazon Transcribe Call Analytics

Ben Rigby, the SVP, Global Head of Product & Engineering, Artificial Intelligence, Automation, and Workforce at Talkdesk told us, “Our customers are processing millions of customer service calls in their contact centers a year and have a critical need to extract actionable conversation insights to ensure positive business outcomes. As an AWS Contact Center Intelligence partner, we further enhanced our call transcription capabilities with Amazon Transcribe. With the launch of Amazon Transcribe Call Analytics, we’re excited to add even more AI capabilities to our Speech Analytics and QM Assist products. These deeper insights can provide agents and supervisors with the data they need to improve the speed and quality of their customer service while boosting workforce productivity.

Praphul Kumar, the Chief Product Officer of SuccessKPI adds, “Amazon Transcribe Call Analytics API enables us to add ML-based capabilities to our platform faster and at a lower cost. This new API removes the need to integrate multiple AI services together and develop custom machine learning models in certain areas. With Transcribe Call Analytics, we will be able to provide conversation insights such as sentiment, non-talk time, and call categories to gauge agent performance. This helps to drive better call outcomes, reduce agent turnover, uncover agent coaching opportunities, and measure call script compliance. Combining AWS services into SuccessKPI’s experience analytics platform was a no brainer. We are looking forward to bringing this valuable capability into the hands of large enterprises and government agencies.

Getting Started
A single API call is all it takes to extract rich insights from your customer conversations. You can start using Amazon Transcribe Call Analytics today in the following regions:

  • US West (Oregon), US East (N. Virginia),
  • Canada (Central),
  • Europe (London), Europe (Frankfurt),
  • Asia Pacific (Mumbai), Asia Pacific (Singapore), Asia Pacific (Seoul), Asia Pacific (Tokyo), Asia Pacific (Sydney).

Please give this new feature a try in the AWS console, and let us know what you think. We always look forward to your feedback! You can send it through your usual AWS Support contacts or post it on the AWS Forum for Amazon Transcribe.

One last thing: if you’re looking for an easy to use omnichannel cloud contact center, you should definitely take a look at Amazon Connect and its ML powered analytics, Contact Lens.

– Julien

Amazon Transcribe Now Supports Automatic Language Identification

Post Syndicated from Julien Simon original https://aws.amazon.com/blogs/aws/amazon-transcribe-now-supports-automatic-language-identification/

In 2017, we launched Amazon Transcribe, an automatic speech recognition service that makes it easy for developers to add a speech-to-text capability to their applications. Since then, we added support for more languages, enabling customers globally to transcribe audio recordings in 31 languages, including 6 in real-time.

A popular use case for Amazon Transcribe is transcribing customer calls. This allows companies to analyze the transcribed text using natural language processing techniques to detect sentiment or to identify the most common call causes. If you operate in a country with multiple official languages or across multiple regions, your audio files can contain different languages. Thus, files have to be tagged manually with the appropriate language before transcription can take place. This typically involves setting up teams of multi-lingual speakers, which creates additional costs and delays in processing audio files.

The media and entertainment industry often uses Amazon Transcribe to convert media content into accessible and searchable text files. Use cases include generating subtitles or transcripts, moderating content, and more. Amazon Transcribe is also used by operations team for quality control, for example checking that audio and video are in sync thanks to the timestamps present in the extracted text. However, other problems couldn’t be easily solved, such as verifying that the main spoken language in your videos is correctly labeled to avoid streaming video in the wrong language.

Today, I’m extremely happy to announce that Amazon Transcribe can now automatically identify the dominant language in an audio recording. This feature will help customers build more efficient transcription workflows by getting rid of manual tagging. In addition to the examples mentioned above, you can now also easily use Amazon Transcribe to automatically recognize and transcribe voicemails, meetings, and any form of recorded communication.

Introducing Automatic Language Identification
With a minimum of 30 seconds of audio, Amazon Transcribe can efficiently generate transcripts in the spoken language without wasting time and resources on manual tagging. Automatic identification of the dominant language is available in batch transcription mode for all 31 languages. Thanks to sampling techniques, language identification happens much faster than the transcription itself, in the matter of seconds.

If you’re already using Amazon Transcribe for speech recognition, you just need to enable the feature in the StartTranscriptionJob API. Before your transcription job is complete, the response of the GetTranscriptionJob API will tell the dominant language of the audio recording, and its confidence score between 0 and 1. The transcript lists the top five languages and their respective confidence scores.

Of course, if you want to use Amazon Transcribe exclusively for automatic language identification, you can simply process the API response and ignore the transcript. In this case, you should stick to short 30-45 second audio recordings to minimize costs.

You can also restrict languages that Amazon Transcribe tries to identify, by passing a list of languages to the StartTranscriptionJob API. For example, if your company call center only receives calls in English, Spanish and French, then restricting identifiable languages to this list will increase language identification accuracy.

Now, I’d like to show you how easy it us to use this new feature!

Detecting the Dominant Language With Amazon Transcribe
First, let’s try a high quality sample. I’ll use the audio track from one of my breakout sessions at AWS Summit Paris 2019. I can easily download it using the youtube-dl tool.

$ youtube-dl -f bestaudio https://www.youtube.com/watch?v=AFN5jaTurfA
$ mv AWS\ \&\ EarthCube\ _\ Deep\ learning\ démarrer\ avec\ MXNet\ et\ Tensorflow\ en\ 10\ minutes-AFN5jaTurfA.m4a video.m4a

Using ffmpeg, I shorten the audio clip to 1 minute.

$ ffmpeg -i video.m4a -ss 00:00:00.00 -t 00:01:00.00 video-1mn.m4a

Then, I upload the clip to an Amazon Simple Storage Service (S3) bucket.

$ aws s3 cp video-1mn.m4a s3://jsimon-transcribe-uswest2/

Next, I use the AWS CLI to run a transcription job on this audio clip, with language identification enabled.

$ awscli transcribe start-transcription-job --transcription-job-name video-test --identify-language --media MediaFileUri=s3://jsimon-transcribe-uswest2/video-1mn.m4a

Waiting only a few seconds, I check the status of the job. I could also use a Amazon CloudWatch event to be notified that language identification is complete.

$ awscli transcribe get-transcription-job --transcription-job-name video-test
{
    "TranscriptionJob": {
        "TranscriptionJobName": "video-test",
        "TranscriptionJobStatus": "IN_PROGRESS",
        "LanguageCode": "fr-FR",
        "MediaSampleRateHertz": 44100,
        "MediaFormat": "mp4",
        "Media": {
        "MediaFileUri": "s3://jsimon-transcribe-uswest2/video-1mn.m4a"
    },
    "Transcript": {},
    "StartTime": 1593704323.312,
"CreationTime": 1593704323.287,

    "Settings": {
        "ChannelIdentification": false,
        "ShowAlternatives": false
    },
    "IdentifyLanguage": true,
    "IdentifiedLanguageScore": 0.915885329246521
    }
}

As highlighted in the output, the dominant language has been correctly detected in seconds, with a high confidence score of 91.59%. A few more seconds later, the transcription job is complete. Running the same CLI call, I can retrieve a link to the transcription, which also includes the top 5 languages for the audio clip, sorted by decreasing score.

"language_identification":[{"score":"0.9159","code":"fr-FR"},{"score":"0.0839","code":"fr-CA"},{"score":"0.0001","code":"en-GB"},{"score":"0.0001","code":"pt-PT"},{"score":"0.0001","code":"de-CH"}]

Adding up French and Canadian French, we pretty much get a score of 100%, so there’s no doubt that this clip is in French. In some cases, you may not care for that level of detail, and you’ll see in the next example how to restrict the list of detected languages.

Restricting the List of Detected Languages
As customer call transcription is a popular use case for Amazon Transcribe, here is a 40-second audio clip (WAV, 8KHz, 16-bit resolution), where I’m reading a paragraph from the French version of the Amazon Transcribe page. As you can hear, quality is pretty awful, and I added background music (Bach-ground, actually) for good measure.

Again, I upload the clip to an S3 bucket, and I use the AWS CLI to transcribe it. This time, I restrict the list of languages to French, Spanish, German, US English, and British English.

$ aws s3 cp speech-8k.wav s3://jsimon-transcribe-uswest2/
$ awscli transcribe start-transcription-job --transcription-job-name speech-8k-test --identify-language --media MediaFileUri=s3://jsimon-transcribe-uswest2/speech-8k.wav --language-options fr-FR es-ES de-DE en-US en-GB

A few seconds later, I check the status of the job.

$ awscli transcribe get-transcription-job --transcription-job-name speech-8k-test
{
    "TranscriptionJob": {
    "TranscriptionJobName": "speech-8k-test",
    "TranscriptionJobStatus": "IN_PROGRESS",
    "LanguageCode": "fr-FR",
    "MediaSampleRateHertz": 8000,
    "MediaFormat": "wav",
    "Media": {
        "MediaFileUri": "s3://jsimon-transcribe-uswest2/speech-8k.wav"
    },
    "Transcript": {},
    "StartTime": 1593705151.446,
"CreationTime": 1593705151.423,

    "Settings": {
        "ChannelIdentification": false,
        "ShowAlternatives": false
    },
    "IdentifyLanguage": true,
    "LanguageOptions": [
        "fr-FR","es-ES","de-DE","en-US","en-GB"
    ],
    "IdentifiedLanguageScore": 0.9995
    }
}

As highlighted in the output, the dominant language has been correctly detected with a very high confidence score in spite of the terrible audio quality. Restricting the list of languages certainly helps, and you should use it whenever possible.

Getting Started
Automatic Language Identification is available today in these regions:

  • US East (N. Virginia), US East (Ohio), US West (N. California), US West (Oregon), AWS GovCloud (US-West).
  • Canada (Central).
  • South America (São Paulo).
  • Europe (Ireland), Europe (London), Europe (Paris), Europe (Frankfurt).
  • Middle East (Bahrain).
  • Asia Pacific (Hong Kong), Asia Pacific (Mumbai), Asia Pacific (Tokyo), Asia Pacific (Seoul), Asia Pacific (Singapore), Asia Pacific (Sydney).

There is no additional charge on top of the existing pricing. Give it a try, and please send us feedback either through your usual AWS Support contacts, or on the AWS Forum for Amazon Transcribe.

– Julien