Tag Archives: Amazon Transcribe

Amazon Transcribe Medical – Real-Time Automatic Speech Recognition for Healthcare Customers

Post Syndicated from Julien Simon original https://aws.amazon.com/blogs/aws/amazon-transcribe-medical-real-time-automatic-speech-recognition-for-healthcare-customers/

In 2017, we launched Amazon Transcribe, an automatic speech recognition service that makes it easy for developers to add speech-to-text capability to their applications: today, we’re extremely happy to extend it to medical speech with Amazon Transcribe Medical.

When I was a child, my parents – both medical doctors – often spent evenings recording letters and exam reports with a microcassette recorder, so that their secretary could later type them and archive them. That was a long time ago, but according to a 2017 study by the University of Wisconsin and the American Medical Association, primary care physicians in the US spend a staggering 6 hours per day entering their medical reports in electronic health record (EHR) systems, now a standard requirement at healthcare providers.

I don’t think that anyone would argue that doctors should go back to paper reports: working with digital data is so much more efficient. Still, could they be spared these long hours of administrative work? Surely, that time would be better spent engaging with patients, and getting a little extra rest after a busy day at the hospital?

Introducing Amazon Transcribe Medical
Thanks to Amazon Transcribe Medical, physicians will now be able to easily and quickly dictate their clinical notes and see their speech converted to accurate text in real-time, without any human intervention. Clinicians can use natural speech and do not have to explicitly call out punctuation like “comma” or “full stop”. This text can then be automatically fed to downstream applications such as EHR systems, or to AWS language services such as Amazon Comprehend Medical for entity extraction.

In the spirit of fully managed services, Transcribe Medical frees you from any infrastructure work, and lets you scale effortlessly while only paying for what you actually use: no upfront fees for costly licenses! As you would expect, Transcribe Medical is also HIPAA compliant.

From a technical perspective, all you have to do is capture audio using your device’s microphone, and send PCM audio to a streaming API based on the popular Websocket protocol. This API will respond with a series of JSON blobs with the transcribed text, as well as word-level time stamps, punctuation, etc. Optionally, you can save this data to an Amazon Simple Storage Service (S3) bucket.

Amazon Transcribe Medical In Action
Let’s do a quick demo with medical text from MT Samples, a great collection of real-life anonymized medical transcripts that are free to use and distribute.

I’m using a streaming application modified for Transcribe Medical, and you’ll be able to do the same in the AWS console. You can view a video recording of this demo here.

Now Available!
You can start using Amazon Transcribe Medical today in the US East (N. Virginia) and US West (Oregon) regions.

Give it a try, and please share your feedback in the AWS forum for Amazon Transcribe, or with your usual AWS support contacts.

– Julien

Amazon Transcribe Now Supports Mandarin and Russian

Post Syndicated from Julien Simon original https://aws.amazon.com/blogs/aws/amazon-transcribe-now-supports-mandarin-and-russian/

As speech is central to human interaction, artificial intelligence research has long focused on speech recognition, the first step in designing and building systems allowing humans to interact intuitively with machines. The diversity in languages, accents and voices makes this an incredibly difficult problem, requiring expert skills, extremely large data sets, and vast amounts of computing power to train efficient models.

In order to help organizations and developers use speech recognition in their applications, we launched Amazon Transcribe at AWS re:Invent 2017, an automatic speech recognition service. Thanks to Amazon Transcribe, customers such as VideoPeel, Echo360, or GE Appliances have been able to quickly and easily add speech recognition capabilities to their applications and devices.

A single API call is all that it takes… and you don’t need to know the first thing about machine learning. You can analyze audio files stored in Amazon Simple Storage Service (S3) and have the service return a text file of the transcribed speech. You can also send a live audio stream to Amazon Transcribe and receive a stream of transcripts in real time.

Since launch, the team has constantly added new languages, and today we are happy to announce support for Mandarin and Russian, bringing the total number of supported languages to 16.

Introducing Mandarin
Working with Amazon Transcribe is extremely simple: let me show you how to get started in just a few minutes.

Let’s try Mandarin first. Starting from this Little Red Riding Hood video, I extracted the audio track, saved it in MP3 format, and uploaded it to one of my Amazon Simple Storage Service (S3) buckets. Here’s the actual file.

Then, I started a transcription job using the AWS CLI:

$ aws transcribe start-transcription-job--media MediaFileUri=https://s3-us-west-2.amazonaws.com/jsimon-transcribe-demo/little_red_riding_hood-mandarin.mp3 --media-format mp3 --language-code zh-CN --transcription-job-name little_red_riding_hood-mandarin

After a few minutes, the job is complete. Looking at the AWS console, I can either download it using the URL provided by Amazon Transcribe, or read it directly.

Unfortunately, I don’t speak Mandarin, but using Amazon Translate, this text is about a sick grandmother and a big bad wolf, so it looks like Amazon Transcribe did its job!

Introducing Russian
Let’s try Russian now, using the dialogue in this short video.

Здравствуйте!Greetings!
Добрый день!Good day!
Давайте познакомимся. Меня зовут Слава.Let’s introduce ourselves. My name is Slava.
Очень приятно, а меня – Наташа.Nice to meet you, and mine – Natasha.
Наташа, кто вы по профессии?Natasha, what is your profession?
Я врач. А вы?I (am a) doctor. And you?
Я инженер.I (am an) engineer.

This time, I will ask Amazon Transcribe to perform speaker identification too.

$ aws transcribe start-transcription-job --media MediaFileUri=https://s3-us-west-2.amazonaws.com/jsimon-transcribe-demo/russian-dialogue.mp3 --media-format mp3 --language-code ru-RU --transcription-job-name russian_dialogue --settings ShowSpeakerLabels=true,MaxSpeakerLabels=2

Here is the result.

As you can see, not only has Amazon Transcribe faithfully converted speech to text, it has also correctly assigned each sentence to the correct speaker.

Now Available!
You can start using these two new languages today in the following regions:

  • Americas: US East (Ohio), US East (N. Virginia), US West (N. California), US West (Oregon), AWS GovCloud (US-West), Canada (Central), South America (Sao Paulo).
  • Europe: EU (Frankfurt), EU (Ireland), EU (London), EU (Paris).
  • Asia Pacific: Asia Pacific (Mumbai), Asia Pacific (Seoul), Asia Pacific (Singapore), Asia Pacific (Sydney).

The free tier covers 60 minutes for the first 12 months, starting from your first transcription request.

As always, we’d love to hear your feedback: please post it to the AWS forum for Amazon Transcribe, or send it through your usual AWS contacts.

Julien;

Amazon Transcribe Streaming Now Supports WebSockets

Post Syndicated from Brandon West original https://aws.amazon.com/blogs/aws/amazon-transcribe-streaming-now-supports-websockets/

I love services like Amazon Transcribe. They are the kind of just-futuristic-enough technology that excites my imagination the same way that magic does. It’s incredible that we have accurate, automatic speech recognition for a variety of languages and accents, in real-time. There are so many use-cases, and nearly all of them are intriguing. Until now, the Amazon Transcribe Streaming API available has been available using HTTP/2 streaming. Today, we’re adding WebSockets as another integration option for bringing real-time voice capabilities to the things you build.

In this post, we are going to transcribe speech in real-time using only client-side JavaScript in a browser. But before we can build, we need a foundation. We’ll review just enough information about Amazon Transcribe, WebSockets, and the Amazon Transcribe Streaming API to broadly explain the demo. For more detailed information, check out the Amazon Transcribe docs.

If you are itching to see things in action, you can head directly to the demo, but I recommend taking a quick read through this post first.

What is Amazon Transcribe?

Amazon Transcribe applies machine learning models to convert speech in audio to text transcriptions. One of the most powerful features of Amazon Transcribe is the ability to perform real-time transcription of audio. Until now, this functionality has been available via HTTP/2 streams. Today, we’re announcing the ability to connect to Amazon Transcribe using WebSockets as well.

For real-time transcription, Amazon Transcribe currently supports British English (en-GB), US English (en-US), French (fr-FR), Canadian French (fr-CA), and US Spanish (es-US).

What are WebSockets?

WebSockets are a protocol built on top of TCP, like HTTP. While HTTP is great for short-lived requests, it hasn’t historically been good at handling situations that require persistent real-time communications. While an HTTP connection is normally closed at the end of the message, a WebSocket connection remains open. This means that messages can be sent bi-directionally with no bandwidth or latency added by handshaking and negotiating a connection. WebSocket connections are full-duplex, meaning that the server can client can both transmit data at the same time. They were also designed for cross-domain usage, so there’s no messing around with cross-origin resource sharing (CORS) as there is with HTTP.

HTTP/2 streams solve a lot of the issues that HTTP had with real-time communications, and the first Amazon Transcribe Streaming API available uses HTTP/2. WebSocket support opens Amazon Transcribe Streaming up to a wider audience, and makes integrations easier for customers that might have existing WebSocket-based integrations or knowledge.

How the Amazon Transcribe Streaming API Works

Authorization

The first thing we need to do is authorize an IAM user to use Amazon Transcribe Streaming WebSockets. In the AWS Management Console, attach the following policy to your user:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "transcribestreaming",
            "Effect": "Allow",
            "Action": "transcribe:StartStreamTranscriptionWebSocket",
            "Resource": "*"
        }
    ]
}

Authentication

Transcribe uses AWS Signature Version 4 to authenticate requests. For WebSocket connections, use a pre-signed URL, that contains all of the necessary information is passed as query parameters in the URL. This gives us an authenticated endpoint that we can use to establish our WebSocket.

Required Parameters

All of the required parameters are included in our pre-signed URL as part of the query string. These are:

  • language-code: The language code. One of en-US, en-GB, fr-FR, fr-CA, es-US.
  • sample-rate: The sample rate of the audio, in Hz. Max of 16000 for en-US and es-US, and 8000 for the other languages.
  • media-encoding: Currently only pcm is valid.
  • vocabulary-name: Amazon Transcribe allows you to define custom vocabularies for uncommon or unique words that you expect to see in your data. To use a custom vocabulary, reference it here.

Audio Data Requirements

There are a few things that we need to know before we start sending data. First, Transcribe expects audio to be encoded as PCM data. The sample rate of a digital audio file relates to the quality of the captured audio. It is the number of times per second (Hz) that the analog signal is checked in order to generate the digital signal. For high-quality data, a sample rate of 16,000 Hz or higher is recommended. For lower-quality audio, such as a phone conversation, use a sample rate of 8,000 Hz. Currently, US English (en-US) and US Spanish (es-US) support sample rates up to 48,000 Hz. Other languages support rates up to 16,000 Hz.

In our demo, the file lib/audioUtils.js contains a downsampleBuffer() function for reducing the sample rate of the incoming audio bytes from the browser, and a pcmEncode() function that takes the raw audio bytes and converts them to PCM.

Request Format

Once we’ve got our audio encoding as PCM data with the right sample rate, we need to wrap it in an envelope before we send it across the WebSocket connection. Each messages consists of three headers, followed by the PCM-encoded audio bytes in the message body. The entire message is then encoded as a binary event stream message and sent. If you’ve used the HTTP/2 API before, there’s one difference that I think makes using WebSockets a bit more straightforward, which is that you don’t need to cryptographically sign each chunk of audio data you send.

Response Format

The messages we receive follow the same general format: they are binary-encoded event stream messages, with three headers and a body. But instead of audio bytes, the message body contains a Transcript object. Partial responses are returned until a natural stopping point in the audio is determined. For more details on how this response is formatted, check out the docs and have a look at the handleEventStreamMessage() function in main.js.

Let’s See the Demo!

Now that we’ve got some context, let’s try out a demo. I’ve deployed it using AWS Amplify Console – take a look, or push the button to deploy your own copy. Enter the Access ID and Secret Key for the IAM User you authorized earlier, hit the Start Transcription button, and start speaking into your microphone.

Deploy to Amplify Console

The complete project is available on GitHub. The most important file is lib/main.js. This file defines all our required dependencies, wires up the buttons and form fields in index.html, accesses the microphone stream, and pushes the data to Transcribe over the WebSocket. The code has been thoroughly commented and will hopefully be easy to understand, but if you have questions, feel free to open issues on the GitHub repo and I’ll be happy to help. I’d like to extend a special thanks to Karan Grover, Software Development Engineer on the Transcribe team, for providing the code that formed that basis of this demo.

Intuit: Serving Millions of Global Customers with Amazon Connect

Post Syndicated from Annik Stahl original https://aws.amazon.com/blogs/architecture/intuit-serving-millions-of-global-customers-with-amazon-connect/

Recently, Bill Schuller, Intuit Contact Center Domain Architect met with AWS’s Simon Elisha to discuss how Intuit manages its customer contact centers with AWS Connect.

As a 35-year-old company with an international customer base, Intuit is widely known as the maker of Quick Books and Turbo Tax, among other software products. Its 50 million customers can access its global contact centers not just for password resets and feature explanations, but for detailed tax interpretation and advice. As you can imagine, this presents a challenge of scale.

Using Amazon Connect, a self-service, cloud-based contact center service, Intuit has been able to provide a seamless call-in experience to Intuit customers from around the globe. When a customer calls in to Amazon Connect, Intuit is able to do a “data dip” through AWS Lambda out to the company’s CRM system (in this case, SalesForce) in order to get more information from the customer. At this point, Intuit can leverage other services like Amazon Lex for national language feedback and then get the customer to the right person who can help. When the call is over, instead of having that important recording of the call locked up in a proprietary system, the audio is moved into an S3 bucket, where Intuit can do some post-call processing. It can also be sent it out to third parties for analysis, or Intuit can use Amazon Transcribe or Amazon Comprehend to get a transcription or sentiment analysis to understand more about what happened during that particular call.

Watch the video below to understand the reasons why Intuit decided on this set of AWS services (hint: it has to do with the ability to experiment with speed and scale but without the cost overhead).

*Check out more This Is My Architecture video series.

About the author

Annik StahlAnnik Stahl is a Senior Program Manager in AWS, specializing in blog and magazine content as well as customer ratings and satisfaction. Having been the face of Microsoft Office for 10 years as the Crabby Office Lady columnist, she loves getting to know her customers and wants to hear from you.