Tag Archives: Amazon Translate

Amazon Translate now supports Office documents

Post Syndicated from Julien Simon original https://aws.amazon.com/blogs/aws/amazon-translate-now-supports-office-documents/

Whether your organization is a multinational enterprise present in many countries, or a small startup hungry for global success, translating your content to local languages may be an enduring challenge. Indeed, text data often comes in many formats, and processing them may require several different tools. Also, as all these tools may not support the same language pairs, you may have to convert certain documents to intermediate formats, or even resort to manual translation. All these issues add extra cost, and create unnecessary complexity in building consistent and automated translation workflows.

Amazon Translate aims at solving these problems in a simple and cost effective fashion. Using either the AWS console or a single API call, Amazon Translate makes it easy for AWS customers to quickly and accurately translate text in 55 different languages and variants.

Earlier this year, Amazon Translate introduced batch translation for plain text and HTML documents. Today, I’m very happy to announce that batch translation now also supports Office documents, namely .docx, .xlsx and .pptx files as defined by the Office Open XML standard.

Introducing Amazon Translate for Office Documents
The process is extremely simple. As you would expect, source documents have to be stored in an Amazon Simple Storage Service (S3) bucket. Please note that no document may be larger than 20 Megabytes, or have more than 1 million characters.

Each batch translation job processes a single file type and a single source language. Thus, we recommend that you organize your documents in a logical fashion in S3, storing each file type and each language under its own prefix.

Then, using either the AWS console or the StartTextTranslationJob API in one of the AWS language SDKs, you can launch a translation job, passing:

  • the input and output location in S3,
  • the file type,
  • the source and target languages.

Once the job is complete, you can collect translated files at the output location.

Let’s do a quick demo!

Translating Office Documents
Using the S3 console, I first upload a few .docx documents to one of my buckets.

S3 files

Then, moving to the Translate console, I create a new batch translation job, giving it a name, and selecting both the source and target languages.

Creating a batch job

Then, I define the location of my documents in S3, and their format, .docx in this case. Optionally, I could apply a custom terminology, to make sure specific words are translated exactly the way that I want.

Likewise, I define the output location for translated files. Please make sure that this path exists, as Translate will not create it for you.

Creating a batch job

Finally, I set the AWS Identity and Access Management (IAM) role, giving my Translate job the appropriate permissions to access S3. Here, I use an existing role that I created previously, and you can also let Translate create one for you. Then, I click on ‘Create job’ to launch the batch job.

Creating a batch job

The job starts immediately.

Batch job running

A little while later, the job is complete. All three documents have been translated successfully.

Viewing a completed job

Translated files are available at the output location, as visible in the S3 console.

Viewing translated files

Downloading one of the translated files, I can open it and compare it to the original version.

Comparing files

For small scale use, it’s extremely easy to use the AWS console to translate Office files. Of course, you can also use the Translate API to build automated workflows.

Automating Batch Translation
In a previous post, we showed you how to automate batch translation with an AWS Lambda function. You could expand on this example, and add language detection with Amazon Comprehend. For instance, here’s how you could combine the DetectDominantLanguage API with the Python-docx open source library to detect the language of .docx files.

import boto3, docx
from docx import Document

document = Document('blog_post.docx')
text = document.paragraphs[0].text
comprehend = boto3.client('comprehend')
response = comprehend.detect_dominant_language(Text=text)
top_language = response['Languages'][0]
code = top_language['LanguageCode']
score = top_language['Score']
print("%s, %f" % (code,score))

Pretty simple! You could also detect the type of each file based on its extension, and move it to the proper input location in S3. Then, you could schedule a Lambda function with CloudWatch Events to periodically translate files, and send a notification by email. Of course, you could use AWS Step Functions to build more elaborate workflows. Your imagination is the limit!

Getting Started
You can start translating Office documents today in the following regions: US East (N. Virginia), US East (Ohio), US West (Oregon), Europe (Ireland), Europe (London), Europe (Frankfurt), and Asia Pacific (Seoul).

If you’ve never tried Amazon Translate, did you know that the free tier offers 2 million characters per month for the first 12 months, starting from your first translation request?

Give it a try, and let us know what you think. We’re looking forward to your feedback: please post it to the AWS Forum for Amazon Translate, or send it to your usual AWS support contacts.

– Julien

Translating documents at enterprise scale with serverless

Post Syndicated from James Beswick original https://aws.amazon.com/blogs/compute/translating-documents-at-enterprise-scale-with-serverless/

For organizations operating in multiple countries, helping customers in different languages is an everyday reality. But in many IT systems, data remains static in a single language, making it difficult or impossible for international customers to use. In this blog post, I show how you can automate language translation at scale to solve a number of common enterprise problems.

Many types of data are good targets for translation. For example, product catalog information, for sharing with a geographically broad customer base. Or customer emails and interactions in multiple languages, translating back to a single language for analytics. Or even resource files for mobile applications, to automate string processing into different languages during the build process.

Building the machine learning models for language translation is extraordinarily complex, so fortunately you have a pay-as-you-go service available in Amazon Translate. This can accurately translate text between 54 languages, and automatically detects the source language.

Developing a scalable translation solution for thousands of documents can be challenging using traditional, server-based architecture. Using a serverless approach, this becomes much easier since you can use storage and compute services that scale for you – Amazon S3 and AWS Lambda:

Integrating S3 with Translate via Lambda.

In this post, I show two solutions that provide an event-based architecture for automated translation. In the first, S3 invokes a Lambda function when objects are stored. It immediately reads the file and requests a translation from Amazon Translate. The second solution explores a more advanced method for scaling to large numbers of documents, queuing the requests and tracking their state. The walkthrough creates resources covered in the AWS Free Tier but you may incur costs for usage.

To set up both example applications, visit the GitHub repo and follow the instructions in the README.md file. Both applications use the AWS Serverless Application Model (SAM) to make it easy to deploy in your AWS account.

Translating in near real time

In the first application, the workflow is straightforward. The source text is sent immediately to Translate for processing, and the result is saved back into the S3 bucket. It provides near real-time translation whenever an object is saved. This uses the following architecture:

Architecture for the first example application.

  1. The source text is saved in the Batching S3 bucket.
  2. The S3 put event invokes the Batching Lambda function. Since Translate has a limit of 5,000 characters per request, it slices the contents of the input into parts small enough for processing.
  3. The resulting parts are saved in the Translation S3 bucket.
  4. The S3 put events invoke the Translation function, which scales up concurrently depending on the number of parts.
  5. Amazon Translate returns the translations back to the Lambda function, which saves the results in the Translation bucket.

The repo’s SAM template allows you to specify a list of target languages, as a space-delimited list of supported language codes. In this case, any text uploaded to S3 is translated into French, Spanish, and Italian:

Parameters:
  TargetLanguage:
    Type: String
    Default: 'fr es it'

Testing the application

  1. Deploy the first application by following the README.md in the GitHub repo. Note the application’s S3 Translation and Batching bucket names shown in the output:Output values after SAM deployment.
  2. The testdata directory contains several sample text files. Change into this directory, then upload coffee.txt to the S3 bucket, replacing your-bucket below with your Translation bucket name:
    cd ./testdata/
    aws s3 cp ./coffee.txt s3://your-bucket
  3. The application invokes the translation workflow, and within a couple of seconds you can list the output files in the translations folder:aws s3 ls s3://your-bucket/translations/

    Translations output.

  4. Create an output directory, then download the translations to your local machine to view the contents:
    mkdir output
    aws s3 cp s3://your-bucket/translations/ ./output/ --recursive
    more ./output/coffee-fr.txt
    more ./output/coffee-es.txt
    more ./output/coffee-it.txt
  5. For the next step, translate several text files containing test data. Copy these to the Translation bucket, replacing your-bucket below with your bucket name:aws s3 cp ./ s3://your-bucket --include "*.txt" --exclude "*/*" --recursive
  6. After a few seconds, list the files in the translations folder to see your translated files:aws s3 ls s3://your-bucket/translations/

    Listing the translated files.

  7. Finally, translate a larger file using the batching process. Copy this file to the Batching S3 bucket (replacing your-bucket with this bucket name):
    cd ../testdata-batching/
    aws s3 cp ./your-filename.txt s3://your-bucket
  8. Since this is a larger file, the batching Lambda function breaks it apart into smaller text files in the Translation bucket. List these files in the terminal, together with their translations:
    aws s3 ls s3://your-bucket
    aws s3 ls s3://your-bucket/translations/

    Listing the output files.

In this example, you can translate a reasonable number of text files for a trivial use-case. However, in an enterprise environment where there could be thousands of files in a single bucket, you need a more robust architecture. The second application introduces a more resilient approach.

Scaling up the translation solution

In an enterprise environment, the application must handle long documents and large quantities of documents. Amazon Translate has service limits in place per account – you can request an increase via an AWS Support Center ticket if needed. However, S3 can ingest a large number of objects quickly, so the application should decouple these two services.

The next example uses Amazon SQS for decoupling, and also introduces Amazon DynamoDB for tracking the status of each translation. The new architecture looks like this:

Decoupled translation architecture.

  1. A downstream process saves text objects in the Batching S3 bucket.
  2. The Batching function breaks these files into smaller parts, saving these in the Translation S3 bucket.
  3. When an object is saved in this bucket, this invokes the Add to Queue function. This writes a message to an SQS queue, and logs the item in a DynamoDB table.
  4. The Translation function receives messages from the SQS queue, and requests translations from the Amazon Translate service.
  5. The function updates the item as completed in the DynamoDB table, and stores the output translation in the Results S3 bucket.

Testing the application

This test uses a much larger text document – the text version of the novel War and Peace, which is over 3 million characters long. It’s recommended that you use a shorter piece of text for the walkthrough, between 20-50 kilobytes, to minimize cost on your AWS bill.

  1. Deploy the second application by following the README.md in the GitHub repo, and note the application’s S3 bucket name and DynamoDB table name.
    The output values from the SAM deployment.
  2. Download your text sample and then upload it the Batching bucket. Replace your-bucket with your bucket name and your-text.txt with your text file name:aws s3 cp ./your-text.txt s3://your-bucket/ 
  3. The batching process creates smaller files in the Translation bucket. After a few seconds, list the files in the Translation bucket (replacing your-bucket with your bucket name):aws s3 ls s3://patterns-translations-v2/ --recursive --summarize

    Listing the translated files.

  4. To see the status of the translations, navigate to the DynamoDB console. Select Tables in the left-side menu and then choose the application’s DynamoDB table. Select the Items tab:Listing items in the DynamoDB table.

    This shows each translation file and a status of Queue or Translated.

  5. As translations complete, these appear in the Results bucket:aws s3 ls s3://patterns-results-v2/ --summarize

    Listing the output translations.

How this works

In the second application, the SQS queue acts as a buffer between the Batching process and the Translation process. The Translation Lambda function fetches messages from the SQS queue when they are available, and submits the source file to Amazon Translate. This throttles the overall speed of processing.

There are configuration settings you can change in the SAM template to vary the speed of throughput:

  • Translator function: this consumes messages from the SQS queue. The BatchSize configured in the SAM template is set to one message per invocation. This is equivalent to processing one source file at a time. You can set a BatchSize value from 1 to 10, so could increase this from the application’s default.
  • Function concurrency: the SAM template sets the Loader function’s concurrency to 1, using the ReservedConcurrentExecutions attribute. In effect, this means Lambda can only invoke 1 function at the same time. As a result, it keeps fetching the next batch from SQS as soon as processing finishes. The concurrency is a multiplier – as this value is increased, the translation throughput increases proportionately, if there are messages available in SQS.
  • Amazon Translate limits: the service limits in place are designed to protect you from higher-than-intended usage. If you need higher soft limits, open an AWS Support Center ticket.

Combining these settings, you have considerable control over the speed of processing. The defaults in the sample application are set at the lowest values possible so you can observe the queueing mechanism.

Conclusion

Automated translation using deep learning enables you to make documents available at scale to an international audience. For organizations operating globally, this can improve your user experience and increase customer access to your company’s products and services.

In this post, I show how you can create a serverless application to process large numbers of files stored in S3. The write operation in the S3 bucket triggers the process, and you use SQS to buffer the workload between S3 and the Amazon Translate service. This solution also uses DynamoDB to help track the state of the translated files.

Converting call center recordings into useful data for analytics

Post Syndicated from James Beswick original https://aws.amazon.com/blogs/compute/converting-call-center-recordings-into-useful-data-for-analytics/

Many businesses operate call centers that record conversations with customers for training or regulatory purposes. These vast collections of audio offer unique opportunities for improving customer service. However, since audio data is mostly unsearchable, it’s usually archived in these systems and never analyzed for insights.

Developing machine learning models for accurately understanding and transcribing speech is also a major challenge. These models require large datasets for optimal performance, along with teams of experts to build and maintain the software. This puts it out of reach for the majority of businesses and organizations. Fortunately, you can use AWS services to handle this difficult problem.

In this blog post, I show how you can use a serverless approach to analyze audio data from your call center. You can clone this application from the GitHub repo and modify to meet your needs. The solution uses Amazon ML services, together with scalable storage, and serverless compute. The example application has the following architecture:

The architecture for the call center audio analyzer.

For call center analysis, this application is useful to determine the types of general topics that customers are calling about. It can also detect the sentiment of the conversation, so if the call is a compliment or a complaint, you could take additional action. When combined with other metadata such as caller location or time of day, this can yield important insights to help you improve customer experience. For example, you might discover there are common service issues in a geography at a certain time of day.

To set up the example application, visit the GitHub repo and follow the instructions in the README.md file.

How the application works

A key part of the serverless solution is Amazon S3, an object store that scales to meet your storage needs. When new objects are stored, this triggers AWS Lambda functions, which scale to keep pace with S3 usage. The application coordinates activities between the S3 bucket and two managed Machine Learning (ML) services, storing the results in an Amazon DynamoDB table.

The ML services used are:

  • Amazon Transcribe, which transcribes audio data into JSON output, using a process called automatic speech recognition. This can understand 31 languages and dialects, and identify different speakers in a customer support call.
  • Amazon Comprehend, which offers sentiment analysis as one of its core features. This service returns an array of scores to estimate the probability that the input text is positive, negative, neutral, or mixed.

Sample application architecture.

  1. A downstream process, such as a call recording system, stores audio data in the application’s S3 bucket.
  2. When the MP3 objects are stored, this triggers the Transcribe function. The function creates a new job in the Amazon Transcribe service.
  3. When the transcription process finishes, Transcribe stores the JSON result in the same S3 bucket.
  4. This JSON object triggers the Sentiment function. The Sentiment function requests a sentiment analysis from the Comprehend service.
  5. After receiving the sentiment scores, this function stores the results in a DynamoDB table.

There is only one bucket used in the application. The two Lambda functions are triggered by the same bucket, using different object suffixes. This is configured in the SAM template, shown here:

  SentimentFunction:
    ...
      Events:
        FileUpload:
          Type: S3
          Properties:
            Bucket: !Ref InputS3Bucket
            Events: s3:ObjectCreated:*
            Filter: 
              S3Key:
                Rules:
                  - Name: suffix
                    Value: '.json'              

  TranscribeFunction:
    ... 
      Events:
        FileUpload:
          Type: S3
          Properties:
            Bucket: !Ref InputS3Bucket
            Events: s3:ObjectCreated:*
            Filter: 
              S3Key:
                Rules:
                  - Name: suffix
                    Value: '.mp3'    

Testing the application

To test the application, you need an MP3 audio file containing spoken text. For example, in my testing, I use audio files of a person reading business reviews representing positive, neutral, and negative experiences.

  1. After cloning the GitHub repo, follow the instructions in the README.md file to deploy the application. Note the name of the S3 bucket output in the deployment.SAM deployment CLI output
  2. Upload your test MP3 files using this command in a terminal, replacing your-bucket-name with the deployed bucket name:aws s3 cp .\ s3://your-bucket-name --recursiveOnce executed, your terminal shows the uploaded media files:

    Uploading sample media files.

  3.  Navigate to the Amazon Transcribe console, and choose Transcription jobs in the left-side menu. The MP3 files you uploaded appear here as separate jobs:Amazon Transcribe jobs in progress
  4. Once the Status column shows all pending job as Complete, navigate to the DynamoDB console.
  5. Choose Tables from the left-side menu and select the table created by the deployment. Choose the Items tab:Sentiment scores in the DynamoDB table
    Each MP3 file appears as a separate item with a sentiment rating and a probability for each sentiment category. It also includes the transcript of the audio.

Handling multiple languages

One of the most useful aspects of serverless architecture is the ability to add functionality easily. For call centers handling multiple languages, ideally you should translate to a common language for sentiment scoring. With this application, it’s easy to add an extra step to the process to translate the transcription language to a base language:

Advanced application architecture

A new Translate Lambda function is invoked by the S3 JSON suffix filter and creates text output in a common base language. The sentiment scoring function is triggered by new objects with the suffix TXT.

In this modified case, when the MP3 audio file is uploaded to S3, you can append the language identifier as metadata to the object. For example, to upload an MP3 with a French language identifier using the AWS CLI:

aws s3 cp .\test-audio-fr.mp3 s3://your-bucket --metadata Content-Language=fr-FR

The first Lambda function passes the language identifier to the Transcribe service. In the Transcribe console, the language appears in the new job:

French transcription job complete

After the job finishes, the JSON output is stored in the same S3 bucket. It shows the transcription from the French language audio:

French transcription output

The new Translate Lambda function passes the transcript value into the Amazon Translate service. This converts the French to English and saves the translation as a text file. The sentiment Lambda function now uses the contents of this text file to generate the sentiment scores.

This approach allows you to accept audio in a wide range of spoken languages but standardize your analytics in one base language.

Developing for extensibility

You might want to take action on phone calls that have a negative sentiment score, or publish scores to other applications in your organization. This architecture makes it simple to extend functionality once DynamoDB saves the sentiment scores. By using DynamoDB Streams, you can invoke a Lambda function each time a record is created or updated in the underlying DynamoDB table:

Adding notifications to the application

In this case, the routing function could trigger an email via Amazon SES where the sentiment score is negative. For example, this could email a manager to follow up with the customer. Alternatively, you may choose to publish all scores and results to any downstream application with Amazon EventBridge. By publishing events to the default event bus, you can allow consuming applications to build new functionality without needing any direct integration.

Deferred execution in Amazon Transcribe

The services used in the example application are all highly scalable and highly available, and can handle significant amounts of data. Amazon Transcribe allows up to 100 concurrent transcription jobs – see the service limits and quotas for more information.

The service also provides a mechanism for deferred execution, which allows you to hold jobs in a queue. When the numbering of executing jobs falls below the concurrent execution limit, the service takes the next job from this queue. This effectively means you can submit any number of jobs to the Transcribe service, and it manages the queue and processing automatically.

To use this feature, there are two additional attributes used in the startTranscriptionJob method of the AWS.TranscribeService object. When added to the Lambda handler in the Transcribe function, the code looks like this:

Deferred execution for Amazon Transcribe

After setting AllowDeferredExecution to true, you must also provide an IAM role ARN in the DataAccessRoleArn attribute. For more information on how to use this feature, see the Transcribe documentation for job execution settings.

Conclusion

In this blog post, I show how to transcribe the content of audio files and calculate a sentiment score. This can be useful for organizations wanting to analyze saved audio for customer calls, webinars, or team meetings.

This solution uses Amazon ML services to handle the audio and text analysis, and serverless services like S3 and Lambda to manage the storage and business logic. The serverless application here can scale to handle large amounts of production data. You can also easily extend the application to provide new functionality, built specifically for your organization’s use-case.

To learn more about building serverless applications at scale, visit the AWS Serverless website.

22 New Languages And Variants, 6 New Regions For Amazon Translate

Post Syndicated from Julien Simon original https://aws.amazon.com/blogs/aws/22-new-languages-and-variants-6-new-regions-for-amazon-translate/

Just a few weeks ago, I told you about 7 new languages supported by Amazon Translate, our fully managed service for machine translation. Well, here I am again, announcing no less than 22 new languages and variants, as well as 6 additional AWS Regions where Translate is now available.

Introducing 22 New Languages And Variants
That’s what I call an update! In addition to existing languages, Translate now supports: Afrikaans, Albanian, Amharic, Azerbaijani, Bengali, Bosnian, Bulgarian, Croatian, Dari, Estonian, Canadian French, Georgian, Hausa, Latvian, Pashto, Serbian, Slovak, Slovenian, Somali, Swahili, Tagalog, and Tamil. Congratulations if you can name all countries and regions of origin: I couldn’t!

With these, Translate now supports a total of 54 languages and variants, and 2804 language pairs. The full list is available in the documentation.

Whether you are expanding your retail operations globally like Regatta, analyzing employee surveys like Siemens, or enabling multilingual chat in customer engagement like Verint, the new language pairs will help you further streamline and automate your translation workflows, by delivering fast, high-quality, and affordable language translation.

Introducing 6 New AWS Regions
In addition to existing regions, you can now use Translate in US West (N. California), Europe (London), Europe (Paris), Europe (Stockholm), Asia Pacific (Hong Kong) and Asia Pacific (Sydney). This brings to 17 the number of regions where Translate is available.

This expansion is great news for many customers who will now be able to translate data in the region where it’s stored, without having to invoke the service in another region. Again, this will make workflows simpler, faster, and even more cost-effective.

Using Amazon Translate
In the last post, I showed you how to use Translate with the AWS SDK for C++. In the continued spirit of language diversity, let’s use the SDK for Ruby this time. Just run gem install aws-sdk to install it.

The simple program below opens a text file, then reads and translates one line at a time. As you can see, translating only requires one simple API call. Of course, it’s the same with other programming languages: call an API and get the job done!

require 'aws-sdk'

if ARGV.length != 2
  puts "Usage: translate.rb <filename> <target language code>"
  exit
end

translate = Aws::Translate::Client.new(region: 'eu-west-1')

File.open(ARGV[0], "r") do |f|
  f.each_line do |line|
  	resp = translate.translate_text({
  		text: line,
  		source_language_code: "auto",
  		target_language_code: ARGV[1],
	})
	puts(resp.translated_text)
  end
end

Here’s an extract from “Notes on Structured Programming“, a famous computer science paper published by E.W. Dijkstra in 1972.

In my life I have seen many programming courses that were essentially like the usual kind of driving lessons, in which one is taught how to handle a car instead of how to use a car to reach one’s destination. My point is that a program is never a goal in itself; the purpose of a program is to evoke computations and the purpose of the computations is to establish a desired effect. Although the program is the final product made by the programmer, the possible computations evoked by it – the “making” of which is left to the machine! – are the true subject matter of his trade. For instance, whenever a programmer states that his program is correct, he really makes an assertion about the computations it may evoke.

Let’s translate it to a few languages: how about Albanian, Hausa, Pashto and Tagalog?

$ ruby translate.rb dijkstra.txt sq
Në jetën time kam parë shumë kurse programimi që ishin në thelb si lloji i zakonshëm i mësimeve të vozitjes, në të cilën mësohet se si të merret me një makinë në vend se si të përdorësh një makinë për të arritur destinacionin e dikujt. Pika ime është se një program nuk është kurrë një qëllim në vetvete; qëllimi i një programi është të ndjell llogaritjet dhe qëllimi i llogaritjeve është të krijojë një efekt të dëshiruar. Megjithëse programi është produkti përfundimtar i bërë nga programuesi, llogaritjet e mundshme të evokuara nga ai - “bërja” e të cilit i është lënë makinë! - janë çështja e vërtetë subjekt i tregtisë së tij. Për shembull, sa herë që një programues thotë se programi i tij është i saktë, ai me të vërtetë bën një pohim në lidhje me llogaritjet që mund të ndjell.

$ ruby translate.rb article.txt ha
A rayuwata na ga kwasa-kwasai da dama da suka kasance da gaske kamar irin darussan tuki da aka saba da su, inda ake koya wa mutum yadda zai rike mota maimakon yadda zai yi amfani da mota don kaiwa mutum makoma. Dalilina shi ne, shirin ba shi da wata manufa a kanta; manufar shirin shi ne tayar da komfuta kuma manufar ƙididdigar ita ce kafa tasirin da ake so. Ko da yake shirin shine samfurin karshe da mai shiryawa ya yi, ƙididdigar da za a iya amfani da ita - “yin” wanda aka bar shi zuwa na'ura! - su ne batun gaskiya game da cinikinsa. Alal misali, duk lokacin da mai shiryawa ya ce shirinsa daidai ne, yana yin tabbaci game da ƙididdigar da zai iya fitarwa.

$ ruby translate.rb dijkstra.txt ps
زما په ژوند کې ما د پروګرام کولو ډیری کورسونه لیدلي دي چې په اصل کې د معمول ډول ډول چلولو درسونو په څیر وو، په کوم کې چې دا درس ورکول کیږي چې څنګه د موټر سره معامله وکړي ترڅو د چا منزل ته ورسیږي. زما ټکی دا دی چې یو پروګرام هېڅکله هم په ځان کې هدف نه دی؛ د یوه پروګرام هدف دا دی چې محاسبه راوباسي، او د محاسبې هدف دا دی چې یو مطلوب اثر رامنځته کړي. که څه هم دا پروګرام وروستی محصول دی چې د پروګرام لخوا جوړ شوی، هغه ممکنه حسابونه چې د هغه لخوا رامینځته شوي - د چا «جوړولو» ماشین ته پریښودل کیږي! - اصلي مسله د هغه د سوداګرۍ موضوع ده. د مثال په توګه، کله چې یو پروګرام کوونکی وايي چې د هغه پروګرام سم دی، هغه په حقیقت کې د هغه محاسبې په اړه یو ادعا کوي چې هغه یې کولی شي.

$ ruby translate.rb dijkstra.txt tl
Sa aking buhay nakita ko ang maraming mga kurso sa programming na karaniwang tulad ng karaniwang uri ng mga aralin sa pagmamaneho, kung saan itinuturo kung paano haharapin ang isang kotse upang makapunta sa patutunguhan ng isang tao. Ang aking punto ay ang isang programa ay hindi kailanman naglalayong mismo; ang layunin ng isang programa ay upang gumuhit ng mga kalkulasyon, at ang layunin ng accounting ay upang lumikha ng nais na epekto. Kahit na ang programa ay ang huling produkto na nilikha ng programa, ang mga posibleng kalkulasyon na nilikha niya - na nag-iiwan ng isang tao na “bumuo” ng makina! - Ang pangunahing isyu ay ang paksa ng kanyang negosyo. Halimbawa, kapag sinabi ng isang programmer na tama ang kanyang programa, talagang gumagawa siya ng claim tungkol sa pagkalkula na maaari niyang gawin.

Available Now!
The new languages and the new regions are available today. If you’ve never tried Amazon Translate, did you know that the free tier offers 2 million characters per month for the first 12 months, starting from your first translation request?

Also, which languages should Translate support next? We’re looking forward to your feedback: please post it to the AWS Forum for Amazon Translate, or send it to your usual AWS support contacts.

Julien

New languages for Amazon Translate: Greek, Hungarian, Romanian, Thai, Ukrainian, Urdu and Vietnamese

Post Syndicated from Julien Simon original https://aws.amazon.com/blogs/aws/new-languages-for-amazon-translate-greek-hungarian-romanian-thai-ukrainian-urdu-and-vietnamese/

Technical Evangelists travel quite a lot, and the number one question that we get from customers when presenting Amazon Translate is: “Is my native language supported?“. Well, I’m happy to announce that starting today, we’ll be able to answer “yes” if your language is Greek, Hungarian, Romanian, Thai, Ukrainian, Urdu and Vietnamese. In fact, using Amazon Translate, we could even say “ναί”, “igen”, “da”, “ใช่”, “так”, “جی ہاں” and “có”… hopefully with a decent accent!

With these additions, Amazon Translate now supports 32 languages: Arabic, Chinese (Simplified), Chinese (Traditional), Czech, Danish, Dutch, English, Finnish, French, German, Greek, Hebrew, Hindi, Hungarian, Indonesian, Italian, Japanese, Korean, Malay, Norwegian, Persian, Polish, Portuguese, Romanian, Russian, Spanish, Swedish, Thai, Turkish, Ukrainian, Urdu and Vietnamese.

Between these languages, the service supports 987 translation combinations: you can see the full list of supported language pairs on this documentation page.

Using Amazon Translate
Amazon Translate is extremely simple to use. Let’s quickly test it in the AWS console on one of my favourite poems:

Developers will certainly prefer to invoke the TranslateText API. Here’s an example with the AWS CLI.

$ aws translate translate-text --source-language-code auto --target-language-code hu --text "Les sanglots longs des violons de l’automne blessent mon coeur d’une langueur monotone"
{
"TranslatedText": "Az őszi hegedű hosszú zokogása monoton bágyadtsággal fáj a szívem",
"SourceLanguageCode": "fr",
"TargetLanguageCode": "hu"
}

Of course, this API is also available in any of the AWS SDKs. In the continued spirit of language diversity, how about an example in C++? Here’s a short program translating a text file stored on disk.

#include <aws/core/Aws.h>
#include <aws/core/utils/Outcome.h>
#include <aws/translate/TranslateClient.h>
#include <aws/translate/model/TranslateTextRequest.h>
#include <aws/translate/model/TranslateTextResult.h>

#include <fstream>
#include <iostream>
#include <string>

# define MAX_LINE_LENGTH 5000

int main(int argc, char **argv) {
  if (argc != 4) {
    std::cout << "Usage: translate_text_file 'target language code' 'input file' 'output file'"
         << std::endl;
    return -1;
  }

  const Aws::String target_language = argv[1];
  const std::string input_file = argv[2];
  const std::string output_file = argv[3];

  std::ifstream fin(input_file.c_str(), std::ios::in);
  if (!fin.good()) {
    std::cerr << "Input file is invalid." << std::endl;
    return -1;
  }

  std::ofstream fout(output_file.c_str(), std::ios::out);
  if (!fout.good()) {
    std::cerr << "Output file is invalid." << std::endl;
    return -1;
  }

  Aws::SDKOptions options;
  Aws::InitAPI(options);
  {
    Aws::Translate::TranslateClient translate_client;
    Aws::Translate::Model::TranslateTextRequest request;
    request = request.WithSourceLanguageCode("auto").WithTargetLanguageCode(target_language);

    Aws::String line;
    while (getline(fin, line)) {
      if (line.empty()) {
        continue;
      }

      if (line.length() > MAX_LINE_LENGTH) {
        std::cerr << "Line is too long." << std::endl;
        break;
      }

      request.SetText(line);
      auto outcome = translate_client.TranslateText(request);

      if (outcome.IsSuccess()) {
        auto translation = outcome.GetResult().GetTranslatedText();
        fout << translation << std::endl;
      } else {
        std::cout << "TranslateText error: " << outcome.GetError().GetExceptionName()
             << " - " << outcome.GetError().GetMessage() << std::endl;
        break;
      }
    }
  }
  Aws::ShutdownAPI(options);
}

Once the code has been built, let’s translate the full poem to Thai:

$ translate_text_file th verlaine.txt verlaine-th.txt

$ cat verlaine-th.txt

“เสียงสะอื้นยาวของไวโอลินฤดูใบไม้ร่วงทำร้ายหัวใจของฉันด้วยความอ่อนเพลียที่น่าเบื่อ ทั้งหมดหายใจไม่ออกและซีดเมื่อชั่วโมงดังผมจำได้ว่าวันเก่าและร้องไห้ และฉันไปที่ลมเลวร้ายที่พาฉันออกไปจากที่นี่ไกลกว่าเช่นใบไม้ที่ตายแล้ว” - Paul Verlaine บทกวีของดาวเสาร์

As you can see, it’s extremely simple to integrate Amazon Translate in your own applications. An single API call is really all that it takes!

Available Now!
These new languages are available today in all regions where Amazon Translate is available. The free tier offers 2 million characters per month for the first 12 months, starting from your first translation request.

We’re looking forward to your feedback! Please post it to the AWS Forum for Amazon Translate, or send it to your usual AWS support contacts.

Julien;