Tag Archives: Amazon Translate

AWS Weekly Roundup—Reserve GPU capacity for short ML workloads, Finch is GA, and more—November 6, 2023

Post Syndicated from Marcia Villalba original https://aws.amazon.com/blogs/aws/aws-weekly-roundup-reserve-gpu-capacity-for-short-ml-workloads-finch-is-ga-and-more-november-6-2023/

The year is coming to an end, and there are only 50 days until Christmas and 21 days to AWS re:Invent! If you are in Las Vegas, come and say hi to me. I will be around the Serverlesspresso booth most of the time.

Last week’s launches
Here are some launches that got my attention during the previous week.

Amazon EC2 – Amazon EC2 announced Capacity Blocks for ML. This means that you can now reserve GPU compute capacity for your short-duration ML workloads. Learn more about this launch on the feature page and announcement blog post.

Finch – Finch is now generally available. Finch is an open source tool for local container development on macOS (using Intel or Apple Silicon). It provides a command line developer tool for building, running, and publishing Linux containers on macOS. Learn more about Finch in this blog post written by Phil Estes or on the Finch website.

AWS X-Ray – AWS X-Ray now supports W3C format trace IDs for distributed tracing. AWS X-Ray supports trace IDs generated through OpenTelemetry or any other framework that conforms to the W3C Trace Context specification.

Amazon Translate Amazon Translate introduces a brevity customization to reduce translation output length. This is a new feature that you can enable in your real-time translations where you need a shorter translation to meet caption size limits. This translation is not literal, but it will preserve the underlying message.

AWS IAM IAM increased the actions last accessed to 60 more services. This functionality is very useful when fine-tuning the permissions of the roles, identifying unused permissions, and granting the least amount of permissions that your roles need.

AWS IAM Access AnalyzerIAM Access Analyzer policy generator expanded support to identify over 200 AWS services to help you create fine-grained policies based on your AWS CloudTrail access activity.

For a full list of AWS announcements, be sure to keep an eye on the What’s New at AWS page.

Other AWS news
Some other news and blog posts that you may have missed:

AWS Compute BlogDaniel Wirjo and Justin Plock wrote a very interesting article about how you can send and receive webhooks on AWS using different AWS serverless services. This is a good read if you are working with webhooks on your application, as it not only shows you how to build these solutions but also what considerations you should have when building them.

AWS Storage Blog Bimal Gajjar and Andrew Peace wrote a very useful blog post about how to handle event ordering and duplicate events with Amazon S3 Event Notifications. This is a common challenge for many customers.

Amazon Science BlogDavid Fan wrote an article about how to build better foundation models for video representation. This article is based on a paper that Prime Video presented at a conference about this topic.

The Official AWS Podcast – Listen each week for updates on the latest AWS news and deep dives into exciting use cases. There are also official AWS podcasts in several languages. Check out the ones in FrenchGermanItalian, and Spanish.

AWS open-source news and updates – This is a newsletter curated by my colleague Ricardo to bring you the latest open source projects, posts, events, and more.

Upcoming AWS events
Check your calendars and sign up for these AWS events:

AWS Community Days – Join a community-led conference run by AWS user group leaders in your region: Ecuador (November 7), Mexico (November 11), Montevideo (November 14), Central Asia (Kazakhstan, Uzbekistan, Kyrgyzstan, and Mongolia on November 17–18), and Guatemala (November 18).

AWS re:Invent (November 27–December 1) – Join us to hear the latest from AWS, learn from experts, and connect with the global cloud community. Browse the session catalog and attendee guides and check out the highlights for generative artificial intelligence (AI).

That’s all for this week. Check back next Monday for another Weekly Roundup!

— Marcia

This post is part of our Weekly Roundup series. Check back each week for a quick roundup of interesting news and announcements from AWS!

Build a multilingual dashboard with Amazon Athena and Amazon QuickSight

Post Syndicated from Francesco Marelli original https://aws.amazon.com/blogs/big-data/build-a-multilingual-dashboard-with-amazon-athena-and-amazon-quicksight/

Amazon QuickSight is a serverless business intelligence (BI) service used by organizations of any size to make better data-driven decisions. QuickSight dashboards can also be embedded into SaaS apps and web portals to provide interactive dashboards, natural language query or data analysis capabilities to app users seamlessly. The QuickSight Demo Central contains many dashboards, feature showcase and tips and tricks that you can use; in the QuickSight Embedded Analytics Developer Portal you can find details on how to embed dashboards in your applications.

The QuickSight user interface currently supports 15 languages that you can choose on a per-user basis. The language selected for the user interface localizes all text generated by QuickSight with respect to UI components and isn’t applied to the data displayed in the dashboards.

This post describes how to create multilingual dashboards at the data level by creating new columns that contain the translated text and providing a language selection parameter and associated control to display data in the selected language in a QuickSight dashboard. You can create new columns with the translated text in several ways; in this post we create new columns using Amazon Athena user-defined functions implemented in the GitHub project sample Amazon Athena UDFs for text translation and analytics using Amazon Comprehend and Amazon Translate. This approach makes it easy to automatically create columns with translated text using neural machine translation provided by Amazon Translate.

Solution overview

The following diagram illustrates the architecture of this solution.

Architecture

For this post, we use the sample SaaS-Sales.csv dataset and follow these steps:

  1. Copy the dataset to a bucket in Amazon Simple Storage Service (Amazon S3).
  2. Use Athena to define a database and table to read the CSV file.
  3. Create a new table in Parquet format with the columns with the translated text.
  4. Create a new dataset in QuickSight.
  5. Create the parameter and control to select the language.
  6. Create dynamic multilingual calculated fields.
  7. Create an analysis with calculated multilingual calculated fields.
  8. Publish the multilingual dashboard.
  9. Create parametric headers and titles for visuals for use in an embedded dashboard.

An alternative approach might be to directly upload the CSV dataset to QuickSight and create the new columns with translated text as QuickSight calculated fields, for example using the ifelse() conditional function to directly assign the translated values.

Prerequisites

To follow the steps in this post, you need to have an AWS account with an active QuickSight Standard Edition or Enterprise Edition subscription.

Copy the dataset to a bucket in Amazon S3

Use the AWS Command Line Interface (AWS CLI) to create the S3 bucket qs-ml-blog-data and copy the dataset under the prefix saas-sales in your AWS account. You must follow the bucket naming rules to create your bucket. See the following code:

$ MY_BUCKET=qs-ml-blog-data
$ PREFIX=saas-sales

$ aws s3 mb s3://${MY_BUCKET}/

$ aws s3 cp \
    "s3://ee-assets-prod-us-east-1/modules/337d5d05acc64a6fa37bcba6b921071c/v1/SaaS-Sales.csv" \
    "s3://${MY_BUCKET}/${PREFIX}/SaaS-Sales.csv" 

Define a database and table to read the CSV file

Use the Athena query editor to create the database qs_ml_blog_db:

CREATE DATABASE IF NOT EXISTS qs_ml_blog_db;

Then create the new table qs_ml_blog_db.saas_sales:

CREATE EXTERNAL TABLE IF NOT EXISTS qs_ml_blog_db.saas_sales (
  row_id bigint, 
  order_id string, 
  order_date string, 
  date_key bigint, 
  contact_name string, 
  country_en string, 
  city_en string, 
  region string, 
  subregion string, 
  customer string, 
  customer_id bigint, 
  industry_en string, 
  segment string, 
  product string, 
  license string, 
  sales double, 
  quantity bigint, 
  discount double, 
  profit double)
ROW FORMAT DELIMITED 
  FIELDS TERMINATED BY ',' 
STORED AS INPUTFORMAT 
  'org.apache.hadoop.mapred.TextInputFormat' 
OUTPUTFORMAT 
  'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION
  's3://<MY_BUCKET>/saas-sales/'
TBLPROPERTIES (
  'areColumnsQuoted'='false', 
  'classification'='csv', 
  'columnsOrdered'='true', 
  'compressionType'='none', 
  'delimiter'=',', 
  'skip.header.line.count'='1', 
  'typeOfData'='file')

Create a new table in Parquet format with the columns with the translated text

We want to translate the columns country_en, city_en, and industry_en to German, Spanish, and Italian. To do this in a scalable and flexible way, we use the GitHub project sample Amazon Athena UDFs for text translation and analytics using Amazon Comprehend and Amazon Translate.

After you set up the user-defined functions following the instructions in the GitHub repo, run the following SQL query in Athena to create the new table qs_ml_blog_db.saas_sales_ml with the translated columns using the translate_text user-defined function and some other minor changes:

CREATE TABLE qs_ml_blog_db.saas_sales_ml WITH (
    format = 'PARQUET',
    parquet_compression = 'SNAPPY',
    external_location = 's3://<MY_BUCKET>/saas-sales-ml/'
) AS 
USING EXTERNAL FUNCTION translate_text(text_col VARCHAR, sourcelang VARCHAR, targetlang VARCHAR, terminologyname VARCHAR) RETURNS VARCHAR LAMBDA 'textanalytics-udf'
SELECT 
row_id,
order_id,
date_parse("order_date",'%m/%d/%Y') as order_date,
date_key,
contact_name,
country_en,
translate_text(country_en, 'en', 'de', NULL) as country_de,
translate_text(country_en, 'en', 'es', NULL) as country_es,
translate_text(country_en, 'en', 'it', NULL) as country_it,
city_en,
translate_text(city_en, 'en', 'de', NULL) as city_de,
translate_text(city_en, 'en', 'es', NULL) as city_es,
translate_text(city_en, 'en', 'it', NULL) as city_it,
region,
subregion,
customer,
customer_id,
industry_en,
translate_text(industry_en, 'en', 'de', NULL) as industry_de,
translate_text(industry_en, 'en', 'es', NULL) as industry_es,
translate_text(industry_en, 'en', 'it', NULL) as industry_it,
segment,
product,
license,
sales,
quantity,
discount,
profit
FROM qs_ml_blog_db.saas_sales
;

Run three simple queries, one per column, to check the generation of the new columns with the translation was successful. We include a screenshot after each query showing its results.

SELECT 
distinct(country_en),
country_de,
country_es,
country_it
FROM qs_ml_blog_db.saas_sales_ml 
ORDER BY country_en
limit 10
;

Original and translated values for column Country

SELECT 
distinct(city_en),
city_de,
city_es,
city_it
FROM qs_ml_blog_db.saas_sales_ml 
ORDER BY city_en
limit 10
;

Original and translated values for column City

SELECT 
distinct(industry_en),
industry_de,
industry_es,
industry_it
FROM qs_ml_blog_db.saas_sales_ml 
ORDER BY industry_en
limit 10
;

Original and translated values for column Industry

Now you can use the new table saas_sales_ml as input to create a dataset in QuickSight.

Create a dataset in QuickSight

To create your dataset in QuickSight, complete the following steps:

  1. On the QuickSight console, choose Datasets in the navigation pane.
  2. Choose Create a dataset.
  3. Choose Athena.
  4. For Data source name¸ enter athena_primary.
  5. For Athena workgroup¸ choose primary.
  6. Choose Create data source.
    New Athena data source
  7. Select the saas_sales_ml table previously created and choose Select.
    Choose your table
  8. Choose to import the table to SPICE and choose Visualize to start creating the new dashboard.
    Finish dataset creation

In the analysis section, you receive a message that informs you that the table was successfully imported to SPICE.

SPICE import complete

Create the parameter and control to select the language

To create the parameter and associate the control that you use to select the language for the dashboard, complete the following steps:

  1. In the analysis section, choose Parameters and Create one.
  2. For Name, enter Language.
  3. For Data type, choose String.
  4. For Values, select Single value.
  5. For Static default value, enter English.
  6. Choose Create.
    Create new parameter
  7. To connect the parameter to a control, choose Control.
    Connect parameter to control
  8. For Display name, choose Language.
  9. For Style, choose Dropdown.
  10. For Values, select Specific values.
  11. For Define specific values, enter English, German, Italian, and French (one value per line).
  12. Select Hide Select all option from the control values if the parameter has a default configured.
  13. Choose Add.
    Define control properties

The control is now available, linked to the parameter and displayed in the Controls section of the current sheet in the analysis.

Language control preview

Create dynamic multilingual calculated fields

You’re now ready to create the calculated fields whose value will change based on the currently selected language.

  1. In the menu bar, choose Add and choose Add calculated field.
    Add calculated field
  2. Use the ifelse conditional function to evaluate the value of the Language parameter and select the correct column in the dataset to assign the value to the calculated field.
  3. Create the Country calculated field using the following expression:
    ifelse(
        ${Language} = 'English', {country_en},
        ${Language} = 'German', {country_de},
        ${Language} = 'Italian', {country_it},
        ${Language} = 'Spanish', {country_es},
        {country_en}
    )

  4. Choose Save.
    Calculated field definition in Amazon QuickSight
  5. Repeat the process for the City calculated field:
    ifelse(
        ${Language} = 'English', {city_en},
        ${Language} = 'German', {city_de},
        ${Language} = 'Italian', {city_it},
        ${Language} = 'Spanish', {city_es},
        {city_en}
    )

  6. Repeat the process for the Industry calculated field:
    ifelse(
        ${Language} = 'English', {industry_en},
        ${Language} = 'German', {industry_de},
        ${Language} = 'Italian', {industry_it},
        ${Language} = 'Spanish', {industry_es},
        {industry_en}
    )

The calculated fields are now available and ready to use in the analysis.

Calculated fields available in analysis

Create an analysis with calculated multilingual calculated fields

Create an analysis with two donut charts and a pivot table that use the three multilingual fields. In the subtitle of the visuals, use the string Language: <<$Language>> to display the currently selected language. The following screenshot shows our analysis.

Analysis with Language control - English

If you choose a new language from the Language control, the visuals adapt accordingly. The following screenshot shows the analysis in Italian.

Analysis with Language control - Italian

You’re now ready to publish the analysis as a dashboard.

Publish the multilingual dashboard

In the menu bar, choose Share and Publish dashboard.

Publish dashboard menu

Publish the new dashboard as “Multilingual dashboard,” leave the advanced publish options at their default values, and choose Publish dashboard.

Publish dashboard with name

The dashboard is now ready.

Published dashboard

We can take the multilingual features one step further by embedding the dashboard and controlling the parameters in the external page using the Amazon QuickSight Embedding SDK.

Create parametric headers and titles for visuals for use in an embedded dashboard

When embedding an QuickSight dashboard, the locale and parameters’ values can be set programmatically from JavaScript. This can be useful to set default values and change the settings for localization and the default data language. The following steps show how to use these features by modifying the dashboard we have created so far, embedding it in an HTML page, and using the Amazon QuickSight Embedding SDK to dynamically set the value of parameters used to display titles, legends, headers, and more in translated text. The full code for the HTML page is also provided in the appendix of this post.

Create new parameters for the titles and the headers of the visuals in the analysis, the sheet name, visuals legends, and control labels as per the following table.

Name Data type Values Static default value
city String Single value City
country String Single value Country
donut01title String Single value Sales by Country
donut02title String Single value Quantity by Industry
industry String Single value Industry
Language String Single value English
languagecontrollabel String Single value Language
pivottitle String Single value Sales by Country, City and Industy
sales String Single value Sales
sheet001name String Single value Summary View

The parameters are now available on the Parameters menu.

You can now use the parameters inside each sheet title, visual title, legend title, column header, axis label, and more in your analysis. The following screenshots provide examples that illustrate how to insert these parameters into each title.

First, we insert the sheet name.

Then we add the language control name.

We edit the donut charts’ titles.

Donut chart title

We also add the donut charts’ legend titles.

Donut chart legend title

In the following screenshot, we specify the pivot table row names.

Pivot table row names

We also specify the pivot table column names.

Pivot table column names

Publish the analysis to a new dashboard and follow the steps in the post Embed interactive dashboards in your apps and portals in minutes with Amazon QuickSight’s new 1-click embedding feature to embed the dashboard in an HTML page hosted in your website or web application.

The example HTML page provided in the appendix of this post contains one control to switch among the four languages you created in the dataset in the previous sections with the option to automatically sync the QuickSight UI locale when changing the language, and one control to independently change the UI locale as required.

The following screenshots provide some examples of combinations of data language and QuickSight UI locale.

The following is an example of English data language with the English QuickSight UI locale.

Embedded dashboard with English language and English locale

The following is an example of Italian data language with the synchronized Italian QuickSight UI locale.

Embedded dashboard with Italian language and synced Italian locale

The following screenshot shows German data language with the Japanese QuickSight UI locale.

Embedded dashboard with German language and Japanese locale

Conclusion

This post demonstrated how to automatically translate data using machine learning and build a multilingual dashboard with Athena, QuickSight, and Amazon Translate, and how to add advanced multilingual features with QuickSight embedded dashboards. You can use the same approach to display different values for dimensions as well as metrics depending on the values of one or more parameters.

QuickSight provides a 30-day free trial subscription for four users; you can get started immediately. You can learn more and ask questions about QuickSight in the Amazon QuickSight Community.

Appendix: Embedded dashboard host page

The full code for the HTML page is as follows:

<!DOCTYPE html>
<html>
    <head>
        <title>Amazon QuickSight Multilingual Embedded Dashboard</title>
        <script src="https://unpkg.com/[email protected]/dist/quicksight-embedding-js-sdk.min.js"></script>
        <script type="text/javascript">

            var url = "https://<<YOUR_AMAZON_QUICKSIGHT_REGION>>.quicksight.aws.amazon.com/sn/embed/share/accounts/<<YOUR_AWS_ACCOUNT_ID>>/dashboards/<<DASHBOARD_ID>>?directory_alias=<<YOUR_AMAZON_QUICKSIGHT_ACCOUNT_NAME>>"
            var defaultLanguageOptions = 'en_US'
            var dashboard

            var trns = {
                en_US: {
                    locale: "en-US",
                    language: "English",
                    languagecontrollabel: "Language",
                    sheet001name: "Summary View",
                    sales: "Sales",
                    country: "Country",
                    city: "City",
                    industry: "Industry",
                    quantity: "Quantity",
                    by: "by",
                    and: "and"
                },
                de_DE: {
                    locale: "de-DE",
                    language: "German",
                    languagecontrollabel: "Sprache",
                    sheet001name: "Zusammenfassende Ansicht",
                    sales: "Umsätze",
                    country: "Land",
                    city: "Stadt",
                    industry: "Industrie",
                    quantity: "Anzahl",
                    by: "von",
                    and: "und"
                },
                it_IT: {
                    locale: "it-IT",
                    language: "Italian",
                    languagecontrollabel: "Lingua",
                    sheet001name: "Prospetto Riassuntivo",
                    sales: "Vendite",
                    country: "Paese",
                    city: "Città",
                    industry: "Settore",
                    quantity: "Quantità",
                    by: "per",
                    and: "e"
                },
                es_ES: {
                    locale: "es-ES",
                    language: "Spanish",
                    languagecontrollabel: "Idioma",
                    sheet001name: "Vista de Resumen",
                    sales: "Ventas",
                    country: "Paìs",
                    city: "Ciudad",
                    industry: "Industria",
                    quantity: "Cantidad",
                    by: "por",
                    and: "y"
                }
            }

            function setLanguageParameters(l){

                return {
                            Language: trns[l]['language'],
                            languagecontrollabel: trns[l]['languagecontrollabel'],
                            sheet001name: trns[l]['sheet001name'],
                            donut01title: trns[l]['sales']+" "+trns[l]['by']+" "+trns[l]['country'],
                            donut02title: trns[l]['quantity']+" "+trns[l]['by']+" "+trns[l]['industry'],
                            pivottitle: trns[l]['sales']+" "+trns[l]['by']+" "+trns[l]['country']+", "+trns[l]['city']+" "+trns[l]['and']+" "+trns[l]['industry'],
                            sales: trns[l]['sales'],
                            country: trns[l]['country'],
                            city: trns[l]['city'],
                            industry: trns[l]['industry'],
                        }
            }

            function embedDashboard(lOpts, forceLocale) {

                var languageOptions = defaultLanguageOptions
                if (lOpts) languageOptions = lOpts

                var containerDiv = document.getElementById("embeddingContainer");
                containerDiv.innerHTML = ''

                parameters = setLanguageParameters(languageOptions)

                if(!forceLocale) locale = trns[languageOptions]['locale']
                else locale = forceLocale

                var options = {
                    url: url,
                    container: containerDiv,
                    parameters: parameters,
                    scrolling: "no",
                    height: "AutoFit",
                    loadingHeight: "930px",
                    width: "1024px",
                    locale: locale
                };

                dashboard = QuickSightEmbedding.embedDashboard(options);
            }

            function onLangChange(langSel) {

                var l = langSel.value

                if(!document.getElementById("changeLocale").checked){
                    dashboard.setParameters(setLanguageParameters(l))
                }
                else {
                    var selLocale = document.getElementById("locale")
                    selLocale.value = trns[l]['locale']
                    embedDashboard(l)
                }
            }

            function onLocaleChange(obj) {

                var locl = obj.value
                var lang = document.getElementById("lang").value

                document.getElementById("changeLocale").checked = false
                embedDashboard(lang,locl)

            }

            function onSyncLocaleChange(obj){

                if(obj.checked){
                    var selLocale = document.getElementById('locale')
                    var selLang = document.getElementById('lang').value
                    selLocale.value = trns[selLang]['locale']
                    embedDashboard(selLang, trns[selLang]['locale'])
                }            
            }

        </script>
    </head>

    <body onload="embedDashboard()">

        <div style="text-align: center; width: 1024px;">
            <h2>Amazon QuickSight Multilingual Embedded Dashboard</h2>

            <span>
                <label for="lang">Language</label>
                <select id="lang" name="lang" onchange="onLangChange(this)">
                    <option value="en_US" selected>English</option>
                    <option value="de_DE">German</option>
                    <option value="it_IT">Italian</option>
                    <option value="es_ES">Spanish</option>
                </select>
            </span>

            &nbsp;-&nbsp;
            
            <span>
                <label for="changeLocale">Sync UI Locale with Language</label>
                <input type="checkbox" id="changeLocale" name="changeLocale" onchange="onSyncLocaleChange(this)">
            </span>

            &nbsp;|&nbsp;

            <span>
                <label for="locale">QuickSight UI Locale</label>
                <select id="locale" name="locale" onchange="onLocaleChange(this)">
                    <option value="en-US" selected>English</option>
                    <option value="da-DK">Dansk</option>
                    <option value="de-DE">Deutsch</option>
                    <option value="ja-JP">日本語</option>
                    <option value="es-ES">Español</option>
                    <option value="fr-FR">Français</option>
                    <option value="it-IT">Italiano</option>
                    <option value="nl-NL">Nederlands</option>
                    <option value="nb-NO">Norsk</option>
                    <option value="pt-BR">Português</option>
                    <option value="fi-FI">Suomi</option>
                    <option value="sv-SE">Svenska</option>
                    <option value="ko-KR">한국어</option>
                    <option value="zh-CN">中文 (简体)</option>
                    <option value="zh-TW">中文 (繁體)</option>            
                </select>
            </span>
        </div>

        <div id="embeddingContainer"></div>

    </body>

</html>

About the Author

Author Francesco MarelliFrancesco Marelli is a principal solutions architect at Amazon Web Services. He is specialized in the design and implementation of analytics, data management, and big data systems. Francesco also has a strong experience in systems integration and design and implementation of applications. He is passionate about music, collecting vinyl records, and playing bass.

Build a multi-language notification system with Amazon Translate and Amazon Pinpoint

Post Syndicated from Praveen Allam original https://aws.amazon.com/blogs/architecture/build-a-multi-language-notification-system-with-amazon-translate-and-amazon-pinpoint/

Organizations with global operations can struggle to notify their customers of any business-related announcements or notifications in different languages. Their customers want to receive notifications in their local language and communication preference. Organizations often rely on complicated third-party services or individuals to manually translate the notifications. This can lead to a loss of revenue due to delayed communication and additional operational expenses.

This blog post demonstrates how to build a straightforward, cost-effective, and scalable multi-language notification system using AWS Serverless technologies. You can post a business-related announcement or notification in English, and based on the customer profile data, it will convert this announcement or notification into different languages. Additionally, the system will also deliver these translated announcements or notifications as an email, voice, or SMS.

Example of a multi-language notification use case

A restaurant franchise company is adding a new item to their menu and plans to release it in North America, Germany, and France. The corporate office has decided to send the following notification.

The company is adding a new item to the menu, and this will go live by May 10. Please ensure you are prepared for this change and plan accordingly.

The franchise owners in Germany want to receive the notifications in the German language, whereas the franchise owners in France want to receive it in French. North American franchises want to receive it in English.

Solution design for multi-language notification system

The solution in Figure 1 demonstrates how to build a multi-language notification system using Amazon Translate and Amazon Pinpoint.

AWS Serverless technologies handle automatic scaling, have built-in high availability architecture, and a pay-for-use billing model, which increases agility and optimizes costs. The system built with this solution is invoked using REST API endpoints. Once this solution is deployed, it can be integrated with any frontend application where users can log in and send out notification events.

Figure 1 illustrates the architecture of this solution.

Solution architecture for multi-language notification system. It includes all the AWS services that are required in this solution. The flow is described as follows.

Figure 1. Solution architecture for multi-language notification system

1. The restaurant franchise will log in to their UI to type the notification message in English. Upon submission, the notification message is sent to the Amazon API Gateway REST endpoint.
Note: In this solution, there is no UI available. You will use a terminal to submit the message.

2. Amazon API Gateway will send this message to Amazon Simple Queue Service (SQS), which will keep the HTTP requests asynchronous.

3. The SQS queue will invoke the SQS AWS Lambda function.

4. The SQS Lambda function invokes the AWS Step Functions state machine. This SQS Lambda function is used as a proxy mechanism to start the state machine workflow. AWS Step Functions are used to orchestrate the notification workflow process. The workflow process validates the message, converts it into different languages, and notifies the customers in their preferred way of communication (email, voice, or SMS). It also handles errors if any of the steps fail by using SQS dead-letter queue.

5. The message entered must be validated in order to ensure that the organizational standards are followed. To perform the message validation, we use the Amazon Comprehend service. Comprehend’s Sentiment analysis will determine whether to send or flag the message. All flagged messages are sent for review.

  • In the example use case message preceding, the message sentiment neutral score is 0.85 confidence. If you set the acceptable score to anything greater than 0.5 confidence, then it is a valid message. Once it passes the validation step, the workflow will proceed to the next step.
  • If the message is vague or not clear, the sentiment score might be less than 0.5 confidence. For example, if this is the message used: We are adding a dish; be ready for it, the sentiment score might be only 0.45 confidence. This is under the acceptable score, and the message will not be processed further.

6. After the message is successfully validated, the message is translated into various languages depending on the customers’ profiles. The Translate Lambda function determines the number of unique languages by referring to the customer profile data in the Amazon DynamoDB table. The function then uses Amazon Translate to translate the message to the different languages required for that notification event. In our example use case, the converted messages will look as follows:

  • German (de):

Das Unternehmen fügt dem Menü einen neuen Punkt hinzu, der bis zum 10. Mai live geschaltet wird. Bitte stellen Sie sicher, dass Sie auf diese Änderung vorbereitet sind und planen Sie entsprechend.

  • French (fr):

La société ajoute un nouvel article au menu, qui sera mis en ligne d’ici le 10 mai. Assurez-vous d’être prêt pour ce changement et de planifier en conséquence.

7. The last step in the workflow is to build the notification logic and deliver the notifications. The Amazon Pinpoint Lambda function retrieves the customer’s profile from the Amazon DynamoDB table. It then parses each record for a given notification event to find out the delivery mode (email, voice, or SMS message). The function then builds the notification logic using Amazon Pinpoint. Amazon Pinpoint notifies each customer either by email, voice, or SMS.

Code repository

The code for this solution is available on GitHub. Review the README file for detailed instructions on how to download and run the solution in your AWS account.

Conclusion

Organizations that operate on an international basis often struggle to build a multi-language notification system to communicate any business-related announcements or notifications to their customers in different languages. Communicating these announcements or notifications in a variety of formats such as email, voice, and SMS can be time-consuming. Our solution addresses these challenges using AWS services with fewer steps than traditional third-party options. This solution also features automatic scaling, built-in high availability, and a pay-for-use billing model to increase agility and optimize costs. These technologies not only decrease infrastructure management tasks like capacity provisioning and patching, but provides for a better customer experience.

Further reading:

Field Notes: How to Prepare Large Text Files for Processing with Amazon Translate and Amazon Comprehend

Post Syndicated from Veeresh Shringari original https://aws.amazon.com/blogs/architecture/field-notes-how-to-prepare-large-text-files-for-processing-with-amazon-translate-and-amazon-comprehend/

Biopharmaceutical manufacturing is a highly regulated industry where deviation documents are used to optimize manufacturing processes. Deviation documents in biopharmaceutical manufacturing processes are geographically diverse, spanning multiple countries and languages. The document corpus is complex, with additional requirements for complete encryption. Therefore, to reduce downtime and increase process efficiency, it is critical to automate the ingestion and understanding of deviation documents. For this workflow, a large biopharma customer needed to translate and classify documents at their manufacturing site.

The customer’s challenge included translation and classification of paragraph-sized text documents into statement types. First, the tokenizer previously used was failing for certain languages. Second, post-tokenization, big paragraphs were needed to be sliced into sizes smaller than 5,000 bytes to facilitate consumption into Amazon Translate and Amazon Comprehend. Because each sentence and paragraphs were of differing sizes, the customer needed to slice them so that each sentence and paragraph did not lose their context and meaning.

This blog post describes a solution to tokenize text documents into appropriate-sized chunks for easy consumption by Amazon Translate and Amazon Comprehend.

Overview of solution

The solution is divided into the following steps. Text data coming from the AWS Glue output is transformed and stored in Amazon Simple Storage Service (Amazon S3) in a .txt file. This transformed data is passed into the sentence tokenizer with slicing and encryption using AWS Key Management Service (AWS KMS). This data is now ready to be fed into Amazon Translate and Amazon Comprehend, and then to a Bidirectional Encoder Representations from Transformers (BERT) model for clustering. All of the models are developed and managed in Amazon SageMaker.

Prerequisites

For this walkthrough, you should have the following prerequisites:

The architecture in Figure 1 shows a complete document classification and clustering workflow running the sentence tokenizer solution (step 4) as an input to Amazon Translate and Amazon Comprehend. The complete architecture also uses AWS Glue crawlers, Amazon Athena, Amazon S3 , AWS KMS, and SageMaker.

Figure 1. Higher level architecture describing use of the tokenizer in the system

Figure 1. Higher level architecture describing use of the tokenizer in the system

Solution steps

  1. Ingest the streaming data from the daily pharma supply chain incidents from the AWS Glue crawlers and Athena-based view tables. AWS Glue is used for ETL (extract, transform, and load), while Athena helps to analyze the data in Amazon S3 for its integrity.
  2. Ingest the streaming data into Amazon S3, which is AWS KMS encrypted. This limits any unauthorized access to the secured files, as required for the healthcare domain.
  3. Enable the CloudWatch logs. CloudWatch logs help to store, monitor, and access error messages logged by SageMaker.
  4. Open the SageMaker notebook using AWS console, and navigate to the integrated development environment (IDE) with Python notebook.

Solution description

Initialize the Amazon S3 client, and enable the get_execution role.

Figure 2. Code sample to initialize Amazon S3 Client execution roles

Figure 3 shows the code for tokenizing large paragraphs into sentences. This helps to feed a sentence of 5,000 byte chunks to Amazon Translate and Amazon Comprehend. Additionally, in the regulated environment, data at rest and in transition, is encrypted using AWS KMS (using S3 IO object) before chunking into 5,000-byte size files using last-in-first-out (LIFO) process.

Figure 3. Code sample with file chunking function and AWS KMS encryption

Figure 3. Code sample with file chunking function and AWS KMS encryption

Figure 4 shows the function for writing the file chunks to objects in Amazon S3, and objects are AWS KMS encrypted.

Figure 4. Code sample for writing chunked 5,000-byte sized data to Amazon S3

Code sample

The following example code details the tokenizer and chunking tool which we subsequently run through SageMaker:
https://gitlab.aws.dev/shringv/aws-samples-aws-tokenizer-slicing-file

Cleaning up

To avoid incurring future charges, delete the resources (like S3 objects) used for the practice files after you have completed implementation of the solution.

Conclusion

In this blog post, we presented a solution which incorporates sentence-level tokenization with rules governing expected sentence size. The solution includes automation scripts to reduce bigger files into smaller chunked sizes of 5,000 bytes to facilitate Amazon Translate and Amazon Comprehend. The solution is effective for tokenizing and chunking complex environments with multi-language files. Furthermore, the solution uses file exchange security by using AWS KMS, as required by regulated industries.

Field Notes provides hands-on technical guidance from AWS Solutions Architects, consultants, and technical account managers, based on their experiences in the field solving real-world business problems for customers.

Speed Up Translation Jobs with a Fully Automated Translation System Assistant

Post Syndicated from Narcisse Zekpa original https://aws.amazon.com/blogs/architecture/speed-up-translation-jobs-with-a-fully-automated-translation-system-assistant/

Like other industries, translation and localization companies face the challenge of providing fast delivery at a low cost. To address this challenge, organizations use Machine Translation (MT) to complement their translator teams. MT is the use of automated software that translates text without the need of human involvement.

One of the most recent advancements is Active Custom Translation (ACT). ACT helps tailor translated text to a specific language style or terminology, per customer specifications. In the past, organizations built custom models to include ACT in their translation system. Amazon Translate has an Active Custom Translation feature, which helps customers integrate configurable MT capabilities into their translation systems, without needing to build it themselves.

This blog describes an end-to-end automated translation flow, including guidelines to manage the data involved in the ACT process. The solution combines Amazon Translate with other Amazon Web Services (AWS) such as AWS DataSync and AWS Lambda. Before exploring this architecture, let’s explain a few basic concepts specific to the translation and localization industry.

Standard translation concepts

Translation Memory. It is common to reuse previously generated outputs as components for machine translation systems. This data is commonly called Translation Memory, and is stored and exchanged according to standardized formats (TMX, TSV, or CSV).

Source Text. Translation input data is commonly exchanged as XML Localization Interchange File Format (XLIFF) documents. Amazon Translate recently added the support of XLIFF documents for batch processing.

Figure 1 illustrates a standard translation flow involving machine translation and translation memory. Once the output has been reviewed and finalized, it is part of the company’s intellectual property (IP). It can then be reincorporated into the flywheel as an input to future translation jobs.

Figure 1: Translation workflow using machine translation

Figure 1: Translation workflow using machine translation

Translation assistant solution walkthrough

When using Amazon Translate in batch mode, you must:

  • Gather together and make translation input data available to the Translation job
  • Monitor the processing and retrieval of the output
  • Implement improvised processes to integrate your Translation Management System (TMS) with AWS, as needed

As you can see, this can involve many manual steps. You must download huge files, upload them into Amazon Simple Storage Service (S3), and configure jobs. The solution shown in Figure 2 illustrates these automation activities.

Figure 2: Automated batch ACT translation solution architecture

Figure 2: Automated batch ACT translation solution architecture

Translation automation activities:

  1. Upload the translation job input data (source files, custom terminology, translation memory files).
  2. Initiate the preprocessing step. Scan input files and identify language pairs.
  3. Create an Amazon Simple Queue Service (SQS) message per language pairs and translation project.
  4. Create S3 buckets and prefixes for each translation job.
  5. Create an Amazon Translate job.
  6. Initiate a post-processing workflow, see Figure 3 (AWS Step Functions).
  7. Copy the Translation output into the output bucket.
  8. Publish an Amazon SNS notification to inform on job completion status.
  9. Download translated files back into customer environment.

In this scenario, translators are operating from their company’s internal infrastructure, although their TMS can also be hosted on the cloud. They first collect the translation input data from their TMS and drop the files onto a shared file server. These files can be XLIFF, TMX, or CSV. We use AWS DataSync to orchestrate and initiate the data transfer from on-premises into an Amazon S3 staging bucket. AWS DataSync provides a few advantages:

  • A low code solution that manages the upload/download of translation data from/to AWS
  • The ability to schedule the synchronization for both upstream and downstream and control the frequency. This allows for batching translation jobs and optimizes usage and cost for Amazon Translate
  • A single point of access to translation data, which reduces the need to manage user accounts and grants access to the data

Once the files are uploaded into the input bucket, DataSync generates an event through Amazon EventBridge. This notification invokes an AWS Lambda function that pushes a message into an Amazon SQS queue. The message contains the list of files to be translated in the current batch. SQS decouples the data upload from the actual processing. Using this workflow provides scalability, service quota limit control, and better error handling.

The queue initiates another Lambda function that creates a file hierarchy in S3 for each translation job. File-naming conventions can be used as a key to separate jobs from each other. The function also prepares translation memory and custom terminology when required. Lastly, it creates and submits the translation job.

The post-processing AWS Step Functions workflow

Amazon Translate is able to generate events into EventBridge upon job completion or failure. We use this capability to invoke a post-processing AWS Step Functions workflow. For instance, some customers must flag machine translated segments within an XLIFF file, so their translators can quickly identify them for manual review.

The flow implemented in the state machine does the following (shown in Figure 3):

  • Verifies output of Amazon Translate. Checks for completeness, confirms all segments successfully translated
  • Enriches the translation data. Flags machine translated segments by comparing input and output
  • Copies output to staging bucket. Prepares for final upload
  • Sends SNS notifications to alert operators. Notifies that the batch is complete
Figure 3: Post-processing Step Functions workflow

Figure 3: Post-processing Step Functions workflow

This solution is entirely serverless, which frees you from maintaining the infrastructure or software platform. You can focus on the core business logic, and what really differentiates you from your competitors.

As the number of translation projects grow overtime, you can also take advantage of Amazon S3 storage classes to optimize document archiving. A translation service provider can define specific rules per customer or per project. These rules can be configured automatically as the data is copied into S3. The result is that files can be transferred to cheaper storage tiers with predefined retention periods.

Conclusion

In this blog, we’ve described a solution that helps you automate the collection and transfer of translation data. It also assists in the scheduling and orchestration of translation jobs. This leads to greater productivity, reduction in cost, and faster time-to-market. Using AWS, you can decrease maintenance, and create a highly scalable and cost-effective solution. Because of the AWS pay-as-you-go model, you can assess the price per project. This information can be used in your pricing model, and be passed along as service options to your own customers.

To get started with Amazon Translate or read more, check out these blogs: