Translating content dynamically by using Amazon S3 Object Lambda

Post Syndicated from James Beswick original https://aws.amazon.com/blogs/compute/translating-content-dynamically-by-using-amazon-s3-object-lambda/

This post is written by Sandeep Mohanty, Senior Solutions Architect.

The recent launch of Amazon S3 Object Lambda creates many possibilities to transform data in S3 buckets dynamically. S3 Object Lambda can be used with other AWS serverless services to transform content stored in S3 in many creative ways. One example is using S3 Object Lambda with Amazon Translate to translate and serve content from S3 buckets on demand.

Amazon Translate is a serverless machine translation service that delivers fast and customizable language translation. With Amazon Translate, you can localize content such as websites and applications to serve a diverse set of users.

Using S3 Object Lambda with Amazon Translate, you do not need to translate content in advance for all possible permutations of source to target languages. Instead, you can transform content in near-real time using a data driven model. This can serve multiple language-specific applications simultaneously.

S3 Object Lambda enables you to process and transform data using Lambda functions as objects are being retrieved from S3 by a client application. S3 GET object requests invoke the Lambda function and you can customize it to transform the content to meet specific requirements.

For example, if you run a website or mobile application with global visitors, you must provide translations in multiple languages. Artifacts such as forms, disclaimers, or product descriptions can be translated to serve a diverse global audience using this approach.

Solution architecture

This is the high-level architecture diagram for the example application that translates dynamic content on demand:

Solution architecture

In this example, you create an S3 Object Lambda that intercepts S3 GET requests for an object. It then translates the file to a target language, passed as an argument appended to the S3 object key. At a high level, the steps can be summarized as follows:

  1. Create a Lambda function to translate data from a source language to a target language using Amazon Translate.
  2. Create an S3 Object Lambda Access Point from the S3 console.
  3. Select the Lambda function created in step 1.
  4. Provide a supporting S3 Access Point to give S3 Object Lambda access to the original object.
  5. Retrieve a file from S3 by invoking the S3 GetObject API, and pass the Object Lambda Access Point ARN as the bucket name instead of the actual S3 bucket name.

Creating the Lambda function

In the first step, you create the Lambda function, DynamicFileTranslation. This Lambda function is invoked by an S3 GET Object API call and it translates the requested object. The target language is passed as an argument appended to the S3 object key, corresponding to the object being retrieved.

For example, for the object key passed in the S3 GetObject API call is customized to look something like “ContactUs/contact-us.txt#fr”, the characters after the pound sign represent the code for the target language. In this case, ‘fr’ is French. The full list of supported languages and language codes can be found here.

This Lambda function dynamically translates the content of an object in S3 to a target language:

import json
import boto3
from urllib.parse import urlparse, unquote
from pathlib import Path
def lambda_handler(event, context):
    print(event)

   # Extract the outputRoute and outputToken from the object context
    object_context = event["getObjectContext"]
    request_route = object_context["outputRoute"]
    request_token = object_context["outputToken"]

   # Extract the user requested URL and the supporting access point arn
    user_request_url = event["userRequest"]["url"]
    supporting_access_point_arn = event["configuration"]["supportingAccessPointArn"]

    print("USER REQUEST URL: ", user_request_url)
   
   # The S3 object key is after the Host name in the user request URL.
   # The user request URL looks something like this, 
   # https://<User Request Host>/ContactUs/contact-us.txt#fr.
   # The target language code in the S3 GET request is after the "#"
    
    user_request_url = unquote(user_request_url)
    result = user_request_url.split("#")
    user_request_url = result[0]
    targetLang = result[1]
    
   # Extract the S3 Object Key from the user requested URL
    s3Key = str(Path(urlparse(user_request_url).path).relative_to('/'))
       
   # Get the original object from S3
    s3 = boto3.resource('s3')
    
   # To get the original object from S3,use the supporting_access_point_arn 
    s3Obj = s3.Object(supporting_access_point_arn, s3Key).get()
    srcText = s3Obj['Body'].read()
    srcText = srcText.decode('utf-8')

   # Translate original text
    translateClient = boto3.client('translate')
    response = translateClient.translate_text(
                                                Text = srcText,
                                                SourceLanguageCode='en',
                                                TargetLanguageCode=targetLang)
    
  # Write object back to S3 Object Lambda
    s3client = boto3.client('s3')
    s3client.write_get_object_response(
                                        Body=response['TranslatedText'],
                                        RequestRoute=request_route,
                                        RequestToken=request_token )
    
    return { 'statusCode': 200 }

The code in the Lambda function:

  • Extracts the outputRoute and outputToken from the object context. This defines where the WriteGetObjectResponse request is delivered.
  • Extracts the user-requested url from the event object.
  • Parses the S3 object key and the target language that is appended to the S3 object key.
  • Calls S3 GetObject to fetch the raw text of the source object.
  • Invokes Amazon Translate with the raw text extracted.
  • Puts the translated output back to S3 using the WriteGetObjectResponse API.

Configuring the Lambda IAM role

The Lambda function needs permissions to call back to the S3 Object Lambda access point with the WriteGetObjectResponse. It also needs permissions to call S3 GetObject and Amazon Translate. Add the following permissions to the Lambda execution role:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "S3ObjectLambdaAccess",
            "Effect": "Allow",
            "Action": [
                "s3:GetObject",
                "s3-object-lambda:WriteGetObjectResponse"
            ],
            "Resource": [
                "<arn of your S3 access point/*>”,
                "<arn of your Object Lambda accesspoint>"
            ]
        },
        {
            "Sid": "AmazonTranslateAccess",
            "Effect": "Allow",
            "Action": "translate:TranslateText",
            "Resource": "*"
        }
    ]
}

Deploying the Lambda using an AWS SAM template

Alternatively, deploy the Lambda function with the IAM role by using an AWS SAM template. The code for the Lambda function and the AWS SAM template is available for download from GitHub.

Creating the S3 Access Point

  1. Navigate to the S3 console and create a bucket with a unique name.
  2. In the S3 console, select “Access Points” and choose “Create access point”. Enter a name for the access point.
  3. For Bucket name, enter the S3 bucket name you entered in step 1.Create access point

This access point is the supporting access point for the Object Lambda Access Point you create in the next step. Keep all other settings on this page as default.

After creating the S3 access point, create the S3 Object Lambda Access Point using the supporting Access Point. The Lambda function you created earlier uses the supporting Access Point to download the original untransformed objects from S3.

Create Object Lambda Access Point

In the S3 console, go to the Object Lambda Access Point configuration and create an Object Lambda Access Point. Enter a name.

Create Object Lambda Access Point

For the Lambda function configurations, associate this with the Lambda function created earlier. Select the latest version of the Lambda function and keep all other settings as default.

Select Lambda function

To understand how to use the other settings of the S3 Object Lambda configuration, refer to the product documentation.

Testing dynamic content translation

In this section, you create a Python script and invoke the S3 GetObject API twice. First, against the S3 bucket and then against the Object Lambda Access Point. You can then compare the output to see how content is transformed using Object Lambda:

  1. Upload a text file to the S3 bucket using the Object Lambda Access Point you configured. For example, upload a sample “Contact Us” file in English to S3.
  2. To use the object Lambda Access Point, locate its ARN from the Properties tab of the Object Lambda Access Point.Object Lambda Access Point
  3. Create a local file called s3ol_client.py that contains the following Python script:
    import json
    import boto3
    import sys, getopt
      
    def main(argv):
    	
      try:	
        targetLang = sys.argv[1]
        print("TargetLang = ", targetLang)
        
        s3 = boto3.client('s3')
        s3Bucket = "my-s3ol-bucket"
        s3Key = "ContactUs/contact-us.txt"
    
        # Call get_object using the S3 bucket name
        response = s3.get_object(Bucket=s3Bucket,Key=s3Key)
        print("Original Content......\n")
        print(response['Body'].read().decode('utf-8'))
    
        print("\n")
    
        # Call get_object using the S3 Object Lambda access point ARN
        s3Bucket = "arn:aws:s3-object-lambda:us-west-2:123456789012:accesspoint/my-s3ol-access-point"
        s3Key = "ContactUs/contact-us.txt#" + targetLang
        response = s3.get_object(Bucket=s3Bucket,Key=s3Key)
        print("Transformed Content......\n")
        print(response['Body'].read().decode('utf-8'))
    
        return {'Success':200}             
      except:
        print("\n\nUsage: s3ol_client.py <Target Language Code>")
    
    #********** Program Entry Point ***********
    if __name__ == '__main__':
        main(sys.argv[1: ])
    
  4. Run the client program from the command line, passing the target language code as an argument. The full list of supported languages and codes in Amazon Translate can be found here.python s3ol_client.py "fr"

The output looks like this:

Example output

The first output is the original content that was retrieved when calling GetObject with the S3 bucket name. The second output is the transformed content when calling GetObject against the Object Lambda access point. The content is transformed by the Lambda function as it is being retrieved, and translated to French in near-real time.

Conclusion

This blog post shows how you can use S3 Object Lambda with Amazon Translate to simplify dynamic content translation by using a data driven approach. With user-provided data as arguments, you can dynamically transform content in S3 and generate a new object.

It is not necessary to create a copy of this new object in S3 before returning it to the client. We also saw that it is not necessary for the object with the same name to exist in the S3 bucket when using S3 Object Lambda. This pattern can be used to address several real world use cases that can benefit from the ability to transform and generate S3 objects on the fly.

For more serverless learning resources, visit Serverless Land.