Tag Archives: python

Building resilient serverless applications using chaos engineering

Post Syndicated from Marcia Villalba original https://aws.amazon.com/blogs/compute/building-resilient-serverless-applications-using-chaos-engineering/

This post is written by Suranjan Choudhury (Head of TME and ITeS SA) and Anil Sharma (Sr PSA, Migration) 

Chaos engineering is the process of stressing an application in testing or production environments by creating disruptive events, such as outages, observing how the system responds, and implementing improvements. Chaos engineering helps you create the real-world conditions needed to uncover hidden issues and performance bottlenecks that are challenging to find in distributed applications.

You can build resilient distributed serverless applications using AWS Lambda and test Lambda functions in real world operating conditions using chaos engineering.  This blog shows an approach to inject chaos in Lambda functions, making no change to the Lambda function code. This blog uses the AWS Fault Injection Simulator (FIS) service to create experiments that inject disruptions for Lambda based serverless applications.

AWS FIS is a managed service that performs fault injection experiments on your AWS workloads. AWS FIS is used to set up and run fault experiments that simulate real-world conditions to discover application issues that are difficult to find otherwise. You can improve application resilience and performance using results from FIS experiments.

The sample code in this blog introduces random faults to existing Lambda functions, like an increase in response times (latency) or random failures. You can observe application behavior under introduced chaos and make improvements to the application.

Approaches to inject chaos in Lambda functions

AWS FIS currently does not support injecting faults in Lambda functions. However, there are two main approaches to inject chaos in Lambda functions: using external libraries or using Lambda layers.

Developers have created libraries to introduce failure conditions to Lambda functions, such as chaos_lambda and failure-Lambda. These libraries allow developers to inject elements of chaos into Python and Node.js Lambda functions. To inject chaos using these libraries, developers must decorate the existing Lambda function’s code. Decorator functions wrap the existing Lambda function, adding chaos at runtime. This approach requires developers to change the existing Lambda functions.

You can also use Lambda layers to inject chaos, requiring no change to the function code, as the fault injection is separated. Since the Lambda layer is deployed separately, you can independently change the element of chaos, like latency in response or failure of the Lambda function. This blog post discusses this approach.

Injecting chaos in Lambda functions using Lambda layers

A Lambda layer is a .zip file archive that contains supplementary code or data. Layers usually contain library dependencies, a custom runtime, or configuration files. This blog creates an FIS experiment that uses Lambda layers to inject disruptions in existing Lambda functions for Java, Node.js, and Python runtimes.

The Lambda layer contains the fault injection code. It is invoked prior to invocation of the Lambda function and injects random latency or errors. Injecting random latency simulates real world unpredictable conditions. The Java, Node.js, and Python chaos injection layers provided are generic and reusable. You can use them to inject chaos in your Lambda functions.

The Chaos Injection Lambda Layers

Java Lambda Layer for Chaos Injection

Java Lambda Layer for Chaos Injection

The chaos injection layer for Java Lambda functions uses the JAVA_TOOL_OPTIONS environment variable. This environment variable allows specifying the initialization of tools, specifically the launching of native or Java programming language agents. The JAVA_TOOL_OPTIONS has a javaagent parameter that points to the chaos injection layer. This layer uses Java’s premain method and the Byte Buddy library for modifying the Lambda function’s Java class during runtime.

When the Lambda function is invoked, the JVM uses the class specified with the javaagent parameter and invokes its premain method before the Lambda function’s handler invocation. The Java premain method injects chaos before Lambda runs.

The FIS experiment adds the layer association and the JAVA_TOOL_OPTIONS environment variable to the Lambda function.

Python and Node.js Lambda Layer for Chaos Injection

Python and Node.js Lambda Layer for Chaos Injection

When injecting chaos in Python and Node.js functions, the Lambda function’s handler is replaced with a function in the respective layers by the FIS aws:ssm:start-automation-execution action. The automation, which is an SSM document, saves the original Lambda function’s handler to in AWS Systems Manager Parameter Store, so that the changes can be rolled back once the experiment is finished.

The layer function contains the logic to inject chaos. At runtime, the layer function is invoked, injecting chaos in the Lambda function. The layer function in turn invokes the Lambda function’s original handler, so that the functionality is fulfilled.

The result in all runtimes (Java, Python, or Node.js), is invocation of the original Lambda function with latency or failure injected. The observed changes are random latency or failure injected by the layer.

Once the experiment is completed, an SSM document is provided. This rolls back the layer’s association to the Lambda function and removes the environment variable, in the case of the Java runtime.

Sample FIS experiments using SSM and Lambda layers

In the sample code provided, Lambda layers are provided for Python, Node.js and Java runtimes along with sample Lambda functions for each runtime.

The sample deploys the Lambda layers and the Lambda functions, FIS experiment template, AWS Identity and Access Management (IAM) roles needed to run the experiment, and the AWS Systems Manger (SSM) Documents. AWS CloudFormation template is provided for deployment.

Step 1: Complete the prerequisites

  • To deploy the sample code, clone the repository locally:
    git clone https://github.com/aws-samples/chaosinjection-lambda-samples.git
  • Complete the prerequisites documented here.

Step 2: Deploy using AWS CloudFormation

The CloudFormation template provided along with this blog deploys sample code. Execute runCfn.sh.

When this is complete, it returns the StackId that CloudFormation created:

Step 3: Run the chaos injection experiment

By default, the experiment is configured to inject chaos in the Java sample Lambda function. To change it to Python or Node.js Lambda functions, edit the experiment template and configure it to inject chaos using steps from here.

Step 4: Start the experiment

From the FIS Console, choose Start experiment.

 Start experiment

Wait until the experiment state changes to “Completed”.

Step 5: Run your test

At this stage, you can inject chaos into your Lambda function. Run the Lambda functions and observe their behavior.

1. Invoke the Lambda function using the command below:

aws lambda invoke --function-name NodeChaosInjectionExampleFn out --log-type Tail --query 'LogResult' --output text | base64 -d

2. The CLI commands output displays the logs created by the Lambda layers showing latency introduced in this invocation.

In this example, the output shows that the Lambda layer injected 1799ms of random latency to the function.

The experiment injects random latency or failure in the Lambda function. Running the Lambda function again results in a different latency or failure. At this stage, you can test the application, and observe its behavior under conditions that may occur in the real world, like an increase in latency or Lambda function’s failure.

Step 6: Roll back the experiment

To roll back the experiment, run the SSM document for rollback. This rolls back the Lambda function to the state before chaos injection. Run this command:

aws ssm start-automation-execution \
--document-name “InjectLambdaChaos-Rollback” \
--document-version “\$DEFAULT” \
--parameters \
‘{“FunctionName”:[“FunctionName”],”LayerArn”:[“LayerArn”],”assumeRole”:[“RoleARN
”]}’ \
--region eu-west-2

Cleaning up

To avoid incurring future charges, clean up the resources created by the CloudFormation template by running the following CLI command. Update the stack name to the one you provided when creating the stack.

aws cloudformation delete-stack --stack-name myChaosStack

Using FIS Experiments results

You can use FIS experiment results to validate expected system behavior. An example of expected behavior is: “If application latency increases by 10%, there is less than a 1% increase in sign in failures.” After the experiment is completed, evaluate whether the application resiliency aligns with your business and technical expectations.

Conclusion

This blog explains an approach for testing reliability and resilience in Lambda functions using chaos engineering. This approach allows you to inject chaos in Lambda functions without changing the Lambda function code, with clear segregation of chaos injection and business logic. It provides a way for developers to focus on building business functionality using Lambda functions.

The Lambda layers that inject chaos can be developed and managed separately. This approach uses AWS FIS to run experiments that inject chaos using Lambda layers and test serverless application’s performance and resiliency. Using the insights from the FIS experiment, you can find, fix, or document risks that surface in the application while testing.

For more serverless learning resources, visit Serverless Land.

How to add notifications and manual approval to an AWS CDK Pipeline

Post Syndicated from Jehu Gray original https://aws.amazon.com/blogs/devops/how-to-add-notifications-and-manual-approval-to-an-aws-cdk-pipeline/

A deployment pipeline typically comprises several stages such as dev, test, and prod, which ensure that changes undergo testing before reaching the production environment. To improve the reliability and stability of release processes, DevOps teams must review Infrastructure as Code (IaC) changes before applying them in production. As a result, implementing a mechanism for notification and manual approval that grants stakeholders improved access to changes in their release pipelines has become a popular practice for DevOps teams.

Notifications keep development teams and stakeholders informed in real-time about updates and changes to deployment status within release pipelines. Manual approvals establish thresholds for transitioning a change from one stage to the next in the pipeline. They also act as a guardrail to mitigate risks arising from errors and rework because of faulty deployments.

Please note that manual approvals, as described in this post, are not a replacement for the use of automation. Instead, they complement automated checks within the release pipeline.

In this blog post, we describe how to set up notifications and add a manual approval stage to AWS Cloud Development Kit (AWS CDK) Pipeline.

Concepts

CDK Pipeline

CDK Pipelines is a construct library for painless continuous delivery of CDK applications. CDK Pipelines can automatically build, test, and deploy changes to CDK resources. CDK Pipelines are self-mutating which means as application stages or stacks are added, the pipeline automatically reconfigures itself to deploy those new stages or stacks. Pipelines need only be manually deployed once, afterwards, the pipeline keeps itself up to date from the source code repository by pulling the changes pushed to the repository.

Notifications

Adding notifications to a pipeline provides visibility to changes made to the environment by utilizing the NotificationRule construct. You can also use this rule to notify pipeline users of important changes, such as when a pipeline starts execution. Notification rules specify both the events and the targets, such as Amazon Simple Notification Service (Amazon SNS) topic or AWS Chatbot clients configured for Slack which represents the nominated recipients of the notifications. An SNS topic is a logical access point that acts as a communication channel while Chatbot is an AWS service that enables DevOps and software development teams to use messaging program chat rooms to monitor and respond to operational events.

Manual Approval

In a CDK pipeline, you can incorporate an approval action at a specific stage, where the pipeline should pause, allowing a team member or designated reviewer to manually approve or reject the action. When an approval action is ready for review, a notification is sent out to alert the relevant parties. This combination of notifications and approvals ensures timely and efficient decision-making regarding crucial actions within the pipeline.

Solution Overview

The solution explains a simple web service that is comprised of an AWS Lambda function that returns a static web page served by Amazon API Gateway. Since Continuous Deployment and Continuous Integration (CI/CD) are important components to most web projects, the team implements a CDK Pipeline for their web project.

There are two important stages in this CDK pipeline; the Pre-production stage for testing and the Production stage, which contains the end product for users.

The flow of the CI/CD process to update the website starts when a developer pushes a change to the repository using their Integrated Development Environment (IDE). An Amazon CloudWatch event triggers the CDK Pipeline. Once the changes reach the pre-production stage for testing, the CI/CD process halts. This is because a manual approval gate is between the pre-production and production stages. So, it becomes a stakeholder’s responsibility to review the changes in the pre-production stage before approving them for production. The pipeline includes an SNS notification that notifies the stakeholder whenever the pipeline requires manual approval.

After approving the changes, the CI/CD process proceeds to the production stage and the updated version of the website becomes available to the end user. If the approver rejects the changes, the process ends at the pre-production stage with no impact to the end user.

The following diagram illustrates the solution architecture.

 

This diagram shows the CDK pipeline process in the solution and how applications or updates are deployed using AWS Lambda Function to end users.

Figure 1. This image shows the CDK pipeline process in our solution and how applications or updates are deployed using AWS Lambda Function to end users.

Prerequisites

For this walkthrough, you should have the following prerequisites:

Add notification to the pipeline

In this tutorial, perform the following steps:

  • Add the import statements for AWS CodeStar notifications and SNS to the import section of the pipeline stack py
import aws_cdk.aws_codestarnotifications as notifications
import aws_cdk.pipelines as pipelines
import aws_cdk.aws_sns as sns
import aws_cdk.aws_sns_subscriptions as subs
  • Ensure the pipeline is built by calling the ‘build pipeline’ function.

pipeline.build_pipeline()

  • Create an SNS topic.

topic = sns.Topic(self, "MyTopic1")

  • Add a subscription to the topic. This specifies where the notifications are sent (Add the stakeholders’ email here).

topic.add_subscription(subs.EmailSubscription("[email protected]"))

  • Define a rule. This contains the source for notifications, the event trigger, and the target .

rule = notifications.NotificationRule(self, "NotificationRule", )

  • Assign the source the value pipeline.pipeline The first pipeline is the name of the CDK pipeline(variable) and the .pipeline is to show it is a pipeline(function).

source=pipeline.pipeline,

  • Define the events to be monitored. Specify notifications for when the pipeline starts, when it fails, when the execution succeeds, and finally when manual approval is needed.
events=["codepipeline-pipeline-pipeline-execution-started", "codepipeline-pipeline-pipeline-execution-failed","codepipeline-pipeline-pipeline-execution-succeeded", 
"codepipeline-pipeline-manual-approval-needed"],
  • For the complete list of supported event types for pipelines, see here
  • Finally, add the target. The target here is the topic created previously.

targets=[topic]

The combination of all the steps becomes:

pipeline.build_pipeline()
topic = sns.Topic(self, "MyTopic1")
topic.add_subscription(subs.EmailSubscription("[email protected]"))
rule = notifications.NotificationRule(self, "NotificationRule",
source=pipeline.pipeline,
events=["codepipeline-pipeline-pipeline-execution-started", "codepipeline-pipeline-pipeline-execution-failed","codepipeline-pipeline-pipeline-execution-succeeded", 
"codepipeline-pipeline-manual-approval-needed"],
targets=[topic]
)

Adding Manual Approval

  • Add the ManualApprovalStep import to the aws_cdk.pipelines import statement.
from aws_cdk.pipelines import (
CodePipeline,
CodePipelineSource,
ShellStep,
ManualApprovalStep
)
  • Add the ManualApprovalStep to the production stage. The code must be added to the add_stage() function.
 prod = WorkshopPipelineStage(self, "Prod")
        prod_stage = pipeline.add_stage(prod,
            pre = [ManualApprovalStep('PromoteToProduction')])

When a stage is added to a pipeline, you can specify the pre and post steps, which are arbitrary steps that run before or after the contents of the stage. You can use them to add validations like manual or automated gates to the pipeline. It is recommended to put manual approval gates in the set of pre steps, and automated approval gates in the set of post steps. So, the manual approval action is added as a pre step that runs after the pre-production stage and before the production stage .

  • The final version of the pipeline_stack.py becomes:
from constructs import Construct
import aws_cdk as cdk
import aws_cdk.aws_codestarnotifications as notifications
import aws_cdk.aws_sns as sns
import aws_cdk.aws_sns_subscriptions as subs
from aws_cdk import (
    Stack,
    aws_codecommit as codecommit,
    aws_codepipeline as codepipeline,
    pipelines as pipelines,
    aws_codepipeline_actions as cpactions,
    
)
from aws_cdk.pipelines import (
    CodePipeline,
    CodePipelineSource,
    ShellStep,
    ManualApprovalStep
)


class WorkshopPipelineStack(cdk.Stack):
    def __init__(self, scope: Construct, id: str, **kwargs) -> None:
        super().__init__(scope, id, **kwargs)
        
        # Creates a CodeCommit repository called 'WorkshopRepo'
        repo = codecommit.Repository(
            self, "WorkshopRepo", repository_name="WorkshopRepo",
            
        )
        
        #Create the Cdk pipeline
        pipeline = pipelines.CodePipeline(
            self,
            "Pipeline",
            
            synth=pipelines.ShellStep(
                "Synth",
                input=pipelines.CodePipelineSource.code_commit(repo, "main"),
                commands=[
                    "npm install -g aws-cdk",  # Installs the cdk cli on Codebuild
                    "pip install -r requirements.txt",  # Instructs Codebuild to install required packages
                    "npx cdk synth",
                ]
                
            ),
        )

        
         # Create the Pre-Prod Stage and its API endpoint
        deploy = WorkshopPipelineStage(self, "Pre-Prod")
        deploy_stage = pipeline.add_stage(deploy)
    
        deploy_stage.add_post(
            
            pipelines.ShellStep(
                "TestViewerEndpoint",
                env_from_cfn_outputs={
                    "ENDPOINT_URL": deploy.hc_viewer_url
                },
                commands=["curl -Ssf $ENDPOINT_URL"],
            )
    
        
        )
        deploy_stage.add_post(
            pipelines.ShellStep(
                "TestAPIGatewayEndpoint",
                env_from_cfn_outputs={
                    "ENDPOINT_URL": deploy.hc_endpoint
                },
                commands=[
                    "curl -Ssf $ENDPOINT_URL",
                    "curl -Ssf $ENDPOINT_URL/hello",
                    "curl -Ssf $ENDPOINT_URL/test",
                ],
            )
            
        )
        
        # Create the Prod Stage with the Manual Approval Step
        prod = WorkshopPipelineStage(self, "Prod")
        prod_stage = pipeline.add_stage(prod,
            pre = [ManualApprovalStep('PromoteToProduction')])
        
        prod_stage.add_post(
            
            pipelines.ShellStep(
                "ViewerEndpoint",
                env_from_cfn_outputs={
                    "ENDPOINT_URL": prod.hc_viewer_url
                },
                commands=["curl -Ssf $ENDPOINT_URL"],
                
            )
            
        )
        prod_stage.add_post(
            pipelines.ShellStep(
                "APIGatewayEndpoint",
                env_from_cfn_outputs={
                    "ENDPOINT_URL": prod.hc_endpoint
                },
                commands=[
                    "curl -Ssf $ENDPOINT_URL",
                    "curl -Ssf $ENDPOINT_URL/hello",
                    "curl -Ssf $ENDPOINT_URL/test",
                ],
            )
            
        )
        
        # Create The SNS Notification for the Pipeline
        
        pipeline.build_pipeline()
        
        topic = sns.Topic(self, "MyTopic")
        topic.add_subscription(subs.EmailSubscription("[email protected]"))
        rule = notifications.NotificationRule(self, "NotificationRule",
            source = pipeline.pipeline,
            events = ["codepipeline-pipeline-pipeline-execution-started", "codepipeline-pipeline-pipeline-execution-failed", "codepipeline-pipeline-manual-approval-needed", "codepipeline-pipeline-manual-approval-succeeded"],
            targets=[topic]
            )
  
    

When a commit is made with git commit -am "Add manual Approval" and changes are pushed with git push, the pipeline automatically self-mutates to add the new approval stage.

Now when the developer pushes changes to update the build environment or the end user application, the pipeline execution stops at the point where the approval action was added. The pipeline won’t resume unless a manual approval action is taken.

Image showing the CDK pipeline with the added Manual Approval action on the AWS Management Console

Figure 2. This image shows the pipeline with the added Manual Approval action.

Since there is a notification rule that includes the approval action, an email notification is sent with the pipeline information and approval status to the stakeholder(s) subscribed to the SNS topic.

Image showing the SNS email notification sent when the pipeline starts

Figure 3. This image shows the SNS email notification sent when the pipeline starts.

After pushing the updates to the pipeline, the reviewer or stakeholder can use the AWS Management Console to access the pipeline to approve or deny changes based on their assessment of these changes. This process helps eliminate any potential issues or errors and ensures only changes deemed relevant are made.

Image showing the review action on the AWS Management Console that gives the stakeholder the ability to approve or reject any changes.

Figure 4. This image shows the review action that gives the stakeholder the ability to approve or reject any changes. 

If a reviewer rejects the action, or if no approval response is received within seven days of the pipeline stopping for the review action, the pipeline status is “Failed.”

Image showing when a stakeholder rejects the action

Figure 5. This image depicts when a stakeholder rejects the action.

If a reviewer approves the changes, the pipeline continues its execution.

Image showing when a stakeholder approves the action

Figure 6. This image depicts when a stakeholder approves the action.

Considerations

It is important to consider any potential drawbacks before integrating a manual approval process into a CDK pipeline. one such consideration is its implementation may delay the delivery of updates to end users. An example of this is business hours limitation. The pipeline process might be constrained by the availability of stakeholders during business hours. This can result in delays if changes are made outside regular working hours and require approval when stakeholders are not immediately accessible.

Clean up

To avoid incurring future charges, delete the resources. Use cdk destroy via the command line to delete the created stack.

Conclusion

Adding notifications and manual approval to CDK Pipelines provides better visibility and control over the changes made to the pipeline environment. These features ideally complement the existing automated checks to ensure that all updates are reviewed before deployment. This reduces the risk of potential issues arising from bugs or errors. The ability to approve or deny changes through the AWS Management Console makes the review process simple and straightforward. Additionally, SNS notifications keep stakeholders updated on the status of the pipeline, ensuring a smooth and seamless deployment process.

Jehu Gray

Jehu Gray is an Enterprise Solutions Architect at Amazon Web Services where he helps customers design solutions that fits their needs. He enjoys exploring whats possible with IaC such as CDK.

Abiola Olanrewaju

Abiola Olanrewaju is an Enterprise Solutions Architect at Amazon Web Services where he helps customers design and implement scalable solutions that drive business outcomes. He has a keen interest in Data Analytics, Security and Automation.

Serge Poueme

Serge Poueme is a Solutions Architect on the AWS for Games Team. He started his career as a software development engineer and enjoys building new products. At AWS, Serge focuses on improving Builders Experience for game developers and optimize servers hosting using Containers. When he is not working he enjoys playing Far Cry or Fifa on his XBOX

Optimizing data with automated intelligent document processing solutions

Post Syndicated from Deependra Shekhawat original https://aws.amazon.com/blogs/architecture/optimizing-data-with-automated-intelligent-document-processing-solutions/

Many organizations struggle to effectively manage and derive insights from the large amount of unstructured data locked in emails, PDFs, images, scanned documents, and more. The variety of formats, document layouts, and text makes it difficult for any standard Optical Character Recognition (OCR) to extract key insights from these data sources.

To help organizations overcome these document management and information extraction challenges, AWS offers connected, pre-trained artificial intelligence (AI) service APIs that help drive business outcomes from these document-based rich data sources.

This blog post describes a cost-effective, scalable automated intelligent document processing solution that leverages a Natural Processing Language (NLP) engine using Amazon Textract and Amazon Comprehend. This solution helps customers take advantage of industry leading machine learning (ML) technology in their document workflows without the need for in-house ML expertise.

Customer document management challenges

Customers across industry verticals experience the following document management challenges:

  • Extraction process accuracy varies significantly when applied to diverse sources; specifically handwritten text, images, and scanned documents.
  • Existing scripting and rule-based solutions cannot provide customer domain or problem-specific classifiers.
  • Traditional document management systems cannot consider feedback from domain experts to improve the learning process.
  • The Personally Identifiable Information (PII) data-handling is not robust or customizable, causing data privacy leakage concern.
  • Many manual interventions are required to complete the entire process.

Automated intelligent document processing solution

We introduced an automated intelligent document processing implementation to address key document management challenges. At the heart of the solution is a NLP engine that combines:

The full solution also leverages other AWS services as described in the following diagram (Figure 1) and steps to develop and operate a cost-effective and scalable architecture for document processing. It effectively extracts text from document types including PDFs, images, scanned documents, Microsoft Excel workbooks, and more.

AI-based intelligent document processing engine

Figure 1: AI-based intelligent document processing engine

Solution overview

Let’s explore the automated intelligent document processing solution step by step.

  1. The document upload engine or business users upload the respective files or documents through a custom web application to the designated Amazon Simple Storage Service (Amazon S3) bucket.
  2. The event-based architecture signals an Amazon S3 push event to invoke the respective AWS Lambda function to start document pre-processing.
  3. The Lambda function evaluates the document payload, leverages Amazon Simple Queue Service (Amazon SQS) for async processing, prepares document metadata, stores it in Amazon DynamoDB, and calls the NLP engine to perform the information extraction process.
  4. The NLP engine leverages Amazon Textract for text extraction from a variety of sources and leverages document metadata to optimize the appropriate API calls (for example, form, tabular, or PDF).
    • Amazon Textract output is fed into Amazon Comprehend which consumes the extracted text and performs entity parsing, line/paragraph-based sentiment analysis, and document/paragraph classification. For better accuracy, we leverage a custom classifier within Amazon Comprehend.
    • Amazon Comprehend also provides key APIs to mask PII data before it is used for any further consumption. The solution offers the ability to configure masking rules for each PII entity per masking requirements.
    • To ensure the solution has capability to handle data from Microsoft Excel workbooks, we developed a custom parser using Python running inside an AWS Lambda function. Depending on the document metadata, this function can be invoked.
  5. Output of Amazon Comprehend is then fed to ML models deployed using Amazon SageMaker depending on additional use cases configured by the customer to complement the overall process with ML-based recommendations, predictions, and personalization.
  6. Once the NLP engine completes its processing, the job completion notification event signals another AWS Lambda function and updates the status in the respective Amazon SQS queue.
  7. The Lambda post-processing function parses the resultant content generated by the NLP engine and stores it in the Amazon DynamoDB and Amazon S3 bucket. This step is responsible for the required data augmentation, key entities validation, and default value assignment to create a data structure that could be consumed by the presentation/visualization layer.
  8. Users get the flexibility to see the extracted information and compare it with the original document extract in the custom user interface (UI). They can provide their feedback on extraction and entity parsing accuracy. From a user access management perspective, Amazon Cognito provides authorization and authentication.

Customer benefits

The automated intelligent document processing solution helps customers:

  • Increase overall document management efficiency by 50-60%, leveraging automation and nullifying manual interventions
  • Reduce in-house team involvement in administrative activities by up to 70% using integrated and connected processing workflows
  • Gain better visibility into key contractual obligations with features such as Document Classification (helps properly route documents to the respective process/team) and Obligation Extraction
  • Utilize a UI-based feedback mechanism for in-house domain experts/reviewers to see and validate the extracted information and offer feedback to inform further model training

From a cost-optimization perspective, depending on document type and required information, only the respective Amazon Textract APIs calls are submitted. (For example, it is not worth using form/table-based Textract API calls for a Know Your Customer (KYC) document such as a driver’s license or passport when the AnalyzeID API is the most efficient solution.)

To maximize solution benefits, customers should invest time in building well-defined taxonomies ahead of using the document processing solution to accommodate their own use cases or industry domain-specific requirements. Their taxonomy input highlights only relevant keys and takes respective actions in case the requires keys are not extracted.

Vertical industry use cases

As mentioned, this document processing solution can be used across industry segments. Let’s explore some practical use cases. For example, it can help insurance industry professionals to accelerate claim processing and customer KYC-related processes. By extracting the key entities from the claim documents, mapping them against the customer defined taxonomy, and integrating with Amazon SageMaker models for anomaly detection (anomalous claims), insurance providers can improve claim management and customer satisfaction.

In the healthcare industry, the solution can help with medical records and report processing, key medical entity extraction, and customer data masking.

The document processing solution can help the banking industry by automating check processing and delivering the ability to extract key entities like payer, payee, date, and amount from the checks.

Conclusion

Manual document processing is resource-intensive, time consuming, and costly. Customers need to allocate resources to process large volume documents, lowering business agility. Their employees are performing manual “stare and compare” tasks, potentially reducing worker morale and preventing them from focusing where their efforts are better placed.

Intelligent document processing helps businesses overcome these challenges by automating the classification, extraction, and analysis of data. This expedites decision cycles, allocates resources to high-value tasks, and reduces costs.

Pre-trained APIs of AWS AI services allow for quick classification, extraction, and data analyzation from scores of documents. This solution also has industry specific features that can quickly process specialized industry specific documents. This blog discussed the foundational architecture to helps to accelerate implementation of any specific document processing use case.

Kids’ coding languages

Post Syndicated from Marc Scott original https://www.raspberrypi.org/blog/kids-coding-languages/

Programming is becoming an increasingly useful skill in today’s society. As we continue to rely more and more on software and digital technology, knowing how to code is also more and more valuable. That’s why many parents are looking for ways to introduce their children to programming. You might find it difficult to know where to begin, with so many different kids’ coding languages and platforms available. In this blog post, we explore how children can progress through different programming languages to realise their potential as proficient coders and creators of digital technology.

Two kids share their Scratch coding project on a laptop.

ScratchJr

Everyone needs to start somewhere, and one great option for children aged 5–7 is ScratchJr (Scratch Junior), a visual programming language with drag-and-drop blocks for creating simple programs. ScratchJr is available for free on Android and iOS mobile devices. It’s great for introducing young children to the basics of programming, and they can use it to create interactive stories and games.

Scratch

Moving on from ScratchJr, there’s its web-based sibling Scratch. Scratch offers drag-and-drop blocks for creating programs and comes with an assortment of graphics, sounds, and music for your child to bring their programs to life. This visual programming language is designed specifically for children to learn programming fundamentals. Scratch is available in multiple spoken languages and is perfect for beginners. It allows kids to create interactive stories, animations, and games with ease.

The Raspberry Pi Foundation has a wealth of free Scratch resources we have created specifically for young people who are beginners, such as the ‘Introduction to Scratch’ project path. And if your child is interested in physical computing to interact with the real world using code, they can also learn how to use electronic components, such as buzzers and LEDs, with Scratch and a Raspberry Pi computer.  

Young person using a laptop to code in Scratch, our favourite of all kids' coding languages.

MakeCode

Another fun option for children who want to explore coding and physical computing is the micro:bit. This is a small programmable device with an LED display, buttons, and sensors, and it can be used to create games, animations, interactive projects, and lots more. To control a micro:bit, a visual programming language called MakeCode can be used. The micro:bit can also be programmed using Scratch or text-based languages such as Python, offering an easy transition for children as their coding skills progress. Have a look at our free collection of micro:bit resources to learn more.

HTML

Everyone is familiar with websites, but fewer people know how they are coded. HTML is a markup language that is used to create the webpages we use every day. It’s a great language for children to learn because they can see the results of their code in real time, in their web browser. They can use HTML and CSS to create simple webpages that include links, videos, pictures, and interactive elements, all the while learning how websites are structured and designed. We have many free web design resources for your child, including a basic ‘Introduction to web development’ project path.

Three kids coding at laptops.

Python 

If your child is becoming confident with Scratch and HTML, then using Python is the recommended next stage in their learning. Python is a high-level text-based programming language that is easy to read and learn. It is a popular choice for beginners as it has a simple syntax that often reads like plain English. Many free Python projects for young people are available on our website, including the ‘Introduction to Python’ path.

A kid coding in Python on a laptop.

The Python community is also really welcoming and has produced a myriad of online tutorials and videos to help learners explore this language. Python can be used to do some very powerful things with ease, which is why it is so popular. For example, it is relatively simple to create Python programs to engage in machine learning and data analysis. If you wanted to explore large language models such as GPT, on which the ChatGPT chatbot is based, then Python would be the language of choice.

JavaScript 

JavaScript is the language of the web, and if your child has become proficient in HTML, then this is the next language for them. JavaScript is used to create interactive websites and web applications. As young people become more comfortable with programming, JavaScript is a useful language to progress to, given how ubiquitous the web is today. It can be tricky to learn, but like Python, it has a vast number of libraries of functions that people have already created for it to achieve things more quickly. These libraries make JavaScript a very powerful language to use.

Try out kids’ coding languages

There are many different programming languages, and each one has its own strengths and weaknesses. Some are easy to learn and use, some are really fast, and some are very secure.

Two kids coding together on Code Club World.

Starting with visual languages such as Scratch or MakeCode allows your child to begin to understand the basic concepts of programming without needing any developed reading and keyboard skills. Once their understanding and skills have improved, they can try out text-based languages, find the one that they are comfortable with, and then continue to learn. It’s fairly common for people who are proficient in one programming language to learn other languages quite quickly, so don’t worry about which programming language your child starts with.

Whether your child is interested in working in software development or just wants to learn a valuable — and creative — skill, helping them learn to code and try out different kids’ coding languages is a great way for you to open up new opportunities for them.

The post Kids’ coding languages appeared first on Raspberry Pi Foundation.

Test our new Code Editor for young people

Post Syndicated from Phil Howell original https://www.raspberrypi.org/blog/code-editor-beta-testing/

We are building a new online text-based Code Editor to help young people aged 7 and older learn to write code. It’s free and designed for young people who attend Code Clubs and CoderDojos, students in schools, and learners at home.

The interface of the beta version of the Raspberry Pi Foundation's Code Editor.
The Code Editor interface

At this stage of development, the Code Editor enables learners to:

  • Write and run Python code right in their browser, with no setup required. The interface is simple and intuitive, which makes getting started with text-based coding easier.
  • Save their code using their Raspberry Pi Foundation account. We want learners to easily build on projects they start in the classroom at home, or bring a project they’ve started at home to their coding club.
A young person at a CoderDojo uses the Raspberry Pi Foundation's Code Editor.

We’ve chosen Python as the first programming language our Code Editor supports because it is popular in schools, CoderDojos, and Code Clubs. Many educators and young people like Python because they see it as similar to the English language. It is often the text-based language young people learn when they take their first steps away from a block-based programming environment, such as Scratch

Python is also widely used by professional programmers and usually tops at least one of the industry-standard indexes that ranks programming languages.

We will be adding support for web development languages (HTML/CSS/JavaScript) to the Editor in the near future.

We’re also planning to add features such as project sharing and collaboration, which we know young people will love. We want the Editor to be safe, accessible, and age-appropriate. As safeguarding is always at the core of what we do, we’ll only make new features available once we’ve ensured they comply with the ICO’s age-appropriate design code and our safeguarding policies.

Test the Code Editor and tell us what you think

We are inviting you to test the Code Editor as part of what we call the beta phase of development. As the Editor is still in development, some things might not look or work as well as we’d like — and this is why we need your help. 

A text output in the beta version of the Raspberry Pi Foundation's Code Editor.
Text output in the Code Editor

We’d love you to try the Editor out and let us know what worked well for you, what didn’t work well, and what you’d like to see next.

You can now try out the Code Editor in the first two projects of our ‘Intro to Python’ path. We’ve included a feedback form for you to let us know which project you tried, and what you think of the Editor. We’d love to hear from you.

Your feedback helps us decide what to do next. Based on what learners, educators, volunteers, teachers, and parents tell us, we will make the improvements to the Editor that matter most to the young people we aim to support.

Where next for the Code Editor?

One of our long-term goals is to engage millions of young people in learning about computing and how to create with digital technologies. We’re developing the Code Editor with three main aims in mind.

1. Supporting young people’s learning journeys

We aim to build the Code Editor so it:

  • Suits beginners and also supports them as their confidence and independence grows, so they can take on their own coding projects in a familiar environment
  • Helps learners to transition from block-based to text-based, informed by our deep understanding of pedagogy and computing education
  • Brings together projects instructions and code editing into a single interface so that young people do not have to switch screens, which makes coding easier

2. Removing barriers to accessing computing education

Our work on the Code Editor will:

  • Ensure it works well on mobile and tablet devices, and low-cost computers including the Raspberry Pi 4 2GB
  • Support localisation and translation, so we can tailor the Editor for the needs of young people all over the world

3. Making learning to program engaging for more young people

We want to offer a Code Editor that:

  • Enables young people to build a vast variety of projects because it supports graphic user interface output and supplies images and sprites for use in multimedia projects

We’re also planning on making the Editor available as an open source project so that other projects and organisations focussed on helping people learn to code can benefit. More on this soon.

Our work on the Code Editor has been generously funded by the Algorand Foundation and Endless, and we thank them for their generous support. If you are interested in partnering with us to fund this key work, please reach out to us via email.

The post Test our new Code Editor for young people appeared first on Raspberry Pi Foundation.

Serverless ICYMI Q1 2023

Post Syndicated from Julian Wood original https://aws.amazon.com/blogs/compute/serverless-icymi-q1-2023/

Welcome to the 21st edition of the AWS Serverless ICYMI (in case you missed it) quarterly recap. Every quarter, we share all the most recent product launches, feature enhancements, blog posts, webinars, live streams, and other interesting things that you might have missed!

ICYMI2023Q1

In case you missed our last ICYMI, check out what happened last quarter here.

Artificial intelligence (AI) technologies, ChatGPT, and DALL-E are creating significant interest in the industry at the moment. Find out how to integrate serverless services with ChatGPT and DALL-E to generate unique bedtime stories for children.

Example notification of a story hosted with Next.js and App Runner

Example notification of a story hosted with Next.js and App Runner

Serverless Land is a website maintained by the Serverless Developer Advocate team to help you build serverless applications and includes workshops, code examples, blogs, and videos. There is now enhanced search functionality so you can search across resources, patterns, and video content.

SLand-search

ServerlessLand search

AWS Lambda

AWS Lambda has improved how concurrency works with Amazon SQS. You can now control the maximum number of concurrent Lambda functions invoked.

The launch blog post explains the scaling behavior of Lambda using this architectural pattern, challenges this feature helps address, and a demo of maximum concurrency in action.

Maximum concurrency is set to 10 for the SQS queue.

Maximum concurrency is set to 10 for the SQS queue.

AWS Lambda Powertools is an open-source library to help you discover and incorporate serverless best practices more easily. Lambda Powertools for .NET is now generally available and currently focused on three observability features: distributed tracing (Tracer), structured logging (Logger), and asynchronous business and application metrics (Metrics). Powertools is also available for Python, Java, and Typescript/Node.js programming languages.

To learn more:

Lambda announced a new feature, runtime management controls, which provide more visibility and control over when Lambda applies runtime updates to your functions. The runtime controls are optional capabilities for advanced customers that require more control over their runtime changes. You can now specify a runtime management configuration for each function with three settings, Automatic (default), Function update, or manual.

There are three new Amazon CloudWatch metrics for asynchronous Lambda function invocations: AsyncEventsReceived, AsyncEventAge, and AsyncEventsDropped. You can track the asynchronous invocation requests sent to Lambda functions to monitor any delays in processing and take corrective actions if required. The launch blog post explains the new metrics and how to use them to troubleshoot issues.

Lambda now supports Amazon DocumentDB change streams as an event source. You can use Lambda functions to process new documents, track updates to existing documents, or log deleted documents. You can use any programming language that is supported by Lambda to write your functions.

There is a helpful blog post suggesting best practices for developing portable Lambda functions that allow you to port your code to containers if you later choose to.

AWS Step Functions

AWS Step Functions has expanded its AWS SDK integrations with support for 35 additional AWS services including Amazon EMR Serverless, AWS Clean Rooms, AWS IoT FleetWise, AWS IoT RoboRunner and 31 other AWS services. In addition, Step Functions also added support for 1000+ new API actions from new and existing AWS services such as Amazon DynamoDB and Amazon Athena. For the full list of added services, visit AWS SDK service integrations.

Amazon EventBridge

Amazon EventBridge has launched the AWS Controllers for Kubernetes (ACK) for EventBridge and Pipes . This allows you to manage EventBridge resources, such as event buses, rules, and pipes, using the Kubernetes API and resource model (custom resource definitions).

EventBridge event buses now also support enhanced integration with Service Quotas. Your quota increase requests for limits such as PutEvents transactions-per-second, number of rules, and invocations per second among others will be processed within one business day or faster, enabling you to respond quickly to changes in usage.

AWS SAM

The AWS Serverless Application Model (SAM) Command Line Interface (CLI) has added the sam list command. You can now show resources defined in your application, including the endpoints, methods, and stack outputs required to test your deployed application.

AWS SAM has a preview of sam build support for building and packaging serverless applications developed in Rust. You can use cargo-lambda in the AWS SAM CLI build workflow and AWS SAM Accelerate to iterate on your code changes rapidly in the cloud.

You can now use AWS SAM connectors as a source resource parameter. Previously, you could only define AWS SAM connectors as a AWS::Serverless::Connector resource. Now you can add the resource attribute on a connector’s source resource, which makes templates more readable and easier to update over time.

AWS SAM connectors now also support multiple destinations to simplify your permissions. You can now use a single connector between a single source resource and multiple destination resources.

In October 2022, AWS released OpenID Connect (OIDC) support for AWS SAM Pipelines. This improves your security posture by creating integrations that use short-lived credentials from your CI/CD provider. There is a new blog post on how to implement it.

Find out how best to build serverless Java applications with the AWS SAM CLI.

AWS App Runner

AWS App Runner now supports retrieving secrets and configuration data stored in AWS Secrets Manager and AWS Systems Manager (SSM) Parameter Store in an App Runner service as runtime environment variables.

AppRunner also now supports incoming requests based on HTTP 1.0 protocol, and has added service level concurrency, CPU and Memory utilization metrics.

Amazon S3

Amazon S3 now automatically applies default encryption to all new objects added to S3, at no additional cost and with no impact on performance.

You can now use an S3 Object Lambda Access Point alias as an origin for your Amazon CloudFront distribution to tailor or customize data to end users. For example, you can resize an image depending on the device that an end user is visiting from.

S3 has introduced Mountpoint for S3, a high performance open source file client that translates local file system API calls to S3 object API calls like GET and LIST.

S3 Multi-Region Access Points now support datasets that are replicated across multiple AWS accounts. They provide a single global endpoint for your multi-region applications, and dynamically route S3 requests based on policies that you define. This helps you to more easily implement multi-Region resilience, latency-based routing, and active-passive failover, even when data is stored in multiple accounts.

Amazon Kinesis

Amazon Kinesis Data Firehose now supports streaming data delivery to Elastic. This is an easier way to ingest streaming data to Elastic and consume the Elastic Stack (ELK Stack) solutions for enterprise search, observability, and security without having to manage applications or write code.

Amazon DynamoDB

Amazon DynamoDB now supports table deletion protection to protect your tables from accidental deletion when performing regular table management operations. You can set the deletion protection property for each table, which is set to disabled by default.

Amazon SNS

Amazon SNS now supports AWS X-Ray active tracing to visualize, analyze, and debug application performance. You can now view traces that flow through Amazon SNS topics to destination services, such as Amazon Simple Queue Service, Lambda, and Kinesis Data Firehose, in addition to traversing the application topology in Amazon CloudWatch ServiceLens.

SNS also now supports setting content-type request headers for HTTPS notifications so applications can receive their notifications in a more predictable format. Topic subscribers can create a DeliveryPolicy that specifies the content-type value that SNS assigns to their HTTPS notifications, such as application/json, application/xml, or text/plain.

EDA Visuals collection added to Serverless Land

The Serverless Developer Advocate team has extended Serverless Land and introduced EDA visuals. These are small bite sized visuals to help you understand concept and patterns about event-driven architectures. Find out about batch processing vs. event streaming, commands vs. events, message queues vs. event brokers, and point-to-point messaging. Discover bounded contexts, migrations, idempotency, claims, enrichment and more!

EDA-visuals

EDA Visuals

To learn more:

Serverless Repos Collection on Serverless Land

There is also a new section on Serverless Land containing helpful code repositories. You can search for code repos to use for examples, learning or building serverless applications. You can also filter by use-case, runtime, and level.

Serverless Repos Collection

Serverless Repos Collection

Serverless Blog Posts

January

Jan 12 – Introducing maximum concurrency of AWS Lambda functions when using Amazon SQS as an event source

Jan 20 – Processing geospatial IoT data with AWS IoT Core and the Amazon Location Service

Jan 23 – AWS Lambda: Resilience under-the-hood

Jan 24 – Introducing AWS Lambda runtime management controls

Jan 24 – Best practices for working with the Apache Velocity Template Language in Amazon API Gateway

February

Feb 6 – Previewing environments using containerized AWS Lambda functions

Feb 7 – Building ad-hoc consumers for event-driven architectures

Feb 9 – Implementing architectural patterns with Amazon EventBridge Pipes

Feb 9 – Securing CI/CD pipelines with AWS SAM Pipelines and OIDC

Feb 9 – Introducing new asynchronous invocation metrics for AWS Lambda

Feb 14 – Migrating to token-based authentication for iOS applications with Amazon SNS

Feb 15 – Implementing reactive progress tracking for AWS Step Functions

Feb 23 – Developing portable AWS Lambda functions

Feb 23 – Uploading large objects to Amazon S3 using multipart upload and transfer acceleration

Feb 28 – Introducing AWS Lambda Powertools for .NET

March

Mar 9 – Server-side rendering micro-frontends – UI composer and service discovery

Mar 9 – Building serverless Java applications with the AWS SAM CLI

Mar 10 – Managing sessions of anonymous users in WebSocket API-based applications

Mar 14 –
Implementing an event-driven serverless story generation application with ChatGPT and DALL-E

Videos

Serverless Office Hours – Tues 10AM PT

Weekly office hours live stream. In each session we talk about a specific topic or technology related to serverless and open it up to helping you with your real serverless challenges and issues. Ask us anything you want about serverless technologies and applications.

January

Jan 10 – Building .NET 7 high performance Lambda functions

Jan 17 – Amazon Managed Workflows for Apache Airflow at Scale

Jan 24 – Using Terraform with AWS SAM

Jan 31 – Preparing your serverless architectures for the big day

February

Feb 07- Visually design and build serverless applications

Feb 14 – Multi-tenant serverless SaaS

Feb 21 – Refactoring to Serverless

Feb 28 – EDA visually explained

March

Mar 07 – Lambda cookbook with Python

Mar 14 – Succeeding with serverless

Mar 21 – Lambda Powertools .NET

Mar 28 – Server-side rendering micro-frontends

FooBar Serverless YouTube channel

Marcia Villalba frequently publishes new videos on her popular serverless YouTube channel. You can view all of Marcia’s videos at https://www.youtube.com/c/FooBar_codes.

January

Jan 12 – Serverless Badge – A new certification to validate your Serverless Knowledge

Jan 19 – Step functions Distributed map – Run 10k parallel serverless executions!

Jan 26 – Step Functions Intrinsic Functions – Do simple data processing directly from the state machines!

February

Feb 02 – Unlock the Power of EventBridge Pipes: Integrate Across Platforms with Ease!

Feb 09 – Amazon EventBridge Pipes: Enrichment and filter of events Demo with AWS SAM

Feb 16 – AWS App Runner – Deploy your apps from GitHub to Cloud in Record Time

Feb 23 – AWS App Runner – Demo hosting a Node.js app in the cloud directly from GitHub (AWS CDK)

March

Mar 02 – What is Amazon DynamoDB? What are the most important concepts? What are the indexes?

Mar 09 – Choreography vs Orchestration: Which is Best for Your Distributed Application?

Mar 16 – DynamoDB Single Table Design: Simplify Your Code and Boost Performance with Table Design Strategies

Mar 23 – 8 Reasons You Should Choose DynamoDB for Your Next Project and How to Get Started

Sessions with SAM & Friends

SAMFiends

AWS SAM & Friends

Eric Johnson is exploring how developers are building serverless applications. We spend time talking about AWS SAM as well as others like AWS CDK, Terraform, Wing, and AMPT.

Feb 16 – What’s new with AWS SAM

Feb 23 – AWS SAM with AWS CDK

Mar 02 – AWS SAM and Terraform

Mar 10 – Live from ServerlessDays ANZ

Mar 16 – All about AMPT

Mar 23 – All about Wing

Mar 30 – SAM Accelerate deep dive

Still looking for more?

The Serverless landing page has more information. The Lambda resources page contains case studies, webinars, whitepapers, customer stories, reference architectures, and even more Getting Started tutorials.

You can also follow the Serverless Developer Advocacy team on Twitter to see the latest news, follow conversations, and interact with the team.

Why Python keeps growing, explained

Post Syndicated from Rizel Scarlett original https://github.blog/2023-03-02-why-python-keeps-growing-explained/

Which programming language has been around for more than three decades and continues to grow in popularity each year?

If you guessed Python, you nailed it. In the 2022 Octoverse report, we found that Python remains the second most-used programming language on GitHub. Interestingly, Python’s use grew more than 22 percent year over year with more than four million developers on GitHub using it at some point in 2022.

In this article, we’ll dive into a brief history of Python, its benefits, its use cases, and seek to answer why a program language conceived in the 1980s continues to dominate development. And, since this is GitHub, we’ll also offer a few useful tips and tricks for developers new to—and experienced in—Python.

So, what is Python? 🤔

Python is a high-level, interpreted programming language with a simple syntax, which makes it easily readable and extremely user- and beginner-friendly. Originally built to satisfy Guido Van Rossum’s desire for a programming language that was simple to use and beautiful to look at, Python was first released to the world in 1991.

Fun fact: Python was named after the BBC TV show, “Monty Python’s Flying Circus.”

Since its development, it has grown to have widespread applicability for developers, data scientists, researchers, and more. But how, you may ask, can a coding language be simple and beautiful to look at? Here’s some proof:

Python

print("Hello world.")

vs.

Java

public class HelloWorld {
    public static void main (String[]args) {
      System.out.println.("Hello world");
    }
}

Since Python is a general-purpose language, it can be used in a variety of applications, and its uncomplicated nature makes it an excellent language for automating tasks, building websites or software, and analyzing data.

Python also has several other characteristics that make it popular amongst developers and engineers. These include:

  • It’s easy to read. Python code uses English keywords rather than punctuation, and its line breaks help define the code blocks. In practice, this means you can identify what the code is designed to do simply by looking at it.

  • It’s open source. You can download the source code, modify it, and use it however you want.

  • It’s portable. Some languages require you to modify code to run on different platforms, but Python is a cross-platform language, which means you can run the same code on any operating system with a Python interpreter.

  • It’s extendable. Python code can be written in other languages (such as C++), and users can add low-level modules to the Python interpreter to customize and optimize their tools.

  • It has a broad standard library. This library is available for anyone to access and means that users don’t have to write code for every single function—they can access built-in modules that help with issues in everyday programming and more.

What is Python commonly used for? 💻

Python can be used for just about anything, from web and software development to machine learning and artificial intelligence (AI). Let’s take a look at some of its most common use cases.

import antigravity

def main():
    antigravity.fly()

if __name__ == '__main__':
    main()

Run this command to check out an inside joke among Python developers.

Using Python for web and software development

Python is a popular language for web and software development because you can create complex, multi-protocol applications while maintaining concise, readable syntax. In fact, some of the most popular applications were built with Python. Plus, Python’s open source community provides developers with an extensive amount of reusable code, frameworks, and support. Case in point: Django is one of the most-used Python frameworks designed by experienced developers to help others accelerate their application build times and avoid issues that might balk their progress.

Using Python for task automation

One of Python’s key benefits is its ability to automate manual, repetitive tasks. With Python, you can learn how to automate just about anything by using either built-in modules or pre-written code from its robust library. Or you can write your own custom scripts to perform specific actions. For example, you can easily automate emails with the “smtplib” module or copy files with the “shutil” module. Python also has a robust set of testing frameworks, which makes it an excellent language for test automation. Frameworks such as Pytest, Behave, and Robot allow developers to write simple yet effective tests to ensure the quality of their builds.

Using Python for machine learning and data science

Here’s a fun fact: Python is the top preferred language for data science and research. Since its syntax is easily understandable and adaptable, people with little-to-no development experience can easily learn Python and use it to manipulate data for research, reporting, predictable or regression analyses, and more. Collecting and parsing data can be a time-consuming task for data scientists. Python is also one of the top languages for training machine learning (ML) models. Through specific algorithms, these models can analyze and identify patterns in data to make predictions or decisions based on that data. They also constantly evolve based on outputs of previous datasets to confront new variables. Data scientists and developers training ML models often utilize libraries, such as NumPy, Pandas, and Matplotlib, to automate functions like cleaning, data transformation, and visualization.

Using Python for financial analysis

Similar to how Python can assist data scientists with the heavy lift of large data sets, Python is widely used in the financial industry to quickly perform complex computations. Stock markets generate huge amounts of data, and Python can be used to import data on stock prices and generate strategies through algorithms to identify trading opportunities. The language can also be used for portfolio optimization, risk management, financial modeling and visualization, cryptocurrency analysis, and even fraud detection.

Using Python for and artificial intelligence

Python can also be found in some of the most complex, artificial intelligence (AI) technologies—and it’s actually one of the preferred languages for AI. Python’s concise and readable code allows developers to create consistent, reliable systems, and its vast library provides a number of frameworks like PyBrain, which offers developers powerful algorithms for machine learning tasks. Plus, Python’s visualization capabilities can help convert these large datasets for AI or ML into comprehensible graphs or reports. Interestingly enough, OpenAI, the artificial intelligence research lab, utilizes the Python framework, Pytorch, as their standard framework for deep learning, which trains its AI systems.

In addition to its relative simplicity to learn, there are a few other reasons why Python continues to consistently grow in popularity. These include:

  • It’s more productive. Compared to some other more complex programming languages like C++, Python’s syntax allows users to do more with less and cut down on time and effort to write the same lines of code.

  • It has an expansive, supportive community of users. Even the best developers run into problems— and this is where user communities can become an invaluable resource. Python has a huge community with documentation, tutorials, tips, and tricks to master the language. The Python community on GitHub, for example, offers everything from information on the latest version of the language to bug reports and update notes.

  • It’s academic. Python has become the go-to language in academia with some students even encountering Python as early as elementary school. (Believe it or not, there are children’s picture books dedicated to Python.) While computer science students are often taught Python, its use extends beyond that discipline into other areas of STEM and academic research. For example, Python can be used to solve differential equations, perform statistical analyses, simulate and track particle diffusion, and more.

  • It has high corporate demand. Because of its wide scale applicability in development and data analysis work, learning and knowing Python is often considered a top-skill among job seekers. According to Statista, Python was the third most demanded language in 2022 by recruiters worldwide.

The bottom line

Python is everywhere—and it’s been used to build a significant number of the technologies, websites, and even systems most people encounter on a daily basis. It powers everything from your favorite video streaming service to the ML algorithms that can help you make your next cryptocurrency trade. And for an even broader scope example (pun absolutely intended), NASA uses Python to power data analysis with its sophisticated James Webb Space Telescope, which makes it one of the few programming languages that is, quite literally, out of this world. 🚀

How to get started with Python 📓

A quick Google search will yield hundreds of resources out there to jumpstart your Python journey—and that can quickly get a little overwhelming. To simplify things, here are a few helpful GitHub repositories to help you get started with Python:

To get started, download the latest version of Python.

Start building on GitHub today

GitHub offers two easier ways to start working with Python: GitHub Codespaces and GitHub Copilot.

You can start building today for free with GitHub Codespaces, which every developer on GitHub gets 60 free hours of use time per month to spin up a development environment in the cloud from any device at speed. Check out the Django quick start template to begin coding right in your browser!

You can also use GitHub Copilot, GitHub’s AI pair programmer, to write your first lines of Python. Here’s how:

  1. Install the GitHub Copilot extension into your code editor.
  2. Describe the purpose of your project in a comment.
  3. Write a comment describing which libraries you may need.
  4. Start tabbing and let GitHub Copilot suggest lines of code to help you learn new techniques or methods.

From machine learning to data analysis, Python’s versatility allows it to continue its explosive growth with developers and non-developers alike. Experiment with Python through GitHub or on your local machine to be part of this growth and get started today!

Build a semantic search engine for tabular columns with Transformers and Amazon OpenSearch Service

Post Syndicated from Kachi Odoemene original https://aws.amazon.com/blogs/big-data/build-a-semantic-search-engine-for-tabular-columns-with-transformers-and-amazon-opensearch-service/

Finding similar columns in a data lake has important applications in data cleaning and annotation, schema matching, data discovery, and analytics across multiple data sources. The inability to accurately find and analyze data from disparate sources represents a potential efficiency killer for everyone from data scientists, medical researchers, academics, to financial and government analysts.

Conventional solutions involve lexical keyword search or regular expression matching, which are susceptible to data quality issues such as absent column names or different column naming conventions across diverse datasets (for example, zip_code, zcode, postalcode).

In this post, we demonstrate a solution for searching for similar columns based on column name, column content, or both. The solution uses approximate nearest neighbors algorithms available in Amazon OpenSearch Service to search for semantically similar columns. To facilitate the search, we create features representations (embeddings) for individual columns in the data lake using pre-trained Transformer models from the sentence-transformers library in Amazon SageMaker. Finally, to interact with and visualize results from our solution, we build an interactive Streamlit web application running on AWS Fargate.

We include a code tutorial for you to deploy the resources to run the solution on sample data or your own data.

Solution overview

The following architecture diagram illustrates the two-stage workflow for finding semantically similar columns. The first stage runs an AWS Step Functions workflow that creates embeddings from tabular columns and builds the OpenSearch Service search index. The second stage, or the online inference stage, runs a Streamlit application through Fargate. The web application collects input search queries and retrieves from the OpenSearch Service index the approximate k-most-similar columns to the query.

Solution architecture

Figure 1. Solution architecture

The automated workflow proceeds in the following steps:

  1. The user uploads tabular datasets into an Amazon Simple Storage Service (Amazon S3) bucket, which invokes an AWS Lambda function that initiates the Step Functions workflow.
  2. The workflow begins with an AWS Glue job that converts the CSV files into Apache Parquet data format.
  3. A SageMaker Processing job creates embeddings for each column using pre-trained models or custom column embedding models. The SageMaker Processing job saves the column embeddings for each table in Amazon S3.
  4. A Lambda function creates the OpenSearch Service domain and cluster to index the column embeddings produced in the previous step.
  5. Finally, an interactive Streamlit web application is deployed with Fargate. The web application provides an interface for the user to input queries to search the OpenSearch Service domain for similar columns.

You can download the code tutorial from GitHub to try this solution on sample data or your own data. Instructions on the how to deploy the required resources for this tutorial are available on Github.

Prerequistes

To implement this solution, you need the following:

  • An AWS account.
  • Basic familiarity with AWS services such as the AWS Cloud Development Kit (AWS CDK), Lambda, OpenSearch Service, and SageMaker Processing.
  • A tabular dataset to create the search index. You can bring your own tabular data or download the sample datasets on GitHub.

Build a search index

The first stage builds the column search engine index. The following figure illustrates the Step Functions workflow that runs this stage.

Step functions workflow

Figure 2 – Step functions workflow – multiple embedding models

Datasets

In this post, we build a search index to include over 400 columns from over 25 tabular datasets. The datasets originate from the following public sources:

For the the full list of the tables included in the index, see the code tutorial on GitHub.

You can bring your own tabular dataset to augment the sample data or build your own search index. We include two Lambda functions that initiate the Step Functions workflow to build the search index for individual CSV files or a batch of CSV files, respectively.

Transform CSV to Parquet

Raw CSV files are converted to Parquet data format with AWS Glue. Parquet is a column-oriented format file format preferred in big data analytics that provides efficient compression and encoding. In our experiments, the Parquet data format offered significant reduction in storage size compared to raw CSV files. We also used Parquet as a common data format to convert other data formats (for example JSON and NDJSON) because it supports advanced nested data structures.

Create tabular column embeddings

To extract embeddings for individual table columns in the sample tabular datasets in this post, we use the following pre-trained models from the sentence-transformers library. For additional models, see Pretrained Models.

Model name Dimension Size (MB)
all-MiniLM-L6-v2 384 80
all-distilroberta-v1 768 290
average_word_embeddings_glove.6B.300d 300 420

The SageMaker Processing job runs create_embeddings.py(code) for a single model. For extracting embeddings from multiple models, the workflow runs parallel SageMaker Processing jobs as shown in the Step Functions workflow. We use the model to create two sets of embeddings:

  • column_name_embeddings – Embeddings of column names (headers)
  • column_content_embeddings – Average embedding of all the rows in the column

For more information about the column embedding process, see the code tutorial on GitHub.

An alternative to the SageMaker Processing step is to create a SageMaker batch transform to get column embeddings on large datasets. This would require deploying the model to a SageMaker endpoint. For more information, see Use Batch Transform.

Index embeddings with OpenSearch Service

In the final step of this stage, a Lambda function adds the column embeddings to a OpenSearch Service approximate k-Nearest-Neighbor (kNN) search index. Each model is assigned its own search index. For more information about the approximate kNN search index parameters, see k-NN.

Online inference and semantic search with a web app

The second stage of the workflow runs a Streamlit web application where you can provide inputs and search for semantically similar columns indexed in OpenSearch Service. The application layer uses an Application Load Balancer, Fargate, and Lambda. The application infrastructure is automatically deployed as part of the solution.

The application allows you to provide an input and search for semantically similar column names, column content, or both. Additionally, you can select the embedding model and number of nearest neighbors to return from the search. The application receives inputs, embeds the input with the specified model, and uses kNN search in OpenSearch Service to search indexed column embeddings and find the most similar columns to the given input. The search results displayed include the table names, column names, and similarity scores for the columns identified, as well as the locations of the data in Amazon S3 for further exploration.

The following figure shows an example of the web application. In this example, we searched for columns in our data lake that have similar Column Names (payload type) to district (payload). The application used all-MiniLM-L6-v2 as the embedding model and returned 10 (k) nearest neighbors from our OpenSearch Service index.

The application returned transit_district, city, borough, and location as the four most similar columns based on the data indexed in OpenSearch Service. This example demonstrates the ability of the search approach to identify semantically similar columns across datasets.

Web application user interface

Figure 3: Web application user interface

Clean up

To delete the resources created by the AWS CDK in this tutorial, run the following command:

cdk destroy --all

Conclusion

In this post, we presented an end-to-end workflow for building a semantic search engine for tabular columns.

Get started today on your own data with our code tutorial available on GitHub. If you’d like help accelerating your use of ML in your products and processes, please contact the Amazon Machine Learning Solutions Lab.


About the Authors

Kachi Odoemene is an Applied Scientist at AWS AI. He builds AI/ML solutions to solve business problems for AWS customers.

Taylor McNally is a Deep Learning Architect at Amazon Machine Learning Solutions Lab. He helps customers from various industries build solutions leveraging AI/ML on AWS. He enjoys a good cup of coffee, the outdoors, and time with his family and energetic dog.

Austin Welch is a Data Scientist in the Amazon ML Solutions Lab. He develops custom deep learning models to help AWS public sector customers accelerate their AI and cloud adoption. In his spare time, he enjoys reading, traveling, and jiu-jitsu.

Develop a serverless application in Python using Amazon CodeWhisperer

Post Syndicated from Rafael Ramos original https://aws.amazon.com/blogs/devops/develop-a-serverless-application-in-python-using-amazon-codewhisperer/

While writing code to develop applications, developers must keep up with multiple programming languages, frameworks, software libraries, and popular cloud services from providers such as AWS. Even though developers can find code snippets on developer communities, to either learn from them or repurpose the code, manually searching for the snippets with an exact or even similar use case is a distracting and time-consuming process. They have to do all of this while making sure that they’re following the correct programming syntax and best coding practices.

Amazon CodeWhisperer, a machine learning (ML) powered coding aide for developers, lets you overcome those challenges. Developers can simply write a comment that outlines a specific task in plain English, such as “upload a file to S3.” Based on this, CodeWhisperer automatically determines which cloud services and public libraries are best-suited for the specified task, it creates the specific code on the fly, and then it recommends the generated code snippets directly in the IDE. And this isn’t about copy-pasting code from the web, but generating code based on the context of your file, such as which libraries and versions you have, as well as the existing code. Moreover, CodeWhisperer seamlessly integrates with your Visual Studio Code and JetBrains IDEs so that you can stay focused and never leave the development environment. At the time of this writing, CodeWhisperer supports Java, Python, JavaScript, C#, and TypeScript.

In this post, we’ll build a full-fledged, event-driven, serverless application for image recognition. With the aid of CodeWhisperer, you’ll write your own code that runs on top of AWS Lambda to interact with Amazon Rekognition, Amazon DynamoDB, Amazon Simple Notification Service (Amazon SNS), Amazon Simple Queue Service (Amazon SQS), Amazon Simple Storage Service (Amazon S3), and third-party HTTP APIs to perform image recognition. The users of the application can interact with it by either sending the URL of an image for processing, or by listing the images and the objects present on each image.

Solution overview

To make our application easier to digest, we’ll split it into three segments:

  1. Image download – The user provides an image URL to the first API. A Lambda function downloads the image from the URL and stores it on an S3 bucket. Amazon S3 automatically sends a notification to an Amazon SNS topic informing that a new image is ready for processing. Amazon SNS then delivers the message to an Amazon SQS queue.
  2. Image recognition – A second Lambda function handles the orchestration and processing of the image. It receives the message from the Amazon SQS queue, sends the image for Amazon Rekognition to process, stores the recognition results on a DynamoDB table, and sends a message with those results as JSON to a second Amazon SNS topic used in section three. A user can list the images and the objects present on each image by calling a second API which queries the DynamoDB table.
  3. 3rd-party integration – The last Lambda function reads the message from the second Amazon SQS queue. At this point, the Lambda function must deliver that message to a fictitious external e-mail server HTTP API that supports only XML payloads. Because of that, the Lambda function converts the JSON message to XML. Lastly, the function sends the XML object via HTTP POST to the e-mail server.

The following diagram depicts the architecture of our application:

Architecture diagram depicting the application architecture. It contains the service icons with the component explained on the text above

Figure 1. Architecture diagram depicting the application architecture. It contains the service icons with the component explained on the text above.

Prerequisites

Before getting started, you must have the following prerequisites:

Configure environment

We already created the scaffolding for the application that we’ll build, which you can find on this Git repository. This application is represented by a CDK app that describes the infrastructure according to the architecture diagram above. However, the actual business logic of the application isn’t provided. You’ll implement it using CodeWhisperer. This means that we already declared using AWS CDK components, such as the API Gateway endpoints, DynamoDB table, and topics and queues. If you’re new to AWS CDK, then we encourage you to go through the CDK workshop later on.

Deploying AWS CDK apps into an AWS environment (a combination of an AWS account and region) requires that you provision resources that the AWS CDK needs to perform the deployment. These resources include an Amazon S3 bucket for storing files and IAM roles that grant permissions needed to perform deployments. The process of provisioning these initial resources is called bootstrapping. The required resources are defined in an AWS CloudFormation stack, called the bootstrap stack, which is usually named CDKToolkit. Like any CloudFormation stack, it appears in the CloudFormation console once it has been deployed.

After cloning the repository, let’s deploy the application (still without the business logic, which we’ll implement later on using CodeWhisperer). For this post, we’ll implement the application in Python. Therefore, make sure that you’re under the python directory. Then, use the cdk bootstrap command to bootstrap an AWS environment for AWS CDK. Replace {AWS_ACCOUNT_ID} and {AWS_REGION} with corresponding values first:

cdk bootstrap aws://{AWS_ACCOUNT_ID}/{AWS_REGION}

For more information about bootstrapping, refer to the documentation.

The last step to prepare your environment is to enable CodeWhisperer on your IDE. See Setting up CodeWhisperer for VS Code or Setting up Amazon CodeWhisperer for JetBrains to learn how to do that, depending on which IDE you’re using.

Image download

Let’s get started by implementing the first Lambda function, which is responsible for downloading an image from the provided URL and storing that image in an S3 bucket. Open the get_save_image.py file from the python/api/runtime/ directory. This file contains an empty Lambda function handler and the needed inputs parameters to integrate this Lambda function.

  • url is the URL of the input image provided by the user,
  • name is the name of the image provided by the user, and
  • S3_BUCKET is the S3 bucket name defined by our application infrastructure.

Write a comment in natural language that describes the required functionality, for example:

# Function to get a file from url

To trigger CodeWhisperer, hit the Enter key after entering the comment and wait for a code suggestion. If you want to manually trigger CodeWhisperer, then you can hit Option + C on MacOS or Alt + C on Windows. You can browse through multiple suggestions (if available) with the arrow keys. Accept a code suggestion by pressing Tab. Discard a suggestion by pressing Esc or typing a character.

For more information on how to work with CodeWhisperer, see Working with CodeWhisperer in VS Code or Working with Amazon CodeWhisperer from JetBrains.

You should get a suggested implementation of a function that downloads a file using a specified URL. The following image shows an example of the code snippet that CodeWhisperer suggests:

Screenshot of the code generated by CodeWhisperer on VS Code. It has a function called get_file_from_url with the implementation suggestion to download a file using the requests lib

Figure 2. Screenshot of the code generated by CodeWhisperer on VS Code. It has a function called get_file_from_url with the implementation suggestion to download a file using the requests lib.

Be aware that CodeWhisperer uses artificial intelligence (AI) to provide code recommendations, and that this is non-deterministic. The result you get in your IDE may be different from the one on the image above. If needed, fine-tune the code, as CodeWhisperer generates the core logic, but you might want to customize the details depending on your requirements.

Let’s try another action, this time to upload the image to an S3 bucket:

# Function to upload image to S3

As a result, CodeWhisperer generates a code snippet similar to the following one:

Screenshot of the code generated by CodeWhisperer on VS Code. It has a function called upload_image with the implementation suggestion to download a file using the requests lib and upload it to S3 using the S3 client

Figure 3. Screenshot of the code generated by CodeWhisperer on VS Code. It has a function called upload_image with the implementation suggestion to download a file using the requests lib and upload it to S3 using the S3 client.

Now that you have the functions with the functionalities to download an image from the web and upload it to an S3 bucket, you can wire up both functions in the Lambda handler function by calling each function with the correct inputs.

Image recognition

Now let’s implement the Lambda function responsible for sending the image to Amazon Rekognition for processing, storing the results in a DynamoDB table, and sending a message with those results as JSON to a second Amazon SNS topic. Open the image_recognition.py file from the python/recognition/runtime/ directory. This file contains an empty Lambda and the needed inputs parameters to integrate this Lambda function.

  • queue_url is the URL of the Amazon SQS queue to which this Lambda function is subscribed,
  • table_name is the name of the DynamoDB table, and
  • topic_arn is the ARN of the Amazon SNS topic to which this Lambda function is published.

Using CodeWhisperer, implement the business logic of the next Lambda function as you did in the previous section. For example, to detect the labels from an image using Amazon Rekognition, write the following comment:

# Detect labels from image with Rekognition

And as a result, CodeWhisperer should give you a code snippet similar to the one in the following image:

Screenshot of the code generated by CodeWhisperer on VS Code. It has a function called detect_labels with the implementation suggestion to use the Rekognition SDK to detect labels on the given image

Figure 4. Screenshot of the code generated by CodeWhisperer on VS Code. It has a function called detect_labels with the implementation suggestion to use the Rekognition SDK to detect labels on the given image.

You can continue generating the other functions that you need to fully implement the business logic of your Lambda function. Here are some examples that you can use:

  • # Save labels to DynamoDB
  • # Publish item to SNS
  • # Delete message from SQS

Following the same approach, open the list_images.py file from the python/recognition/runtime/ directory to implement the logic to list all of the labels from the DynamoDB table. As you did previously, type a comment in plain English:

# Function to list all items from a DynamoDB table

Other frequently used code

Interacting with AWS isn’t the only way that you can leverage CodeWhisperer. You can use it to implement repetitive tasks, such as creating unit tests and converting message formats, or to implement algorithms like sorting and string matching and parsing. The last Lambda function that we’ll implement as part of this post is to convert a JSON payload received from Amazon SQS to XML. Then, we’ll POST this XML to an HTTP endpoint.

Open the send_email.py file from the python/integration/runtime/ directory. This file contains an empty Lambda function handler. An event is a JSON-formatted document that contains data for a Lambda function to process. Type a comment with your intent to get the code snippet:

# Transform json to xml

As CodeWhisperer uses the context of your files to generate code, depending on the imports that you have on your file, you’ll get an implementation such as the one in the following image:

Screenshot of the code generated by CodeWhisperer on VS Code. It has a function called json_to_xml with the implementation suggestion to transform JSON payload into XML payload

Figure 5. Screenshot of the code generated by CodeWhisperer on VS Code. It has a function called json_to_xml with the implementation suggestion to transform JSON payload into XML payload.

Repeat the same process with a comment such as # Send XML string with HTTP POST to get the last function implementation. Note that the email server isn’t part of this implementation. You can mock it, or simply ignore this HTTP POST step. Lastly, wire up both functions in the Lambda handler function by calling each function with the correct inputs.

Deploy and test the application

To deploy the application, run the command cdk deploy --all. You should get a confirmation message, and after a few minutes your application will be up and running on your AWS account. As outputs, the APIStack and RekognitionStack will print the API Gateway endpoint URLs. It will look similar to this example:

Outputs:
...
APIStack.RESTAPIEndpoint01234567 = https://examp1eid0.execute-
api.{your-region}.amazonaws.com/prod/
  1. The first endpoint expects two string parameters: url (the image file URL to download) and name (the target file name that will be stored on the S3 bucket). Use any image URL you like, but remember that you must encode an image URL before passing it as a query string parameter to escape the special characters. Use an online URL encoder of your choice for that. Then, use the curl command to invoke the API Gateway endpoint:
curl -X GET 'https://examp1eid0.execute-api.eu-east-
2.amazonaws.com/prod?url={encoded-image-URL}&name={file-name}'

Replace {encoded-image-URL} and {file-name} with the corresponding values. Also, make sure that you use the correct API endpoint that you’ve noted from the AWS CDK deploy command output as mentioned above.

  1. It will take a few seconds for the processing to happen in the background. Once it’s ready, see what has been stored in the DynamoDB table by invoking the List Images API (make sure that you use the correct URL from the output of your deployed AWS CDK stack):
curl -X GET 'https://examp1eid7.execute-api.eu-east-2.amazonaws.com/prod'

After you’re done, to avoid unexpected charges to your account, make sure that you clean up your AWS CDK stacks. Use the cdk destroy command to delete the stacks.

Conclusion

In this post, we’ve seen how to get a significant productivity boost with the help of ML. With that, as a developer, you can stay focused on your IDE and reduce the time that you spend searching online for code snippets that are relevant for your use case. Writing comments in natural language, you get context-based snippets to implement full-fledged applications. In addition, CodeWhisperer comes with a mechanism called reference tracker, which detects whether a code recommendation might be similar to particular CodeWhisperer training data. The reference tracker lets you easily find and review that reference code and see how it’s used in the context of another project. Lastly, CodeWhisperer provides the ability to run scans on your code (generated by CodeWhisperer as well as written by you) to detect security vulnerabilities.

During the preview period, CodeWhisperer is available to all developers across the world for free. Get started with the free preview on JetBrains, VS Code or AWS Cloud9.

About the author:

Rafael Ramos

Rafael is a Solutions Architect at AWS, where he helps ISVs on their journey to the cloud. He spent over 13 years working as a software developer, and is passionate about DevOps and serverless. Outside of work, he enjoys playing tabletop RPG, cooking and running marathons.

Caroline Gluck

Caroline is an AWS Cloud application architect based in New York City, where she helps customers design and build cloud native data science applications. Caroline is a builder at heart, with a passion for serverless architecture and machine learning. In her spare time, she enjoys traveling, cooking, and spending time with family and friends.

Jason Varghese

Jason is a Senior Solutions Architect at AWS guiding enterprise customers on their cloud migration and modernization journeys. He has served in multiple engineering leadership roles and has over 20 years of experience architecting, designing and building scalable software solutions. Jason holds a bachelor’s degree in computer engineering from the University of Oklahoma and an MBA from the University of Central Oklahoma.

Dmitry Balabanov

Dmitry is a Solutions Architect with AWS where he focuses on building reusable assets for customers across multiple industries. With over 15 years of experience in designing, building, and maintaining applications, he still loves learning new things. When not at work, he enjoys paragliding and mountain trekking.

Take part in the Hour of Code

Post Syndicated from Liz Smart original https://www.raspberrypi.org/blog/hour-of-code-activities/

Launched in 2013, Hour of Code is an initiative to introduce young people to computer science using fun one-hour tutorials. To date, over 100 million young people have completed an hour of code with it. 

A girl doing a physical computing project.

Although the Hour of Code website is accessible all year round, every December for Computer Science Education Week people worldwide run their own Hour of Code events. Each year we love seeing many Code Clubs, CoderDojos, and young people at home across the community complete their Hour of Code. You can register your 2022 Hour of Code event now to run between 5 and 11 December. 

To support your event, we have pulled together a bumper set of our free coding projects, which can each be completed in just one hour. You will find these activities on the Hour of Code website.

Two young digital makers using Raspberry Pi

There’s something for all ages and levels of experience, so put an hour aside and help young people make something fabulous with code:

Ages 7–11

Beginner

For younger creators new to coding, a Scratch project is a great place to start. 

alt=""

With our Space talk project, they can create a space scene with characters that ‘emote’ to share their thoughts or feelings using sounds, colours, and actions. Creators program the character emotes using Scratch blocks to control graphic effects, costume animation, and sound effects. 

Alternatively, our Stress ball project lets them code an onscreen stress ball that reacts to user clicks. Creators use the Paint and Sound editors in Scratch to personalise a clickable stress ball, and they add Scratch blocks to control graphic effects, costume animation, and sound effects. 

We love this fun stress ball example sent to us recently by young creator April from the United States:

Another great option is to use Code Club World, which is a free tool to help children who are new to coding.  

Creators can develop a character avatar, design a T-shirt, make some music, and more.

Comfortable

For 7- to 11-year-olds who are more comfortable with block-based coding, our project Broadcasting spells is ideal to choose. With the project, they connect Scratch blocks to code a wand that casts spells turning sprites into toads, and growing and shrinking them. Creators use broadcast blocks to transform multiple sprites at once, and they create sound effects with the Sound editor in Scratch. 

alt=""

Ages 11–14

Beginner

We have three exciting projects for trying text-based coding during Hour of Code in this category. The first, Anime expressions, is one of our brand-new ‘Introduction to web development’ projects. With this project, young people create a responsive webpage with text and images for an anime drawing tutorial. They write HTML to structure the webpage and CSS styles to apply layout, colour palettes, and fonts. 

For a great introduction to coding with Python, we have the project Hello world from our ‘Introduction to Python’ path. With this project, creators write Python text-based code to create an interactive program that shows text and emojis based on user input. They learn about variables as they use them to store text and numbers, and they learn about writing functions to organise code and do calculations, retrieve the current date and time, and make a customisable dice. 

alt=""

LED firefly is a fantastic physical making project in which young people use a Raspberry Pi Pico microcontroller and basic electronic components to create a blinking LED firefly. They program the LED’s light patterns with MicroPython code and activate it via a switch they make themselves using jumper wires.

A blinking LED with paper wings.

Comfortable

For 11- to 14-year-olds who are already comfortable with HTML, the Flip treat webcards project is a fun option. With this, they create a webpage showing a set of cards that flip when a visitor’s mouse pointer hovers over them. Creators use CSS styling and animations to add interactivity, then they customise the cards with fancy fonts and colour gradients.

Young people who have already done some Python coding can try out our project Target practice. With this project they create a game, using the p5 graphics library to draw a colourful target, and writing code so that the player scores points by hitting the target’s rings with arrows. While they create the project, they learn about RGB colours, shape positioning with x and y coordinates, and decisions using if, else-if, and else code statements. 

Ages 14+

Beginner

Our project Charting champions is a great introduction to data visualisation and analysis for coders aged 15 and older. With the project, they will discover the power of the Python programming language as they store Olympic medal data in lists and use the pygal library to create an interactive chart.

alt=""

Comfortable

Teenage coders who feel comfortable with Python programming can use our project Solar system simulator to code an animated, interactive solar system model using the Python p5 graphics library. Their model will be interactive, as they’ll use dictionaries to store planet facts that display when a user clicks on an orbiting planet.

Coding for Hour of Code and beyond

Now is the time to register your Hour of Code event, then decide which project you’d like to support young people to create. You can download certificates for each of the creators from the Hour of Code certificates page.

And make sure to check out our project paths so you know what projects you can help the young people you support to code beyond this one hour of code. 

We don’t just create activities so that other people can experience coding and digital making — we also get involved ourselves!

Two members of the Code Club working at computers.

Recently, our teams who support the Code Club and CoderDojo networks got together to make LED fireflies. We are excited to get coding again as part of Hour of Code and Computer Science Education Week.

The post Take part in the Hour of Code appeared first on Raspberry Pi.

Enrich VPC Flow Logs with resource tags and deliver data to Amazon S3 using Amazon Kinesis Data Firehose

Post Syndicated from Chaitanya Shah original https://aws.amazon.com/blogs/big-data/enrich-vpc-flow-logs-with-resource-tags-and-deliver-data-to-amazon-s3-using-amazon-kinesis-data-firehose/

VPC Flow Logs is an AWS feature that captures information about the network traffic flows going to and from network interfaces in Amazon Virtual Private Cloud (Amazon VPC). Visibility to the network traffic flows of your application can help you troubleshoot connectivity issues, architect your application and network for improved performance, and improve security of your application.

Each VPC flow log record contains the source and destination IP address fields for the traffic flows. The records also contain the Amazon Elastic Compute Cloud (Amazon EC2) instance ID that generated the traffic flow, which makes it easier to identify the EC2 instance and its associated VPC, subnet, and Availability Zone from where the traffic originated. However, when you have a large number of EC2 instances running in your environment, it may not be obvious where the traffic is coming from or going to simply based on the EC2 instance IDs or IP addresses contained in the VPC flow log records.

By enriching flow log records with additional metadata such as resource tags associated with the source and destination resources, you can more easily understand and analyze traffic patterns in your environment. For example, customers often tag their resources with resource names and project names. By enriching flow log records with resource tags, you can easily query and view flow log records based on an EC2 instance name, or identify all traffic for a certain project.

In addition, you can add resource context and metadata about the destination resource such as the destination EC2 instance ID and its associated VPC, subnet, and Availability Zone based on the destination IP in the flow logs. This way, you can easily query your flow logs to identify traffic crossing Availability Zones or VPCs.

In this post, you will learn how to enrich flow logs with tags associated with resources from VPC flow logs in a completely serverless model using Amazon Kinesis Data Firehose and the recently launched Amazon VPC IP Address Manager (IPAM), and also analyze and visualize the flow logs using Amazon Athena and Amazon QuickSight.

Solution overview

In this solution, you enable VPC flow logs and stream them to Kinesis Data Firehose. This solution enriches log records using an AWS Lambda function on Kinesis Data Firehose in a completely serverless manner. The Lambda function fetches resource tags for the instance ID. It also looks up the destination resource from the destination IP using the Amazon EC2 API and IPAM, and adds the associated VPC network context and metadata for the destination resource. It then stores the enriched log records in an Amazon Simple Storage Service (Amazon S3) bucket. After you have enriched your flow logs, you can query, view, and analyze them in a wide variety of services, such as AWS Glue, Athena, QuickSight, Amazon OpenSearch Service, as well as solutions from the AWS Partner Network such as Splunk and Datadog.

The following diagram illustrates the solution architecture.

Architecture

The workflow contains the following steps:

  1. Amazon VPC sends the VPC flow logs to the Kinesis Data Firehose delivery stream.
  2. The delivery stream uses a Lambda function to fetch resource tags for instance IDs from the flow log record and add it to the record. You can also fetch tags for the source and destination IP address and enrich the flow log record.
  3. When the Lambda function finishes processing all the records from the Kinesis Data Firehose buffer with enriched information like resource tags, Kinesis Data Firehose stores the result file in the destination S3 bucket. Any failed records that Kinesis Data Firehose couldn’t process are stored in the destination S3 bucket under the prefix you specify during delivery stream setup.
  4. All the logs for the delivery stream and Lambda function are stored in Amazon CloudWatch log groups.

Prerequisites

As a prerequisite, you need to create the target S3 bucket before creating the Kinesis Data Firehose delivery stream.

If using a Windows computer, you need PowerShell; if using a Mac, you need Terminal to run AWS Command Line Interface (AWS CLI) commands. To install the latest version of the AWS CLI, refer to Installing or updating the latest version of the AWS CLI.

Create a Lambda function

You can download the Lambda function code from the GitHub repo used in this solution. The example in this post assumes you are enabling all the available fields in the VPC flow logs. You can use it as is or customize per your needs. For example, if you intend to use the default fields when enabling the VPC flow logs, you need to modify the Lambda function with the respective fields. Creating this function creates an AWS Identity and Access Management (IAM) Lambda execution role.

To create your Lambda function, complete the following steps:

  1. On the Lambda console, choose Functions in the navigation pane.
  2. Choose Create function.
  3. Select Author from scratch.
  4. For Function name, enter a name.
  5. For Runtime, choose Python 3.8.
  6. For Architecture, select x86_64.
  7. For Execution role, select Create a new role with basic Lambda permissions.
  8. Choose Create function.

Create Lambda Function

You can then see code source page, as shown in the following screenshot, with the default code in the lambda_function.py file.

  1. Delete the default code and enter the code from the GitHub Lambda function aws-vpc-flowlogs-enricher.py.
  2. Choose Deploy.

VPC Flow Logs Enricher function

To enrich the flow logs with additional tag information, you need to create an additional IAM policy to give Lambda permission to describe tags on resources from the VPC flow logs.

  1. On the IAM console, choose Policies in the navigation pane.
  2. Choose Create policy.
  3. On the JSON tab, enter the JSON code as shown in the following screenshot.

This policy gives the Lambda function permission to retrieve tags for the source and destination IP and retrieve the VPC ID, subnet ID, and other relevant metadata for the destination IP from your VPC flow log record.

  1. Choose Next: Tags.

Tags

  1. Add any tags and choose Next: Review.

  1. For Name, enter vpcfl-describe-tag-policy.
  2. For Description, enter a description.
  3. Choose Create policy.

Create IAM Policy

  1. Navigate to the previously created Lambda function and choose Permissions in the navigation pane.
  2. Choose the role that was created by Lambda function.

A page opens in a new tab.

  1. On the Add permissions menu, choose Attach policies.

Add Permissions

  1. Search for the vpcfl-describe-tag-policy you just created.
  2. Select the vpcfl-describe-tag-policy and choose Attach policies.

Create the Kinesis Data Firehose delivery stream

To create your delivery stream, complete the following steps:

  1. On the Kinesis Data Firehose console, choose Create delivery stream.
  2. For Source, choose Direct PUT.
  3. For Destination, choose Amazon S3.

Kinesis Firehose Stream Source and Destination

After you choose Amazon S3 for Destination, the Transform and convert records section appears.

  1. For Data transformation, select Enable.
  2. Browse and choose the Lambda function you created earlier.
  3. You can customize the buffer size as needed.

This impacts on how many records the delivery stream will buffer before it flushes it to Amazon S3.

  1. You can also customize the buffer interval as needed.

This impacts how long (in seconds) the delivery stream will buffer the incoming records from the VPC.

  1. Optionally, you can enable Record format conversion.

If you want to query from Athena, it’s recommended to convert it to Apache Parquet or ORC and compress the files with available compression algorithms, such as gzip and snappy. For more performance tips, refer to Top 10 Performance Tuning Tips for Amazon Athena. In this post, record format conversion is disabled.

Transform and Conver records

  1. For S3 bucket, choose Browse and choose the S3 bucket you created as a prerequisite to store the flow logs.
  2. Optionally, you can specify the S3 bucket prefix. The following expression creates a Hive-style partition for year, month, and day:

AWSLogs/year=!{timestamp:YYYY}/month=!{timestamp:MM}/day=!{timestamp:dd}/

  1. Optionally, you can enable dynamic partitioning.

Dynamic partitioning enables you to create targeted datasets by partitioning streaming S3 data based on partitioning keys. The right partitioning can help you to save costs related to the amount of data that is scanned by analytics services like Athena. For more information, see Kinesis Data Firehose now supports dynamic partitioning to Amazon S3.

Note that you can enable dynamic partitioning only when you create a new delivery stream. You can’t enable dynamic partitioning for an existing delivery stream.

Destination Settings

  1. Expand Buffer hints, compression and encryption.
  2. Set the buffer size to 128 and buffer interval to 900 for best performance.
  3. For Compression for data records, select GZIP.

S3 Buffer settings

Create a VPC flow log subscription

Now you create a VPC flow log subscription for the Kinesis Data Firehose delivery stream you created.

Navigate to AWS CloudShell or Terminal/PowerShell for a Mac or Windows computer and run the following AWS CLI command to enable the subscription. Provide your VPC ID for the parameter --resource-ids and delivery stream ARN for the parameter --log-destination.

aws ec2 create-flow-logs \ 
--resource-type VPC \ 
--resource-ids vpc-0000012345f123400d \ 
--traffic-type ALL \ 
--log-destination-type kinesis-data-firehose \ 
--log-destination arn:aws:firehose:us-east-1:123456789101:deliverystream/PUT-Kinesis-Demo-Stream \ 
--max-aggregation-interval 60 \ 
--log-format '${account-id} ${action} ${az-id} ${bytes} ${dstaddr} ${dstport} ${end} ${flow-direction} ${instance-id} ${interface-id} ${log-status} ${packets} ${pkt-dst-aws-service} ${pkt-dstaddr} ${pkt-src-aws-service} ${pkt-srcaddr} ${protocol} ${region} ${srcaddr} ${srcport} ${start} ${sublocation-id} ${sublocation-type} ${subnet-id} ${tcp-flags} ${traffic-path} ${type} ${version} ${vpc-id}'

If you’re running CloudShell for the first time, it will take a few seconds to prepare the environment to run.

After you successfully enable the subscription for your VPC flow logs, it takes a few minutes depending on the intervals mentioned in the setup to create the log record files in the destination S3 folder.

To view those files, navigate to the Amazon S3 console and choose the bucket storing the flow logs. You should see the compressed interval logs, as shown in the following screenshot.

S3 destination bucket

You can download any file from the destination S3 bucket on your computer. Then extract the gzip file and view it in your favorite text editor.

The following is a sample enriched flow log record, with the new fields in bold providing added context and metadata of the source and destination IP addresses:

{'account-id': '123456789101',
 'action': 'ACCEPT',
 'az-id': 'use1-az2',
 'bytes': '7251',
 'dstaddr': '10.10.10.10',
 'dstport': '52942',
 'end': '1661285182',
 'flow-direction': 'ingress',
 'instance-id': 'i-123456789',
 'interface-id': 'eni-0123a456b789d',
 'log-status': 'OK',
 'packets': '25',
 'pkt-dst-aws-service': '-',
 'pkt-dstaddr': '10.10.10.11',
 'pkt-src-aws-service': 'AMAZON',
 'pkt-srcaddr': '52.52.52.152',
 'protocol': '6',
 'region': 'us-east-1',
 'srcaddr': '52.52.52.152',
 'srcport': '443',
 'start': '1661285124',
 'sublocation-id': '-',
 'sublocation-type': '-',
 'subnet-id': 'subnet-01eb23eb4fe5c6bd7',
 'tcp-flags': '19',
 'traffic-path': '-',
 'type': 'IPv4',
 'version': '5',
 'vpc-id': 'vpc-0123a456b789d',
 'src-tag-Name': 'test-traffic-ec2-1', 'src-tag-project': ‘Log Analytics’, 'src-tag-team': 'Engineering', 'dst-tag-Name': 'test-traffic-ec2-1', 'dst-tag-project': ‘Log Analytics’, 'dst-tag-team': 'Engineering', 'dst-vpc-id': 'vpc-0bf974690f763100d', 'dst-az-id': 'us-east-1a', 'dst-subnet-id': 'subnet-01eb23eb4fe5c6bd7', 'dst-interface-id': 'eni-01eb23eb4fe5c6bd7', 'dst-instance-id': 'i-06be6f86af0353293'}

Create an Athena database and AWS Glue crawler

Now that you have enriched the VPC flow logs and stored them in Amazon S3, the next step is to create the Athena database and table to query the data. You first create an AWS Glue crawler to infer the schema from the log files in Amazon S3.

  1. On the AWS Glue console, choose Crawlers in the navigation pane.
  2. Choose Create crawler.

Glue Crawler

  1. For Name¸ enter a name for the crawler.
  2. For Description, enter an optional description.
  3. Choose Next.

Glue Crawler properties

  1. Choose Add a data source.
  2. For Data source¸ choose S3.
  3. For S3 path, provide the path of the flow logs bucket.
  4. Select Crawl all sub-folders.
  5. Choose Add an S3 data source.

Add Data source

  1. Choose Next.

Data source classifiers

  1. Choose Create new IAM role.
  2. Enter a role name.
  3. Choose Next.

Configure security settings

  1. Choose Add database.
  2. For Name, enter a database name.
  3. For Description, enter an optional description.
  4. Choose Create database.

Create Database

  1. On the previous tab for the AWS Glue crawler setup, for Target database, choose the newly created database.
  2. Choose Next.

Set output and scheduling

  1. Review the configuration and choose Create crawler.

Create crawler

  1. On the Crawlers page, select the crawler you created and choose Run.

Run crawler

You can rerun this crawler when new tags are added to your AWS resources, so that they’re available for you to query from the Athena database.

Run Athena queries

Now you’re ready to query the enriched VPC flow logs from Athena.

  1. On the Athena console, open the query editor.
  2. For Database, choose the database you created.
  3. Enter the query as shown in the following screenshot and choose Run.

Athena query

The following code shows some of the sample queries you can run:

Select * from awslogs where "dst-az-id"='us-east-1a'
Select * from awslogs where "src-tag-project"='Log Analytics' or "dst-tag-team"='Engineering' 
Select "srcaddr", "srcport", "dstaddr", "dstport", "region", "az-id", "dst-az-id", "flow-direction" from awslogs where "az-id"='use1-az2' and "dst-az-id"='us-east-1a'

The following screenshot shows an example query result of the source Availability Zone to the destination Availability Zone traffic.

Athena query result

You can also visualize various charts for the flow logs stored in the S3 bucket via QuickSight. For more information, refer to Analyzing VPC Flow Logs using Amazon Athena, and Amazon QuickSight.

Pricing

For pricing details, refer to Amazon Kinesis Data Firehose pricing.

Clean up

To clean up your resources, complete the following steps:

  1. Delete the Kinesis Data Firehose delivery stream and associated IAM role and policies.
  2. Delete the target S3 bucket.
  3. Delete the VPC flow log subscription.
  4. Delete the Lambda function and associated IAM role and policy.

Conclusion

This post provided a complete serverless solution architecture for enriching VPC flow log records with additional information like resource tags using a Kinesis Data Firehose delivery stream and Lambda function to process logs to enrich with metadata and store in a target S3 file. This solution can help you query, analyze, and visualize VPC flow logs with relevant application metadata because resource tags have been assigned to resources that are available in the logs. This meaningful information associated with each log record wherever the tags are available makes it easy to associate log information to your application.

We encourage you to follow the steps provided in this post to create a delivery stream, integrate with your VPC flow logs, and create a Lambda function to enrich the flow log records with additional metadata to more easily understand and analyze traffic patterns in your environment.


About the Authors

Chaitanya Shah is a Sr. Technical Account Manager with AWS, based out of New York. He has over 22 years of experience working with enterprise customers. He loves to code and actively contributes to AWS solutions labs to help customers solve complex problems. He provides guidance to AWS customers on best practices for their AWS Cloud migrations. He is also specialized in AWS data transfer and in the data and analytics domain.

Vaibhav Katkade is a Senior Product Manager in the Amazon VPC team. He is interested in areas of network security and cloud networking operations. Outside of work, he enjoys cooking and the outdoors.

Learn to program in Python with our online courses

Post Syndicated from Rosa Brown original https://www.raspberrypi.org/blog/learn-to-program-in-python-online-courses-for-teachers/

If you’re new to teaching programming or looking to build or refresh your programming knowledge, we have a free resource that is perfect for you. Our ‘Learn to program in Python’ online course pathway is for educators who want to develop their understanding of the text-based language Python. Each course is packed with information and activities to help you apply what you learn in your classroom teaching.

A computing teacher and a learner do physical computing in the primary school classroom.

Why learn to program in Python?

Writing a program in Python is very similar to writing in English, which makes starting to program much easier. Python is also a general-purpose programming language, so once you’ve learned the basics, you can use Python for lots of different programming activities.

That’s why Python is a perfect choice for learning to program, and why many of our educational resources involve Python. Our seven online Python courses cover aspects from taking your first steps into programming, to writing a program to control an electronic circuit, to learning about object-oriented programming.

With time and practice, you will be able to use Python programming to create unique solutions to problems, build helpful tools, and make things that are important to you.

How does the Python course pathway work? 

The courses in the pathway have been written by our educators and include advice and activities to help you teach programming in your classroom. You can reuse the course activities to explain programming concepts to your learners and get them to write programs themselves. Because you will have first-hand experience of the activities, you’ll be able to anticipate your learners’ difficulties and adapt your lessons to suit them.

In a computing classroom, a smiling girl raises her hand.

All the courses are designed to take three or four weeks to complete, based on you spending two hours a week on participating. You can have free time-limited access to each course for the length of time it’s designed to take to complete. For example, if it’s a four-week course, like ‘Programming 101’, you can sign up for free to get four weeks of access.

The seven courses in the Python path can be completed in any order you like, and you can choose the courses that match your interests and needs.

A room of educators at desktop computers.

Each course involves activities that help you create a programming project using the concepts that you’re learning about. These activities are designed to be a fun and interactive way to reinforce what you’ve learned and can also be used with your learners in the classroom.

Course spotlight: Programming 101

If programming is completely new to you, our ‘Programming 101’ course is the best place to start. In ‘Programming 101’, we use this definition of programming to start with the idea that programming is about you telling a computer what to do: 

“Programming is how you get computers to solve problems.” 

We see programming as a chance to think creatively about a problem and about all the different ways it could be solved. While you might be unfamiliar with terms like programming, algorithms, or selection, the ‘Programming 101’ course demonstrates how they touch on things that many of us know from other areas of our lives.

On the course, you will:

  • Learn about basic programming concepts such as sequencing and repetition
  • Start to write your own programs
  • Discover how to interpret error messages to find and fix mistakes in your programs

What will you make in the courses?

Through building an understanding of programming, you will see how you can write your own programs to make games, quizzes, physical computing projects, and more. Here’s look at some of the things you could make in three of the seven courses: 

  • Programming 101: Write your first program in Python to make a personal assistant bot. You’ll discover how to make the output of your program respond to the user’s input.  
alt=""
You’ll write a program to create personal assistant bot in the ‘Programming 101’ course for beginners.
  • Programming with GUIs: Build a game where players compare two sets of emoji to find the emoji that matches. To make this game, you’ll use what you learn in the course to design the layout of a graphic user interface (GUI) and make sure only one emoji appears twice. 
alt=""
You’ll make an interactive graphic game in the ‘Programming with GUIs’ course.
  • Object-oriented Programming: Create a text-based adventure game with a character on a quest through different rooms! You’ll discover how to write a program that reacts to user input, and how to write your own code to create more challenges within the game based on your ideas.    

So check out our courses and start gaining Python programming skills today!

Python programming resources for young people

If you want to help your learners develop their understanding of programming in Python, you’ll be interested in these free resources we’ve created for young people: 

Introduction to Python: Our guided project path for learners who are new to text-based programming. We have created these projects with young people around the age of 9 to 13 in mind. Each project takes one hour to complete, and learners can make their own fun programs while learning about Python.

More Python: Our guided project path for learners who want to move beyond the ‘Intro to Python’ path to write programs that contain charts, artwork, and more. We’ve written these projects for young people around the age of 10 to 13.

Isaac Computer Science: This learning platform we’ve created for GCSE and A level students (age 14 to 18) uses Python and other text-based languages to teach the programming concepts within England’s computer science curriculum.   

The post Learn to program in Python with our online courses appeared first on Raspberry Pi.

How do I start my child coding?

Post Syndicated from Marc Scott original https://www.raspberrypi.org/blog/how-do-i-start-my-child-coding/

You may have heard a lot about coding and how important it is for children to start learning about coding as early as possible. Computers have become part of our lives, and we’re not just talking about the laptop or desktop computer you might have in your home or on your desk at work. Your phone, your microwave, and your car are all controlled by computers, and those computers need instructions to tell them what to do. Coding, or computer programming, involves writing those instructions.

A boy types code at a CoderDojo coding club.

If children discover a love for coding, they will have an avenue to make the things they want to make; to write programs and build projects that they find useful, fun, or interesting. So how do you give your child the opportunity to learn about coding? We’ve listed some free resources and suggested activities below.

Scratch Junior 

If you have a young child under about 7 years of age, then a great place to begin is with ScratchJr. This is an app available on Android and iOS phones and tablets, that lets children learn the basics of programming, without having to worry about making mistakes.

ScratchJr programming interface.

Code Club World

The Raspberry Pi Foundation has developed a series of activities for young learners, on their journey to developing their computing skills. Code Club World provides a platform for children to play with code to design their own avatar, make it dance, and play music. Plus they can share their creations with other learners. 

“You could have a go too and discover Scratch together. The platform is designed for complete beginners and it is great fun to play with.”

Carol Thornhill, Engineering Science MA, Mathematics teacher

Scratch

For 7- to 11-year-old children, Scratch is a good way to begin their journey in coding, or to progress from ScratchJr. Like ScratchJr, Scratch is a block-based language, allowing children to assemble code to produce games, animations, stories, or even use some of the add-ons to interact with electronic devices and explore physical computing.

A girl with her Scratch project
A girl with a Scratch project she has coded.

The Raspberry Pi Foundation has hundreds of Scratch projects that your child can try out, but the best place to begin is with our Introduction to Scratch path, which will provide your child with the basic skills they need, and then encourage them to build projects that are relevant to them, culminating in their creation of their own interactive ebook.

Your child may never tire of Scratch, and that is absolutely fine — it is a fully functioning programming language that is surprisingly powerful, when you learn to understand everything it can do. Another advantage of Scratch is that it provides easy access to graphics, sounds, and interactivity that can be trickier to achieve in other programming languages.

Python 

If you’re looking for more traditional programming languages for your child to progress on to, especially when they reach 12 years of age or beyond, then we like to direct our young learners to the Python programming language and to the languages that the World Wide Web is built on, particularly HTML, CSS, and JavaScript.

Animation coded in Python of an archery target disk.
An animation coded using Python.

Our Python resources cover the basics of using the language, and then progress from there. Python is one of the most widely used languages when it comes to the fields of artificial intelligence and data science, and we have resources to support your child in learning about these fascinating aspects of technology. Our projects can even introduce your child to the world of electronics and physical computing with activities that use the inexpensive Raspberry Pi Pico, and a handful of electronic components, enabling your kids to create a wide variety of art installations and useful gadgets.

“Trying Python doesn’t mean you can’t go back to Scratch or switch between Scratch and Python for different purposes. I still use Scratch for some projects myself!”

Tracy Gardner, Computer Science PhD, former IBM Software Architect and currently a project writer at the Raspberry Pi Foundation

A young person codes at a Raspberry Pi computer.
Python is a great text-based programming language for young people to learn.

Coding projects

On our coding tutorials website we have many different projects to help your child learn coding and digital making. These range from beginner resources like the Introduction to Scratch path to more advanced activities such as the Introduction to Unity path, where children can learn how to make 3D worlds and games. 

“Our new project paths can be tackled by young creators on their own, without adult intervention. Paths are structured so that they build skills and confidence in the early stages, and then provide more open-ended tasks and inspirational ideas that creators can adapt or work from.”

Rik Cross, BSc (Hons), PGCE, former teacher and Director of Informal Learning at the Raspberry Pi Foundation

Web development 

The Web is integral to many of our lives, and we believe that it is important for children to have an understanding of the technology that drives it. That is why we have an Introduction to the Web path that allows children to develop their own web pages, focusing on the kinds of webpages that they want to build, be that sending a greeting card, telling a story, or creating a showcase of their projects.

A girl has fun learning to code at home on a tablet sitting on a sofa.
It’s empowering for children to learn to how the websites they visit are created with code.

Coding clubs 

Coding clubs are a great place for children to have fun and become more confident with coding, where they can learn through making and share their creations with each other. The Raspberry Pi Foundation operates the world’s largest network of coding clubs — CoderDojo and Code Club

“I have a new group of creators at my Code Club every year and my favourite part is when they realise they really can let their imagination run wild. You want to make an animation where a talking pineapple chases a snowman — absolutely. You want to make a piece of scalable art out of 1000 pixelated cartoon musical instruments — go right ahead. If you can code it, you can make it ”

Liz Smart, Code Club and CoderDojo mentor, former Solutions Architect and project writer for the Raspberry Pi Foundation

Three teenage girls at a laptop.
At Code Club and CoderDojo, many young people enjoy teaming up to code projects together.

Coding challenges 

Once your child has learnt some of the basics, they may enjoy entering a coding challenge! The European Astro Pi Challenge programme allows young people to write code and actually have it run on the International Space Station, and Coolest Projects gives children a chance to showcase their projects from across the globe.

A Coolest Projects participant
A girl with her coded creation at an in-person Coolest Projects showcase.

Free resources 

No matter what technology your child wants to engage with, there is a wealth of free resources and materials available from organisations such as the Raspberry Pi Foundation and Scratch Foundation, that prepare young people for 21st century life. Whether they want to become professional software engineers, tinker with some electronics, or just have a play around … encourage them to explore some coding projects, and see what they can learn, make, and do!


Author: Marc Scott, BSc (Hons) is a former Science, Computer Science, and Engineering teacher and the Content Lead for Projects at the Raspberry Pi Foundation.

The post How do I start my child coding? appeared first on Raspberry Pi.

Simplify and optimize Python package management for AWS Glue PySpark jobs with AWS CodeArtifact

Post Syndicated from Ashok Padmanabhan original https://aws.amazon.com/blogs/big-data/simplify-and-optimize-python-package-management-for-aws-glue-pyspark-jobs-with-aws-codeartifact/

Data engineers use various Python packages to meet their data processing requirements while building data pipelines with AWS Glue PySpark Jobs. Languages like Python and Scala are commonly used in data pipeline development. Developers can take advantage of their open-source packages or even customize their own to make it easier and faster to perform use cases, such as data manipulation and analysis. However, managing standardized packages can be cumbersome with multiple teams using different versions of packages, installing non-approved packages, and causing duplicate development effort due to the lack of visibility of what is available at the enterprise level. This can be especially challenging in large enterprises with multiple data engineering teams.

ETL Developers have requirements to use additional packages for their AWS Glue ETL jobs. With security being job zero for customers, many will restrict egress traffic from their VPC to the public internet, and they need a way to manage the packages used by applications including their data processing pipelines.

Our proposed solution will enable you with network egress restrictions to manage packages centrally with AWS CodeArtifact and use their favorite libraries in their AWS Glue ETL PySpark code. In this post, we’ll describe how CodeArtifact can be used for managing packages and modules for AWS Glue ETL jobs, and we’ll demo a solution using Glue PySpark jobs that run within VPC Subnets that have no internet access.

Solution overview

The solution uses CodeArtifact as a tool to make it easier for organizations of any size to securely store, publish, and share software packages used in their ETL with AWS Glue. VPC Endpoints will be enabled for CodeArtifact and Glue to enable private link connections. AWS Step Functions makes it easy to coordinate the orchestration of components used in the data processing pipeline. Native integrations with both CodeArtifact and AWS Glue enable the workflow to both authenticate the request to CodeArtifact and start the AWS Glue ETL job.

The following architecture shows an implementation of a solution using AWS Glue, CodeArtifact, and Step Functions to use additional Python modules without egress internet access. The solution is deployed using AWS Cloud Development Kit (AWS CDK), an open-source software development framework to define your cloud application resources using familiar programming languages.

Solution Architecture for the blog post

Fig 1: Architecture Diagram for the Solution

To illustrate how to set up this architecture, we’ll walk you through the following steps:

  1. Deploying an AWS CDK stack to provision the following AWS Resources
    1. CodeArtifact
    2. An AWS Glue job
    3. Step Functions workflow
    4. Amazon Simple Storage Service (Amazon S3) bucket
    5. A VPC with a private Subnet and VPC Endpoints to Amazon S3 and CodeArtifact
  2. Validate the Deployment.
  3. Run a Sample Workflow – This workflow will run an AWS Glue PySpark job that uses a custom Python library, and an upgraded version of boto3.
  4. Cleaning up your resources.

Prerequisites

Make sure that you complete the following steps as prerequisites:

The solution

Launching your AWS CDK Stack

Step 1: Using your device’s command line, check out our Git repository to a local directory on your device:

git clone https://github.com/aws-samples/python-lib-management-without-internet-for-aws-glue-in-private-subnets.git

Step 2: Change directories to the new directory Amazon S3 script location:

cd python-lib-management-without-internet-for-aws-glue-in-private-subnets/scripts/s3

Step 3: Download the following CSV, which contains New York City Taxi and Limousine Commission (TLC) Trip weekly trips. This will serve as the input source for the AWS Glue Job:

aws s3 cp s3://nyc-tlc/misc/FOIL_weekly_trips_apps.csv .

Step 4: Change the directories to the path where the app.py file is located (in reference to the previous step, execute the following step):

cd ../..

Step 5: Create a virtual environment:

macOS/Linux:
python3 -m venv .env

Windows:
python -m venv .env

Step 6: Activate the virtual environment after the init process completes and the virtual environment is created:

macOS/Linux:
source .env/bin/activate

Windows:
.env\Scripts\activate.bat

Step 7: Install the required dependencies:

pip3 install -r requirements.txt

Step 8: Make sure that your AWS profile is setup along with the region that you want to deploy as mentioned in the prerequisite. Synthesize the templates. AWS CDK apps use code to define the infrastructure, and when run they produce or “synthesize” a CloudFormation template for each stack defined in the application:

cdk synthesize

Step 9: BootStrap the cdk app using the following command:

cdk bootstrap aws://<AWS_ACCOUNTID>/<AWS_REGION>

Replace the place holder AWS_ACCOUNTID and AWS_REGION with your AWS account ID and the region to be deployed.

This step provisions the initial resources, including an Amazon S3 bucket for storing files and IAM roles that grant permissions needed to perform deployments.

Step 10: Deploy the solution. By default, some actions that could potentially make security changes require approval. In this deployment, you’re creating an IAM role. The following command overrides the approval prompts, but if you would like to manually accept the prompts, then omit the --require-approval never flag:

cdk deploy "*" --require-approval never

While the AWS CDK deploys the CloudFormation stacks, you can follow the deployment progress in your terminal:

AWS CDK Deployment progress in terminal

Fig 2: AWS CDK Deployment progress in terminal

Once the deployment is successful, you’ll see the successful status as follows:

AWS CDK Deployment completion success

Fig 3: AWS CDK Deployment completion success

Step 11: Log in to the AWS Console, go to CloudFormation, and see the output of the ApplicationStack stack:

AWS CloudFormation stack output

Fig 4: AWS CloudFormation stack output

Note the values of the DomainName and RepositoryName variables. We’ll use them in the next step to upload our artifacts

Step 12: We will upload a custom library into the repo that we created. This will be used by our Glue ETL job.

  • Install twine using pip:
python3 -m pip install twine

The custom python package glueutils-0.2.0.tar.gz can be found under this folder of the cloned repo:

cd scripts/custom_glue_library
  • Configure twine with the login command (additional details here ). Refer to step 11 for the DomainName and RepositoryName from the CloudFormation output:
aws codeartifact login --tool twine --domain <DomainName> --domain-owner <AWS_ACCOUNTID> --repository <RepositoryName>
  • Publish Python package assets:
twine upload --repository codeartifact glueutils-0.2.0.tar.gz
Python package publishing using twine

Fig 5: Python package publishing using twine

Validate the Deployment

The AWS CDK stack will deploy the following AWS resources:

  1. Amazon Virtual Private Cloud (Amazon VPC)
    1. One Private Subnet
  2. AWS CodeArtifact
    1. CodeArtifact Repository
    2. CodeArtifact Domain
    3. CodeArtifact Upstream Repository
  3. AWS Glue
    1. AWS Glue Job
    2. AWS Glue Database
    3. AWS Glue Connection
  4. AWS Step Function
  5. Amazon S3 Bucket for AWS CDK and also for storing scripts and CSV file
  6. IAM Roles and Policies
  7. Amazon Elastic Compute Cloud (Amazon EC2) Security Group

Step 1: Browse to the AWS account and region via the AWS Console to which the resources are deployed.

Step 2: Browse the Subnet page (https://<region> .console.aws.amazon.com/vpc/home?region=<region> #subnets:) (*Replace region with actual AWS Region to which your resources are deployed)

Step 3: Select the Subnet with name as ApplicationStack/enterprise-repo-vpc/Enterprise-Repo-Private-Subnet1

Step 4: Select the Route Table and validate that there are no Internet Gateway or NAT Gateway for routes to Internet, and that it’s similar to the following image:

Route table validation

Fig 6: Route table validation

Step 5: Navigate to the CodeArtifact console and review the repositories created. The enterprise-repo is your local repository, and pypi-store is the upstream repository connected to the PyPI, providing artifacts from pypi.org.

AWS CodeArifact repositories created

Fig 7: AWS CodeArifact repositories created

Step 6: Navigate to enterprise-repo and search for glueutils. This is the custom python package that we published.

AWS CodeArifact custom python package published

Fig 8: AWS CodeArifact custom python package published

Step 7: Navigate to Step Functions Console and review the enterprise-repo-step-function as follows:

AWS Step Functions workflow

Fig 9: AWS Step Functions workflow

The diagram shows how the Step Functions workflow will orchestrate the pattern.

  1. The first step CodeArtifactGetAuthorizationToken calls the getAuthorizationToken API to generate a temporary authorization token for accessing repositories in the domain (this token is valid for 15 mins.).
  2. The next step GenerateCodeArtifactURL takes the authorization token from the response and generates the CodeArtifact URL.
  3. Then, this will move into the GlueStartJobRun state, which makes a synchronous API call to run the AWS Glue job.

Step 8: Navigate to the AWS Glue Console and select the Jobs tab, then select enterprise-repo-glue-job.

The AWS Glue job is created with the following script and AWS Glue Connection enterprise-repo-glue-connection. The AWS Glue connection is a Data Catalog object that enables the job to connect to sources and APIs from within the VPC. The network type connection runs the job from within the private subnet to make requests to Amazon S3 and CodeArtifact over the VPC endpoint connection. This enables the job to run without any traffic through the internet.

Note the connections section in the AWS Glue PySpark Job, which makes the Glue job run on the private subnet in the VPC provisioned.

AWS Glue network connections

Fig 10: AWS Glue network connections

The job takes an Amazon S3 bucket, Glue Database, Python Job Installer Option, and Additional Python Modules as job parameters. The parameters --additional-python-modules and --python-modules-installer-option are passed to install the selected Python module from a PyPI repository hosted in AWS CodeArtifact.

The script itself first reads the Amazon S3 input path of the taxi data in the CSV format. A light transformation to sum the total trips by year, week, and app is performed. Then the output is written to an Amazon S3 path as parquet . A partitioned table in the AWS Glue Data Catalog will either be created or updated if it already exists .

You can find the Glue PySpark script here.

Run a sample workflow

The following steps will demonstrate how to run a sample workflow:

Step 1: Navigate to the Step Functions Console and select the enterprise-repo-step-function.

Step 2: Select Start execution and input the following: We’re including the glueutils and latest boto3 libraries as part of the job run. It is always recommended to pin your python dependencies to avoid any breaking change due to a future version of dependency . In the below example, the latest available version of boto3, and the 0.2.0 version of glueutils will be installed. To pin it to a specific release you may add  boto3==1.24.2   (Current latest release at the time of publishing this post).

{"pythonmodules": "boto3,glueutils==0.2.0"}

Step 3: Select Start execution and wait until Execution Status is Succeeded. This may take a few minutes.

Step 4: Navigate to the CodeArtifact Console to review the enterprise-repo repository. You’ll see the cached PyPi packages and all of their dependencies pulled down from PyPi.

Step 5: In the Glue Console under the Runs section of the enterprise-glue-job, you’ll see the parameters passed:

Fig 11 : AWS Glue job execution history

Fig 11 : AWS Glue job execution history

Note the --index-url which was passed as a parameter to the glue ETL job. The token is valid only for 15 minutes.

Step 6: Navigate to the Amazon CloudWatch Console and go to the /aws/glue-jobs log group to verify that the packages were installed from the local repo.

You will see that the 2 package names passed as parameters are installed with the corresponding versions.

Fig 12 : Amazon CloudWatch logs details for the Glue job

Fig 12 : Amazon CloudWatch logs details for the Glue job

Step 7: Navigate to the Amazon Athena console and select Query Editor.

Step 8: Run the following query to validate the output of the AWS Glue job:

SELECT year, app, SUM(total_trips) as sum_of_total_trips 
FROM 
"codeartifactblog_glue_db"."taxidataparquet" 
GROUP BY year, app;

Clean up

Make sure that you clean up all of the other AWS resources that you created in the AWS CDK Stack deployment. You can delete these resources via the AWS CDK Destroy command as follows or the CloudFormation console.

To destroy the resources using AWS CDK, follow these steps:

  1. Follow Steps 1-6 from the ‘Launching your CDK Stack’ section.
  2. Destroy the app by executing the following command:
    cdk destroy

Conclusion

In this post, we demonstrated how CodeArtifact can be used for managing Python packages and modules for AWS Glue jobs that run within VPC Subnets that have no internet access. We also demonstrated how the versions of existing packages can be updated (i.e., boto3) and a custom Python library (glueutils) that is developed locally is also managed through CodeArtifact.

This post enables you to use your favorite Python packages with AWS Glue ETL PySpark jobs by modifying the input to the AWS StepFunctions workflow (Step 2 in the Run a Sample workflow section).


About the Authors

Bret Pontillo is a Data & ML Engineer with AWS Professional Services. He works closely with enterprise customers building data lakes and analytical applications on the AWS platform. In his free time, Bret enjoys traveling, watching sports, and trying new restaurants.

Gaurav Gundal is a DevOps consultant with AWS Professional Services, helping customers build solutions on the customer platform. When not building, designing, or developing solutions, Gaurav spends time with his family, plays guitar, and enjoys traveling to different places.

Ashok Padmanabhan is a Sr. IOT Data Architect with AWS Professional Services, helping customers build data and analytics platform and solutions. When not helping customers build and design data lakes, Ashok enjoys spending time at the beach near his home in Florida.

Python coding for kids: Moving beyond the basics

Post Syndicated from Rebecca Franks original https://www.raspberrypi.org/blog/python-coding-for-kids-beyond-the-basics/

We are excited to announce our second new Python learning path, ‘More Python’, which shows young coders how to add real data to their programs while creating projects from a chart of Olympic medals to an interactive world map. The six guided Python projects in this free learning path are designed to enable young people to independently create their own Python projects about the topics that matter to them.

A girl points excitedly at a project on the Raspberry Pi Foundation's projects site.
Two kids are at a laptop with one of our coding projects.

In this post, we’ll show you how kids use the projects in the ‘More Python’ path, what they can make by following the path, and how the path structure helps them become confident and independent digital makers.

Python coding for kids: Our learning paths

Our ‘Introduction to Python’ learning path is the perfect place to start learning how to use Python, a text-based programming language. When we launched the Intro path in February, we explained why Python is such a popular, useful, and accessible programming language for young people.

Because Python has so much to offer, we have created a second Python path for young people who have learned the basics in the first path. In this new set of six projects, learners will discover new concepts and see how to add different types of real data to their programs.

Illustration of different graph types
By following the ‘More Python’ path, young people learn the skills to independently create a data visualisation for a topic they are passionate about in the final project.

Key questions answered

Who is this path for?

We have written the projects in this path with young people around the age of 10 to 13 in mind. To code in a text-based language, a young person needs to be familiar with using a keyboard, due to the typing involved. Learners should have already completed the ‘Introduction to Python’ project path, as they will build on the learning from that path.

Three young tech creators show off their tech project at Coolest Projects.

How do young people learn with the projects? 

Young people need access to a web browser to complete our project paths. Each project contains step-by-step instructions for learners to follow, and tick boxes to mark when they complete each step. On top of that, the projects have steps for learners to:

  • Reflect on what they have covered in the project
  • Share their projects with others
  • See suggestions to upgrade their projects

Young people also have the option to sign up for an account with us so they can save their progress at any time and collect badges.

A young person codes at a Raspberry Pi computer.

While learners follow the project instructions in this project path, they write their code into Trinket, a free web-based coding platform accessible in a browser. Each project contains a link to a starter Trinket, which includes everything to get started writing Python code — no need to install any additional software.

Screenshot of Python code in the online IDE Trinket.
This is what Python code on Trinket looks like.

If they prefer, however, young people also have the option of instead writing their code in a desktop-based programming environment, such as Thonny, as they work through the projects.

What will young people learn?  

To use data in their Python programs, the project instructions show learners how to:

  • Create and use lists
  • Create and use dictionaries
  • Read data from a data file

The projects support learners as they explore new concepts of digital visual media and: 

  • Create charts using the Python library Pygal
  • Plot pins on a map
  • Create randomised artwork

In each project, learners reflect and answer questions about their work, which is important for connecting the project’s content to their pre-existing knowledge.

In a computing classroom, a girl laughs at what she sees on the screen.

As they work through the projects, learners see different ways to present data and then decide how they want to present their data in the final project in the path. You’ll find out what the projects are on the path page, or at the bottom of this blog post.

The project path helps learners become independent coders and digital makers, as each project contains slightly less support than the one before. You can read about how our project paths are designed to increase young people’s independence, and explore our other free learning paths for young coders

How long will the path take to complete?

We’ve designed the path to be completed in around six one-hour sessions, with one hour per project, at home, in school, or at a coding club. The project instructions encourage learners to add code to upgrade their projects and go further if they wish. This means that young people might want to spend a little more time getting their projects exactly as they imagine them.

In a classroom, a teacher and a student look at a computer screen while the student types on the keyboard.

What can young people do next?

Use Unity to create a 3D world

Unity is a free development environment for creating 3D virtual environments, including games, visual novels, and animations, all with the text-based programming language C#. Our ‘Introduction to Unity’ project path for keen coders shows how to make 3D worlds and games with collectibles, timers, and non-player characters.

Take part in Coolest Projects Global

At the end of the ‘More Python’ path, learners are encouraged to register a project they’ve made using their new coding skills for Coolest Projects Global, our free and world-leading online technology showcase for young tech creators. The project they register will become part of the online gallery, where members of the Coolest Projects community can celebrate each other’s creations.

A young coder shows off her tech project for Coolest Projects to two other young tech creators.

We welcome projects from all young people, whether they are beginners or experienced coders and digital makers. Coolest Projects Global is a unique opportunity for young people to share their ingenuity with the world and with other young people who love coding and creating with digital technology.

Details about the projects in ‘More Python’

The ‘More Python’ path is structured according to our Digital Making Framework, with three Explore project, two Design projects, and a final Invent project.

Explore project 1: Charting champions

Illustration of a fast-moving, smiling robot wearing a champion's rosette.

In this Explore project, learners discover the power of lists in Python by creating an interactive chart of Olympic medals. They learn how to read data from a text file and then present that data as a bar chart.

Explore project 2: Solar system

Illustration of our solar system.

In this Explore project, learners create a simulation of the solar system. They revisit the drawing and animation skills that they learned in the ‘Introduction to Python’ project path to produce animated planets orbiting the sun. The animation is based on real data taken from a data file to simulate the speed that the planets move at as they orbit. The simulation is also interactive, using dictionaries to display data about the planets that have been selected.

Explore project 3: Codebreaker

Illustration of a person thinking about codebreaking.

The final Explore project gets learners to build on their knowledge of lists and dictionaries by creating a program that encodes and decodes a message using an Atbash cipher. The Atbash cipher was originally developed in the Hebrew language. It takes the alphabet and matches it to its reverse order to create a secret message. They also create a script that checks how many times certain letters have been used in an encoded message, so that they can discover patterns.

Design project 1: Encoded art

Illustration of a robot painting a portrait of another robot.

The first Design project allows learners to create fun pieces of artwork by encoding the letters of their name into images, patterns, or drawings. Learners can choose the images that will be produced for each letter, and whether these appear at random or in a geometric pattern.

Learners are encouraged to share their encoded artwork in the community library, where there are lots of fun projects to discover already. In this project, learners apply all of the coding skills and knowledge covered in the Explore projects, including working with dictionaries and lists.

Design project 2: Mapping data

Illustration of a map and a hand of someone marking it with a large pin.

In the next Design project, learners access data from a data file and use it to create location pins on a world map. They have six datasets to choose from, so they can use one that interests them. They can also choose from a variety of maps and design their own pin to truly personalise their projects.

Invent project: Persuasive data presentation

Illustration of different graph types

This project is designed to use all of the skills and knowledge covered in this path, and most of the skills from the ‘Introduction to Python’ path. Learners can choose from eight datasets to create data visualisations. They are also given instructions on how to access and prepare other datasets if they want to visualise data about a different topic.

Once learners have chosen their dataset, they can decide how they want it to be displayed. This could be a chart, a map with pins, or a unique data visualisation. There are lots of example projects to provide inspiration for learners. One of our favourites is the ISS Expedition project, which places flags on the ISS depending on the expedition number you enter.

The post Python coding for kids: Moving beyond the basics appeared first on Raspberry Pi.

New for Amazon CodeGuru Reviewer – Detector Library and Security Detectors for Log-Injection Flaws

Post Syndicated from Danilo Poccia original https://aws.amazon.com/blogs/aws/new-for-amazon-codeguru-reviewer-detector-library-and-security-detectors-for-log-injection-flaws/

Amazon CodeGuru Reviewer is a developer tool that detects security vulnerabilities in your code and provides intelligent recommendations to improve code quality. For example, CodeGuru Reviewer introduced Security Detectors for Java and Python code to identify security risks from the top ten Open Web Application Security Project (OWASP) categories and follow security best practices for AWS APIs and common crypto libraries. At re:Invent, CodeGuru Reviewer introduced a secrets detector to identify hardcoded secrets and suggest remediation steps to secure your secrets with AWS Secrets Manager. These capabilities help you find and remediate security issues before you deploy.

Today, I am happy to share two new features of CodeGuru Reviewer:

  • A new Detector Library describes in detail the detectors that CodeGuru Reviewer uses when looking for possible defects and includes code samples for both Java and Python.
  • New security detectors have been introduced for detecting log-injection flaws in Java and Python code, similar to what happened with the recent Apache Log4j vulnerability we described in this blog post.

Let’s see these new features in more detail.

Using the Detector Library
To help you understand more clearly which detectors CodeGuru Reviewer uses to review your code, we are now sharing a Detector Library where you can find detailed information and code samples.

These detectors help you build secure and efficient applications on AWS. In the Detector Library, you can find detailed information about CodeGuru Reviewer’s security and code quality detectors, including descriptions, their severity and potential impact on your application, and additional information that helps you mitigate risks.

Note that each detector looks for a wide range of code defects. We include one noncompliant and compliant code example for each detector. However, CodeGuru uses machine learning and automated reasoning to identify possible issues. For this reason, each detector can find a range of defects in addition to the explicit code example shown on the detector’s description page.

Let’s have a look at a few detectors. One detector is looking for insecure cross-origin resource sharing (CORS) policies that are too permissive and may lead to loading content from untrusted or malicious sources.

Detector Library screenshot.

Another detector checks for improper input validation that can enable attacks and lead to unwanted behavior.

Detector Library screenshot.

Specific detectors help you use the AWS SDK for Java and the AWS SDK for Python (Boto3) in your applications. For example, there are detectors that can detect hardcoded credentials, such as passwords and access keys, or inefficient polling of AWS resources.

New Detectors for Log-Injection Flaws
Following the recent Apache Log4j vulnerability, we introduced in CodeGuru Reviewer new detectors that check if you’re logging anything that is not sanitized and possibly executable. These detectors cover the issue described in CWE-117: Improper Output Neutralization for Logs.

These detectors work with Java and Python code and, for Java, are not limited to the Log4j library. They don’t work by looking at the version of the libraries you use, but check what you are actually logging. In this way, they can protect you if similar bugs happen in the future.

Detector Library screenshot.

Following these detectors, user-provided inputs must be sanitized before they are logged. This avoids having an attacker be able to use this input to break the integrity of your logs, forge log entries, or bypass log monitors.

Availability and Pricing
These new features are available today in all AWS Regions where Amazon CodeGuru is offered. For more information, see the AWS Regional Services List.

The Detector Library is free to browse as part of the documentation. For the new detectors looking for log-injection flaws, standard pricing applies. See the CodeGuru pricing page for more information.

Start using Amazon CodeGuru Reviewer today to improve the security of your code.

Danilo

Coding for kids: Art, games, and animations with our new beginners’ Python path

Post Syndicated from Rebecca Franks original https://www.raspberrypi.org/blog/coding-for-kids-art-games-animations-beginners-python-programming/

Python is a programming language that’s popular with learners and educators in clubs and schools. It also is widely used by professional programmers, particularly in the data science field. Many educators and young people like how similar the Python syntax is to the English language.

Two girls code together at a computer.

That’s why Python is often the first text-based language that young people learn to program in. The familiar syntax can lower the barrier to taking the first steps away from a block-based programming environment, such as Scratch.

In 2021, Python ranked in first place in an industry-standard popularity index of a major software quality assessment company, confirming its favoured position in software engineering. Python is, for example, championed by Google and used in many of its applications.

Coding for kids in Python

Python’s popularity means there are many excellent resources for learning this language. These resources often focus on creating programs that produce text outputs. We wanted to do something different.

Two young people code at laptops.

Our new ‘Introduction to Python’ project path focuses on creating digital visuals using the Python p5 library. This library is like a set of tools that allows you to get creative by using Python code to draw shapes, edit images, and create frame-by-frame animations. That makes it the perfect choice for young learners: they can develop their knowledge and skills in Python programming while creating cool visuals that they’ll be proud of. 

What is in the ‘Introduction to Python’ path?

The ‘Introduction to Python’ project path is designed according to our Digital Making Framework, encouraging learners to become independent coders and digital makers by gently removing scaffolding as they progress along the projects in a path. Paths begin with three Explore projects, in which learners are guided through tasks that introduce them to new coding skills. Next, learners complete two Design projects. Here, they are encouraged to practise their skills and bring in their own interests to personalise their coding creations. Finally, learners complete one Invent project. This is where they put everything that they have learned together and create something unique that matters to them.

""
Emoji, archery, rockets, art, and movement are all part of this Python path.

The structure of our Digital Making Framework means that learners experience the structured development process of a coding project and learn how to turn their ideas into reality. The Framework also supports with finding errors in their code (debugging), showing them that errors are a part of computer programming and just temporary setbacks that you can overcome. 

What coding skills and knowledge will young people learn?

The Explore projects are where the initial learning takes place. The key programming concepts covered in this path are:

  • Variables
  • Performing calculations with variables
  • Using functions
  • Using selection (if, elif and else)
  • Using repetition (for loops)
  • Using randomisation
  • Importing from libraries

Learners also explore aspects of digital visual media concepts:

  • Coordinates
  • RGB colours
  • Screen size
  • Layers
  • Frames and animation

Learners then develop these skills and knowledge by putting them into practice in the Design and Invent projects, where they add in their own ideas and creativity. 

Explore project 1: Hello world emoji

In the first Explore project of this path, learners create an interactive program that uses emoji characters as the visual element.

""

This is the first step into Python and gets learners used to the syntax for printing text, using variables, and defining functions.

Explore project 2: Target practice

In this Explore project, learners create an archery game. They are introduced to the p5 library, which they use to draw an archery board and create the arrows.

""

The new programming concept covered in this project is selection, where learners use if, elif and else to allocate points for the game.

Explore project 3: Rocket launch

The final Explore project gets learners to animate a rocket launching into space. They create an interactive animation where the user is asked to enter an amount of fuel for the rocket launch. The animation then shows if the fuel is enough to get the rocket into orbit.

""

The new programming concept covered here is repetition. Learners use for loops to animate smoke coming from the exhaust of the rocket.

Design project 1: Make a face

The first Design project allows learners to unleash their creativity by drawing a face using the Python coding skills that they have built in the Explore projects. They have full control of the design for their face and can explore three examples for inspiration.

""

Learners are also encouraged to share their drawings in the community library, where there are lots of fun projects to discover already. In this project, learners apply all of the coding skills and knowledge covered in the Explore projects, including selection, repetition, and variables.

Design project 2: Don’t collide!

In the second Design project, learners code a scrolling game called ‘Don’t collide’, where a character or vehicle moves down the screen while having to avoid obstacles.

""

Learners can choose their own theme for the game, and decide what will move down the screen and what the obstacles will look like. In this project, they also get to practice everything they learned in the Explore projects. 

Invent project: Powerful patterns

This project is the ultimate chance for learners to put all of their skills and knowledge into practice and get creative. They design their own unique patterns and create frame-by-frame animations.

""

The Invent project offers ingredients, which are short reminders of all the key skills that learners have gained while completing the previous projects in the path. The ingredients encourage them to be independent whilst also supporting them with code snippets to help them along.

Key questions answered

Who is the Introduction to Python path for?

We have written the projects in the path with young people around the age of 9 to 13 in mind. To code in a text-based language, a young person needs to be familiar with using a keyboard, due to the typing involved. A learner may have completed one of our Scratch paths prior to this one, but this isn’t essential. and we encourage beginner coders to take this path first if that is their choice.

A young person codes at a Raspberry Pi computer.

What software do learners need to code these projects?

A web browser. In every project, starter code is provided in a free web-based development environment called Trinket, where learners add their own code. The starter Trinkets include everything that learners need to use Python and access the p5 library.

If preferred, the projects also include instructions for using a desktop-based programming environment, such as Thonny.

How long will the path take to complete?

We’ve designed the path to be completed in around six one-hour sessions, with one hour per project. However, the project instructions encourage learners to upgrade their projects and go further if they wish. This means that young people might want to spend a little more time getting their projects exactly as they imagine them. 

What can young people do next after completing this path?

Taking part in Coolest Projects Global

At the end of the path, learners are encouraged to register a project they’re making with their new coding skills for Coolest Projects Global, our world-leading online technology showcase for young people.

Three young tech creators show off their tech project at Coolest Projects.

Taking part is free, all online, and beginners as well as more experienced young tech creators are welcome and invited. This is their unique opportunity to share their ingenuity in an online gallery for the world and the Coolest Projects community to celebrate.

Coding more Python projects with us

Coming very soon is our ‘More Python’ path. In this path, learners will move beyond the basics they learned in Introduction to Python. They will learn how to use lists, dictionaries, and files to create charts, models, and artwork. Keep your eye on our blog and social media for the release of ‘More Python’.

The post Coding for kids: Art, games, and animations with our new beginners’ Python path appeared first on Raspberry Pi.

Create a serverless event-driven workflow to ingest and process Microsoft data with AWS Glue and Amazon EventBridge

Post Syndicated from Venkata Sistla original https://aws.amazon.com/blogs/big-data/create-a-serverless-event-driven-workflow-to-ingest-and-process-microsoft-data-with-aws-glue-and-amazon-eventbridge/

Microsoft SharePoint is a document management system for storing files, organizing documents, and sharing and editing documents in collaboration with others. Your organization may want to ingest SharePoint data into your data lake, combine the SharePoint data with other data that’s available in the data lake, and use it for reporting and analytics purposes. AWS Glue is a serverless data integration service that makes it easy to discover, prepare, and combine data for analytics, machine learning, and application development. AWS Glue provides all the capabilities needed for data integration so that you can start analyzing your data and putting it to use in minutes instead of months.

Organizations often manage their data on SharePoint in the form of files and lists, and you can use this data for easier discovery, better auditing, and compliance. SharePoint as a data source is not a typical relational database and the data is mostly semi structured, which is why it’s often difficult to join the SharePoint data with other relational data sources. This post shows how to ingest and process SharePoint lists and files with AWS Glue and Amazon EventBridge, which enables you to join other data that is available in your data lake. We use SharePoint REST APIs with a standard open data protocol (OData) syntax. OData advocates a standard way of implementing REST APIs that allows for SQL-like querying capabilities. OData helps you focus on your business logic while building RESTful APIs without having to worry about the various approaches to define request and response headers, query options, and so on.

AWS Glue event-driven workflows

Unlike a traditional relational database, SharePoint data may or may not change frequently, and it’s difficult to predict the frequency at which your SharePoint server generates new data, which makes it difficult to plan and schedule data processing pipelines efficiently. Running data processing frequently can be expensive, whereas scheduling pipelines to run infrequently can deliver cold data. Similarly, triggering pipelines from an external process can increase complexity, cost, and job startup time.

AWS Glue supports event-driven workflows, a capability that lets developers start AWS Glue workflows based on events delivered by EventBridge. The main reason to choose EventBridge in this architecture is because it allows you to process events, update the target tables, and make information available to consume in near-real time. Because frequency of data change in SharePoint is unpredictable, using EventBridge to capture events as they arrive enables you to run the data processing pipeline only when new data is available.

To get started, you simply create a new AWS Glue trigger of type EVENT and place it as the first trigger in your workflow. You can optionally specify a batching condition. Without event batching, the AWS Glue workflow is triggered every time an EventBridge rule matches, which may result in multiple concurrent workflows running. AWS Glue protects you by setting default limits that restrict the number of concurrent runs of a workflow. You can increase the required limits by opening a support case. Event batching allows you to configure the number of events to buffer or the maximum elapsed time before firing the particular trigger. When the batching condition is met, a workflow run is started. For example, you can trigger your workflow when 100 files are uploaded in Amazon Simple Storage Service (Amazon S3) or 5 minutes after the first upload. We recommend configuring event batching to avoid too many concurrent workflows, and optimize resource usage and cost.

To illustrate this solution better, consider the following use case for a wine manufacturing and distribution company that operates across multiple countries. They currently host all their transactional system’s data on a data lake in Amazon S3. They also use SharePoint lists to capture feedback and comments on wine quality and composition from their suppliers and other stakeholders. The supply chain team wants to join their transactional data with the wine quality comments in SharePoint data to improve their wine quality and manage their production issues better. They want to capture those comments from the SharePoint server within an hour and publish that data to a wine quality dashboard in Amazon QuickSight. With an event-driven approach to ingest and process their SharePoint data, the supply chain team can consume the data in less than an hour.

Overview of solution

In this post, we walk through a solution to set up an AWS Glue job to ingest SharePoint lists and files into an S3 bucket and an AWS Glue workflow that listens to S3 PutObject data events captured by AWS CloudTrail. This workflow is configured with an event-based trigger to run when an AWS Glue ingest job adds new files into the S3 bucket. The following diagram illustrates the architecture.

To make it simple to deploy, we captured the entire solution in an AWS CloudFormation template that enables you to automatically ingest SharePoint data into Amazon S3. SharePoint uses ClientID and TenantID credentials for authentication and uses Oauth2 for authorization.

The template helps you perform the following steps:

  1. Create an AWS Glue Python shell job to make the REST API call to the SharePoint server and ingest files or lists into Amazon S3.
  2. Create an AWS Glue workflow with a starting trigger of EVENT type.
  3. Configure CloudTrail to log data events, such as PutObject API calls to CloudTrail.
  4. Create a rule in EventBridge to forward the PutObject API events to AWS Glue when they’re emitted by CloudTrail.
  5. Add an AWS Glue event-driven workflow as a target to the EventBridge rule. The workflow gets triggered when the SharePoint ingest AWS Glue job adds new files to the S3 bucket.

Prerequisites

For this walkthrough, you should have the following prerequisites:

Configure SharePoint server authentication details

Before launching the CloudFormation stack, you need to set up your SharePoint server authentication details, namely, TenantID, Tenant, ClientID, ClientSecret, and the SharePoint URL in AWS Systems Manager Parameter Store of the account you’re deploying in. This makes sure that no authentication details are stored in the code and they’re fetched in real time from Parameter Store when the solution is running.

To create your AWS Systems Manager parameters, complete the following steps:

  1. On the Systems Manager console, under Application Management in the navigation pane, choose Parameter Store.
    systems manager
  2. Choose Create Parameter.
  3. For Name, enter the parameter name /DATALAKE/GlueIngest/SharePoint/tenant.
  4. Leave the type as string.
  5. Enter your SharePoint tenant detail into the value field.
  6. Choose Create parameter.
  7. Repeat these steps to create the following parameters:
    1. /DataLake/GlueIngest/SharePoint/tenant
    2. /DataLake/GlueIngest/SharePoint/tenant_id
    3. /DataLake/GlueIngest/SharePoint/client_id/list
    4. /DataLake/GlueIngest/SharePoint/client_secret/list
    5. /DataLake/GlueIngest/SharePoint/client_id/file
    6. /DataLake/GlueIngest/SharePoint/client_secret/file
    7. /DataLake/GlueIngest/SharePoint/url/list
    8. /DataLake/GlueIngest/SharePoint/url/file

Deploy the solution with AWS CloudFormation

For a quick start of this solution, you can deploy the provided CloudFormation stack. This creates all the required resources in your account.

The CloudFormation template generates the following resources:

  • S3 bucket – Stores data, CloudTrail logs, job scripts, and any temporary files generated during the AWS Glue extract, transform, and load (ETL) job run.
  • CloudTrail trail with S3 data events enabled – Enables EventBridge to receive PutObject API call data in a specific bucket.
  • AWS Glue Job – A Python shell job that fetches the data from the SharePoint server.
  • AWS Glue workflow – A data processing pipeline that is comprised of a crawler, jobs, and triggers. This workflow converts uploaded data files into Apache Parquet format.
  • AWS Glue database – The AWS Glue Data Catalog database that holds the tables created in this walkthrough.
  • AWS Glue table – The Data Catalog table representing the Parquet files being converted by the workflow.
  • AWS Lambda function – The AWS Lambda function is used as an AWS CloudFormation custom resource to copy job scripts from an AWS Glue-managed GitHub repository and an AWS Big Data blog S3 bucket to your S3 bucket.
  • IAM roles and policies – We use the following AWS Identity and Access Management (IAM) roles:
    • LambdaExecutionRole – Runs the Lambda function that has permission to upload the job scripts to the S3 bucket.
    • GlueServiceRole – Runs the AWS Glue job that has permission to download the script, read data from the source, and write data to the destination after conversion.
    • EventBridgeGlueExecutionRole – Has permissions to invoke the NotifyEvent API for an AWS Glue workflow.
    • IngestGlueRole – Runs the AWS Glue job that has permission to ingest data into the S3 bucket.

To launch the CloudFormation stack, complete the following steps:

  1. Sign in to the AWS CloudFormation console.
  2. Choose Launch Stack:
  3. Choose Next.
  4. For pS3BucketName, enter the unique name of your new S3 bucket.
  5. Leave pWorkflowName and pDatabaseName as the default.

cloud formation 1

  1. For pDatasetName, enter the SharePoint list name or file name you want to ingest.
  2. Choose Next.

cloud formation 2

  1. On the next page, choose Next.
  2. Review the details on the final page and select I acknowledge that AWS CloudFormation might create IAM resources.
  3. Choose Create.

It takes a few minutes for the stack creation to complete; you can follow the progress on the Events tab.

You can run the ingest AWS Glue job either on a schedule or on demand. As the job successfully finishes and ingests data into the raw prefix of the S3 bucket, the AWS Glue workflow runs and transforms the ingested raw CSV files into Parquet files and loads them into the transformed prefix.

Review the EventBridge rule

The CloudFormation template created an EventBridge rule to forward S3 PutObject API events to AWS Glue. Let’s review the configuration of the EventBridge rule:

  1. On the EventBridge console, under Events, choose Rules.
  2. Choose the rule s3_file_upload_trigger_rule-<CloudFormation-stack-name>.
  3. Review the information in the Event pattern section.

event bridge

The event pattern shows that this rule is triggered when an S3 object is uploaded to s3://<bucket_name>/data/SharePoint/tablename_raw/. CloudTrail captures the PutObject API calls made and relays them as events to EventBridge.

  1. In the Targets section, you can verify that this EventBridge rule is configured with an AWS Glue workflow as a target.

event bridge target section

Run the ingest AWS Glue job and verify the AWS Glue workflow is triggered successfully

To test the workflow, we run the ingest-glue-job-SharePoint-file job using the following steps:

  1. On the AWS Glue console, select the ingest-glue-job-SharePoint-file job.

glue job

  1. On the Action menu, choose Run job.

glue job action menu

  1. Choose the History tab and wait until the job succeeds.

glue job history tab

You can now see the CSV files in the raw prefix of your S3 bucket.

csv file s3 location

Now the workflow should be triggered.

  1. On the AWS Glue console, validate that your workflow is in the RUNNING state.

glue workflow running status

  1. Choose the workflow to view the run details.
  2. On the History tab of the workflow, choose the current or most recent workflow run.
  3. Choose View run details.

glue workflow visual

When the workflow run status changes to Completed, let’s check the converted files in your S3 bucket.

  1. Switch to the Amazon S3 console, and navigate to your bucket.

You can see the Parquet files under s3://<bucket_name>/data/SharePoint/tablename_transformed/.

parquet file s3 location

Congratulations! Your workflow ran successfully based on S3 events triggered by uploading files to your bucket. You can verify everything works as expected by running a query against the generated table using Amazon Athena.

Sample wine dataset

Let’s analyze a sample red wine dataset. The following screenshot shows a SharePoint list that contains various readings that relate to the characteristics of the wine and an associated wine category. This is populated by various wine tasters from multiple countries.

redwine dataset

The following screenshot shows a supplier dataset from the data lake with wine categories ordered per supplier.

supplier dataset

We process the red wine dataset using this solution and use Athena to query the red wine data and supplier data where wine quality is greater than or equal to 7.

athena query and results

We can visualize the processed dataset using QuickSight.

Clean up

To avoid incurring unnecessary charges, you can use the AWS CloudFormation console to delete the stack that you deployed. This removes all the resources you created when deploying the solution.

Conclusion

Event-driven architectures provide access to near-real-time information and help you make business decisions on fresh data. In this post, we demonstrated how to ingest and process SharePoint data using AWS serverless services like AWS Glue and EventBridge. We saw how to configure a rule in EventBridge to forward events to AWS Glue. You can use this pattern for your analytical use cases, such as joining SharePoint data with other data in your lake to generate insights, or auditing SharePoint data and compliance requirements.


About the Author

Venkata Sistla is a Big Data & Analytics Consultant on the AWS Professional Services team. He specializes in building data processing capabilities and helping customers remove constraints that prevent them from leveraging their data to develop business insights.

Introducing Code Club World: a new way for young people to learn to code at home

Post Syndicated from Laura Kirsop original https://www.raspberrypi.org/blog/code-club-world-free-online-platform-young-people-children-learn-to-code-at-home/

Today we are introducing you to Code Club World — a free online platform where young people aged 9 to 13 can learn to make stuff with code.

Images from Code Club World, a free online platform for children who want to learn to code

In Code Club World, young people can:

  • Start out by creating their personal robot avatar
  • Make music, design a t-shirt, and teach their robot avatar to dance!
  • Learn to code on islands with structured activities
  • Discover block-based and text-based coding in Scratch and Python
  • Earn badges for their progress 
  • Share their coding creations with family, friends, and the Code Club World community

Learning to code at home with Code Club World: meaningful, fun, flexible

When we spoke to parents and children about learning at home during the pandemic, it became clear to us that they were looking for educational tools that the children can enjoy and master independently, and that are as fun and social as the computer games and other apps the children love.

A girl has fun learning to code at home, sitting with a laptop on a sofa, with a dog sleeping next to her and her father writing code too.
Code Club World is educational, and as fun as the games and apps young people love.

What’s more, a free tool for learning to code at home is particularly important for young people who are unable to attend coding clubs in person. We believe every child should have access to a high-quality coding and digital making education. And with this in mind, we set out to create Code Club World, an online environment as rich and engaging as a face-to-face extracurricular learning experience, where all young people can learn to code.

The Code Club World activities are mapped to our research-informed Digital Making Framework — a coding and digital making curriculum for non-formal settings. That means when children are in the Code Club World environment, they are learning to code and use digital making to independently create their ideas and address challenges that matter to them.

Islands in the Code Club World online platform for children who want to learn to code for free.
Welcome to Code Club World — so many islands to explore!

By providing a structured pathway through the coding activities, a reward system of badges to engage and motivate learners, and a broad range of projects covering different topics, Code Club World supports learners at every stage, while making the activities meaningful, fun, and flexible.

A girl has fun learning to code at home on a tablet sitting on a sofa.
Code Club World’s home island works as well on mobile phones and tablets as on computers.

We’ve also designed Code Club World to be mobile-friendly, so if a young person uses a phone or tablet to visit the platform, they can still code cool things they will be proud of.

Created with the community

Since we started developing Code Club World, we have been working with a community of more than 1000 parents, educators, and children who are giving us valuable input to shape the direction of the platform. We’ve had some fantastic feedback from them:

“I’ve not coded before, but found this really fun! … I LOVED making the dance. It was so much fun and made me laugh!”

Learner, aged 11

“I love the concept of having islands to explore in making the journey through learning coding, it is fabulous and eye-catching.”

Parent

The platform is still in beta status — this means we’d love you to share it with young people in your family, school, or community so they can give their feedback and help make Code Club World even better.

Together, we will ensure every child has an equal opportunity to learn to code and make things that change their world.

The post Introducing Code Club World: a new way for young people to learn to code at home appeared first on Raspberry Pi.

Open-Sourcing a Monitoring GUI for Metaflow

Post Syndicated from Netflix Technology Blog original https://netflixtechblog.com/open-sourcing-a-monitoring-gui-for-metaflow-75ff465f0d60

Open-Sourcing a Monitoring GUI for Metaflow, Netflix’s ML Platform

tl;dr Today, we are open-sourcing a long-awaited GUI for Metaflow. The Metaflow GUI allows data scientists to monitor their workflows in real-time, track experiments, and see detailed logs and results for every executed task. The GUI can be extended with plugins, allowing the community to build integrations to other systems, custom visualizations, and embed upcoming features of Metaflow directly into its views.

Metaflow is a full-stack framework for data science that we started developing at Netflix over four years ago and which we open-sourced in 2019. It allows data scientists to define ML workflows, test them locally, scale-out to the cloud, and deploy to production in idiomatic Python code. Since open-sourcing, the Metaflow community has been growing quickly: it is now the 7th most starred active project on Netflix’s GitHub account with nearly 4800 stars. Outside Netflix, Metaflow is used to power machine learning in production by hundreds of companies across industries from bioinformatics to real estate.

Since its inception, Metaflow has been a command-line-centric tool. It makes it easy for data scientists to express even complex machine learning applications in idiomatic Python, test them locally, or scale them out in the cloud — all using their favorite IDEs and terminals. Following our culture of freedom and responsibility, Metaflow grants data scientists the freedom to choose the right modeling approach, handle data and features flexibly, and construct workflows easily while ensuring that the resulting project executes responsibly and robustly on the production infrastructure.

As the number and criticality of projects running on Metaflow increased — some of which are very central to our business — our ML platform team started receiving an increasing number of support requests. Frequently, the questions were of the nature “can you help me understand why my flow takes so long to execute” or “how can I find the logs for a model that failed last night.” Technically, Metaflow provides a Python API that allows the user to inspect all details e.g., in a notebook, but writing code in a notebook to answer basic questions like this felt overkill and unnecessarily tedious. After observing the situation for months, we started forming an understanding of the kind of a new user interface that could address the growing needs of our users.

Requirements for a Metaflow GUI

Metaflow is a human-centered system by design. We consider our Python API and the CLI to be integral parts of the overall user interface and user experience, which singularly focuses on making it easier to build production-ready ML projects from scratch. In our approach, Python code provides a highly expressive and productive user interface for expressing complex business logic, such as ML models and workflows. At the same time, the CLI allows users to execute specific commands quickly and even automate common actions. When it comes to complex, real-life development work like this, it would be hard to achieve the same level of productivity on a graphical user interface.

However, textual UIs are quite lacking when it comes to discoverability and getting a holistic understanding of the system’s state. The questions we were hearing reflected this gap: we were lacking a user interface that would allow the users, quite simply, to figure out quickly what is happening in their Metaflow projects.

Netflix has a long history of developing innovative tools for observability, so when we began to specify requirements for the new GUI, we were able to leverage experiences from the previous GUIs built for other use cases, as well as real-life user stories from Metaflow users. We wanted to scope the GUI tightly, focusing on a specific gap in the Metaflow experience:

  1. The GUI should allow the users to see what flows and tasks are executing and what is happening inside them. Notably, we didn’t want to replace any of the functionality in the Metaflow APIs or CLI with the GUI — just to complement them. This meant that the GUI would be read-only: all actions like writing code and starting executions should happen on the users’ IDE and terminal as before. We also had no need to build a model-monitoring GUI yet, which is a wholly separate problem domain.
  2. The GUI would be targeted at professional data scientists. Instead of a fancy GUI for demos and presentations, we wanted a serious productivity tool with carefully thought-out user workflows that would fit seamlessly into our toolchain of data science. This requires attention to small details: for instance, users should be able to copy a link to any view in the GUI and share it e.g., on Slack, for easy collaboration and support (or to integrate with the Metaflow Slack bot). And, there should be natural affordances for navigating between the CLI, the GUI, and notebooks.
  3. The GUI should be scalable and snappy: it should handle our existing repository consisting of millions of runs, some of which contain tens of thousands of tasks without hiccups. Based on our experiences with other GUIs operating at Netflix-scale, this is not a trivial requirement: scalability needs to be baked into the design from the very beginning. Sluggish GUIs are hard to debug and fix afterwards, and they can have a significantly negative impact on productivity.
  4. The GUI should integrate well with other GUIs. A modern ML stack consists of many independent systems like data warehouses, compute layers, model serving systems, and, in particular, notebooks. It should be possible to find runs and tasks of interest in the Metaflow GUI and use a task-specific view to jump to other GUIs for further information. Our landscape of tools is constantly evolving, so we didn’t want to hardcode these links and views in the GUI itself. Instead, following the integration-friendly ethos of Metaflow, we want to embed relevant information in the GUI as plugins.
  5. Finally, we wanted to minimize the operational overhead of the GUI. In particular, under no circumstances should the GUI impact Metaflow executions. The GUI backend should be a simple service, optionally sitting alongside the existing Metaflow metadata service, providing a read-only, real-time view to the stored state. The frontend side should be easily extensible and maintainable, suggesting that we wanted a modern React app.

Monitoring GUI for Metaflow

As our ML Platform team had limited frontend resources, we reached out to Codemate to help with the implementation. As it often happens in software engineering projects, the project took longer than expected to finish, mostly because the problem of tracking and visualizing thousands of concurrent objects in real-time in a highly distributed environment is a surprisingly non-trivial problem (duh!). After countless iterations, we are finally very happy with the outcome, which we have now used in production for a few months.

When you open the GUI, you see an overview of all flows and runs, both current and historical, which you can group and filter in various ways:

Runs Grouped by flows

We can use this view for experiment tracking: Metaflow records every execution automatically, so data scientists can track all their work using this view. Naturally, the view can be grouped by user. They can also tag their runs and filter the view by tags, allowing them to focus on particular subsets of experiments.

After you click a specific run, you see all its tasks on a timeline:

Timeline view for a run

The timeline view is extremely useful in understanding performance bottlenecks, distribution of task runtimes, and finding failed tasks. At the top, you can see global attributes of the run, such as its status, start time, parameters etc. You can click a specific task to see more details:

Task view

This task view shows logs produced by a task, its results, and optionally links to other systems that are relevant to the task. For instance, if the task had deployed a model to a model serving platform, the view could include a link to a UI used for monitoring microservices.

As specified in our requirements, the GUI should work well with Metaflow CLI. To facilitate this, the top bar includes a navigation component where the user can copy-paste any pathspec, i.e., a path to any object in the Metaflow universe, which are prominently shown in the CLI output. This way, the user can easily move from the CLI to the GUI to observe runs and tasks in detail.

While the CLI is great, it is challenging to visualize flows. Each flow can be represented as a Directed Acyclic Graph (DAG), and so the GUI provides a much better way to visualize a flow. The DAG view presents all the steps of a flow and how they are related. Each step may have developer comments. They are colored to indicate the current state. Split steps are grouped by shaded boxes, while steps that participated in a foreach are grouped by a double shade box. Clicking on a step will take you to the Task view.

DAG View

Users at different organizations will likely have some special use cases that are not directly supported. The Metaflow GUI is extensible through its plugin API. For example, Netflix has its container orchestration platform called Titus. Users can configure tasks to utilize Titus to scale up or out. When failures happen, users will need to access their Titus containers for more information, and within the task view, a simple plugin provides a link for further troubleshooting.

Example task-level plugin

Try it at home!

We know that our user stories and requirements for a Metaflow GUI are not unique to Netflix. A number of companies in the Metaflow community have requested GUI for Metaflow in the past. To support the thriving community and invite 3rd party contributions to the GUI, we are open-sourcing our Monitoring GUI for Metaflow today!

You can find detailed instructions for how to deploy the GUI here. If you want to see the GUI in action before deploying it, Outerbounds, a new startup founded by our ex-colleagues, has deployed a public demo instance of the GUI. Outerbounds also hosts an active Slack community of Metaflow users where you can find support for GUI-related issues and share feedback and ideas for improvement.

With the new GUI, data scientists don’t have to fly blind anymore. Instead of reaching out to a platform team for support, they can easily see the state of their workflows on their own. We hope that Metaflow users outside Netflix will find the GUI equally beneficial, and companies will find creative ways to improve the GUI with new plugins.

For more context on the development process and motivation for the GUI, you can watch this recording of the GUI launch meetup.


Open-Sourcing a Monitoring GUI for Metaflow was originally published in Netflix TechBlog on Medium, where people are continuing the conversation by highlighting and responding to this story.