Tag Archives: python

Bringing Python to Workers using Pyodide and WebAssembly

2024-04-02 Hood Chatham

Post Syndicated from Hood Chatham original https://blog.cloudflare.com/python-workers

Starting today, in open beta, you can now write Cloudflare Workers in Python.

This new support for Python is different from how Workers have historically supported languages beyond JavaScript — in this case, we have directly integrated a Python implementation into workerd, the open-source Workers runtime. All bindings, including bindings to Vectorize, Workers AI, R2, Durable Objects, and more are supported on day one. Python Workers can import a subset of popular Python packages including FastAPI, Langchain, Numpy and more. There are no extra build steps or external toolchains.

To do this, we’ve had to push the bounds of all of our systems, from the runtime itself, to our deployment system, to the contents of the Worker bundle that is published across our network. You can read the docs, and start using it today.

We want to use this post to pull back the curtain on the internal lifecycle of a Python Worker, share what we’ve learned in the process, and highlight where we’re going next.

Beyond “Just compile to WebAssembly”

Cloudflare Workers have supported WebAssembly since 2018 — each Worker is a V8 isolate, powered by the same JavaScript engine as the Chrome web browser. In principle, it’s been possible for years to write Workers in any language — including Python — so long as it first compiles to WebAssembly or to JavaScript.

In practice, just because something is possible doesn’t mean it’s simple. And just because “hello world” works doesn’t mean you can reliably build an application. Building full applications requires supporting an ecosystem of packages that developers are used to building with. For a platform to truly support a programming language, it’s necessary to go much further than showing how to compile code using external toolchains.

Python Workers are different from what we’ve done in the past. It’s early, and still in beta, but we think it shows what providing first-class support for programming languages beyond JavaScript can look like on Workers.

The lifecycle of a Python Worker

With Pyodide now built into workerd, you can write a Worker like this:

from js import Response

async def on_fetch(request, env):
    return Response.new("Hello world!")

…with a wrangler.toml file that points to a .py file:

name = "hello-world-python-worker"
main = "src/entry.py"
compatibility_date = "2024-03-18"

…and when you run npx wrangler@latest dev, the Workers runtime will:

Determine which version of Pyodide is required, based on your compatibility date
Create an isolate for your Worker, and automatically inject Pyodide
Serve your Python code using Pyodide

This all happens under the hood — no extra toolchain or precompilation steps needed. The Python execution environment is provided for you, mirroring how Workers written in JavaScript already work.

A Python interpreter built into the Workers runtime

Just as JavaScript has many engines, Python has many implementations that can execute Python code. CPython is the reference implementation of Python. If you’ve used Python before, this is almost certainly what you’ve used, and is commonly referred to as just “Python”.

Pyodide is a port of CPython to WebAssembly. It interprets Python code, without any need to precompile the Python code itself to any other format. It runs in a web browser — check out this REPL. It is true to the CPython that Python developers know and expect, providing most of the Python Standard Library. It provides a foreign function interface (FFI) to JavaScript, allowing you to call JavaScript APIs directly from Python — more on this below. It provides popular open-source packages, and can import pure Python packages directly from PyPI.

Pyodide struck us as the perfect fit for Workers. It is designed to allow the core interpreter and each native Python module to be built as separate WebAssembly modules, dynamically linked at runtime. This allows the code footprint for these modules to be shared among all Workers running on the same machine, rather than requiring each Worker to bring its own copy. This is essential to making WebAssembly work well in the Workers environment, where we often run thousands of Workers per machine — we need Workers using the same programming language to share their runtime code footprint. Running thousands of Workers on every machine is what makes it possible for us to deploy every application in every location at a reasonable price.

Just like with JavaScript Workers, with Python Workers we provide the runtime for you:

Pyodide is currently the exception — most languages that target WebAssembly do not yet support dynamic linking, so each application ends up bringing its own copy of its language runtime. We hope to see more languages support dynamic linking in the future, so that we can more effectively bring them to Workers.

How Pyodide works

Pyodide executes Python code in WebAssembly, which is a sandboxed environment, separated from the host runtime. Unlike running native code, all operations outside of pure computation (such as file reads) must be provided by a runtime environment, then imported by the WebAssembly module.

LLVM provides three target triples for WebAssembly:

wasm32-unknown-unknown – this backend provides no C standard library or system call interface; to support this backend, we would need to manually rewrite every system or library call to make use of imports we would define ourselves in the runtime.
wasm32-wasi – WASI is a standardized system interface, and defines a standard set of imports that are implemented in WASI runtimes such as wasmtime.
wasm32-unknown-emscripten – Like WASI, Emscripten defines the imports that a WebAssembly program needs to execute, but also outputs an accompanying JavaScript library that implements these imported functions.

Pyodide uses Emscripten, and provides three things:

A distribution of the CPython interpreter, compiled using Emscripten
A foreign function interface (FFI) between Python and JavaScript
A set of third-party Python packages, compiled using Emscripten’s compiler to WebAssembly.

Of these targets, only Emscripten currently supports dynamic linking, which, as we noted above, is essential to providing a shared language runtime for Python that is shared across isolates. Emscripten does this by providing implementations of dlopen and dlsym, which use the accompanying JavaScript library to modify the WebAssembly program’s table to link additional WebAssembly-compiled modules at runtime. WASI does not yet support the dlopen/dlsym dynamic linking abstractions used by CPython.

Pyodide and the magic of foreign function interfaces (FFI)

You might have noticed that in our Hello World Python Worker, we import Response from the js module:

from js import Response

async def on_fetch(request, env):
    return Response.new("Hello world!")

Why is that?

Most Workers are written in JavaScript, and most of our engineering effort on the Workers runtime goes into improving JavaScript Workers. There is a risk in adding a second language that it might never reach feature parity with the first language and always be a second class citizen. Pyodide’s foreign function interface (FFI) is critical to avoiding this by providing access to all JavaScript functionality from Python. This can be used by the Worker author directly, and it is also used to make packages like FastAPI and Langchain work out-of-the-box, as we’ll show later in this post.

An FFI is a system for calling functions in one language that are implemented in another language. In most cases, an FFI is defined by a “higher-level” language in order to call functions implemented in a systems language, often C. Python’s ctypes module is such a system. These sorts of foreign function interfaces are often difficult to use because of the nature of C APIs.

Pyodide’s foreign function interface is an interface between Python and JavaScript, which are two high level object-oriented languages with a lot of design similarities. When passed from one language to another, immutable types such as strings and numbers are transparently translated. All mutable objects are wrapped in an appropriate proxy.

When a JavaScript object is passed into Python, Pyodide determines which JavaScript protocols the object supports and dynamically constructs an appropriate Python class that implements the corresponding Python protocols. For example, if the JavaScript object supports the JavaScript iteration protocol then the proxy will support the Python iteration protocol. If the JavaScript object is a Promise or other thenable, the Python object will be an awaitable.

from js import JSON

js_array = JSON.parse("[1,2,3]")

for entry in js_array:
   print(entry)

The lifecycle of a request to a Python Worker makes use of Pyodide’s FFI, wrapping the incoming JavaScript Request object in a JsProxy object that is accessible in your Python code. It then converts the value returned by the Python Worker’s handler into a JavaScript Response object that can be delivered back to the client:

Why dynamic linking is essential, and static linking isn’t enough

Python comes with a C FFI, and many Python packages use this FFI to import native libraries. These libraries are typically written in C, so they must first be compiled down to WebAssembly in order to work on the Workers runtime. As we noted above, Pyodide is built with Emscripten, which overrides Python’s C FFI — any time a package tries to load a native library, it is instead loaded from a WebAssembly module that is provided by the Workers runtime. Dynamic linking is what makes this possible — it is what lets us override Python’s C FFI, allowing Pyodide to support many Python packages that have native library dependencies.

Dynamic linking is “pay as you go”, while static linking is “pay upfront” — if code is statically linked into your binary, it must be loaded upfront in order for the binary to run, even if this code is never used.

Dynamic linking enables the Workers runtime to share the underlying WebAssembly modules of packages across different Workers that are running on the same machine.

We won’t go too much into detail on how dynamic linking works in Emscripten, but the main takeaway is that the Emscripten runtime fetches WebAssembly modules from a filesystem abstraction provided in JavaScript. For each Worker, we generate a filesystem at runtime, whose structure mimics a Python distribution that has the Worker’s dependencies installed, but whose underlying files are shared between Workers. This makes it possible to share Python and WebAssembly files between multiple Workers that import the same dependency. Today, we’re able to share these files across Workers, but copy them into each new isolate. We think we can go even further, by employing copy-on-write techniques to share the underlying resource across many Workers.

Supporting Server and Client libraries

Python has a wide variety of popular HTTP client libraries, including httpx, urllib3, requests and more. Unfortunately, none of them work out of the box in Pyodide. Adding support for these has been one of the longest running user requests for the Pyodide project. The Python HTTP client libraries all work with raw sockets, and the browser security model and CORS do not allow this, so we needed another way to make them work in the Workers runtime.

Async Client libraries

For libraries that can make requests asynchronously, including aiohttp and httpx, we can use the Fetch API to make requests. We do this by patching the library, instructing it to use the Fetch API from JavaScript — taking advantage of Pyodide’s FFI. The httpx patch ends up quite simple —fewer than 100 lines of code. Simplified even further, it looks like this:

from js import Headers, Request, fetch

def py_request_to_js_request(py_request):
    js_headers = Headers.new(py_request.headers)
    return Request.new(py_request.url, method=py_request.method, headers=js_headers)

def js_response_to_py_response(js_response):
  ... # omitted

async def do_request(py_request):
  js_request = py_request_to_js_request(py_request)
    js_response = await fetch(js_request)
    py_response = js_response_to_py_response(js_response)
    return py_response

Synchronous Client libraries

Another challenge in supporting Python HTTP client libraries is that many Python APIs are synchronous. For these libraries, we cannot use the fetch API directly because it is asynchronous.

Thankfully, Joe Marshall recently landed a contribution to urllib3 that adds Pyodide support in web browsers by:

Checking if blocking with Atomics.wait() is possible
a. If so, start a fetch worker thread
b. Delegate the fetch operation to the worker thread and serialize the response into a SharedArrayBuffer
c. In the Python thread, use Atomics.wait to block for the response in the SharedArrayBuffer
If Atomics.wait() doesn’t work, fall back to a synchronous XMLHttpRequest

Despite this, today Cloudflare Workers do not support worker threads or synchronous XMLHttpRequest, so neither of these two approaches will work in Python Workers. We do not support synchronous requests today, but there is a way forward…

WebAssembly Stack Switching

There is an approach which will allow us to support synchronous requests. WebAssembly has a stage 3 proposal adding support for stack switching, which v8 has an implementation of. Pyodide contributors have been working on adding support for stack switching to Pyodide since September of 2022, and it is almost ready.

With this support, Pyodide exposes a function called run_sync which can block for completion of an awaitable:

from pyodide.ffi import run_sync

def sync_fetch(py_request):
   js_request = py_request_to_js_request(py_request)
   js_response  = run_sync(fetch(js_request))
   return js_response_to_py_response(js_response)

FastAPI and Python’s Asynchronous Server Gateway Interface

FastAPI is one of the most popular libraries for defining Python servers. FastAPI applications use a protocol called the Asynchronous Server Gateway Interface (ASGI). This means that FastAPI never reads from or writes to a socket itself. An ASGI application expects to be hooked up to an ASGI server, typically uvicorn. The ASGI server handles all of the raw sockets on the application’s behalf.

Conveniently for us, this means that FastAPI works in Cloudflare Workers without any patches or changes to FastAPI itself. We simply need to replace uvicorn with an appropriate ASGI server that can run within a Worker. Our initial implementation lives here, in the fork of Pyodide that we maintain. We hope to add a more comprehensive feature set, add test coverage, and then upstream this implementation into Pyodide.

You can try this yourself by cloning cloudflare/python-workers-examples, and running npx wrangler@latest dev in the directory of the FastAPI example.

Importing Python Packages

Python Workers support a subset of Python packages, which are provided directly by Pyodide, including numpy, httpx, FastAPI, Langchain, and more. This ensures compatibility with the Pyodide runtime by pinning package versions to Pyodide versions, and allows Pyodide to patch internal implementations, as we showed above in the case of httpx.

To import a package, simply add it to your requirements.txt file, without adding a version number. A specific version of the package is provided directly by Pyodide. Today, you can use packages in local development, and in the coming weeks, you will be able to deploy Workers that define dependencies in a requirements.txt file. Later in this post, we’ll show how we’re thinking about managing new versions of Pyodide and packages.

We maintain our own fork of Pyodide, which allows us to provide patches specific to the Workers runtime, and to quickly expand our support for packages in Python Workers, while also committing to upstreaming our changes back to Pyodide, so that the whole ecosystem of developers can benefit.

Python packages are often big and memory hungry though, and they can do a lot of work at import time. How can we ensure that you can bring in the packages you need, while mitigating long cold start times?

Making cold starts faster with memory snapshots

In the example at the start of this post, in local development, we mentioned injecting Pyodide into your Worker. Pyodide itself is 6.4MB — and Python packages can also be quite large.

If we simply shoved Pyodide into your Worker and uploaded it to Cloudflare, that’d be quite a large Worker to load into a new isolate — cold starts would be slow. On a fast computer with a good network connection, Pyodide takes about two seconds to initialize in a web browser, one second of network time and one second of cpu time. It wouldn’t be acceptable to initialize it every time you update your code for every isolate your Worker runs in across Cloudflare’s network.

Instead, when you run npx wrangler@latest deploy, the following happens:

Wrangler uploads your Python code and your requirements.txt file to the Workers API
We send your Python code, and your requirements.txt file to the Workers runtime to be validated
We create a new isolate for your Worker, and automatically inject Pyodide plus any packages you’ve specified in your requirements.txt file.
We scan the Worker’s code for import statements, execute them, and then take a snapshot of the Worker’s WebAssembly linear memory. Effectively, we perform the expensive work of importing packages at deploy time, rather than at runtime.
We deploy this snapshot alongside your Worker’s Python code to Cloudflare’s network.
Just like a JavaScript Worker, we execute the Worker’s top-level scope.

When a request comes in to your Worker, we load this snapshot and use it to bootstrap your Worker in an isolate, avoiding expensive initialization time:

This takes cold starts for a basic Python Worker down to below 1 second. We’re not yet satisfied with this though. We’re confident that we can drive this down much, much further. How? By reusing memory snapshots.

Reusing Memory Snapshots

When you upload a Python Worker, we generate a single memory snapshot of the Worker’s top-level imports, including both Pyodide and any dependencies. This snapshot is specific to your Worker. It can’t be shared, even though most of its contents are the same as other Python Workers.

Instead, we can create a single, shared snapshot ahead of time, and preload it into a pool of “pre-warmed” isolates. These isolates would already have the Pyodide runtime loaded and ready — making a Python Worker work just like a JavaScript Worker. In both cases, the underlying interpreter and execution environment is provided by the Workers runtime, and available on-demand without delay. The only difference is that with Python, the interpreter runs in WebAssembly, within the Worker.

Snapshots are a common pattern across runtimes and execution environments. Node.js uses V8 snapshots to speed up startup time. You can take snapshots of Firecracker microVMs and resume execution in a different process. There’s lots more we can do here — not just for Python Workers, but for Workers written in JavaScript as well, caching snapshots of compiled code from top-level scope and the state of the isolate itself. Workers are so fast and efficient that to-date we haven’t had to take snapshots in this way, but we think there are still big performance gains to be had.

This is our biggest lever towards driving cold start times down over the rest of 2024.

Future proofing compatibility with Pyodide versions and Compatibility Dates

When you deploy a Worker to Cloudflare, you expect it to keep running indefinitely, even if you never update it again. There are Workers deployed in 2018 that are still running just fine in production.

We achieve this using Compatibility Dates and Compatibility Flags, which provide explicit opt-in mechanisms for new behavior and potentially backwards-incompatible changes, without impacting existing Workers.

This works in part because it mirrors how the Internet and web browsers work. You publish a web page with some JavaScript, and rightly expect it to work forever. Web browsers and Cloudflare Workers have the same type of commitment of stability to developers.

There is a challenge with Python though — both Pyodide and CPython are versioned. Updated versions are published regularly and can contain breaking changes. And Pyodide provides a set of built-in packages, each with a pinned version number. This presents a question — how should we allow you to update your Worker to a newer version of Pyodide?

The answer is Compatibility Dates and Compatibility Flags.

A new version of Python is released every year in August, and a new version of Pyodide is released six (6) months later. When this new version of Pyodide is published, we will add it to Workers by gating it behind a Compatibility Flag, which is only enabled after a specified Compatibility Date. This lets us continually provide updates, without risk of breaking changes, extending the commitment we’ve made for JavaScript to Python.

Each Python release has a five (5) year support window. Once this support window has passed for a given version of Python, security patches are no longer applied, making this version unsafe to rely on. To mitigate this risk, while still trying to hold as true as possible to our commitment of stability and long-term support, after five years any Python Worker still on a Python release that is outside of the support window will be automatically moved forward to the next oldest Python release. Python is a mature and stable language, so we expect that in most cases, your Python Worker will continue running without issue. But we recommend updating the compatibility date of your Worker regularly, to stay within the support window.

In between Python releases, we also expect to update and add additional Python packages, using the same opt-in mechanism. A Compatibility Flag will be a combination of the Python version and the release date of a set of packages. For example, python_3.17_packages_2025_03_01.

How bindings work in Python Workers

We mentioned earlier that Pyodide provides a foreign function interface (FFI) to JavaScript — meaning that you can directly use JavaScript objects, methods, functions and more, directly from Python.

This means that from day one, all binding APIs to other Cloudflare resources are supported in Cloudflare Workers. The env object that is provided by handlers in Python Workers is a JavaScript object that Pyodide provides a proxy API to, handling type translations across languages automatically.

For example, to write to and read from a KV namespace from a Python Worker, you would write:

from js import Response

async def on_fetch(request, env):
    await env.FOO.put("bar", "baz")
    bar = await env.FOO.get("bar")
    return Response.new(bar) # returns "baz"

This works for Web APIs too — see how Response is imported from the js module? You can import any global from JavaScript this way.

Get this JavaScript out of my Python!

You’re probably reading this post because you want to write Python instead of JavaScript. from js import Response just isn’t Pythonic. We know — and we have actually tackled this challenge before for another language (Rust). And we think we can do this even better for Python.

We launched workers-rs in 2021 to make it possible to write Workers in Rust. For each JavaScript API in Workers, we, alongside open-source contributors, have written bindings that expose a more idiomatic Rust API.

We plan to do the same for Python Workers — starting with the bindings to Workers AI and Vectorize. But while workers-rs requires that you use and update an external dependency, the APIs we provide with Python Workers will be built into the Workers runtime directly. Just update your compatibility date, and get the latest, most Pythonic APIs.

This is about more than just making bindings to resources on Cloudflare more Pythonic though — it’s about compatibility with the ecosystem.

Similar to how we recently converted workers-rs to use types from the http crate, which makes it easy to use the axum crate for routing, we aim to do the same for Python Workers. For example, the Python standard library provides a raw socket API, which many Python packages depend on. Workers already provides connect(), a JavaScript API for working with raw sockets. We see ways to provide at least a subset of the Python standard library’s socket API in Workers, enabling a broader set of Python packages to work on Workers, with less of a need for patches.

But ultimately, we hope to kick start an effort to create a standardized serverless API for Python. One that is easy to use for any Python developer and offers the same capabilities as JavaScript.

We’re just getting started with Python Workers

Providing true support for a new programming language is a big investment that goes far beyond making “hello world” work. We chose Python very intentionally — it’s the second most popular programming language after JavaScript — and we are committed to continuing to improve performance and widen our support for Python packages.

We’re grateful to the Pyodide maintainers and the broader Python community — and we’d love to hear from you. Drop into the Python Workers channel in the Cloudflare Developers Discord, or start a discussion on Github about what you’d like to see next and which Python packages you’d like us to support.

Best practices for managing Terraform State files in AWS CI/CD Pipeline

2024-02-19 Arun Kumar Selvaraj

Post Syndicated from Arun Kumar Selvaraj original https://aws.amazon.com/blogs/devops/best-practices-for-managing-terraform-state-files-in-aws-ci-cd-pipeline/

Introduction

Today customers want to reduce manual operations for deploying and maintaining their infrastructure. The recommended method to deploy and manage infrastructure on AWS is to follow Infrastructure-As-Code (IaC) model using tools like AWS CloudFormation, AWS Cloud Development Kit (AWS CDK) or Terraform.

One of the critical components in terraform is managing the state file which keeps track of your configuration and resources. When you run terraform in an AWS CI/CD pipeline the state file has to be stored in a secured, common path to which the pipeline has access to. You need a mechanism to lock it when multiple developers in the team want to access it at the same time.

In this blog post, we will explain how to manage terraform state files in AWS, best practices on configuring them in AWS and an example of how you can manage it efficiently in your Continuous Integration pipeline in AWS when used with AWS Developer Tools such as AWS CodeCommit and AWS CodeBuild. This blog post assumes you have a basic knowledge of terraform, AWS Developer Tools and AWS CI/CD pipeline. Let’s dive in!

Challenges with handling state files

By default, the state file is stored locally where terraform runs, which is not a problem if you are a single developer working on the deployment. However if not, it is not ideal to store state files locally as you may run into following problems:

When working in teams or collaborative environments, multiple people need access to the state file
Data in the state file is stored in plain text which may contain secrets or sensitive information
Local files can get lost, corrupted, or deleted

Best practices for handling state files

The recommended practice for managing state files is to use terraform’s built-in support for remote backends. These are:

Remote backend on Amazon Simple Storage Service (Amazon S3): You can configure terraform to store state files in an Amazon S3 bucket which provides a durable and scalable storage solution. Storing on Amazon S3 also enables collaboration that allows you to share state file with others.

Remote backend on Amazon S3 with Amazon DynamoDB: In addition to using an Amazon S3 bucket for managing the files, you can use an Amazon DynamoDB table to lock the state file. This will allow only one person to modify a particular state file at any given time. It will help to avoid conflicts and enable safe concurrent access to the state file.

There are other options available as well such as remote backend on terraform cloud and third party backends. Ultimately, the best method for managing terraform state files on AWS will depend on your specific requirements.

When deploying terraform on AWS, the preferred choice of managing state is using Amazon S3 with Amazon DynamoDB.

AWS configurations for managing state files

Create an Amazon S3 bucket using terraform. Implement security measures for Amazon S3 bucket by creating an AWS Identity and Access Management (AWS IAM) policy or Amazon S3 Bucket Policy. Thus you can restrict access, configure object versioning for data protection and recovery, and enable AES256 encryption with SSE-KMS for encryption control.

Next create an Amazon DynamoDB table using terraform with Primary key set to LockID. You can also set any additional configuration options such as read/write capacity units. Once the table is created, you will configure the terraform backend to use it for state locking by specifying the table name in the terraform block of your configuration.

For a single AWS account with multiple environments and projects, you can use a single Amazon S3 bucket. If you have multiple applications in multiple environments across multiple AWS accounts, you can create one Amazon S3 bucket for each account. In that Amazon S3 bucket, you can create appropriate folders for each environment, storing project state files with specific prefixes.

Now that you know how to handle terraform state files on AWS, let’s look at an example of how you can configure them in a Continuous Integration pipeline in AWS.

Architecture

Figure 1: Example architecture on how to use terraform in an AWS CI pipeline

This diagram outlines the workflow implemented in this blog:

The AWS CodeCommit repository contains the application code
The AWS CodeBuild job contains the buildspec files and references the source code in AWS CodeCommit
The AWS Lambda function contains the application code created after running terraform apply
Amazon S3 contains the state file created after running terraform apply. Amazon DynamoDB locks the state file present in Amazon S3

Implementation

Pre-requisites

Before you begin, you must complete the following prerequisites:

Install the latest version of AWS Command Line Interface (AWS CLI)
Install terraform latest version
Install latest Git version and setup git-remote-codecommit
Use an existing AWS account or create a new one
Use AWS IAM role with role profile, role permissions, role trust relationship and user permissions to access your AWS account via local terminal

Setting up the environment

You need an AWS access key ID and secret access key to configure AWS CLI. To learn more about configuring the AWS CLI, follow these instructions.
Clone the repo for complete example: git clone https://github.com/aws-samples/manage-terraform-statefiles-in-aws-pipeline
After cloning, you could see the following folder structure:

Figure 2: AWS CodeCommit repository structure

Let’s break down the terraform code into 2 parts – one for preparing the infrastructure and another for preparing the application.

Preparing the Infrastructure

The main.tf file is the core component that does below:
- - It creates an Amazon S3 bucket to store the state file. We configure bucket ACL, bucket versioning and encryption so that the state file is secure.
  - It creates an Amazon DynamoDB table which will be used to lock the state file.
  - It creates two AWS CodeBuild projects, one for ‘terraform plan’ and another for ‘terraform apply’.
Note – It also has the code block (commented out by default) to create AWS Lambda which you will use at a later stage.

AWS CodeBuild projects should be able to access Amazon S3, Amazon DynamoDB, AWS CodeCommit and AWS Lambda. So, the AWS IAM role with appropriate permissions required to access these resources are created via iam.tf file.

Next you will find two buildspec files named buildspec-plan.yaml and buildspec-apply.yaml that will execute terraform commands – terraform plan and terraform apply respectively.

Modify AWS region in the provider.tf file.

Update Amazon S3 bucket name, Amazon DynamoDB table name, AWS CodeBuild compute types, AWS Lambda role and policy names to required values using variable.tf file. You can also use this file to easily customize parameters for different environments.

With this, the infrastructure setup is complete.

You can use your local terminal and execute below commands in the same order to deploy the above-mentioned resources in your AWS account.

terraform init
terraform validate
terraform plan
terraform apply

Once the apply is successful and all the above resources have been successfully deployed in your AWS account, proceed with deploying your application.

Preparing the Application

In the cloned repository, use the backend.tf file to create your own Amazon S3 backend to store the state file. By default, it will have below values. You can override them with your required values.

bucket = "tfbackend-bucket" 
key    = "terraform.tfstate" 
region = "eu-central-1"

The repository has sample python code stored in main.py that returns a simple message when invoked.

In the main.tf file, you can find the below block of code to create and deploy the Lambda function that uses the main.py code (uncomment these code blocks).

data "archive_file" "lambda_archive_file" {
    ……
}

resource "aws_lambda_function" "lambda" {
    ……
}

Now you can deploy the application using AWS CodeBuild instead of running terraform commands locally which is the whole point and advantage of using AWS CodeBuild.

Run the two AWS CodeBuild projects to execute terraform plan and terraform apply again.

Once successful, you can verify your deployment by testing the code in AWS Lambda. To test a lambda function (console):

- Open AWS Lambda console and select your function “tf-codebuild”
- In the navigation pane, in Code section, click Test to create a test event
- Provide your required name, for example “test-lambda”
- Accept default values and click Save
- Click Test again to trigger your test event “test-lambda”

It should return the sample message you provided in your main.py file. In the default case, it will display “Hello from AWS Lambda !” message as shown below.

Figure 3: Sample Amazon Lambda function response

To verify your state file, go to Amazon S3 console and select the backend bucket created (tfbackend-bucket). It will contain your state file.

Figure 4: Amazon S3 bucket with terraform state file

Open Amazon DynamoDB console and check your table tfstate-lock and it will have an entry with LockID.

Figure 5: Amazon DynamoDB table with LockID

Thus, you have securely stored and locked your terraform state file using terraform backend in a Continuous Integration pipeline.

Cleanup

To delete all the resources created as part of the repository, run the below command from your terminal.

terraform destroy

Conclusion

In this blog post, we explored the fundamentals of terraform state files, discussed best practices for their secure storage within AWS environments and also mechanisms for locking these files to prevent unauthorized team access. And finally, we showed you an example of how efficiently you can manage them in a Continuous Integration pipeline in AWS.

You can apply the same methodology to manage state files in a Continuous Delivery pipeline in AWS. For more information, see CI/CD pipeline on AWS, Terraform backends types, Purpose of terraform state.

Introducing zabbix_utils – the official Python library for Zabbix API

2024-02-01 Aleksandr Iantsen

Post Syndicated from Aleksandr Iantsen original https://blog.zabbix.com/python-zabbix-utils/27056/

Zabbix is a flexible and universal monitoring solution that integrates with a wide variety of different systems right out of the box. Despite actively expanding the list of natively supported systems for integration (via templates or webhook integrations), there may still be a need to integrate with custom systems and services that are not yet supported. In such cases, a library taking care of implementing interaction protocols with the Zabbix API, Zabbix server/proxy, or Agent/Agent2 becomes extremely useful. Given that Python is widely adopted among DevOps and SRE engineers as well as server administrators, we decided to release a library for this programming language first.

We are pleased to introduce zabbix_utils – a Python library for seamless interaction with Zabbix API, Zabbix server/proxy, and Zabbix Agent/Agent2. Of course, there are popular community solutions for working with these Zabbix components in Python. Keeping this fact in mind, we have tried to consolidate popular issues and cases along with our experience to develop as convenient a tool as possible. Furthermore, we made sure that transitioning to the tool is as straightforward and clear as possible. Thanks to official support, you can be confident that the current version of the library is compatible with the latest Zabbix release.

In this article, we will introduce you to the main capabilities of the library and provide examples of how to use it with Zabbix components.

Usage Scenarios

The zabbix_utils library can be used in the following scenarios, but is not limited to them:

Zabbix automation
Integration with third-party systems
Custom monitoring solutions
Data export (hosts, templates, problems, etc.)
Integration into your Python application for Zabbix monitoring support
Anything else that comes to mind

You can use zabbix_utils for automating Zabbix tasks, such as scripting the automatic monitoring setup of your IT infrastructure objects. This can involve using ZabbixAPI for the direct management of Zabbix objects, Sender for sending values to hosts, and Getter for gathering data from Agents. We will discuss Sender and Getter in more detail later in this article.

For example, let’s imagine you have an infrastructure consisting of different branches. Each server or workstation is deployed from an image with an automatically configured Zabbix Agent and each branch is monitored by a Zabbix proxy since it has an isolated network. Your custom service or script can fetch a list of this equipment from your CMDB system, along with any additional information. It can then use this data to create hosts in Zabbix and link the necessary templates using ZabbixAPI based on the received information. If the information from CMDB is insufficient, you can request data directly from the configured Zabbix Agent using Getter and then use this information for further configuration and decision-making during setup. Another part of your script can access AD to get a list of branch users to update the list of users in Zabbix through the API and assign them the appropriate permissions and roles based on information from AD or CMDB (e.g., editing rights for server owners).

Another use case of the library may be when you regularly export templates from Zabbix for subsequent import into a version control system. You can also establish a mechanism for loading changes and rolling back to previous versions of templates. Here a variety of other use cases can also be implemented – it’s all up to your requirements and the creative usage of the library.

Of course, if you are a developer and there is a requirement to implement Zabbix monitoring support for your custom system or tool, you can implement sending data describing any events generated by your custom system/tool to Zabbix using Sender.

Installation and Configuration

To begin with, you need to install the zabbix_utils library. You can do this in two main ways:

By using pip:

~$ pip install zabbix_utils

By cloning from GitHub:

~$ git clone https://github.com/zabbix/python-zabbix-utils
~$ cd python-zabbix-utils/
~$ python setup.py install

No additional configuration is required. But you can specify values for the following environment variables: ZABBIX_URL, ZABBIX_TOKEN, ZABBIX_USER, ZABBIX_PASSWORD if you need. These use cases are described in more detail below.

Working with Zabbix API

To work with Zabbix API, it is necessary to import the ZabbixAPI class from the zabbix_utils library:

from zabbix_utils import ZabbixAPI

If you are using one of the existing popular community libraries, in most cases, it will be sufficient to simply replace the ZabbixAPI import statement with an import from our library.

At that point you need to create an instance of the ZabbixAPI class. T4here are several usage scenarios:

Use preset values of environment variables, i.e., not pass any parameters to ZabbixAPI:

~$ export ZABBIX_URL="https://zabbix.example.local"
~$ export ZABBIX_USER="Admin"
~$ export ZABBIX_PASSWORD="zabbix"

from zabbix_utils import ZabbixAPI


api = ZabbixAPI()

Pass only the Zabbix API address as input, which can be specified as either the server IP/FQDN address or DNS name (in this case, the HTTP protocol will be used) or as an URL, and the authentication data should still be specified as values for environment variables:

~$ export ZABBIX_USER="Admin"
~$ export ZABBIX_PASSWORD="zabbix"

from zabbix_utils import ZabbixAPI

api = ZabbixAPI(url="127.0.0.1")

Pass only the Zabbix API address to ZabbixAPI, as in the example above, and pass the authentication data later using the login() method:

from zabbix_utils import ZabbixAPI

api = ZabbixAPI(url="127.0.0.1")
api.login(user="Admin", password="zabbix")

Pass all parameters at once when creating an instance of ZabbixAPI; in this case, there is no need to subsequently call login():

from zabbix_utils import ZabbixAPI

api = ZabbixAPI(
    url="127.0.0.1",
    user="Admin",
    password="zabbix"
)

The ZabbixAPI class supports working with various Zabbix versions, automatically checking the API version during initialization. You can also work with the Zabbix API version as an object as follows:

from zabbix_utils import ZabbixAPI

api = ZabbixAPI()

# ZabbixAPI version field
ver = api.version
print(type(ver).__name__, ver) # APIVersion 6.0.24

# Method to get ZabbixAPI version
ver = api.api_version()
print(type(ver).__name__, ver) # APIVersion 6.0.24

# Additional methods
print(ver.major)    # 6.0
print(ver.minor)    # 24
print(ver.is_lts()) # True

As a result, you will get an APIVersion object that has major and minor fields returning the respective minor and major parts of the current version, as well as the is_lts() method, returning true if the current version is LTS (Long Term Support), and false otherwise. The APIVersion object can also be compared to a version represented as a string or a float number:

# Version comparison
print(ver < 6.4)      # True
print(ver != 6.0)     # False
print(ver != "6.0.5") # True

If the account and password (or starting from Zabbix 5.4 – token instead of login/password) are not set as environment variable values or during the initialization of ZabbixAPI, then it is necessary to call the login() method for authentication:

from zabbix_utils import ZabbixAPI

api = ZabbixAPI(url="127.0.0.1")
api.login(token="xxxxxxxx")

After authentication, you can make any API requests described for all supported versions in the Zabbix documentation.

The format for calling API methods looks like this:

api_instance.zabbix_object.method(parameters)

For example:

api.host.get()

After completing all the necessary API requests, it’s necessary to execute logout() if authentication was done using login and password:

api.logout()

More examples of usage can be found here.

Sending Values to Zabbix Server/Proxy

There is often a need to send values to Zabbix Trapper. For this purpose, the zabbix_sender utility is provided. However, if your service or script sending this data is written in Python, calling an external utility may not be very convenient. Therefore, we have developed the Sender, which will help you send values to Zabbix server or proxy one by one or in groups. To work with Sender, you need to import it as follows:

from zabbix_utils import Sender

After that, you can send a single value:

from zabbix_utils import Sender

sender = Sender(server='127.0.0.1', port=10051)
resp = sender.send_value('example_host', 'example.key', 50, 1702511920)

Alternatively, you can put them into a group for simultaneous sending, for which you need to additionally import ItemValue:

from zabbix_utils import ItemValue, Sender


items = [
    ItemValue('host1', 'item.key1', 10),
    ItemValue('host1', 'item.key2', 'Test value'),
    ItemValue('host2', 'item.key1', -1, 1702511920),
    ItemValue('host3', 'item.key1', '{"msg":"Test value"}'),
    ItemValue('host2', 'item.key1', 0, 1702511920, 100)
]

sender = Sender('127.0.0.1', 10051)
response = sender.send(items)

For cases when there is a necessity to send more values than Zabbix Trapper can accept at one time, there is an option for fragmented sending, i.e. sequential sending in separate fragments (chunks). By default, the chunk size is set to 250 values. In other words, when sending values in bulk, the 400 values passed to the send() method for sending will be sent in two stages. 250 values will be sent first, and the remaining 150 values will be sent after receiving a response. The chunk size can be changed, to do this, you simply need to specify your value for the chunk_size parameter when initializing Sender:

from zabbix_utils import ItemValue, Sender


items = [
    ItemValue('host1', 'item.key1', 10),
    ItemValue('host1', 'item.key2', 'Test value'),
    ItemValue('host2', 'item.key1', -1, 1702511920),
    ItemValue('host3', 'item.key1', '{"msg":"Test value"}'),
    ItemValue('host2', 'item.key1', 0, 1702511920, 100)
]

sender = Sender('127.0.0.1', 10051, chunk_size=2)
response = sender.send(items)

In the example above, the chunk size is set to 2. So, 5 values passed will be sent in three requests of two, two, and one value, respectively.

If your server has multiple network interfaces, and values need to be sent from a specific one, the Sender provides the option to specify a source_ip for the sent values:

from zabbix_utils import Sender

sender = Sender(
    server='zabbix.example.local',
    port=10051,
    source_ip='10.10.7.1'
)
resp = sender.send_value('example_host', 'example.key', 50, 1702511920)

It also supports reading connection parameters from the Zabbix Agent/Agent2 configuration file. To do this, set the use_config flag, after which it is not necessary to pass connection parameters when creating an instance of Sender:

from zabbix_utils import Sender

sender = Sender(
    use_config=True,
    config_path='/etc/zabbix/zabbix_agent2.conf'
)
response = sender.send_value('example_host', 'example.key', 50, 1702511920)

Since the Zabbix Agent/Agent2 configuration file can specify one or even several Zabbix clusters consisting of multiple Zabbix server instances, Sender will send data to the first available server of each cluster specified in the ServerActive parameter in the configuration file. In case the ServerActive parameter is not specified in the Zabbix Agent/Agent2 configuration file, the server address from the Server parameter with the standard Zabbix Trapper port – 10051 will be taken.

By default, Sender returns the aggregated result of sending across all clusters. But it is possible to get more detailed information about the results of sending for each chunk and each cluster:

print(response)
# {"processed": 2, "failed": 0, "total": 2, "time": "0.000108", "chunk": 2}

if response.failed == 0:
    print(f"Value sent successfully in {response.time}")
else:
    print(response.details)
    # {
    #     127.0.0.1:10051: [
    #         {
    #             "processed": 1,
    #             "failed": 0,
    #             "total": 1,
    #             "time": "0.000051",
    #             "chunk": 1
    #         }
    #     ],
    #     zabbix.example.local:10051: [
    #         {
    #             "processed": 1,
    #             "failed": 0,
    #             "total": 1,
    #             "time": "0.000057",
    #             "chunk": 1
    #         }
    #     ]
    # }
    for node, chunks in response.details.items():
        for resp in chunks:
            print(f"processed {resp.processed} of {resp.total} at {node.address}:{node.port}")
            # processed 1 of 1 at 127.0.0.1:10051
            # processed 1 of 1 at zabbix.example.local:10051

More usage examples can be found here.

Getting values from Zabbix Agent/Agent2 by item key.

Sometimes it can also be useful to directly retrieve values from the Zabbix Agent. To assist with this task, zabbix_utils provides the Getter. It performs the same function as the zabbix_get utility, allowing you to work natively within Python code. Getter is straightforward to use; just import it, create an instance by passing the Zabbix Agent’s address and port, and then call the get() method, providing the data item key for the value you want to retrieve:

from zabbix_utils import Getter

agent = Getter('10.8.54.32', 10050)
resp = agent.get('system.uname')

In cases where your server has multiple network interfaces, and requests need to be sent from a specific one, you can specify the source_ip for the Agent connection:

from zabbix_utils import Getter

agent = Getter(
    host='zabbix.example.local',
    port=10050,
    source_ip='10.10.7.1'
)
resp = agent.get('system.uname')

The response from the Zabbix Agent will be processed by the library and returned as an object of the AgentResponse class:

print(resp)
# {
#     "error": null,
#     "raw": "Linux zabbix_server 5.15.0-3.60.5.1.el9uek.x86_64",
#     "value": "Linux zabbix_server 5.15.0-3.60.5.1.el9uek.x86_64"
# }

print(resp.error)
# None

print(resp.value)
# Linux zabbix_server 5.15.0-3.60.5.1.el9uek.x86_64

More usage examples can be found here.

Conclusions

The zabbix_utils library for Python allows you to take full advantage of monitoring using Zabbix, without limiting yourself to the integrations available out of the box. It can be valuable for both DevOps and SRE engineers, as well as Python developers looking to implement monitoring support for their system using Zabbix.

In the next article, we will thoroughly explore integration with an external service using this library to demonstrate the capabilities of zabbix_utils more comprehensively.

Questions

Q: Which Agent versions are supported for Getter?

A: Supported versions of Zabbix Agents are the same as Zabbix API versions, as specified in the readme file. Our goal is to create a library with full support for all Zabbix components of the same version.

Q: Does Getter support Agent encryption?

A: Encryption support is not yet built into Sender and Getter, but you can create your wrapper using third-party libraries for both.

from zabbix_utils import Sender

def psk_wrapper(sock, tls):
    # ...
    # Implementation of TLS PSK wrapper for the socket
    # ...

sender = Sender(
    server='zabbix.example.local',
    port=10051,
    socket_wrapper=psk_wrapper
)

More examples can be found here.

Q: Is it possible to set a timeout value for Getter?

A: The response timeout value can be set for the Getter, as well as for ZabbixAPI and Sender. In all cases, the timeout is set for waiting for any responses to requests.

# Example of setting a timeout for Sender
sender = Sender(server='127.0.0.1', port=10051, timeout=30)

# Example of setting a timeout for Getter
agent = Getter(host='127.0.0.1', port=10050, timeout=30)

Q: Is parallel (asynchronous) mode supported?

A: Currently, the library does not include asynchronous classes and methods, but we plan to develop asynchronous versions of ZabbixAPI and Sender.

Q: Is it possible to specify multiple servers when sending through Sender without specifying a configuration file (for working with an HA cluster)?

A: Yes, it’s possible by the following way:

from zabbix_utils import Sender


zabbix_clusters = [
    [
        'zabbix.cluster1.node1',
        'zabbix.cluster1.node2:10051'
    ],
    [
        'zabbix.cluster2.node1:10051',
        'zabbix.cluster2.node2:20051',
        'zabbix.cluster2.node3'
    ]
]

sender = Sender(clusters=zabbix_clusters)
response = sender.send_value('example_host', 'example.key', 10, 1702511922)

print(response)
# {"processed": 2, "failed": 0, "total": 2, "time": "0.000103", "chunk": 2}

print(response.details)
# {
#     "zabbix.cluster1.node1:10051": [
#         {
#             "processed": 1,
#             "failed": 0,
#             "total": 1,
#             "time": "0.000050",
#             "chunk": 1
#         }
#     ],
#     "zabbix.cluster2.node2:20051": [
#         {
#             "processed": 1,
#             "failed": 0,
#             "total": 1,
#             "time": "0.000053",
#             "chunk": 1
#         }
#     ]
# }

The post Introducing zabbix_utils – the official Python library for Zabbix API appeared first on Zabbix Blog.

Building resilient serverless applications using chaos engineering

2023-09-18 Marcia Villalba

Post Syndicated from Marcia Villalba original https://aws.amazon.com/blogs/compute/building-resilient-serverless-applications-using-chaos-engineering/

This post is written by Suranjan Choudhury (Head of TME and ITeS SA) and Anil Sharma (Sr PSA, Migration)

Chaos engineering is the process of stressing an application in testing or production environments by creating disruptive events, such as outages, observing how the system responds, and implementing improvements. Chaos engineering helps you create the real-world conditions needed to uncover hidden issues and performance bottlenecks that are challenging to find in distributed applications.

You can build resilient distributed serverless applications using AWS Lambda and test Lambda functions in real world operating conditions using chaos engineering. This blog shows an approach to inject chaos in Lambda functions, making no change to the Lambda function code. This blog uses the AWS Fault Injection Simulator (FIS) service to create experiments that inject disruptions for Lambda based serverless applications.

AWS FIS is a managed service that performs fault injection experiments on your AWS workloads. AWS FIS is used to set up and run fault experiments that simulate real-world conditions to discover application issues that are difficult to find otherwise. You can improve application resilience and performance using results from FIS experiments.

The sample code in this blog introduces random faults to existing Lambda functions, like an increase in response times (latency) or random failures. You can observe application behavior under introduced chaos and make improvements to the application.

Approaches to inject chaos in Lambda functions

AWS FIS currently does not support injecting faults in Lambda functions. However, there are two main approaches to inject chaos in Lambda functions: using external libraries or using Lambda layers.

Developers have created libraries to introduce failure conditions to Lambda functions, such as chaos_lambda and failure-Lambda. These libraries allow developers to inject elements of chaos into Python and Node.js Lambda functions. To inject chaos using these libraries, developers must decorate the existing Lambda function’s code. Decorator functions wrap the existing Lambda function, adding chaos at runtime. This approach requires developers to change the existing Lambda functions.

You can also use Lambda layers to inject chaos, requiring no change to the function code, as the fault injection is separated. Since the Lambda layer is deployed separately, you can independently change the element of chaos, like latency in response or failure of the Lambda function. This blog post discusses this approach.

Injecting chaos in Lambda functions using Lambda layers

A Lambda layer is a .zip file archive that contains supplementary code or data. Layers usually contain library dependencies, a custom runtime, or configuration files. This blog creates an FIS experiment that uses Lambda layers to inject disruptions in existing Lambda functions for Java, Node.js, and Python runtimes.

The Lambda layer contains the fault injection code. It is invoked prior to invocation of the Lambda function and injects random latency or errors. Injecting random latency simulates real world unpredictable conditions. The Java, Node.js, and Python chaos injection layers provided are generic and reusable. You can use them to inject chaos in your Lambda functions.

The Chaos Injection Lambda Layers

Java Lambda Layer for Chaos Injection

The chaos injection layer for Java Lambda functions uses the JAVA_TOOL_OPTIONS environment variable. This environment variable allows specifying the initialization of tools, specifically the launching of native or Java programming language agents. The JAVA_TOOL_OPTIONS has a javaagent parameter that points to the chaos injection layer. This layer uses Java’s premain method and the Byte Buddy library for modifying the Lambda function’s Java class during runtime.

When the Lambda function is invoked, the JVM uses the class specified with the javaagent parameter and invokes its premain method before the Lambda function’s handler invocation. The Java premain method injects chaos before Lambda runs.

The FIS experiment adds the layer association and the JAVA_TOOL_OPTIONS environment variable to the Lambda function.

Python and Node.js Lambda Layer for Chaos Injection

When injecting chaos in Python and Node.js functions, the Lambda function’s handler is replaced with a function in the respective layers by the FIS aws:ssm:start-automation-execution action. The automation, which is an SSM document, saves the original Lambda function’s handler to in AWS Systems Manager Parameter Store, so that the changes can be rolled back once the experiment is finished.

The layer function contains the logic to inject chaos. At runtime, the layer function is invoked, injecting chaos in the Lambda function. The layer function in turn invokes the Lambda function’s original handler, so that the functionality is fulfilled.

The result in all runtimes (Java, Python, or Node.js), is invocation of the original Lambda function with latency or failure injected. The observed changes are random latency or failure injected by the layer.

Once the experiment is completed, an SSM document is provided. This rolls back the layer’s association to the Lambda function and removes the environment variable, in the case of the Java runtime.

Sample FIS experiments using SSM and Lambda layers

In the sample code provided, Lambda layers are provided for Python, Node.js and Java runtimes along with sample Lambda functions for each runtime.

The sample deploys the Lambda layers and the Lambda functions, FIS experiment template, AWS Identity and Access Management (IAM) roles needed to run the experiment, and the AWS Systems Manger (SSM) Documents. AWS CloudFormation template is provided for deployment.

Step 1: Complete the prerequisites

To deploy the sample code, clone the repository locally:
git clone https://github.com/aws-samples/chaosinjection-lambda-samples.git
Complete the prerequisites documented here.

Step 2: Deploy using AWS CloudFormation

The CloudFormation template provided along with this blog deploys sample code. Execute runCfn.sh.

When this is complete, it returns the StackId that CloudFormation created:

Step 3: Run the chaos injection experiment

By default, the experiment is configured to inject chaos in the Java sample Lambda function. To change it to Python or Node.js Lambda functions, edit the experiment template and configure it to inject chaos using steps from here.

Step 4: Start the experiment

From the FIS Console, choose Start experiment.

Wait until the experiment state changes to “Completed”.

Step 5: Run your test

At this stage, you can inject chaos into your Lambda function. Run the Lambda functions and observe their behavior.

1. Invoke the Lambda function using the command below:

aws lambda invoke --function-name NodeChaosInjectionExampleFn out --log-type Tail --query 'LogResult' --output text | base64 -d

2. The CLI commands output displays the logs created by the Lambda layers showing latency introduced in this invocation.

In this example, the output shows that the Lambda layer injected 1799ms of random latency to the function.

The experiment injects random latency or failure in the Lambda function. Running the Lambda function again results in a different latency or failure. At this stage, you can test the application, and observe its behavior under conditions that may occur in the real world, like an increase in latency or Lambda function’s failure.

Step 6: Roll back the experiment

To roll back the experiment, run the SSM document for rollback. This rolls back the Lambda function to the state before chaos injection. Run this command:

aws ssm start-automation-execution \
--document-name “InjectLambdaChaos-Rollback” \
--document-version “\$DEFAULT” \
--parameters \
‘{“FunctionName”:[“FunctionName”],”LayerArn”:[“LayerArn”],”assumeRole”:[“RoleARN
”]}’ \
--region eu-west-2

Cleaning up

To avoid incurring future charges, clean up the resources created by the CloudFormation template by running the following CLI command. Update the stack name to the one you provided when creating the stack.

aws cloudformation delete-stack --stack-name myChaosStack

Using FIS Experiments results

You can use FIS experiment results to validate expected system behavior. An example of expected behavior is: “If application latency increases by 10%, there is less than a 1% increase in sign in failures.” After the experiment is completed, evaluate whether the application resiliency aligns with your business and technical expectations.

Conclusion

This blog explains an approach for testing reliability and resilience in Lambda functions using chaos engineering. This approach allows you to inject chaos in Lambda functions without changing the Lambda function code, with clear segregation of chaos injection and business logic. It provides a way for developers to focus on building business functionality using Lambda functions.

The Lambda layers that inject chaos can be developed and managed separately. This approach uses AWS FIS to run experiments that inject chaos using Lambda layers and test serverless application’s performance and resiliency. Using the insights from the FIS experiment, you can find, fix, or document risks that surface in the application while testing.

For more serverless learning resources, visit Serverless Land.

How to add notifications and manual approval to an AWS CDK Pipeline

2023-08-17 Jehu Gray

Post Syndicated from Jehu Gray original https://aws.amazon.com/blogs/devops/how-to-add-notifications-and-manual-approval-to-an-aws-cdk-pipeline/

A deployment pipeline typically comprises several stages such as dev, test, and prod, which ensure that changes undergo testing before reaching the production environment. To improve the reliability and stability of release processes, DevOps teams must review Infrastructure as Code (IaC) changes before applying them in production. As a result, implementing a mechanism for notification and manual approval that grants stakeholders improved access to changes in their release pipelines has become a popular practice for DevOps teams.

Notifications keep development teams and stakeholders informed in real-time about updates and changes to deployment status within release pipelines. Manual approvals establish thresholds for transitioning a change from one stage to the next in the pipeline. They also act as a guardrail to mitigate risks arising from errors and rework because of faulty deployments.

Please note that manual approvals, as described in this post, are not a replacement for the use of automation. Instead, they complement automated checks within the release pipeline.

In this blog post, we describe how to set up notifications and add a manual approval stage to AWS Cloud Development Kit (AWS CDK) Pipeline.

Concepts

CDK Pipeline

CDK Pipelines is a construct library for painless continuous delivery of CDK applications. CDK Pipelines can automatically build, test, and deploy changes to CDK resources. CDK Pipelines are self-mutating which means as application stages or stacks are added, the pipeline automatically reconfigures itself to deploy those new stages or stacks. Pipelines need only be manually deployed once, afterwards, the pipeline keeps itself up to date from the source code repository by pulling the changes pushed to the repository.

Notifications

Adding notifications to a pipeline provides visibility to changes made to the environment by utilizing the NotificationRule construct. You can also use this rule to notify pipeline users of important changes, such as when a pipeline starts execution. Notification rules specify both the events and the targets, such as Amazon Simple Notification Service (Amazon SNS) topic or AWS Chatbot clients configured for Slack which represents the nominated recipients of the notifications. An SNS topic is a logical access point that acts as a communication channel while Chatbot is an AWS service that enables DevOps and software development teams to use messaging program chat rooms to monitor and respond to operational events.

Manual Approval

In a CDK pipeline, you can incorporate an approval action at a specific stage, where the pipeline should pause, allowing a team member or designated reviewer to manually approve or reject the action. When an approval action is ready for review, a notification is sent out to alert the relevant parties. This combination of notifications and approvals ensures timely and efficient decision-making regarding crucial actions within the pipeline.

Solution Overview

The solution explains a simple web service that is comprised of an AWS Lambda function that returns a static web page served by Amazon API Gateway. Since Continuous Deployment and Continuous Integration (CI/CD) are important components to most web projects, the team implements a CDK Pipeline for their web project.

There are two important stages in this CDK pipeline; the Pre-production stage for testing and the Production stage, which contains the end product for users.

The flow of the CI/CD process to update the website starts when a developer pushes a change to the repository using their Integrated Development Environment (IDE). An Amazon CloudWatch event triggers the CDK Pipeline. Once the changes reach the pre-production stage for testing, the CI/CD process halts. This is because a manual approval gate is between the pre-production and production stages. So, it becomes a stakeholder’s responsibility to review the changes in the pre-production stage before approving them for production. The pipeline includes an SNS notification that notifies the stakeholder whenever the pipeline requires manual approval.

After approving the changes, the CI/CD process proceeds to the production stage and the updated version of the website becomes available to the end user. If the approver rejects the changes, the process ends at the pre-production stage with no impact to the end user.

The following diagram illustrates the solution architecture.

This diagram shows the CDK pipeline process in the solution and how applications or updates are deployed using AWS Lambda Function to end users.

Figure 1. This image shows the CDK pipeline process in our solution and how applications or updates are deployed using AWS Lambda Function to end users.

Prerequisites

For this walkthrough, you should have the following prerequisites:

An AWS account
Install Python version 3.6 or later
A basic understanding of CDK and CDK Pipelines. Please go through the Python Workshop on cdkworkshop.com to follow along with the code examples and get hands-on learning about CDK and related concepts.
Install AWS CDK version 2.73.0 or later
Set up a CDK pipeline, and have a basic understanding of how SNS works .
Since the pipeline stack is being modified, there may be a need to run cdk deploy locally again.
NOTE: The CDK Pipeline code structure used in the CDK workshop can be found here: Pipeline stack Code.

Add notification to the pipeline

In this tutorial, perform the following steps:

Add the import statements for AWS CodeStar notifications and SNS to the import section of the pipeline stack py

import aws_cdk.aws_codestarnotifications as notifications
import aws_cdk.pipelines as pipelines
import aws_cdk.aws_sns as sns
import aws_cdk.aws_sns_subscriptions as subs

Ensure the pipeline is built by calling the ‘build pipeline’ function.

pipeline.build_pipeline()

Create an SNS topic.

topic = sns.Topic(self, "MyTopic1")

Add a subscription to the topic. This specifies where the notifications are sent (Add the stakeholders’ email here).

topic.add_subscription(subs.EmailSubscription("[email protected]"))

Define a rule. This contains the source for notifications, the event trigger, and the target .

rule = notifications.NotificationRule(self, "NotificationRule", )

Assign the source the value pipeline.pipeline The first pipeline is the name of the CDK pipeline(variable) and the .pipeline is to show it is a pipeline(function).

source=pipeline.pipeline,

Define the events to be monitored. Specify notifications for when the pipeline starts, when it fails, when the execution succeeds, and finally when manual approval is needed.

events=["codepipeline-pipeline-pipeline-execution-started", "codepipeline-pipeline-pipeline-execution-failed","codepipeline-pipeline-pipeline-execution-succeeded", 
"codepipeline-pipeline-manual-approval-needed"],

For the complete list of supported event types for pipelines, see here
Finally, add the target. The target here is the topic created previously.

targets=[topic]

The combination of all the steps becomes:

pipeline.build_pipeline()
topic = sns.Topic(self, "MyTopic1")
topic.add_subscription(subs.EmailSubscription("[email protected]"))
rule = notifications.NotificationRule(self, "NotificationRule",
source=pipeline.pipeline,
events=["codepipeline-pipeline-pipeline-execution-started", "codepipeline-pipeline-pipeline-execution-failed","codepipeline-pipeline-pipeline-execution-succeeded", 
"codepipeline-pipeline-manual-approval-needed"],
targets=[topic]
)

Adding Manual Approval

Add the ManualApprovalStep import to the aws_cdk.pipelines import statement.

from aws_cdk.pipelines import (
CodePipeline,
CodePipelineSource,
ShellStep,
ManualApprovalStep
)

Add the ManualApprovalStep to the production stage. The code must be added to the add_stage() function.

 prod = WorkshopPipelineStage(self, "Prod")
        prod_stage = pipeline.add_stage(prod,
            pre = [ManualApprovalStep('PromoteToProduction')])

When a stage is added to a pipeline, you can specify the pre and post steps, which are arbitrary steps that run before or after the contents of the stage. You can use them to add validations like manual or automated gates to the pipeline. It is recommended to put manual approval gates in the set of pre steps, and automated approval gates in the set of post steps. So, the manual approval action is added as a pre step that runs after the pre-production stage and before the production stage .

The final version of the pipeline_stack.py becomes:

from constructs import Construct
import aws_cdk as cdk
import aws_cdk.aws_codestarnotifications as notifications
import aws_cdk.aws_sns as sns
import aws_cdk.aws_sns_subscriptions as subs
from aws_cdk import (
    Stack,
    aws_codecommit as codecommit,
    aws_codepipeline as codepipeline,
    pipelines as pipelines,
    aws_codepipeline_actions as cpactions,
    
)
from aws_cdk.pipelines import (
    CodePipeline,
    CodePipelineSource,
    ShellStep,
    ManualApprovalStep
)


class WorkshopPipelineStack(cdk.Stack):
    def __init__(self, scope: Construct, id: str, **kwargs) -> None:
        super().__init__(scope, id, **kwargs)
        
        # Creates a CodeCommit repository called 'WorkshopRepo'
        repo = codecommit.Repository(
            self, "WorkshopRepo", repository_name="WorkshopRepo",
            
        )
        
        #Create the Cdk pipeline
        pipeline = pipelines.CodePipeline(
            self,
            "Pipeline",
            
            synth=pipelines.ShellStep(
                "Synth",
                input=pipelines.CodePipelineSource.code_commit(repo, "main"),
                commands=[
                    "npm install -g aws-cdk",  # Installs the cdk cli on Codebuild
                    "pip install -r requirements.txt",  # Instructs Codebuild to install required packages
                    "npx cdk synth",
                ]
                
            ),
        )

        
         # Create the Pre-Prod Stage and its API endpoint
        deploy = WorkshopPipelineStage(self, "Pre-Prod")
        deploy_stage = pipeline.add_stage(deploy)
    
        deploy_stage.add_post(
            
            pipelines.ShellStep(
                "TestViewerEndpoint",
                env_from_cfn_outputs={
                    "ENDPOINT_URL": deploy.hc_viewer_url
                },
                commands=["curl -Ssf $ENDPOINT_URL"],
            )
    
        
        )
        deploy_stage.add_post(
            pipelines.ShellStep(
                "TestAPIGatewayEndpoint",
                env_from_cfn_outputs={
                    "ENDPOINT_URL": deploy.hc_endpoint
                },
                commands=[
                    "curl -Ssf $ENDPOINT_URL",
                    "curl -Ssf $ENDPOINT_URL/hello",
                    "curl -Ssf $ENDPOINT_URL/test",
                ],
            )
            
        )
        
        # Create the Prod Stage with the Manual Approval Step
        prod = WorkshopPipelineStage(self, "Prod")
        prod_stage = pipeline.add_stage(prod,
            pre = [ManualApprovalStep('PromoteToProduction')])
        
        prod_stage.add_post(
            
            pipelines.ShellStep(
                "ViewerEndpoint",
                env_from_cfn_outputs={
                    "ENDPOINT_URL": prod.hc_viewer_url
                },
                commands=["curl -Ssf $ENDPOINT_URL"],
                
            )
            
        )
        prod_stage.add_post(
            pipelines.ShellStep(
                "APIGatewayEndpoint",
                env_from_cfn_outputs={
                    "ENDPOINT_URL": prod.hc_endpoint
                },
                commands=[
                    "curl -Ssf $ENDPOINT_URL",
                    "curl -Ssf $ENDPOINT_URL/hello",
                    "curl -Ssf $ENDPOINT_URL/test",
                ],
            )
            
        )
        
        # Create The SNS Notification for the Pipeline
        
        pipeline.build_pipeline()
        
        topic = sns.Topic(self, "MyTopic")
        topic.add_subscription(subs.EmailSubscription("[email protected]"))
        rule = notifications.NotificationRule(self, "NotificationRule",
            source = pipeline.pipeline,
            events = ["codepipeline-pipeline-pipeline-execution-started", "codepipeline-pipeline-pipeline-execution-failed", "codepipeline-pipeline-manual-approval-needed", "codepipeline-pipeline-manual-approval-succeeded"],
            targets=[topic]
            )

When a commit is made with git commit -am "Add manual Approval" and changes are pushed with git push, the pipeline automatically self-mutates to add the new approval stage.

Now when the developer pushes changes to update the build environment or the end user application, the pipeline execution stops at the point where the approval action was added. The pipeline won’t resume unless a manual approval action is taken.

Image showing the CDK pipeline with the added Manual Approval action on the AWS Management Console

Figure 2. This image shows the pipeline with the added Manual Approval action.

Since there is a notification rule that includes the approval action, an email notification is sent with the pipeline information and approval status to the stakeholder(s) subscribed to the SNS topic.

Image showing the SNS email notification sent when the pipeline starts

Figure 3. This image shows the SNS email notification sent when the pipeline starts.

After pushing the updates to the pipeline, the reviewer or stakeholder can use the AWS Management Console to access the pipeline to approve or deny changes based on their assessment of these changes. This process helps eliminate any potential issues or errors and ensures only changes deemed relevant are made.

Image showing the review action on the AWS Management Console that gives the stakeholder the ability to approve or reject any changes.

Figure 4. This image shows the review action that gives the stakeholder the ability to approve or reject any changes.

If a reviewer rejects the action, or if no approval response is received within seven days of the pipeline stopping for the review action, the pipeline status is “Failed.”

Image showing when a stakeholder rejects the action

Figure 5. This image depicts when a stakeholder rejects the action.

If a reviewer approves the changes, the pipeline continues its execution.

Image showing when a stakeholder approves the action

Figure 6. This image depicts when a stakeholder approves the action.

Considerations

It is important to consider any potential drawbacks before integrating a manual approval process into a CDK pipeline. one such consideration is its implementation may delay the delivery of updates to end users. An example of this is business hours limitation. The pipeline process might be constrained by the availability of stakeholders during business hours. This can result in delays if changes are made outside regular working hours and require approval when stakeholders are not immediately accessible.

Clean up

To avoid incurring future charges, delete the resources. Use cdk destroy via the command line to delete the created stack.

Conclusion

Adding notifications and manual approval to CDK Pipelines provides better visibility and control over the changes made to the pipeline environment. These features ideally complement the existing automated checks to ensure that all updates are reviewed before deployment. This reduces the risk of potential issues arising from bugs or errors. The ability to approve or deny changes through the AWS Management Console makes the review process simple and straightforward. Additionally, SNS notifications keep stakeholders updated on the status of the pipeline, ensuring a smooth and seamless deployment process.

Optimizing data with automated intelligent document processing solutions

2023-04-28 Deependra Shekhawat

Post Syndicated from Deependra Shekhawat original https://aws.amazon.com/blogs/architecture/optimizing-data-with-automated-intelligent-document-processing-solutions/

Many organizations struggle to effectively manage and derive insights from the large amount of unstructured data locked in emails, PDFs, images, scanned documents, and more. The variety of formats, document layouts, and text makes it difficult for any standard Optical Character Recognition (OCR) to extract key insights from these data sources.

To help organizations overcome these document management and information extraction challenges, AWS offers connected, pre-trained artificial intelligence (AI) service APIs that help drive business outcomes from these document-based rich data sources.

This blog post describes a cost-effective, scalable automated intelligent document processing solution that leverages a Natural Processing Language (NLP) engine using Amazon Textract and Amazon Comprehend. This solution helps customers take advantage of industry leading machine learning (ML) technology in their document workflows without the need for in-house ML expertise.

Customer document management challenges

Customers across industry verticals experience the following document management challenges:

Extraction process accuracy varies significantly when applied to diverse sources; specifically handwritten text, images, and scanned documents.
Existing scripting and rule-based solutions cannot provide customer domain or problem-specific classifiers.
Traditional document management systems cannot consider feedback from domain experts to improve the learning process.
The Personally Identifiable Information (PII) data-handling is not robust or customizable, causing data privacy leakage concern.
Many manual interventions are required to complete the entire process.

Automated intelligent document processing solution

We introduced an automated intelligent document processing implementation to address key document management challenges. At the heart of the solution is a NLP engine that combines:

Amazon Textract
Amazon Comprehend
Amazon SageMaker
Custom regular expression-based Python parser

The full solution also leverages other AWS services as described in the following diagram (Figure 1) and steps to develop and operate a cost-effective and scalable architecture for document processing. It effectively extracts text from document types including PDFs, images, scanned documents, Microsoft Excel workbooks, and more.

Figure 1: AI-based intelligent document processing engine

Solution overview

Let’s explore the automated intelligent document processing solution step by step.

The document upload engine or business users upload the respective files or documents through a custom web application to the designated Amazon Simple Storage Service (Amazon S3) bucket.
The event-based architecture signals an Amazon S3 push event to invoke the respective AWS Lambda function to start document pre-processing.
The Lambda function evaluates the document payload, leverages Amazon Simple Queue Service (Amazon SQS) for async processing, prepares document metadata, stores it in Amazon DynamoDB, and calls the NLP engine to perform the information extraction process.
The NLP engine leverages Amazon Textract for text extraction from a variety of sources and leverages document metadata to optimize the appropriate API calls (for example, form, tabular, or PDF).
- Amazon Textract output is fed into Amazon Comprehend which consumes the extracted text and performs entity parsing, line/paragraph-based sentiment analysis, and document/paragraph classification. For better accuracy, we leverage a custom classifier within Amazon Comprehend.
- Amazon Comprehend also provides key APIs to mask PII data before it is used for any further consumption. The solution offers the ability to configure masking rules for each PII entity per masking requirements.
- To ensure the solution has capability to handle data from Microsoft Excel workbooks, we developed a custom parser using Python running inside an AWS Lambda function. Depending on the document metadata, this function can be invoked.
Output of Amazon Comprehend is then fed to ML models deployed using Amazon SageMaker depending on additional use cases configured by the customer to complement the overall process with ML-based recommendations, predictions, and personalization.
Once the NLP engine completes its processing, the job completion notification event signals another AWS Lambda function and updates the status in the respective Amazon SQS queue.
The Lambda post-processing function parses the resultant content generated by the NLP engine and stores it in the Amazon DynamoDB and Amazon S3 bucket. This step is responsible for the required data augmentation, key entities validation, and default value assignment to create a data structure that could be consumed by the presentation/visualization layer.
Users get the flexibility to see the extracted information and compare it with the original document extract in the custom user interface (UI). They can provide their feedback on extraction and entity parsing accuracy. From a user access management perspective, Amazon Cognito provides authorization and authentication.

Customer benefits

The automated intelligent document processing solution helps customers:

Increase overall document management efficiency by 50-60%, leveraging automation and nullifying manual interventions
Reduce in-house team involvement in administrative activities by up to 70% using integrated and connected processing workflows
Gain better visibility into key contractual obligations with features such as Document Classification (helps properly route documents to the respective process/team) and Obligation Extraction
Utilize a UI-based feedback mechanism for in-house domain experts/reviewers to see and validate the extracted information and offer feedback to inform further model training

From a cost-optimization perspective, depending on document type and required information, only the respective Amazon Textract APIs calls are submitted. (For example, it is not worth using form/table-based Textract API calls for a Know Your Customer (KYC) document such as a driver’s license or passport when the AnalyzeID API is the most efficient solution.)

To maximize solution benefits, customers should invest time in building well-defined taxonomies ahead of using the document processing solution to accommodate their own use cases or industry domain-specific requirements. Their taxonomy input highlights only relevant keys and takes respective actions in case the requires keys are not extracted.

Vertical industry use cases

As mentioned, this document processing solution can be used across industry segments. Let’s explore some practical use cases. For example, it can help insurance industry professionals to accelerate claim processing and customer KYC-related processes. By extracting the key entities from the claim documents, mapping them against the customer defined taxonomy, and integrating with Amazon SageMaker models for anomaly detection (anomalous claims), insurance providers can improve claim management and customer satisfaction.

In the healthcare industry, the solution can help with medical records and report processing, key medical entity extraction, and customer data masking.

The document processing solution can help the banking industry by automating check processing and delivering the ability to extract key entities like payer, payee, date, and amount from the checks.

Conclusion

Manual document processing is resource-intensive, time consuming, and costly. Customers need to allocate resources to process large volume documents, lowering business agility. Their employees are performing manual “stare and compare” tasks, potentially reducing worker morale and preventing them from focusing where their efforts are better placed.

Intelligent document processing helps businesses overcome these challenges by automating the classification, extraction, and analysis of data. This expedites decision cycles, allocates resources to high-value tasks, and reduces costs.

Pre-trained APIs of AWS AI services allow for quick classification, extraction, and data analyzation from scores of documents. This solution also has industry specific features that can quickly process specialized industry specific documents. This blog discussed the foundational architecture to helps to accelerate implementation of any specific document processing use case.

Kids’ coding languages

2023-04-28 Marc Scott

Post Syndicated from Marc Scott original https://www.raspberrypi.org/blog/kids-coding-languages/

Programming is becoming an increasingly useful skill in today’s society. As we continue to rely more and more on software and digital technology, knowing how to code is also more and more valuable. That’s why many parents are looking for ways to introduce their children to programming. You might find it difficult to know where to begin, with so many different kids’ coding languages and platforms available. In this blog post, we explore how children can progress through different programming languages to realise their potential as proficient coders and creators of digital technology.

Two kids share their Scratch coding project on a laptop.

ScratchJr

Everyone needs to start somewhere, and one great option for children aged 5–7 is ScratchJr (Scratch Junior), a visual programming language with drag-and-drop blocks for creating simple programs. ScratchJr is available for free on Android and iOS mobile devices. It’s great for introducing young children to the basics of programming, and they can use it to create interactive stories and games.

Scratch

Moving on from ScratchJr, there’s its web-based sibling Scratch. Scratch offers drag-and-drop blocks for creating programs and comes with an assortment of graphics, sounds, and music for your child to bring their programs to life. This visual programming language is designed specifically for children to learn programming fundamentals. Scratch is available in multiple spoken languages and is perfect for beginners. It allows kids to create interactive stories, animations, and games with ease.

The Raspberry Pi Foundation has a wealth of free Scratch resources we have created specifically for young people who are beginners, such as the ‘Introduction to Scratch’ project path. And if your child is interested in physical computing to interact with the real world using code, they can also learn how to use electronic components, such as buzzers and LEDs, with Scratch and a Raspberry Pi computer.

Young person using a laptop to code in Scratch, our favourite of all kids' coding languages.

MakeCode

Another fun option for children who want to explore coding and physical computing is the micro:bit. This is a small programmable device with an LED display, buttons, and sensors, and it can be used to create games, animations, interactive projects, and lots more. To control a micro:bit, a visual programming language called MakeCode can be used. The micro:bit can also be programmed using Scratch or text-based languages such as Python, offering an easy transition for children as their coding skills progress. Have a look at our free collection of micro:bit resources to learn more.

HTML

Everyone is familiar with websites, but fewer people know how they are coded. HTML is a markup language that is used to create the webpages we use every day. It’s a great language for children to learn because they can see the results of their code in real time, in their web browser. They can use HTML and CSS to create simple webpages that include links, videos, pictures, and interactive elements, all the while learning how websites are structured and designed. We have many free web design resources for your child, including a basic ‘Introduction to web development’ project path.

Python

If your child is becoming confident with Scratch and HTML, then using Python is the recommended next stage in their learning. Python is a high-level text-based programming language that is easy to read and learn. It is a popular choice for beginners as it has a simple syntax that often reads like plain English. Many free Python projects for young people are available on our website, including the ‘Introduction to Python’ path.

The Python community is also really welcoming and has produced a myriad of online tutorials and videos to help learners explore this language. Python can be used to do some very powerful things with ease, which is why it is so popular. For example, it is relatively simple to create Python programs to engage in machine learning and data analysis. If you wanted to explore large language models such as GPT, on which the ChatGPT chatbot is based, then Python would be the language of choice.

JavaScript

JavaScript is the language of the web, and if your child has become proficient in HTML, then this is the next language for them. JavaScript is used to create interactive websites and web applications. As young people become more comfortable with programming, JavaScript is a useful language to progress to, given how ubiquitous the web is today. It can be tricky to learn, but like Python, it has a vast number of libraries of functions that people have already created for it to achieve things more quickly. These libraries make JavaScript a very powerful language to use.

Try out kids’ coding languages

There are many different programming languages, and each one has its own strengths and weaknesses. Some are easy to learn and use, some are really fast, and some are very secure.

Two kids coding together on Code Club World.

Starting with visual languages such as Scratch or MakeCode allows your child to begin to understand the basic concepts of programming without needing any developed reading and keyboard skills. Once their understanding and skills have improved, they can try out text-based languages, find the one that they are comfortable with, and then continue to learn. It’s fairly common for people who are proficient in one programming language to learn other languages quite quickly, so don’t worry about which programming language your child starts with.

Whether your child is interested in working in software development or just wants to learn a valuable — and creative — skill, helping them learn to code and try out different kids’ coding languages is a great way for you to open up new opportunities for them.

The post Kids’ coding languages appeared first on Raspberry Pi Foundation.

Test our new Code Editor for young people

2023-04-05 Phil Howell

Post Syndicated from Phil Howell original https://www.raspberrypi.org/blog/code-editor-beta-testing/

We are building a new online text-based Code Editor to help young people aged 7 and older learn to write code. It’s free and designed for young people who attend Code Clubs and CoderDojos, students in schools, and learners at home.

The interface of the beta version of the Raspberry Pi Foundation's Code Editor. — The Code Editor interface

At this stage of development, the Code Editor enables learners to:

Write and run Python code right in their browser, with no setup required. The interface is simple and intuitive, which makes getting started with text-based coding easier.
Save their code using their Raspberry Pi Foundation account. We want learners to easily build on projects they start in the classroom at home, or bring a project they’ve started at home to their coding club.

A young person at a CoderDojo uses the Raspberry Pi Foundation's Code Editor.

We’ve chosen Python as the first programming language our Code Editor supports because it is popular in schools, CoderDojos, and Code Clubs. Many educators and young people like Python because they see it as similar to the English language. It is often the text-based language young people learn when they take their first steps away from a block-based programming environment, such as Scratch.

Python is also widely used by professional programmers and usually tops at least one of the industry-standard indexes that ranks programming languages.

Start coding in Python

We will be adding support for web development languages (HTML/CSS/JavaScript) to the Editor in the near future.

We’re also planning to add features such as project sharing and collaboration, which we know young people will love. We want the Editor to be safe, accessible, and age-appropriate. As safeguarding is always at the core of what we do, we’ll only make new features available once we’ve ensured they comply with the ICO’s age-appropriate design code and our safeguarding policies.

Test the Code Editor and tell us what you think

We are inviting you to test the Code Editor as part of what we call the beta phase of development. As the Editor is still in development, some things might not look or work as well as we’d like — and this is why we need your help.

A text output in the beta version of the Raspberry Pi Foundation's Code Editor. — Text output in the Code Editor

We’d love you to try the Editor out and let us know what worked well for you, what didn’t work well, and what you’d like to see next.

You can now try out the Code Editor in the first two projects of our ‘Intro to Python’ path. We’ve included a feedback form for you to let us know which project you tried, and what you think of the Editor. We’d love to hear from you.

I want to try the Code Editor

Your feedback helps us decide what to do next. Based on what learners, educators, volunteers, teachers, and parents tell us, we will make the improvements to the Editor that matter most to the young people we aim to support.

Where next for the Code Editor?

One of our long-term goals is to engage millions of young people in learning about computing and how to create with digital technologies. We’re developing the Code Editor with three main aims in mind.

1. Supporting young people’s learning journeys

We aim to build the Code Editor so it:

Suits beginners and also supports them as their confidence and independence grows, so they can take on their own coding projects in a familiar environment
Helps learners to transition from block-based to text-based, informed by our deep understanding of pedagogy and computing education
Brings together projects instructions and code editing into a single interface so that young people do not have to switch screens, which makes coding easier

2. Removing barriers to accessing computing education

Our work on the Code Editor will:

Ensure it works well on mobile and tablet devices, and low-cost computers including the Raspberry Pi 4 2GB
Support localisation and translation, so we can tailor the Editor for the needs of young people all over the world

3. Making learning to program engaging for more young people

We want to offer a Code Editor that:

Enables young people to build a vast variety of projects because it supports graphic user interface output and supplies images and sprites for use in multimedia projects

We’re also planning on making the Editor available as an open source project so that other projects and organisations focussed on helping people learn to code can benefit. More on this soon.

Our work on the Code Editor has been generously funded by the Algorand Foundation and Endless, and we thank them for their generous support. If you are interested in partnering with us to fund this key work, please reach out to us via email.

The post Test our new Code Editor for young people appeared first on Raspberry Pi Foundation.

Serverless ICYMI Q1 2023

2023-04-03 Julian Wood

Post Syndicated from Julian Wood original https://aws.amazon.com/blogs/compute/serverless-icymi-q1-2023/

Welcome to the 21^st edition of the AWS Serverless ICYMI (in case you missed it) quarterly recap. Every quarter, we share all the most recent product launches, feature enhancements, blog posts, webinars, live streams, and other interesting things that you might have missed!

In case you missed our last ICYMI, check out what happened last quarter here.

Artificial intelligence (AI) technologies, ChatGPT, and DALL-E are creating significant interest in the industry at the moment. Find out how to integrate serverless services with ChatGPT and DALL-E to generate unique bedtime stories for children.

Example notification of a story hosted with Next.js and App Runner

Serverless Land is a website maintained by the Serverless Developer Advocate team to help you build serverless applications and includes workshops, code examples, blogs, and videos. There is now enhanced search functionality so you can search across resources, patterns, and video content.

ServerlessLand search

AWS Lambda

AWS Lambda has improved how concurrency works with Amazon SQS. You can now control the maximum number of concurrent Lambda functions invoked.

The launch blog post explains the scaling behavior of Lambda using this architectural pattern, challenges this feature helps address, and a demo of maximum concurrency in action.

Maximum concurrency is set to 10 for the SQS queue.

AWS Lambda Powertools is an open-source library to help you discover and incorporate serverless best practices more easily. Lambda Powertools for .NET is now generally available and currently focused on three observability features: distributed tracing (Tracer), structured logging (Logger), and asynchronous business and application metrics (Metrics). Powertools is also available for Python, Java, and Typescript/Node.js programming languages.

To learn more:

Watch AWS Lambda Powertools for .NET on Serverless office hours

Lambda announced a new feature, runtime management controls, which provide more visibility and control over when Lambda applies runtime updates to your functions. The runtime controls are optional capabilities for advanced customers that require more control over their runtime changes. You can now specify a runtime management configuration for each function with three settings, Automatic (default), Function update, or manual.

There are three new Amazon CloudWatch metrics for asynchronous Lambda function invocations: AsyncEventsReceived, AsyncEventAge, and AsyncEventsDropped. You can track the asynchronous invocation requests sent to Lambda functions to monitor any delays in processing and take corrective actions if required. The launch blog post explains the new metrics and how to use them to troubleshoot issues.

Lambda now supports Amazon DocumentDB change streams as an event source. You can use Lambda functions to process new documents, track updates to existing documents, or log deleted documents. You can use any programming language that is supported by Lambda to write your functions.

There is a helpful blog post suggesting best practices for developing portable Lambda functions that allow you to port your code to containers if you later choose to.

AWS Step Functions

AWS Step Functions has expanded its AWS SDK integrations with support for 35 additional AWS services including Amazon EMR Serverless, AWS Clean Rooms, AWS IoT FleetWise, AWS IoT RoboRunner and 31 other AWS services. In addition, Step Functions also added support for 1000+ new API actions from new and existing AWS services such as Amazon DynamoDB and Amazon Athena. For the full list of added services, visit AWS SDK service integrations.

Amazon EventBridge

Amazon EventBridge has launched the AWS Controllers for Kubernetes (ACK) for EventBridge and Pipes . This allows you to manage EventBridge resources, such as event buses, rules, and pipes, using the Kubernetes API and resource model (custom resource definitions).

EventBridge event buses now also support enhanced integration with Service Quotas. Your quota increase requests for limits such as PutEvents transactions-per-second, number of rules, and invocations per second among others will be processed within one business day or faster, enabling you to respond quickly to changes in usage.

AWS SAM

The AWS Serverless Application Model (SAM) Command Line Interface (CLI) has added the sam list command. You can now show resources defined in your application, including the endpoints, methods, and stack outputs required to test your deployed application.

AWS SAM has a preview of sam build support for building and packaging serverless applications developed in Rust. You can use cargo-lambda in the AWS SAM CLI build workflow and AWS SAM Accelerate to iterate on your code changes rapidly in the cloud.

You can now use AWS SAM connectors as a source resource parameter. Previously, you could only define AWS SAM connectors as a AWS::Serverless::Connector resource. Now you can add the resource attribute on a connector’s source resource, which makes templates more readable and easier to update over time.

AWS SAM connectors now also support multiple destinations to simplify your permissions. You can now use a single connector between a single source resource and multiple destination resources.

In October 2022, AWS released OpenID Connect (OIDC) support for AWS SAM Pipelines. This improves your security posture by creating integrations that use short-lived credentials from your CI/CD provider. There is a new blog post on how to implement it.

Find out how best to build serverless Java applications with the AWS SAM CLI.

AWS App Runner

AWS App Runner now supports retrieving secrets and configuration data stored in AWS Secrets Manager and AWS Systems Manager (SSM) Parameter Store in an App Runner service as runtime environment variables.

AppRunner also now supports incoming requests based on HTTP 1.0 protocol, and has added service level concurrency, CPU and Memory utilization metrics.

Amazon S3

Amazon S3 now automatically applies default encryption to all new objects added to S3, at no additional cost and with no impact on performance.

You can now use an S3 Object Lambda Access Point alias as an origin for your Amazon CloudFront distribution to tailor or customize data to end users. For example, you can resize an image depending on the device that an end user is visiting from.

S3 has introduced Mountpoint for S3, a high performance open source file client that translates local file system API calls to S3 object API calls like GET and LIST.

S3 Multi-Region Access Points now support datasets that are replicated across multiple AWS accounts. They provide a single global endpoint for your multi-region applications, and dynamically route S3 requests based on policies that you define. This helps you to more easily implement multi-Region resilience, latency-based routing, and active-passive failover, even when data is stored in multiple accounts.

Amazon Kinesis

Amazon Kinesis Data Firehose now supports streaming data delivery to Elastic. This is an easier way to ingest streaming data to Elastic and consume the Elastic Stack (ELK Stack) solutions for enterprise search, observability, and security without having to manage applications or write code.

Amazon DynamoDB

Amazon DynamoDB now supports table deletion protection to protect your tables from accidental deletion when performing regular table management operations. You can set the deletion protection property for each table, which is set to disabled by default.

Amazon SNS

Amazon SNS now supports AWS X-Ray active tracing to visualize, analyze, and debug application performance. You can now view traces that flow through Amazon SNS topics to destination services, such as Amazon Simple Queue Service, Lambda, and Kinesis Data Firehose, in addition to traversing the application topology in Amazon CloudWatch ServiceLens.

SNS also now supports setting content-type request headers for HTTPS notifications so applications can receive their notifications in a more predictable format. Topic subscribers can create a DeliveryPolicy that specifies the content-type value that SNS assigns to their HTTPS notifications, such as application/json, application/xml, or text/plain.

EDA Visuals collection added to Serverless Land

The Serverless Developer Advocate team has extended Serverless Land and introduced EDA visuals. These are small bite sized visuals to help you understand concept and patterns about event-driven architectures. Find out about batch processing vs. event streaming, commands vs. events, message queues vs. event brokers, and point-to-point messaging. Discover bounded contexts, migrations, idempotency, claims, enrichment and more!

EDA Visuals

To learn more:

Watch EDA visually explained on Serverless office hours

Serverless Repos Collection on Serverless Land

There is also a new section on Serverless Land containing helpful code repositories. You can search for code repos to use for examples, learning or building serverless applications. You can also filter by use-case, runtime, and level.

Serverless Repos Collection

Serverless Blog Posts

Videos

Serverless Office Hours – Tues 10AM PT

Weekly office hours live stream. In each session we talk about a specific topic or technology related to serverless and open it up to helping you with your real serverless challenges and issues. Ask us anything you want about serverless technologies and applications.

YouTube: https://youtube.com/serverlessland
Twitch: https://twitch.tv/aws
LinkedIn: https://linkedin.com/company/serverlessland

January

Jan 10 – Building .NET 7 high performance Lambda functions

Jan 17 – Amazon Managed Workflows for Apache Airflow at Scale

Jan 24 – Using Terraform with AWS SAM

Jan 31 – Preparing your serverless architectures for the big day

February

Feb 07- Visually design and build serverless applications

Feb 14 – Multi-tenant serverless SaaS

Feb 21 – Refactoring to Serverless

Feb 28 – EDA visually explained

March

Mar 07 – Lambda cookbook with Python

Mar 14 – Succeeding with serverless

Mar 21 – Lambda Powertools .NET

Mar 28 – Server-side rendering micro-frontends

FooBar Serverless YouTube channel

Marcia Villalba frequently publishes new videos on her popular serverless YouTube channel. You can view all of Marcia’s videos at https://www.youtube.com/c/FooBar_codes.

January

Jan 12 – Serverless Badge – A new certification to validate your Serverless Knowledge

Jan 19 – Step functions Distributed map – Run 10k parallel serverless executions!

Jan 26 – Step Functions Intrinsic Functions – Do simple data processing directly from the state machines!

February

Feb 02 – Unlock the Power of EventBridge Pipes: Integrate Across Platforms with Ease!

Feb 09 – Amazon EventBridge Pipes: Enrichment and filter of events Demo with AWS SAM

Feb 16 – AWS App Runner – Deploy your apps from GitHub to Cloud in Record Time

Feb 23 – AWS App Runner – Demo hosting a Node.js app in the cloud directly from GitHub (AWS CDK)

March

Mar 02 – What is Amazon DynamoDB? What are the most important concepts? What are the indexes?

Mar 09 – Choreography vs Orchestration: Which is Best for Your Distributed Application?

Mar 16 – DynamoDB Single Table Design: Simplify Your Code and Boost Performance with Table Design Strategies

Mar 23 – 8 Reasons You Should Choose DynamoDB for Your Next Project and How to Get Started

Sessions with SAM & Friends

AWS SAM & Friends

Eric Johnson is exploring how developers are building serverless applications. We spend time talking about AWS SAM as well as others like AWS CDK, Terraform, Wing, and AMPT.

Feb 16 – What’s new with AWS SAM

Feb 23 – AWS SAM with AWS CDK

Mar 02 – AWS SAM and Terraform

Mar 10 – Live from ServerlessDays ANZ

Mar 16 – All about AMPT

Mar 23 – All about Wing

Mar 30 – SAM Accelerate deep dive

Still looking for more?

The Serverless landing page has more information. The Lambda resources page contains case studies, webinars, whitepapers, customer stories, reference architectures, and even more Getting Started tutorials.

You can also follow the Serverless Developer Advocacy team on Twitter to see the latest news, follow conversations, and interact with the team.

Eric Johnson: @edjgeek
James Beswick: @jbesw
Ben Smith: @benjamin_l_s
Julian Wood: @julian_wood
Marcia Villalba: @mavi888uy
David Boyne: @boyney123

Why Python keeps growing, explained

2023-03-02 Rizel Scarlett

Post Syndicated from Rizel Scarlett original https://github.blog/2023-03-02-why-python-keeps-growing-explained/

Which programming language has been around for more than three decades and continues to grow in popularity each year?

If you guessed Python, you nailed it. In the 2022 Octoverse report, we found that Python remains the second most-used programming language on GitHub. Interestingly, Python’s use grew more than 22 percent year over year with more than four million developers on GitHub using it at some point in 2022.

In this article, we’ll dive into a brief history of Python, its benefits, its use cases, and seek to answer why a program language conceived in the 1980s continues to dominate development. And, since this is GitHub, we’ll also offer a few useful tips and tricks for developers new to—and experienced in—Python.

So, what is Python?

Python is a high-level, interpreted programming language with a simple syntax, which makes it easily readable and extremely user- and beginner-friendly. Originally built to satisfy Guido Van Rossum’s desire for a programming language that was simple to use and beautiful to look at, Python was first released to the world in 1991.

Fun fact: Python was named after the BBC TV show, “Monty Python’s Flying Circus.”

Since its development, it has grown to have widespread applicability for developers, data scientists, researchers, and more. But how, you may ask, can a coding language be simple and beautiful to look at? Here’s some proof:

Python

print("Hello world.")

vs.

Java

public class HelloWorld {
    public static void main (String[]args) {
      System.out.println.("Hello world");
    }
}

Since Python is a general-purpose language, it can be used in a variety of applications, and its uncomplicated nature makes it an excellent language for automating tasks, building websites or software, and analyzing data.

Python also has several other characteristics that make it popular amongst developers and engineers. These include:

It’s easy to read. Python code uses English keywords rather than punctuation, and its line breaks help define the code blocks. In practice, this means you can identify what the code is designed to do simply by looking at it.
It’s open source. You can download the source code, modify it, and use it however you want.
It’s portable. Some languages require you to modify code to run on different platforms, but Python is a cross-platform language, which means you can run the same code on any operating system with a Python interpreter.
It’s extendable. Python code can be written in other languages (such as C++), and users can add low-level modules to the Python interpreter to customize and optimize their tools.
It has a broad standard library. This library is available for anyone to access and means that users don’t have to write code for every single function—they can access built-in modules that help with issues in everyday programming and more.

What is Python commonly used for?

Python can be used for just about anything, from web and software development to machine learning and artificial intelligence (AI). Let’s take a look at some of its most common use cases.

import antigravity

def main():
    antigravity.fly()

if __name__ == '__main__':
    main()

Run this command to check out an inside joke among Python developers.

Using Python for web and software development

Python is a popular language for web and software development because you can create complex, multi-protocol applications while maintaining concise, readable syntax. In fact, some of the most popular applications were built with Python. Plus, Python’s open source community provides developers with an extensive amount of reusable code, frameworks, and support. Case in point: Django is one of the most-used Python frameworks designed by experienced developers to help others accelerate their application build times and avoid issues that might balk their progress.

Using Python for task automation

One of Python’s key benefits is its ability to automate manual, repetitive tasks. With Python, you can learn how to automate just about anything by using either built-in modules or pre-written code from its robust library. Or you can write your own custom scripts to perform specific actions. For example, you can easily automate emails with the “smtplib” module or copy files with the “shutil” module. Python also has a robust set of testing frameworks, which makes it an excellent language for test automation. Frameworks such as Pytest, Behave, and Robot allow developers to write simple yet effective tests to ensure the quality of their builds.

Using Python for machine learning and data science

Here’s a fun fact: Python is the top preferred language for data science and research. Since its syntax is easily understandable and adaptable, people with little-to-no development experience can easily learn Python and use it to manipulate data for research, reporting, predictable or regression analyses, and more. Collecting and parsing data can be a time-consuming task for data scientists. Python is also one of the top languages for training machine learning (ML) models. Through specific algorithms, these models can analyze and identify patterns in data to make predictions or decisions based on that data. They also constantly evolve based on outputs of previous datasets to confront new variables. Data scientists and developers training ML models often utilize libraries, such as NumPy, Pandas, and Matplotlib, to automate functions like cleaning, data transformation, and visualization.

Using Python for financial analysis

Similar to how Python can assist data scientists with the heavy lift of large data sets, Python is widely used in the financial industry to quickly perform complex computations. Stock markets generate huge amounts of data, and Python can be used to import data on stock prices and generate strategies through algorithms to identify trading opportunities. The language can also be used for portfolio optimization, risk management, financial modeling and visualization, cryptocurrency analysis, and even fraud detection.

Using Python for and artificial intelligence

Python can also be found in some of the most complex, artificial intelligence (AI) technologies—and it’s actually one of the preferred languages for AI. Python’s concise and readable code allows developers to create consistent, reliable systems, and its vast library provides a number of frameworks like PyBrain, which offers developers powerful algorithms for machine learning tasks. Plus, Python’s visualization capabilities can help convert these large datasets for AI or ML into comprehensible graphs or reports. Interestingly enough, OpenAI, the artificial intelligence research lab, utilizes the Python framework, Pytorch, as their standard framework for deep learning, which trains its AI systems.

Why is Python so popular?

In addition to its relative simplicity to learn, there are a few other reasons why Python continues to consistently grow in popularity. These include:

It’s more productive. Compared to some other more complex programming languages like C++, Python’s syntax allows users to do more with less and cut down on time and effort to write the same lines of code.
It has an expansive, supportive community of users. Even the best developers run into problems— and this is where user communities can become an invaluable resource. Python has a huge community with documentation, tutorials, tips, and tricks to master the language. The Python community on GitHub, for example, offers everything from information on the latest version of the language to bug reports and update notes.
It’s academic. Python has become the go-to language in academia with some students even encountering Python as early as elementary school. (Believe it or not, there are children’s picture books dedicated to Python.) While computer science students are often taught Python, its use extends beyond that discipline into other areas of STEM and academic research. For example, Python can be used to solve differential equations, perform statistical analyses, simulate and track particle diffusion, and more.
It has high corporate demand. Because of its wide scale applicability in development and data analysis work, learning and knowing Python is often considered a top-skill among job seekers. According to Statista, Python was the third most demanded language in 2022 by recruiters worldwide.

The bottom line

Python is everywhere—and it’s been used to build a significant number of the technologies, websites, and even systems most people encounter on a daily basis. It powers everything from your favorite video streaming service to the ML algorithms that can help you make your next cryptocurrency trade. And for an even broader scope example (pun absolutely intended), NASA uses Python to power data analysis with its sophisticated James Webb Space Telescope, which makes it one of the few programming languages that is, quite literally, out of this world.

How to get started with Python

A quick Google search will yield hundreds of resources out there to jumpstart your Python journey—and that can quickly get a little overwhelming. To simplify things, here are a few helpful GitHub repositories to help you get started with Python:

Explore pre-built Python algorithms: From networking flows to physics and neural networks, this repository is a great guide to building algorithms in Python.
Learn Python in 30 days: This step-by-step guide will walk you through the basics of Python in 30 days.
Grab some tips from this cheatsheet: Check out this collection of Python scripts with code examples and explanations for learning the language.
Sharpen your Python skills: Pick up some tips from this study guide for both beginner and seasoned users built by a Python superfan!

To get started, download the latest version of Python.

Start building on GitHub today

GitHub offers two easier ways to start working with Python: GitHub Codespaces and GitHub Copilot.

You can start building today for free with GitHub Codespaces, which every developer on GitHub gets 60 free hours of use time per month to spin up a development environment in the cloud from any device at speed. Check out the Django quick start template to begin coding right in your browser!

You can also use GitHub Copilot, GitHub’s AI pair programmer, to write your first lines of Python. Here’s how:

Install the GitHub Copilot extension into your code editor.
Describe the purpose of your project in a comment.
Write a comment describing which libraries you may need.
Start tabbing and let GitHub Copilot suggest lines of code to help you learn new techniques or methods.

From machine learning to data analysis, Python’s versatility allows it to continue its explosive growth with developers and non-developers alike. Experiment with Python through GitHub or on your local machine to be part of this growth and get started today!

Build a semantic search engine for tabular columns with Transformers and Amazon OpenSearch Service

2023-03-01 Kachi Odoemene

Post Syndicated from Kachi Odoemene original https://aws.amazon.com/blogs/big-data/build-a-semantic-search-engine-for-tabular-columns-with-transformers-and-amazon-opensearch-service/

Finding similar columns in a data lake has important applications in data cleaning and annotation, schema matching, data discovery, and analytics across multiple data sources. The inability to accurately find and analyze data from disparate sources represents a potential efficiency killer for everyone from data scientists, medical researchers, academics, to financial and government analysts.

Conventional solutions involve lexical keyword search or regular expression matching, which are susceptible to data quality issues such as absent column names or different column naming conventions across diverse datasets (for example, zip_code, zcode, postalcode).

In this post, we demonstrate a solution for searching for similar columns based on column name, column content, or both. The solution uses approximate nearest neighbors algorithms available in Amazon OpenSearch Service to search for semantically similar columns. To facilitate the search, we create features representations (embeddings) for individual columns in the data lake using pre-trained Transformer models from the sentence-transformers library in Amazon SageMaker. Finally, to interact with and visualize results from our solution, we build an interactive Streamlit web application running on AWS Fargate.

We include a code tutorial for you to deploy the resources to run the solution on sample data or your own data.

Solution overview

The following architecture diagram illustrates the two-stage workflow for finding semantically similar columns. The first stage runs an AWS Step Functions workflow that creates embeddings from tabular columns and builds the OpenSearch Service search index. The second stage, or the online inference stage, runs a Streamlit application through Fargate. The web application collects input search queries and retrieves from the OpenSearch Service index the approximate k-most-similar columns to the query.

Figure 1. Solution architecture

The automated workflow proceeds in the following steps:

The user uploads tabular datasets into an Amazon Simple Storage Service (Amazon S3) bucket, which invokes an AWS Lambda function that initiates the Step Functions workflow.
The workflow begins with an AWS Glue job that converts the CSV files into Apache Parquet data format.
A SageMaker Processing job creates embeddings for each column using pre-trained models or custom column embedding models. The SageMaker Processing job saves the column embeddings for each table in Amazon S3.
A Lambda function creates the OpenSearch Service domain and cluster to index the column embeddings produced in the previous step.
Finally, an interactive Streamlit web application is deployed with Fargate. The web application provides an interface for the user to input queries to search the OpenSearch Service domain for similar columns.

You can download the code tutorial from GitHub to try this solution on sample data or your own data. Instructions on the how to deploy the required resources for this tutorial are available on Github.

Prerequistes

To implement this solution, you need the following:

An AWS account.
Basic familiarity with AWS services such as the AWS Cloud Development Kit (AWS CDK), Lambda, OpenSearch Service, and SageMaker Processing.
A tabular dataset to create the search index. You can bring your own tabular data or download the sample datasets on GitHub.

Build a search index

The first stage builds the column search engine index. The following figure illustrates the Step Functions workflow that runs this stage.

Figure 2 – Step functions workflow – multiple embedding models

Datasets

In this post, we build a search index to include over 400 columns from over 25 tabular datasets. The datasets originate from the following public sources:

s3://sagemaker-sample-files/datasets/tabular/
NYC Open Data
Chicago Data Portal

For the the full list of the tables included in the index, see the code tutorial on GitHub.

You can bring your own tabular dataset to augment the sample data or build your own search index. We include two Lambda functions that initiate the Step Functions workflow to build the search index for individual CSV files or a batch of CSV files, respectively.

Transform CSV to Parquet

Raw CSV files are converted to Parquet data format with AWS Glue. Parquet is a column-oriented format file format preferred in big data analytics that provides efficient compression and encoding. In our experiments, the Parquet data format offered significant reduction in storage size compared to raw CSV files. We also used Parquet as a common data format to convert other data formats (for example JSON and NDJSON) because it supports advanced nested data structures.

Create tabular column embeddings

To extract embeddings for individual table columns in the sample tabular datasets in this post, we use the following pre-trained models from the sentence-transformers library. For additional models, see Pretrained Models.

Model name	Dimension	Size (MB)
all-MiniLM-L6-v2	384	80
all-distilroberta-v1	768	290
average_word_embeddings_glove.6B.300d	300	420

The SageMaker Processing job runs create_embeddings.py(code) for a single model. For extracting embeddings from multiple models, the workflow runs parallel SageMaker Processing jobs as shown in the Step Functions workflow. We use the model to create two sets of embeddings:

column_name_embeddings – Embeddings of column names (headers)
column_content_embeddings – Average embedding of all the rows in the column

For more information about the column embedding process, see the code tutorial on GitHub.

An alternative to the SageMaker Processing step is to create a SageMaker batch transform to get column embeddings on large datasets. This would require deploying the model to a SageMaker endpoint. For more information, see Use Batch Transform.

Index embeddings with OpenSearch Service

In the final step of this stage, a Lambda function adds the column embeddings to a OpenSearch Service approximate k-Nearest-Neighbor (kNN) search index. Each model is assigned its own search index. For more information about the approximate kNN search index parameters, see k-NN.

Online inference and semantic search with a web app

The second stage of the workflow runs a Streamlit web application where you can provide inputs and search for semantically similar columns indexed in OpenSearch Service. The application layer uses an Application Load Balancer, Fargate, and Lambda. The application infrastructure is automatically deployed as part of the solution.

The application allows you to provide an input and search for semantically similar column names, column content, or both. Additionally, you can select the embedding model and number of nearest neighbors to return from the search. The application receives inputs, embeds the input with the specified model, and uses kNN search in OpenSearch Service to search indexed column embeddings and find the most similar columns to the given input. The search results displayed include the table names, column names, and similarity scores for the columns identified, as well as the locations of the data in Amazon S3 for further exploration.

The following figure shows an example of the web application. In this example, we searched for columns in our data lake that have similar Column Names (payload type) to district (payload). The application used all-MiniLM-L6-v2 as the embedding model and returned 10 (k) nearest neighbors from our OpenSearch Service index.

The application returned transit_district, city, borough, and location as the four most similar columns based on the data indexed in OpenSearch Service. This example demonstrates the ability of the search approach to identify semantically similar columns across datasets.

Figure 3: Web application user interface

Clean up

To delete the resources created by the AWS CDK in this tutorial, run the following command:

cdk destroy --all

Conclusion

In this post, we presented an end-to-end workflow for building a semantic search engine for tabular columns.

Get started today on your own data with our code tutorial available on GitHub. If you’d like help accelerating your use of ML in your products and processes, please contact the Amazon Machine Learning Solutions Lab.

About the Authors

Kachi Odoemene is an Applied Scientist at AWS AI. He builds AI/ML solutions to solve business problems for AWS customers.

Taylor McNally is a Deep Learning Architect at Amazon Machine Learning Solutions Lab. He helps customers from various industries build solutions leveraging AI/ML on AWS. He enjoys a good cup of coffee, the outdoors, and time with his family and energetic dog.

Austin Welch is a Data Scientist in the Amazon ML Solutions Lab. He develops custom deep learning models to help AWS public sector customers accelerate their AI and cloud adoption. In his spare time, he enjoys reading, traveling, and jiu-jitsu.

Develop a serverless application in Python using Amazon CodeWhisperer

2023-01-03 Rafael Ramos

Post Syndicated from Rafael Ramos original https://aws.amazon.com/blogs/devops/develop-a-serverless-application-in-python-using-amazon-codewhisperer/

While writing code to develop applications, developers must keep up with multiple programming languages, frameworks, software libraries, and popular cloud services from providers such as AWS. Even though developers can find code snippets on developer communities, to either learn from them or repurpose the code, manually searching for the snippets with an exact or even similar use case is a distracting and time-consuming process. They have to do all of this while making sure that they’re following the correct programming syntax and best coding practices.

Amazon CodeWhisperer, a machine learning (ML) powered coding aide for developers, lets you overcome those challenges. Developers can simply write a comment that outlines a specific task in plain English, such as “upload a file to S3.” Based on this, CodeWhisperer automatically determines which cloud services and public libraries are best-suited for the specified task, it creates the specific code on the fly, and then it recommends the generated code snippets directly in the IDE. And this isn’t about copy-pasting code from the web, but generating code based on the context of your file, such as which libraries and versions you have, as well as the existing code. Moreover, CodeWhisperer seamlessly integrates with your Visual Studio Code and JetBrains IDEs so that you can stay focused and never leave the development environment. At the time of this writing, CodeWhisperer supports Java, Python, JavaScript, C#, and TypeScript.

In this post, we’ll build a full-fledged, event-driven, serverless application for image recognition. With the aid of CodeWhisperer, you’ll write your own code that runs on top of AWS Lambda to interact with Amazon Rekognition, Amazon DynamoDB, Amazon Simple Notification Service (Amazon SNS), Amazon Simple Queue Service (Amazon SQS), Amazon Simple Storage Service (Amazon S3), and third-party HTTP APIs to perform image recognition. The users of the application can interact with it by either sending the URL of an image for processing, or by listing the images and the objects present on each image.

Solution overview

To make our application easier to digest, we’ll split it into three segments:

Image download – The user provides an image URL to the first API. A Lambda function downloads the image from the URL and stores it on an S3 bucket. Amazon S3 automatically sends a notification to an Amazon SNS topic informing that a new image is ready for processing. Amazon SNS then delivers the message to an Amazon SQS queue.
Image recognition – A second Lambda function handles the orchestration and processing of the image. It receives the message from the Amazon SQS queue, sends the image for Amazon Rekognition to process, stores the recognition results on a DynamoDB table, and sends a message with those results as JSON to a second Amazon SNS topic used in section three. A user can list the images and the objects present on each image by calling a second API which queries the DynamoDB table.
3rd-party integration – The last Lambda function reads the message from the second Amazon SQS queue. At this point, the Lambda function must deliver that message to a fictitious external e-mail server HTTP API that supports only XML payloads. Because of that, the Lambda function converts the JSON message to XML. Lastly, the function sends the XML object via HTTP POST to the e-mail server.

The following diagram depicts the architecture of our application:

Architecture diagram depicting the application architecture. It contains the service icons with the component explained on the text above

Figure 1. Architecture diagram depicting the application architecture. It contains the service icons with the component explained on the text above.

Prerequisites

Before getting started, you must have the following prerequisites:

An AWS account and an Administrator user
Install and authenticate the AWS CLI. You can authenticate with an AWS Identity and Access Management (IAM) user or an AWS Security Token Service (AWS STS) token.
Install Python 3.7 or later.
Install Node Package Manager (npm).
Install the AWS CDK Toolkit.
Install the AWS Toolkit for VS Code or for JetBrains.
Install Git.

Configure environment

We already created the scaffolding for the application that we’ll build, which you can find on this Git repository. This application is represented by a CDK app that describes the infrastructure according to the architecture diagram above. However, the actual business logic of the application isn’t provided. You’ll implement it using CodeWhisperer. This means that we already declared using AWS CDK components, such as the API Gateway endpoints, DynamoDB table, and topics and queues. If you’re new to AWS CDK, then we encourage you to go through the CDK workshop later on.

Deploying AWS CDK apps into an AWS environment (a combination of an AWS account and region) requires that you provision resources that the AWS CDK needs to perform the deployment. These resources include an Amazon S3 bucket for storing files and IAM roles that grant permissions needed to perform deployments. The process of provisioning these initial resources is called bootstrapping. The required resources are defined in an AWS CloudFormation stack, called the bootstrap stack, which is usually named CDKToolkit. Like any CloudFormation stack, it appears in the CloudFormation console once it has been deployed.

After cloning the repository, let’s deploy the application (still without the business logic, which we’ll implement later on using CodeWhisperer). For this post, we’ll implement the application in Python. Therefore, make sure that you’re under the python directory. Then, use the cdk bootstrap command to bootstrap an AWS environment for AWS CDK. Replace {AWS_ACCOUNT_ID} and {AWS_REGION} with corresponding values first:

cdk bootstrap aws://{AWS_ACCOUNT_ID}/{AWS_REGION}

For more information about bootstrapping, refer to the documentation.

The last step to prepare your environment is to enable CodeWhisperer on your IDE. See Setting up CodeWhisperer for VS Code or Setting up Amazon CodeWhisperer for JetBrains to learn how to do that, depending on which IDE you’re using.

Image download

Let’s get started by implementing the first Lambda function, which is responsible for downloading an image from the provided URL and storing that image in an S3 bucket. Open the get_save_image.py file from the python/api/runtime/ directory. This file contains an empty Lambda function handler and the needed inputs parameters to integrate this Lambda function.

url is the URL of the input image provided by the user,
name is the name of the image provided by the user, and
S3_BUCKET is the S3 bucket name defined by our application infrastructure.

Write a comment in natural language that describes the required functionality, for example:

# Function to get a file from url

To trigger CodeWhisperer, hit the Enter key after entering the comment and wait for a code suggestion. If you want to manually trigger CodeWhisperer, then you can hit Option + C on MacOS or Alt + C on Windows. You can browse through multiple suggestions (if available) with the arrow keys. Accept a code suggestion by pressing Tab. Discard a suggestion by pressing Esc or typing a character.

For more information on how to work with CodeWhisperer, see Working with CodeWhisperer in VS Code or Working with Amazon CodeWhisperer from JetBrains.

You should get a suggested implementation of a function that downloads a file using a specified URL. The following image shows an example of the code snippet that CodeWhisperer suggests:

Screenshot of the code generated by CodeWhisperer on VS Code. It has a function called get_file_from_url with the implementation suggestion to download a file using the requests lib

Figure 2. Screenshot of the code generated by CodeWhisperer on VS Code. It has a function called get_file_from_url with the implementation suggestion to download a file using the requests lib.

Be aware that CodeWhisperer uses artificial intelligence (AI) to provide code recommendations, and that this is non-deterministic. The result you get in your IDE may be different from the one on the image above. If needed, fine-tune the code, as CodeWhisperer generates the core logic, but you might want to customize the details depending on your requirements.

Let’s try another action, this time to upload the image to an S3 bucket:

# Function to upload image to S3

As a result, CodeWhisperer generates a code snippet similar to the following one:

Screenshot of the code generated by CodeWhisperer on VS Code. It has a function called upload_image with the implementation suggestion to download a file using the requests lib and upload it to S3 using the S3 client

Figure 3. Screenshot of the code generated by CodeWhisperer on VS Code. It has a function called upload_image with the implementation suggestion to download a file using the requests lib and upload it to S3 using the S3 client.

Now that you have the functions with the functionalities to download an image from the web and upload it to an S3 bucket, you can wire up both functions in the Lambda handler function by calling each function with the correct inputs.

Image recognition

Now let’s implement the Lambda function responsible for sending the image to Amazon Rekognition for processing, storing the results in a DynamoDB table, and sending a message with those results as JSON to a second Amazon SNS topic. Open the image_recognition.py file from the python/recognition/runtime/ directory. This file contains an empty Lambda and the needed inputs parameters to integrate this Lambda function.

queue_url is the URL of the Amazon SQS queue to which this Lambda function is subscribed,
table_name is the name of the DynamoDB table, and
topic_arn is the ARN of the Amazon SNS topic to which this Lambda function is published.

Using CodeWhisperer, implement the business logic of the next Lambda function as you did in the previous section. For example, to detect the labels from an image using Amazon Rekognition, write the following comment:

# Detect labels from image with Rekognition

And as a result, CodeWhisperer should give you a code snippet similar to the one in the following image:

Screenshot of the code generated by CodeWhisperer on VS Code. It has a function called detect_labels with the implementation suggestion to use the Rekognition SDK to detect labels on the given image

Figure 4. Screenshot of the code generated by CodeWhisperer on VS Code. It has a function called detect_labels with the implementation suggestion to use the Rekognition SDK to detect labels on the given image.

You can continue generating the other functions that you need to fully implement the business logic of your Lambda function. Here are some examples that you can use:

# Save labels to DynamoDB

# Publish item to SNS

# Delete message from SQS

Following the same approach, open the list_images.py file from the python/recognition/runtime/ directory to implement the logic to list all of the labels from the DynamoDB table. As you did previously, type a comment in plain English:

# Function to list all items from a DynamoDB table

Other frequently used code

Interacting with AWS isn’t the only way that you can leverage CodeWhisperer. You can use it to implement repetitive tasks, such as creating unit tests and converting message formats, or to implement algorithms like sorting and string matching and parsing. The last Lambda function that we’ll implement as part of this post is to convert a JSON payload received from Amazon SQS to XML. Then, we’ll POST this XML to an HTTP endpoint.

Open the send_email.py file from the python/integration/runtime/ directory. This file contains an empty Lambda function handler. An event is a JSON-formatted document that contains data for a Lambda function to process. Type a comment with your intent to get the code snippet:

# Transform json to xml

As CodeWhisperer uses the context of your files to generate code, depending on the imports that you have on your file, you’ll get an implementation such as the one in the following image:

Screenshot of the code generated by CodeWhisperer on VS Code. It has a function called json_to_xml with the implementation suggestion to transform JSON payload into XML payload

Figure 5. Screenshot of the code generated by CodeWhisperer on VS Code. It has a function called json_to_xml with the implementation suggestion to transform JSON payload into XML payload.

Repeat the same process with a comment such as # Send XML string with HTTP POST to get the last function implementation. Note that the email server isn’t part of this implementation. You can mock it, or simply ignore this HTTP POST step. Lastly, wire up both functions in the Lambda handler function by calling each function with the correct inputs.

Deploy and test the application

To deploy the application, run the command cdk deploy --all. You should get a confirmation message, and after a few minutes your application will be up and running on your AWS account. As outputs, the APIStack and RekognitionStack will print the API Gateway endpoint URLs. It will look similar to this example:

Outputs:
...
APIStack.RESTAPIEndpoint01234567 = https://examp1eid0.execute-
api.{your-region}.amazonaws.com/prod/

The first endpoint expects two string parameters: url (the image file URL to download) and name (the target file name that will be stored on the S3 bucket). Use any image URL you like, but remember that you must encode an image URL before passing it as a query string parameter to escape the special characters. Use an online URL encoder of your choice for that. Then, use the curl command to invoke the API Gateway endpoint:

curl -X GET 'https://examp1eid0.execute-api.eu-east-
2.amazonaws.com/prod?url={encoded-image-URL}&amp;name={file-name}'

Replace {encoded-image-URL} and {file-name} with the corresponding values. Also, make sure that you use the correct API endpoint that you’ve noted from the AWS CDK deploy command output as mentioned above.

It will take a few seconds for the processing to happen in the background. Once it’s ready, see what has been stored in the DynamoDB table by invoking the List Images API (make sure that you use the correct URL from the output of your deployed AWS CDK stack):

curl -X GET 'https://examp1eid7.execute-api.eu-east-2.amazonaws.com/prod'

After you’re done, to avoid unexpected charges to your account, make sure that you clean up your AWS CDK stacks. Use the cdk destroy command to delete the stacks.

Conclusion

In this post, we’ve seen how to get a significant productivity boost with the help of ML. With that, as a developer, you can stay focused on your IDE and reduce the time that you spend searching online for code snippets that are relevant for your use case. Writing comments in natural language, you get context-based snippets to implement full-fledged applications. In addition, CodeWhisperer comes with a mechanism called reference tracker, which detects whether a code recommendation might be similar to particular CodeWhisperer training data. The reference tracker lets you easily find and review that reference code and see how it’s used in the context of another project. Lastly, CodeWhisperer provides the ability to run scans on your code (generated by CodeWhisperer as well as written by you) to detect security vulnerabilities.

During the preview period, CodeWhisperer is available to all developers across the world for free. Get started with the free preview on JetBrains, VS Code or AWS Cloud9.

About the author:

Take part in the Hour of Code

2022-11-15 Liz Smart

Post Syndicated from Liz Smart original https://www.raspberrypi.org/blog/hour-of-code-activities/

Launched in 2013, Hour of Code is an initiative to introduce young people to computer science using fun one-hour tutorials. To date, over 100 million young people have completed an hour of code with it.

A girl doing a physical computing project.

Although the Hour of Code website is accessible all year round, every December for Computer Science Education Week people worldwide run their own Hour of Code events. Each year we love seeing many Code Clubs, CoderDojos, and young people at home across the community complete their Hour of Code. You can register your 2022 Hour of Code event now to run between 5 and 11 December.

To support your event, we have pulled together a bumper set of our free coding projects, which can each be completed in just one hour. You will find these activities on the Hour of Code website.

Two young digital makers using Raspberry Pi

There’s something for all ages and levels of experience, so put an hour aside and help young people make something fabulous with code:

Ages 7–11

Beginner

For younger creators new to coding, a Scratch project is a great place to start.

With our Space talk project, they can create a space scene with characters that ‘emote’ to share their thoughts or feelings using sounds, colours, and actions. Creators program the character emotes using Scratch blocks to control graphic effects, costume animation, and sound effects.

Alternatively, our Stress ball project lets them code an onscreen stress ball that reacts to user clicks. Creators use the Paint and Sound editors in Scratch to personalise a clickable stress ball, and they add Scratch blocks to control graphic effects, costume animation, and sound effects.

We love this fun stress ball example sent to us recently by young creator April from the United States:

Another great option is to use Code Club World, which is a free tool to help children who are new to coding.

Creators can develop a character avatar, design a T-shirt, make some music, and more.

Comfortable

For 7- to 11-year-olds who are more comfortable with block-based coding, our project Broadcasting spells is ideal to choose. With the project, they connect Scratch blocks to code a wand that casts spells turning sprites into toads, and growing and shrinking them. Creators use broadcast blocks to transform multiple sprites at once, and they create sound effects with the Sound editor in Scratch.

Ages 11–14

Beginner

We have three exciting projects for trying text-based coding during Hour of Code in this category. The first, Anime expressions, is one of our brand-new ‘Introduction to web development’ projects. With this project, young people create a responsive webpage with text and images for an anime drawing tutorial. They write HTML to structure the webpage and CSS styles to apply layout, colour palettes, and fonts.

For a great introduction to coding with Python, we have the project Hello world from our ‘Introduction to Python’ path. With this project, creators write Python text-based code to create an interactive program that shows text and emojis based on user input. They learn about variables as they use them to store text and numbers, and they learn about writing functions to organise code and do calculations, retrieve the current date and time, and make a customisable dice.

LED firefly is a fantastic physical making project in which young people use a Raspberry Pi Pico microcontroller and basic electronic components to create a blinking LED firefly. They program the LED’s light patterns with MicroPython code and activate it via a switch they make themselves using jumper wires.

Comfortable

For 11- to 14-year-olds who are already comfortable with HTML, the Flip treat webcards project is a fun option. With this, they create a webpage showing a set of cards that flip when a visitor’s mouse pointer hovers over them. Creators use CSS styling and animations to add interactivity, then they customise the cards with fancy fonts and colour gradients.

Young people who have already done some Python coding can try out our project Target practice. With this project they create a game, using the p5 graphics library to draw a colourful target, and writing code so that the player scores points by hitting the target’s rings with arrows. While they create the project, they learn about RGB colours, shape positioning with x and y coordinates, and decisions using if, else-if, and else code statements.

Ages 14+

Beginner

Our project Charting champions is a great introduction to data visualisation and analysis for coders aged 15 and older. With the project, they will discover the power of the Python programming language as they store Olympic medal data in lists and use the pygal library to create an interactive chart.

Comfortable

Teenage coders who feel comfortable with Python programming can use our project Solar system simulator to code an animated, interactive solar system model using the Python p5 graphics library. Their model will be interactive, as they’ll use dictionaries to store planet facts that display when a user clicks on an orbiting planet.

Coding for Hour of Code and beyond

Now is the time to register your Hour of Code event, then decide which project you’d like to support young people to create. You can download certificates for each of the creators from the Hour of Code certificates page.

And make sure to check out our project paths so you know what projects you can help the young people you support to code beyond this one hour of code.

We don’t just create activities so that other people can experience coding and digital making — we also get involved ourselves!

Two members of the Code Club working at computers.

Recently, our teams who support the Code Club and CoderDojo networks got together to make LED fireflies. We are excited to get coding again as part of Hour of Code and Computer Science Education Week.

The post Take part in the Hour of Code appeared first on Raspberry Pi.

Enrich VPC Flow Logs with resource tags and deliver data to Amazon S3 using Amazon Kinesis Data Firehose

2022-11-09 Chaitanya Shah

Post Syndicated from Chaitanya Shah original https://aws.amazon.com/blogs/big-data/enrich-vpc-flow-logs-with-resource-tags-and-deliver-data-to-amazon-s3-using-amazon-kinesis-data-firehose/

VPC Flow Logs is an AWS feature that captures information about the network traffic flows going to and from network interfaces in Amazon Virtual Private Cloud (Amazon VPC). Visibility to the network traffic flows of your application can help you troubleshoot connectivity issues, architect your application and network for improved performance, and improve security of your application.

Each VPC flow log record contains the source and destination IP address fields for the traffic flows. The records also contain the Amazon Elastic Compute Cloud (Amazon EC2) instance ID that generated the traffic flow, which makes it easier to identify the EC2 instance and its associated VPC, subnet, and Availability Zone from where the traffic originated. However, when you have a large number of EC2 instances running in your environment, it may not be obvious where the traffic is coming from or going to simply based on the EC2 instance IDs or IP addresses contained in the VPC flow log records.

By enriching flow log records with additional metadata such as resource tags associated with the source and destination resources, you can more easily understand and analyze traffic patterns in your environment. For example, customers often tag their resources with resource names and project names. By enriching flow log records with resource tags, you can easily query and view flow log records based on an EC2 instance name, or identify all traffic for a certain project.

In addition, you can add resource context and metadata about the destination resource such as the destination EC2 instance ID and its associated VPC, subnet, and Availability Zone based on the destination IP in the flow logs. This way, you can easily query your flow logs to identify traffic crossing Availability Zones or VPCs.

In this post, you will learn how to enrich flow logs with tags associated with resources from VPC flow logs in a completely serverless model using Amazon Kinesis Data Firehose and the recently launched Amazon VPC IP Address Manager (IPAM), and also analyze and visualize the flow logs using Amazon Athena and Amazon QuickSight.

Solution overview

In this solution, you enable VPC flow logs and stream them to Kinesis Data Firehose. This solution enriches log records using an AWS Lambda function on Kinesis Data Firehose in a completely serverless manner. The Lambda function fetches resource tags for the instance ID. It also looks up the destination resource from the destination IP using the Amazon EC2 API and IPAM, and adds the associated VPC network context and metadata for the destination resource. It then stores the enriched log records in an Amazon Simple Storage Service (Amazon S3) bucket. After you have enriched your flow logs, you can query, view, and analyze them in a wide variety of services, such as AWS Glue, Athena, QuickSight, Amazon OpenSearch Service, as well as solutions from the AWS Partner Network such as Splunk and Datadog.

The following diagram illustrates the solution architecture.

The workflow contains the following steps:

Amazon VPC sends the VPC flow logs to the Kinesis Data Firehose delivery stream.
The delivery stream uses a Lambda function to fetch resource tags for instance IDs from the flow log record and add it to the record. You can also fetch tags for the source and destination IP address and enrich the flow log record.
When the Lambda function finishes processing all the records from the Kinesis Data Firehose buffer with enriched information like resource tags, Kinesis Data Firehose stores the result file in the destination S3 bucket. Any failed records that Kinesis Data Firehose couldn’t process are stored in the destination S3 bucket under the prefix you specify during delivery stream setup.
All the logs for the delivery stream and Lambda function are stored in Amazon CloudWatch log groups.

Prerequisites

As a prerequisite, you need to create the target S3 bucket before creating the Kinesis Data Firehose delivery stream.

If using a Windows computer, you need PowerShell; if using a Mac, you need Terminal to run AWS Command Line Interface (AWS CLI) commands. To install the latest version of the AWS CLI, refer to Installing or updating the latest version of the AWS CLI.

Create a Lambda function

You can download the Lambda function code from the GitHub repo used in this solution. The example in this post assumes you are enabling all the available fields in the VPC flow logs. You can use it as is or customize per your needs. For example, if you intend to use the default fields when enabling the VPC flow logs, you need to modify the Lambda function with the respective fields. Creating this function creates an AWS Identity and Access Management (IAM) Lambda execution role.

To create your Lambda function, complete the following steps:

On the Lambda console, choose Functions in the navigation pane.
Choose Create function.
Select Author from scratch.
For Function name, enter a name.
For Runtime, choose Python 3.8.
For Architecture, select x86_64.
For Execution role, select Create a new role with basic Lambda permissions.
Choose Create function.

You can then see code source page, as shown in the following screenshot, with the default code in the lambda_function.py file.

Delete the default code and enter the code from the GitHub Lambda function aws-vpc-flowlogs-enricher.py.
Choose Deploy.

To enrich the flow logs with additional tag information, you need to create an additional IAM policy to give Lambda permission to describe tags on resources from the VPC flow logs.

On the IAM console, choose Policies in the navigation pane.
Choose Create policy.
On the JSON tab, enter the JSON code as shown in the following screenshot.

This policy gives the Lambda function permission to retrieve tags for the source and destination IP and retrieve the VPC ID, subnet ID, and other relevant metadata for the destination IP from your VPC flow log record.

Choose Next: Tags.

Add any tags and choose Next: Review.

For Name, enter vpcfl-describe-tag-policy.
For Description, enter a description.
Choose Create policy.

Navigate to the previously created Lambda function and choose Permissions in the navigation pane.
Choose the role that was created by Lambda function.

A page opens in a new tab.

On the Add permissions menu, choose Attach policies.

Search for the vpcfl-describe-tag-policy you just created.
Select the vpcfl-describe-tag-policy and choose Attach policies.

Create the Kinesis Data Firehose delivery stream

To create your delivery stream, complete the following steps:

On the Kinesis Data Firehose console, choose Create delivery stream.
For Source, choose Direct PUT.
For Destination, choose Amazon S3.

After you choose Amazon S3 for Destination, the Transform and convert records section appears.

For Data transformation, select Enable.
Browse and choose the Lambda function you created earlier.
You can customize the buffer size as needed.

This impacts on how many records the delivery stream will buffer before it flushes it to Amazon S3.

You can also customize the buffer interval as needed.

This impacts how long (in seconds) the delivery stream will buffer the incoming records from the VPC.

Optionally, you can enable Record format conversion.

If you want to query from Athena, it’s recommended to convert it to Apache Parquet or ORC and compress the files with available compression algorithms, such as gzip and snappy. For more performance tips, refer to Top 10 Performance Tuning Tips for Amazon Athena. In this post, record format conversion is disabled.

For S3 bucket, choose Browse and choose the S3 bucket you created as a prerequisite to store the flow logs.
Optionally, you can specify the S3 bucket prefix. The following expression creates a Hive-style partition for year, month, and day:

AWSLogs/year=!{timestamp:YYYY}/month=!{timestamp:MM}/day=!{timestamp:dd}/

Optionally, you can enable dynamic partitioning.

Dynamic partitioning enables you to create targeted datasets by partitioning streaming S3 data based on partitioning keys. The right partitioning can help you to save costs related to the amount of data that is scanned by analytics services like Athena. For more information, see Kinesis Data Firehose now supports dynamic partitioning to Amazon S3.

Note that you can enable dynamic partitioning only when you create a new delivery stream. You can’t enable dynamic partitioning for an existing delivery stream.

Expand Buffer hints, compression and encryption.
Set the buffer size to 128 and buffer interval to 900 for best performance.
For Compression for data records, select GZIP.

Create a VPC flow log subscription

Now you create a VPC flow log subscription for the Kinesis Data Firehose delivery stream you created.

Navigate to AWS CloudShell or Terminal/PowerShell for a Mac or Windows computer and run the following AWS CLI command to enable the subscription. Provide your VPC ID for the parameter --resource-ids and delivery stream ARN for the parameter --log-destination.

aws ec2 create-flow-logs \ 
--resource-type VPC \ 
--resource-ids vpc-0000012345f123400d \ 
--traffic-type ALL \ 
--log-destination-type kinesis-data-firehose \ 
--log-destination arn:aws:firehose:us-east-1:123456789101:deliverystream/PUT-Kinesis-Demo-Stream \ 
--max-aggregation-interval 60 \ 
--log-format '${account-id} ${action} ${az-id} ${bytes} ${dstaddr} ${dstport} ${end} ${flow-direction} ${instance-id} ${interface-id} ${log-status} ${packets} ${pkt-dst-aws-service} ${pkt-dstaddr} ${pkt-src-aws-service} ${pkt-srcaddr} ${protocol} ${region} ${srcaddr} ${srcport} ${start} ${sublocation-id} ${sublocation-type} ${subnet-id} ${tcp-flags} ${traffic-path} ${type} ${version} ${vpc-id}'

If you’re running CloudShell for the first time, it will take a few seconds to prepare the environment to run.

After you successfully enable the subscription for your VPC flow logs, it takes a few minutes depending on the intervals mentioned in the setup to create the log record files in the destination S3 folder.

To view those files, navigate to the Amazon S3 console and choose the bucket storing the flow logs. You should see the compressed interval logs, as shown in the following screenshot.

You can download any file from the destination S3 bucket on your computer. Then extract the gzip file and view it in your favorite text editor.

The following is a sample enriched flow log record, with the new fields in bold providing added context and metadata of the source and destination IP addresses:

{'account-id': '123456789101',
 'action': 'ACCEPT',
 'az-id': 'use1-az2',
 'bytes': '7251',
 'dstaddr': '10.10.10.10',
 'dstport': '52942',
 'end': '1661285182',
 'flow-direction': 'ingress',
 'instance-id': 'i-123456789',
 'interface-id': 'eni-0123a456b789d',
 'log-status': 'OK',
 'packets': '25',
 'pkt-dst-aws-service': '-',
 'pkt-dstaddr': '10.10.10.11',
 'pkt-src-aws-service': 'AMAZON',
 'pkt-srcaddr': '52.52.52.152',
 'protocol': '6',
 'region': 'us-east-1',
 'srcaddr': '52.52.52.152',
 'srcport': '443',
 'start': '1661285124',
 'sublocation-id': '-',
 'sublocation-type': '-',
 'subnet-id': 'subnet-01eb23eb4fe5c6bd7',
 'tcp-flags': '19',
 'traffic-path': '-',
 'type': 'IPv4',
 'version': '5',
 'vpc-id': 'vpc-0123a456b789d',
 'src-tag-Name': 'test-traffic-ec2-1', 'src-tag-project': ‘Log Analytics’, 'src-tag-team': 'Engineering', 'dst-tag-Name': 'test-traffic-ec2-1', 'dst-tag-project': ‘Log Analytics’, 'dst-tag-team': 'Engineering', 'dst-vpc-id': 'vpc-0bf974690f763100d', 'dst-az-id': 'us-east-1a', 'dst-subnet-id': 'subnet-01eb23eb4fe5c6bd7', 'dst-interface-id': 'eni-01eb23eb4fe5c6bd7', 'dst-instance-id': 'i-06be6f86af0353293'}

Create an Athena database and AWS Glue crawler

Now that you have enriched the VPC flow logs and stored them in Amazon S3, the next step is to create the Athena database and table to query the data. You first create an AWS Glue crawler to infer the schema from the log files in Amazon S3.

On the AWS Glue console, choose Crawlers in the navigation pane.
Choose Create crawler.

For Name¸ enter a name for the crawler.
For Description, enter an optional description.
Choose Next.

Choose Add a data source.
For Data source¸ choose S3.
For S3 path, provide the path of the flow logs bucket.
Select Crawl all sub-folders.
Choose Add an S3 data source.

Choose Next.

Choose Create new IAM role.
Enter a role name.
Choose Next.

Choose Add database.
For Name, enter a database name.
For Description, enter an optional description.
Choose Create database.

On the previous tab for the AWS Glue crawler setup, for Target database, choose the newly created database.
Choose Next.

Review the configuration and choose Create crawler.

On the Crawlers page, select the crawler you created and choose Run.

You can rerun this crawler when new tags are added to your AWS resources, so that they’re available for you to query from the Athena database.

Run Athena queries

Now you’re ready to query the enriched VPC flow logs from Athena.

On the Athena console, open the query editor.
For Database, choose the database you created.
Enter the query as shown in the following screenshot and choose Run.

The following code shows some of the sample queries you can run:

Select * from awslogs where "dst-az-id"='us-east-1a'
Select * from awslogs where "src-tag-project"='Log Analytics' or "dst-tag-team"='Engineering' 
Select "srcaddr", "srcport", "dstaddr", "dstport", "region", "az-id", "dst-az-id", "flow-direction" from awslogs where "az-id"='use1-az2' and "dst-az-id"='us-east-1a'

The following screenshot shows an example query result of the source Availability Zone to the destination Availability Zone traffic.

You can also visualize various charts for the flow logs stored in the S3 bucket via QuickSight. For more information, refer to Analyzing VPC Flow Logs using Amazon Athena, and Amazon QuickSight.

Pricing

For pricing details, refer to Amazon Kinesis Data Firehose pricing.

Clean up

To clean up your resources, complete the following steps:

Delete the Kinesis Data Firehose delivery stream and associated IAM role and policies.
Delete the target S3 bucket.
Delete the VPC flow log subscription.
Delete the Lambda function and associated IAM role and policy.

Conclusion

This post provided a complete serverless solution architecture for enriching VPC flow log records with additional information like resource tags using a Kinesis Data Firehose delivery stream and Lambda function to process logs to enrich with metadata and store in a target S3 file. This solution can help you query, analyze, and visualize VPC flow logs with relevant application metadata because resource tags have been assigned to resources that are available in the logs. This meaningful information associated with each log record wherever the tags are available makes it easy to associate log information to your application.

We encourage you to follow the steps provided in this post to create a delivery stream, integrate with your VPC flow logs, and create a Lambda function to enrich the flow log records with additional metadata to more easily understand and analyze traffic patterns in your environment.

About the Authors

Chaitanya Shah is a Sr. Technical Account Manager with AWS, based out of New York. He has over 22 years of experience working with enterprise customers. He loves to code and actively contributes to AWS solutions labs to help customers solve complex problems. He provides guidance to AWS customers on best practices for their AWS Cloud migrations. He is also specialized in AWS data transfer and in the data and analytics domain.

Vaibhav Katkade is a Senior Product Manager in the Amazon VPC team. He is interested in areas of network security and cloud networking operations. Outside of work, he enjoys cooking and the outdoors.

Learn to program in Python with our online courses

2022-10-27 Rosa Brown

Post Syndicated from Rosa Brown original https://www.raspberrypi.org/blog/learn-to-program-in-python-online-courses-for-teachers/

If you’re new to teaching programming or looking to build or refresh your programming knowledge, we have a free resource that is perfect for you. Our ‘Learn to program in Python’ online course pathway is for educators who want to develop their understanding of the text-based language Python. Each course is packed with information and activities to help you apply what you learn in your classroom teaching.

A computing teacher and a learner do physical computing in the primary school classroom.

Why learn to program in Python?

Writing a program in Python is very similar to writing in English, which makes starting to program much easier. Python is also a general-purpose programming language, so once you’ve learned the basics, you can use Python for lots of different programming activities.

That’s why Python is a perfect choice for learning to program, and why many of our educational resources involve Python. Our seven online Python courses cover aspects from taking your first steps into programming, to writing a program to control an electronic circuit, to learning about object-oriented programming.

Start learning to program in Python

With time and practice, you will be able to use Python programming to create unique solutions to problems, build helpful tools, and make things that are important to you.

How does the Python course pathway work?

The courses in the pathway have been written by our educators and include advice and activities to help you teach programming in your classroom. You can reuse the course activities to explain programming concepts to your learners and get them to write programs themselves. Because you will have first-hand experience of the activities, you’ll be able to anticipate your learners’ difficulties and adapt your lessons to suit them.

In a computing classroom, a smiling girl raises her hand.

All the courses are designed to take three or four weeks to complete, based on you spending two hours a week on participating. You can have free time-limited access to each course for the length of time it’s designed to take to complete. For example, if it’s a four-week course, like ‘Programming 101’, you can sign up for free to get four weeks of access.

The seven courses in the Python path can be completed in any order you like, and you can choose the courses that match your interests and needs.

A room of educators at desktop computers.

Each course involves activities that help you create a programming project using the concepts that you’re learning about. These activities are designed to be a fun and interactive way to reinforce what you’ve learned and can also be used with your learners in the classroom.

Course spotlight: Programming 101

If programming is completely new to you, our ‘Programming 101’ course is the best place to start. In ‘Programming 101’, we use this definition of programming to start with the idea that programming is about you telling a computer what to do:

“Programming is how you get computers to solve problems.”

We see programming as a chance to think creatively about a problem and about all the different ways it could be solved. While you might be unfamiliar with terms like programming, algorithms, or selection, the ‘Programming 101’ course demonstrates how they touch on things that many of us know from other areas of our lives.

On the course, you will:

Learn about basic programming concepts such as sequencing and repetition
Start to write your own programs
Discover how to interpret error messages to find and fix mistakes in your programs

What will you make in the courses?

Through building an understanding of programming, you will see how you can write your own programs to make games, quizzes, physical computing projects, and more. Here’s look at some of the things you could make in three of the seven courses:

Programming 101: Write your first program in Python to make a personal assistant bot. You’ll discover how to make the output of your program respond to the user’s input.

alt="" — You’ll write a program to create personal assistant bot in the ‘Programming 101’ course for beginners.

Programming with GUIs: Build a game where players compare two sets of emoji to find the emoji that matches. To make this game, you’ll use what you learn in the course to design the layout of a graphic user interface (GUI) and make sure only one emoji appears twice.

alt="" — You’ll make an interactive graphic game in the ‘Programming with GUIs’ course.

Object-oriented Programming: Create a text-based adventure game with a character on a quest through different rooms! You’ll discover how to write a program that reacts to user input, and how to write your own code to create more challenges within the game based on your ideas.

So check out our courses and start gaining Python programming skills today!

Python programming resources for young people

If you want to help your learners develop their understanding of programming in Python, you’ll be interested in these free resources we’ve created for young people:

Introduction to Python: Our guided project path for learners who are new to text-based programming. We have created these projects with young people around the age of 9 to 13 in mind. Each project takes one hour to complete, and learners can make their own fun programs while learning about Python.

More Python: Our guided project path for learners who want to move beyond the ‘Intro to Python’ path to write programs that contain charts, artwork, and more. We’ve written these projects for young people around the age of 10 to 13.

Isaac Computer Science: This learning platform we’ve created for GCSE and A level students (age 14 to 18) uses Python and other text-based languages to teach the programming concepts within England’s computer science curriculum.

The post Learn to program in Python with our online courses appeared first on Raspberry Pi.

How do I start my child coding?

2022-07-14 Marc Scott

Post Syndicated from Marc Scott original https://www.raspberrypi.org/blog/how-do-i-start-my-child-coding/

You may have heard a lot about coding and how important it is for children to start learning about coding as early as possible. Computers have become part of our lives, and we’re not just talking about the laptop or desktop computer you might have in your home or on your desk at work. Your phone, your microwave, and your car are all controlled by computers, and those computers need instructions to tell them what to do. Coding, or computer programming, involves writing those instructions.

A boy types code at a CoderDojo coding club.

If children discover a love for coding, they will have an avenue to make the things they want to make; to write programs and build projects that they find useful, fun, or interesting. So how do you give your child the opportunity to learn about coding? We’ve listed some free resources and suggested activities below.

Scratch Junior

If you have a young child under about 7 years of age, then a great place to begin is with ScratchJr. This is an app available on Android and iOS phones and tablets, that lets children learn the basics of programming, without having to worry about making mistakes.

Code Club World

The Raspberry Pi Foundation has developed a series of activities for young learners, on their journey to developing their computing skills. Code Club World provides a platform for children to play with code to design their own avatar, make it dance, and play music. Plus they can share their creations with other learners.

“You could have a go too and discover Scratch together. The platform is designed for complete beginners and it is great fun to play with.”

Carol Thornhill, Engineering Science MA, Mathematics teacher

Scratch

For 7- to 11-year-old children, Scratch is a good way to begin their journey in coding, or to progress from ScratchJr. Like ScratchJr, Scratch is a block-based language, allowing children to assemble code to produce games, animations, stories, or even use some of the add-ons to interact with electronic devices and explore physical computing.

The Raspberry Pi Foundation has hundreds of Scratch projects that your child can try out, but the best place to begin is with our Introduction to Scratch path, which will provide your child with the basic skills they need, and then encourage them to build projects that are relevant to them, culminating in their creation of their own interactive ebook.

Try out an interactive animation coded in Scratch

Your child may never tire of Scratch, and that is absolutely fine — it is a fully functioning programming language that is surprisingly powerful, when you learn to understand everything it can do. Another advantage of Scratch is that it provides easy access to graphics, sounds, and interactivity that can be trickier to achieve in other programming languages.

Python

If you’re looking for more traditional programming languages for your child to progress on to, especially when they reach 12 years of age or beyond, then we like to direct our young learners to the Python programming language and to the languages that the World Wide Web is built on, particularly HTML, CSS, and JavaScript.

Our Python resources cover the basics of using the language, and then progress from there. Python is one of the most widely used languages when it comes to the fields of artificial intelligence and data science, and we have resources to support your child in learning about these fascinating aspects of technology. Our projects can even introduce your child to the world of electronics and physical computing with activities that use the inexpensive Raspberry Pi Pico, and a handful of electronic components, enabling your kids to create a wide variety of art installations and useful gadgets.

“Trying Python doesn’t mean you can’t go back to Scratch or switch between Scratch and Python for different purposes. I still use Scratch for some projects myself!”

Tracy Gardner, Computer Science PhD, former IBM Software Architect and currently a project writer at the Raspberry Pi Foundation

A young person codes at a Raspberry Pi computer. — Python is a great text-based programming language for young people to learn.

Coding projects

On our coding tutorials website we have many different projects to help your child learn coding and digital making. These range from beginner resources like the Introduction to Scratch path to more advanced activities such as the Introduction to Unity path, where children can learn how to make 3D worlds and games.

“Our new project paths can be tackled by young creators on their own, without adult intervention. Paths are structured so that they build skills and confidence in the early stages, and then provide more open-ended tasks and inspirational ideas that creators can adapt or work from.”

Rik Cross, BSc (Hons), PGCE, former teacher and Director of Informal Learning at the Raspberry Pi Foundation

Web development

The Web is integral to many of our lives, and we believe that it is important for children to have an understanding of the technology that drives it. That is why we have an Introduction to the Web path that allows children to develop their own web pages, focusing on the kinds of webpages that they want to build, be that sending a greeting card, telling a story, or creating a showcase of their projects.

A girl has fun learning to code at home on a tablet sitting on a sofa. — It’s empowering for children to learn to how the websites they visit are created with code.

Coding clubs

Coding clubs are a great place for children to have fun and become more confident with coding, where they can learn through making and share their creations with each other. The Raspberry Pi Foundation operates the world’s largest network of coding clubs — CoderDojo and Code Club.

“I have a new group of creators at my Code Club every year and my favourite part is when they realise they really can let their imagination run wild. You want to make an animation where a talking pineapple chases a snowman — absolutely. You want to make a piece of scalable art out of 1000 pixelated cartoon musical instruments — go right ahead. If you can code it, you can make it ”

Liz Smart, Code Club and CoderDojo mentor, former Solutions Architect and project writer for the Raspberry Pi Foundation

Three teenage girls at a laptop. — At Code Club and CoderDojo, many young people enjoy teaming up to code projects together.

Coding challenges

Once your child has learnt some of the basics, they may enjoy entering a coding challenge! The European Astro Pi Challenge programme allows young people to write code and actually have it run on the International Space Station, and Coolest Projects gives children a chance to showcase their projects from across the globe.

A Coolest Projects participant — A girl with her coded creation at an in-person Coolest Projects showcase.

Free resources

No matter what technology your child wants to engage with, there is a wealth of free resources and materials available from organisations such as the Raspberry Pi Foundation and Scratch Foundation, that prepare young people for 21st century life. Whether they want to become professional software engineers, tinker with some electronics, or just have a play around … encourage them to explore some coding projects, and see what they can learn, make, and do!

Author: Marc Scott, BSc (Hons) is a former Science, Computer Science, and Engineering teacher and the Content Lead for Projects at the Raspberry Pi Foundation.

The post How do I start my child coding? appeared first on Raspberry Pi.

Simplify and optimize Python package management for AWS Glue PySpark jobs with AWS CodeArtifact

2022-06-10 Ashok Padmanabhan

Post Syndicated from Ashok Padmanabhan original https://aws.amazon.com/blogs/big-data/simplify-and-optimize-python-package-management-for-aws-glue-pyspark-jobs-with-aws-codeartifact/

Data engineers use various Python packages to meet their data processing requirements while building data pipelines with AWS Glue PySpark Jobs. Languages like Python and Scala are commonly used in data pipeline development. Developers can take advantage of their open-source packages or even customize their own to make it easier and faster to perform use cases, such as data manipulation and analysis. However, managing standardized packages can be cumbersome with multiple teams using different versions of packages, installing non-approved packages, and causing duplicate development effort due to the lack of visibility of what is available at the enterprise level. This can be especially challenging in large enterprises with multiple data engineering teams.

ETL Developers have requirements to use additional packages for their AWS Glue ETL jobs. With security being job zero for customers, many will restrict egress traffic from their VPC to the public internet, and they need a way to manage the packages used by applications including their data processing pipelines.

Our proposed solution will enable you with network egress restrictions to manage packages centrally with AWS CodeArtifact and use their favorite libraries in their AWS Glue ETL PySpark code. In this post, we’ll describe how CodeArtifact can be used for managing packages and modules for AWS Glue ETL jobs, and we’ll demo a solution using Glue PySpark jobs that run within VPC Subnets that have no internet access.

Solution overview

The solution uses CodeArtifact as a tool to make it easier for organizations of any size to securely store, publish, and share software packages used in their ETL with AWS Glue. VPC Endpoints will be enabled for CodeArtifact and Glue to enable private link connections. AWS Step Functions makes it easy to coordinate the orchestration of components used in the data processing pipeline. Native integrations with both CodeArtifact and AWS Glue enable the workflow to both authenticate the request to CodeArtifact and start the AWS Glue ETL job.

The following architecture shows an implementation of a solution using AWS Glue, CodeArtifact, and Step Functions to use additional Python modules without egress internet access. The solution is deployed using AWS Cloud Development Kit (AWS CDK), an open-source software development framework to define your cloud application resources using familiar programming languages.

Fig 1: Architecture Diagram for the Solution

To illustrate how to set up this architecture, we’ll walk you through the following steps:

Deploying an AWS CDK stack to provision the following AWS Resources
1. CodeArtifact
2. An AWS Glue job
3. Step Functions workflow
4. Amazon Simple Storage Service (Amazon S3) bucket
5. A VPC with a private Subnet and VPC Endpoints to Amazon S3 and CodeArtifact
Validate the Deployment.
Run a Sample Workflow – This workflow will run an AWS Glue PySpark job that uses a custom Python library, and an upgraded version of boto3.
Cleaning up your resources.

Prerequisites

Make sure that you complete the following steps as prerequisites:

Have an AWS account. For this post, you configure the required AWS resources using AWS CloudFormation. If you haven’t signed up, complete the following tasks:
- Create an account. For instructions, see Sign Up for AWS
- Create an AWS Identity and Access Management (IAM) user. For instructions, see Create IAM User.
Have the following installed and configured on your machine:

The solution

Launching your AWS CDK Stack

Step 1: Using your device’s command line, check out our Git repository to a local directory on your device:

git clone https://github.com/aws-samples/python-lib-management-without-internet-for-aws-glue-in-private-subnets.git

Step 2: Change directories to the new directory Amazon S3 script location:

cd python-lib-management-without-internet-for-aws-glue-in-private-subnets/scripts/s3

Step 3: Download the following CSV, which contains New York City Taxi and Limousine Commission (TLC) Trip weekly trips. This will serve as the input source for the AWS Glue Job:

aws s3 cp s3://nyc-tlc/misc/FOIL_weekly_trips_apps.csv .

Step 4: Change the directories to the path where the app.py file is located (in reference to the previous step, execute the following step):

cd ../..

Step 5: Create a virtual environment:

macOS/Linux:
python3 -m venv .env

Windows:
python -m venv .env

Step 6: Activate the virtual environment after the init process completes and the virtual environment is created:

macOS/Linux:
source .env/bin/activate

Windows:
.env\Scripts\activate.bat

Step 7: Install the required dependencies:

pip3 install -r requirements.txt

Step 8: Make sure that your AWS profile is setup along with the region that you want to deploy as mentioned in the prerequisite. Synthesize the templates. AWS CDK apps use code to define the infrastructure, and when run they produce or “synthesize” a CloudFormation template for each stack defined in the application:

cdk synthesize

Step 9: BootStrap the cdk app using the following command:

cdk bootstrap aws://<AWS_ACCOUNTID>/<AWS_REGION>

Replace the place holder AWS_ACCOUNTID and AWS_REGION with your AWS account ID and the region to be deployed.

This step provisions the initial resources, including an Amazon S3 bucket for storing files and IAM roles that grant permissions needed to perform deployments.

Step 10: Deploy the solution. By default, some actions that could potentially make security changes require approval. In this deployment, you’re creating an IAM role. The following command overrides the approval prompts, but if you would like to manually accept the prompts, then omit the --require-approval never flag:

cdk deploy "*" --require-approval never

While the AWS CDK deploys the CloudFormation stacks, you can follow the deployment progress in your terminal:

Fig 2: AWS CDK Deployment progress in terminal

Once the deployment is successful, you’ll see the successful status as follows:

Fig 3: AWS CDK Deployment completion success

Step 11: Log in to the AWS Console, go to CloudFormation, and see the output of the ApplicationStack stack:

Fig 4: AWS CloudFormation stack output

Note the values of the DomainName and RepositoryName variables. We’ll use them in the next step to upload our artifacts

Step 12: We will upload a custom library into the repo that we created. This will be used by our Glue ETL job.

Install twine using pip:

python3 -m pip install twine

The custom python package glueutils-0.2.0.tar.gz can be found under this folder of the cloned repo:

cd scripts/custom_glue_library

Configure twine with the login command (additional details here ). Refer to step 11 for the DomainName and RepositoryName from the CloudFormation output:

aws codeartifact login --tool twine --domain <DomainName> --domain-owner <AWS_ACCOUNTID> --repository <RepositoryName>

Publish Python package assets:

twine upload --repository codeartifact glueutils-0.2.0.tar.gz

Fig 5: Python package publishing using twine

Validate the Deployment

The AWS CDK stack will deploy the following AWS resources:

Amazon Virtual Private Cloud (Amazon VPC)
1. One Private Subnet
AWS CodeArtifact
1. CodeArtifact Repository
2. CodeArtifact Domain
3. CodeArtifact Upstream Repository
AWS Glue
1. AWS Glue Job
2. AWS Glue Database
3. AWS Glue Connection
AWS Step Function
Amazon S3 Bucket for AWS CDK and also for storing scripts and CSV file
IAM Roles and Policies
Amazon Elastic Compute Cloud (Amazon EC2) Security Group

Step 1: Browse to the AWS account and region via the AWS Console to which the resources are deployed.

Step 2: Browse the Subnet page (https://<region> .console.aws.amazon.com/vpc/home?region=<region> #subnets:) (*Replace region with actual AWS Region to which your resources are deployed)

Step 3: Select the Subnet with name as ApplicationStack/enterprise-repo-vpc/Enterprise-Repo-Private-Subnet1

Step 4: Select the Route Table and validate that there are no Internet Gateway or NAT Gateway for routes to Internet, and that it’s similar to the following image:

Fig 6: Route table validation

Step 5: Navigate to the CodeArtifact console and review the repositories created. The enterprise-repo is your local repository, and pypi-store is the upstream repository connected to the PyPI, providing artifacts from pypi.org.

Fig 7: AWS CodeArifact repositories created

Step 6: Navigate to enterprise-repo and search for glueutils. This is the custom python package that we published.

Fig 8: AWS CodeArifact custom python package published

Step 7: Navigate to Step Functions Console and review the enterprise-repo-step-function as follows:

Fig 9: AWS Step Functions workflow

The diagram shows how the Step Functions workflow will orchestrate the pattern.

The first step CodeArtifactGetAuthorizationToken calls the getAuthorizationToken API to generate a temporary authorization token for accessing repositories in the domain (this token is valid for 15 mins.).
The next step GenerateCodeArtifactURL takes the authorization token from the response and generates the CodeArtifact URL.
Then, this will move into the GlueStartJobRun state, which makes a synchronous API call to run the AWS Glue job.

Step 8: Navigate to the AWS Glue Console and select the Jobs tab, then select enterprise-repo-glue-job.

The AWS Glue job is created with the following script and AWS Glue Connection enterprise-repo-glue-connection. The AWS Glue connection is a Data Catalog object that enables the job to connect to sources and APIs from within the VPC. The network type connection runs the job from within the private subnet to make requests to Amazon S3 and CodeArtifact over the VPC endpoint connection. This enables the job to run without any traffic through the internet.

Note the connections section in the AWS Glue PySpark Job, which makes the Glue job run on the private subnet in the VPC provisioned.

Fig 10: AWS Glue network connections

The job takes an Amazon S3 bucket, Glue Database, Python Job Installer Option, and Additional Python Modules as job parameters. The parameters --additional-python-modules and --python-modules-installer-option are passed to install the selected Python module from a PyPI repository hosted in AWS CodeArtifact.

The script itself first reads the Amazon S3 input path of the taxi data in the CSV format. A light transformation to sum the total trips by year, week, and app is performed. Then the output is written to an Amazon S3 path as parquet . A partitioned table in the AWS Glue Data Catalog will either be created or updated if it already exists .

You can find the Glue PySpark script here.

Run a sample workflow

The following steps will demonstrate how to run a sample workflow:

Step 1: Navigate to the Step Functions Console and select the enterprise-repo-step-function.

Step 2: Select Start execution and input the following: We’re including the glueutils and latest boto3 libraries as part of the job run. It is always recommended to pin your python dependencies to avoid any breaking change due to a future version of dependency . In the below example, the latest available version of boto3, and the 0.2.0 version of glueutils will be installed. To pin it to a specific release you may add boto3==1.24.2 (Current latest release at the time of publishing this post).

{"pythonmodules": "boto3,glueutils==0.2.0"}

Step 3: Select Start execution and wait until Execution Status is Succeeded. This may take a few minutes.

Step 4: Navigate to the CodeArtifact Console to review the enterprise-repo repository. You’ll see the cached PyPi packages and all of their dependencies pulled down from PyPi.

Step 5: In the Glue Console under the Runs section of the enterprise-glue-job, you’ll see the parameters passed:

Fig 11 : AWS Glue job execution history

Note the --index-url which was passed as a parameter to the glue ETL job. The token is valid only for 15 minutes.

Step 6: Navigate to the Amazon CloudWatch Console and go to the /aws/glue-jobs log group to verify that the packages were installed from the local repo.

You will see that the 2 package names passed as parameters are installed with the corresponding versions.

Fig 12 : Amazon CloudWatch logs details for the Glue job

Step 7: Navigate to the Amazon Athena console and select Query Editor.

Step 8: Run the following query to validate the output of the AWS Glue job:

SELECT year, app, SUM(total_trips) as sum_of_total_trips 
FROM 
"codeartifactblog_glue_db"."taxidataparquet" 
GROUP BY year, app;

Clean up

Make sure that you clean up all of the other AWS resources that you created in the AWS CDK Stack deployment. You can delete these resources via the AWS CDK Destroy command as follows or the CloudFormation console.

To destroy the resources using AWS CDK, follow these steps:

Follow Steps 1-6 from the ‘Launching your CDK Stack’ section.
Destroy the app by executing the following command:
```
cdk destroy
```

Conclusion

In this post, we demonstrated how CodeArtifact can be used for managing Python packages and modules for AWS Glue jobs that run within VPC Subnets that have no internet access. We also demonstrated how the versions of existing packages can be updated (i.e., boto3) and a custom Python library (glueutils) that is developed locally is also managed through CodeArtifact.

This post enables you to use your favorite Python packages with AWS Glue ETL PySpark jobs by modifying the input to the AWS StepFunctions workflow (Step 2 in the Run a Sample workflow section).

About the Authors

Bret Pontillo is a Data & ML Engineer with AWS Professional Services. He works closely with enterprise customers building data lakes and analytical applications on the AWS platform. In his free time, Bret enjoys traveling, watching sports, and trying new restaurants.

Gaurav Gundal is a DevOps consultant with AWS Professional Services, helping customers build solutions on the customer platform. When not building, designing, or developing solutions, Gaurav spends time with his family, plays guitar, and enjoys traveling to different places.

Ashok Padmanabhan is a Sr. IOT Data Architect with AWS Professional Services, helping customers build data and analytics platform and solutions. When not helping customers build and design data lakes, Ashok enjoys spending time at the beach near his home in Florida.

Python coding for kids: Moving beyond the basics

2022-04-14 Rebecca Franks

Post Syndicated from Rebecca Franks original https://www.raspberrypi.org/blog/python-coding-for-kids-beyond-the-basics/

We are excited to announce our second new Python learning path, ‘More Python’, which shows young coders how to add real data to their programs while creating projects from a chart of Olympic medals to an interactive world map. The six guided Python projects in this free learning path are designed to enable young people to independently create their own Python projects about the topics that matter to them.

A girl points excitedly at a project on the Raspberry Pi Foundation's projects site. — Two kids are at a laptop with one of our coding projects.

In this post, we’ll show you how kids use the projects in the ‘More Python’ path, what they can make by following the path, and how the path structure helps them become confident and independent digital makers.

Python coding for kids: Our learning paths

Our ‘Introduction to Python’ learning path is the perfect place to start learning how to use Python, a text-based programming language. When we launched the Intro path in February, we explained why Python is such a popular, useful, and accessible programming language for young people.

Start the ‘More Python’ path

Because Python has so much to offer, we have created a second Python path for young people who have learned the basics in the first path. In this new set of six projects, learners will discover new concepts and see how to add different types of real data to their programs.

Key questions answered

Who is this path for?

We have written the projects in this path with young people around the age of 10 to 13 in mind. To code in a text-based language, a young person needs to be familiar with using a keyboard, due to the typing involved. Learners should have already completed the ‘Introduction to Python’ project path, as they will build on the learning from that path.

Three young tech creators show off their tech project at Coolest Projects.

How do young people learn with the projects?

Young people need access to a web browser to complete our project paths. Each project contains step-by-step instructions for learners to follow, and tick boxes to mark when they complete each step. On top of that, the projects have steps for learners to:

Reflect on what they have covered in the project
Share their projects with others
See suggestions to upgrade their projects

Young people also have the option to sign up for an account with us so they can save their progress at any time and collect badges.

While learners follow the project instructions in this project path, they write their code into Trinket, a free web-based coding platform accessible in a browser. Each project contains a link to a starter Trinket, which includes everything to get started writing Python code — no need to install any additional software.

Screenshot of Python code in the online IDE Trinket. — This is what Python code on Trinket looks like.

If they prefer, however, young people also have the option of instead writing their code in a desktop-based programming environment, such as Thonny, as they work through the projects.

What will young people learn?

To use data in their Python programs, the project instructions show learners how to:

Create and use lists
Create and use dictionaries
Read data from a data file

The projects support learners as they explore new concepts of digital visual media and:

Create charts using the Python library Pygal
Plot pins on a map
Create randomised artwork

In each project, learners reflect and answer questions about their work, which is important for connecting the project’s content to their pre-existing knowledge.

In a computing classroom, a girl laughs at what she sees on the screen.

As they work through the projects, learners see different ways to present data and then decide how they want to present their data in the final project in the path. You’ll find out what the projects are on the path page, or at the bottom of this blog post.

The project path helps learners become independent coders and digital makers, as each project contains slightly less support than the one before. You can read about how our project paths are designed to increase young people’s independence, and explore our other free learning paths for young coders

How long will the path take to complete?

We’ve designed the path to be completed in around six one-hour sessions, with one hour per project, at home, in school, or at a coding club. The project instructions encourage learners to add code to upgrade their projects and go further if they wish. This means that young people might want to spend a little more time getting their projects exactly as they imagine them.

In a classroom, a teacher and a student look at a computer screen while the student types on the keyboard.

What can young people do next?

Use Unity to create a 3D world

Unity is a free development environment for creating 3D virtual environments, including games, visual novels, and animations, all with the text-based programming language C#. Our ‘Introduction to Unity’ project path for keen coders shows how to make 3D worlds and games with collectibles, timers, and non-player characters.

Take part in Coolest Projects Global

At the end of the ‘More Python’ path, learners are encouraged to register a project they’ve made using their new coding skills for Coolest Projects Global, our free and world-leading online technology showcase for young tech creators. The project they register will become part of the online gallery, where members of the Coolest Projects community can celebrate each other’s creations.

A young coder shows off her tech project for Coolest Projects to two other young tech creators.

We welcome projects from all young people, whether they are beginners or experienced coders and digital makers. Coolest Projects Global is a unique opportunity for young people to share their ingenuity with the world and with other young people who love coding and creating with digital technology.

Learn more about Coolest Projects Global

Details about the projects in ‘More Python’

The ‘More Python’ path is structured according to our Digital Making Framework, with three Explore project, two Design projects, and a final Invent project.

Explore project 1: Charting champions

Illustration of a fast-moving, smiling robot wearing a champion's rosette.

In this Explore project, learners discover the power of lists in Python by creating an interactive chart of Olympic medals. They learn how to read data from a text file and then present that data as a bar chart.

Explore project 2: Solar system

Illustration of our solar system.

In this Explore project, learners create a simulation of the solar system. They revisit the drawing and animation skills that they learned in the ‘Introduction to Python’ project path to produce animated planets orbiting the sun. The animation is based on real data taken from a data file to simulate the speed that the planets move at as they orbit. The simulation is also interactive, using dictionaries to display data about the planets that have been selected.

Explore project 3: Codebreaker

Illustration of a person thinking about codebreaking.

The final Explore project gets learners to build on their knowledge of lists and dictionaries by creating a program that encodes and decodes a message using an Atbash cipher. The Atbash cipher was originally developed in the Hebrew language. It takes the alphabet and matches it to its reverse order to create a secret message. They also create a script that checks how many times certain letters have been used in an encoded message, so that they can discover patterns.

Design project 1: Encoded art

Illustration of a robot painting a portrait of another robot.

The first Design project allows learners to create fun pieces of artwork by encoding the letters of their name into images, patterns, or drawings. Learners can choose the images that will be produced for each letter, and whether these appear at random or in a geometric pattern.

Learners are encouraged to share their encoded artwork in the community library, where there are lots of fun projects to discover already. In this project, learners apply all of the coding skills and knowledge covered in the Explore projects, including working with dictionaries and lists.

Design project 2: Mapping data

Illustration of a map and a hand of someone marking it with a large pin.

In the next Design project, learners access data from a data file and use it to create location pins on a world map. They have six datasets to choose from, so they can use one that interests them. They can also choose from a variety of maps and design their own pin to truly personalise their projects.

Invent project: Persuasive data presentation

Illustration of different graph types

This project is designed to use all of the skills and knowledge covered in this path, and most of the skills from the ‘Introduction to Python’ path. Learners can choose from eight datasets to create data visualisations. They are also given instructions on how to access and prepare other datasets if they want to visualise data about a different topic.

Once learners have chosen their dataset, they can decide how they want it to be displayed. This could be a chart, a map with pins, or a unique data visualisation. There are lots of example projects to provide inspiration for learners. One of our favourites is the ISS Expedition project, which places flags on the ISS depending on the expedition number you enter.

The post Python coding for kids: Moving beyond the basics appeared first on Raspberry Pi.

New for Amazon CodeGuru Reviewer – Detector Library and Security Detectors for Log-Injection Flaws

2022-02-15 Danilo Poccia

Post Syndicated from Danilo Poccia original https://aws.amazon.com/blogs/aws/new-for-amazon-codeguru-reviewer-detector-library-and-security-detectors-for-log-injection-flaws/

Amazon CodeGuru Reviewer is a developer tool that detects security vulnerabilities in your code and provides intelligent recommendations to improve code quality. For example, CodeGuru Reviewer introduced Security Detectors for Java and Python code to identify security risks from the top ten Open Web Application Security Project (OWASP) categories and follow security best practices for AWS APIs and common crypto libraries. At re:Invent, CodeGuru Reviewer introduced a secrets detector to identify hardcoded secrets and suggest remediation steps to secure your secrets with AWS Secrets Manager. These capabilities help you find and remediate security issues before you deploy.

Today, I am happy to share two new features of CodeGuru Reviewer:

A new Detector Library describes in detail the detectors that CodeGuru Reviewer uses when looking for possible defects and includes code samples for both Java and Python.
New security detectors have been introduced for detecting log-injection flaws in Java and Python code, similar to what happened with the recent Apache Log4j vulnerability we described in this blog post.

Let’s see these new features in more detail.

Using the Detector Library
To help you understand more clearly which detectors CodeGuru Reviewer uses to review your code, we are now sharing a Detector Library where you can find detailed information and code samples.

These detectors help you build secure and efficient applications on AWS. In the Detector Library, you can find detailed information about CodeGuru Reviewer’s security and code quality detectors, including descriptions, their severity and potential impact on your application, and additional information that helps you mitigate risks.

Note that each detector looks for a wide range of code defects. We include one noncompliant and compliant code example for each detector. However, CodeGuru uses machine learning and automated reasoning to identify possible issues. For this reason, each detector can find a range of defects in addition to the explicit code example shown on the detector’s description page.

Let’s have a look at a few detectors. One detector is looking for insecure cross-origin resource sharing (CORS) policies that are too permissive and may lead to loading content from untrusted or malicious sources.

Another detector checks for improper input validation that can enable attacks and lead to unwanted behavior.

Specific detectors help you use the AWS SDK for Java and the AWS SDK for Python (Boto3) in your applications. For example, there are detectors that can detect hardcoded credentials, such as passwords and access keys, or inefficient polling of AWS resources.

New Detectors for Log-Injection Flaws
Following the recent Apache Log4j vulnerability, we introduced in CodeGuru Reviewer new detectors that check if you’re logging anything that is not sanitized and possibly executable. These detectors cover the issue described in CWE-117: Improper Output Neutralization for Logs.

These detectors work with Java and Python code and, for Java, are not limited to the Log4j library. They don’t work by looking at the version of the libraries you use, but check what you are actually logging. In this way, they can protect you if similar bugs happen in the future.

Following these detectors, user-provided inputs must be sanitized before they are logged. This avoids having an attacker be able to use this input to break the integrity of your logs, forge log entries, or bypass log monitors.

Availability and Pricing
These new features are available today in all AWS Regions where Amazon CodeGuru is offered. For more information, see the AWS Regional Services List.

The Detector Library is free to browse as part of the documentation. For the new detectors looking for log-injection flaws, standard pricing applies. See the CodeGuru pricing page for more information.

Start using Amazon CodeGuru Reviewer today to improve the security of your code.

— Danilo

Beyond “Just compile to WebAssembly”

The lifecycle of a Python Worker

A Python interpreter built into the Workers runtime

How Pyodide works

Pyodide and the magic of foreign function interfaces (FFI)

Why dynamic linking is essential, and static linking isn’t enough

Supporting Server and Client libraries

Async Client libraries

Synchronous Client libraries

WebAssembly Stack Switching

FastAPI and Python’s Asynchronous Server Gateway Interface

Importing Python Packages

Making cold starts faster with memory snapshots

Reusing Memory Snapshots

Future proofing compatibility with Pyodide versions and Compatibility Dates

How bindings work in Python Workers

Get this JavaScript out of my Python!

We’re just getting started with Python Workers

#10: Build a serverless retail solution for endless aisle on AWS

#9: Optimizing data with automated intelligent document processing solutions

#8: Disaster Recovery Solutions with AWS managed services, Part 3: Multi-Site Active/Passive

#7: Simulating Kubernetes-workload AZ failures with AWS Fault Injection Simulator

#6: Let’s Architect! Designing event-driven architectures

#5: Use a reusable ETL framework in your AWS lake house architecture

#4: Invoking asynchronous external APIs with AWS Step Functions

#3: Announcing updates to the AWS Well-Architected Framework

#2: Let’s Architect! Designing architectures for multi-tenancy

#1: Understand resiliency patterns and trade-offs to architect efficiently in the cloud

Bonus! Three older special mentions

Introduction

Challenges with handling state files

Best practices for handling state files

AWS configurations for managing state files

Architecture

Implementation

Pre-requisites

Setting up the environment

Preparing the Infrastructure

Preparing the Application

Cleanup

Conclusion

Usage Scenarios

Installation and Configuration

Working with Zabbix API

Sending Values to Zabbix Server/Proxy

Getting values from Zabbix Agent/Agent2 by item key.

Conclusions

Questions

Approaches to inject chaos in Lambda functions

Injecting chaos in Lambda functions using Lambda layers

The Chaos Injection Lambda Layers

Java Lambda Layer for Chaos Injection

Python and Node.js Lambda Layer for Chaos Injection

Sample FIS experiments using SSM and Lambda layers

Step 1: Complete the prerequisites

Step 2: Deploy using AWS CloudFormation

Step 3: Run the chaos injection experiment

Step 4: Start the experiment

Step 5: Run your test

Step 6: Roll back the experiment

Cleaning up

Using FIS Experiments results

Conclusion

Concepts

CDK Pipeline

Notifications

Manual Approval

Solution Overview

Prerequisites

Add notification to the pipeline

Adding Manual Approval

Considerations

Clean up

Conclusion

Customer document management challenges

Automated intelligent document processing solution

Solution overview

Customer benefits

Vertical industry use cases

Conclusion