Amazon Q Developer is a generative artificial intelligence (AI) powered conversational assistant that can help you understand, build, extend, and operate AWS applications. You can ask questions about AWS architecture, your AWS resources, best practices, documentation, support, and more.
With Amazon Q Developer in your IDE, you can write a comment in natural language that outlines a specific task, such as, “Upload a file with server-side encryption.” Based on this information, Amazon Q Developer recommends one or more code snippets directly in the IDE that can accomplish the task. You can quickly and easily accept the top suggestions (tab key), view more suggestions (arrow keys), or continue writing your own code.
However, Amazon Q Developer in the IDE is more than just a code completion plugin. Amazon Q Developer is a generative AI (GenAI) powered assistant for software development that can be used to have a conversation about your code, get code suggestions, or ask questions about building software. This provides the benefits of collaborative paired programming, powered by GenAI models that have been trained on billions of lines of code, from the Amazon internal code-base and publicly available sources.
The challenge
At the 2024 AWS Summit in Sydney, an exhilarating code challenge took center stage, pitting a Blue Team against a Red Team, with approximately 10 to 15 challengers in each team, in a battle of coding prowess. The challenge consisted of 20 tasks, starting with basic math and string manipulation, and progressively escalating in difficulty to include complex algorithms and intricate ciphers.
The Blue Team had a distinct advantage, leveraging the powerful capabilities of Amazon Q Developer, the most capable generative AI-powered assistant for software development. With Q Developer’s guidance, the Blue Team navigated increasingly complex tasks with ease, tapping into Q Developer’s vast knowledge base and problem-solving abilities. In contrast, the Red Team competed without assistance, relying solely on their own coding expertise and problem-solving skills to tackle daunting challenges.
As the competition unfolded, the two teams battled it out, each striving to outperform the other. The Blue Team’s efficient use of Amazon Q Developer proved to be a game-changer, allowing them to tackle the most challenging tasks with remarkable speed and accuracy. However, the Red Team’s sheer determination and technical prowess kept them in the running, showcasing their ability to think outside the box and devise innovative solutions.
The culmination of the code challenge was a thrilling finale, with both teams pushing the boundaries of their skills and ultimately leaving the audience in a state of admiration for their remarkable achievements.
The graph shows the average completion time in which Team Blue “Q Developer” completed more questions across the board in less time than Team Red “Solo Coder”. Within the 1-hour time limit, Team Blue got all the way to Question 19, whereas Team Red only got to Question 16.
There are some assumptions and validations. People who consider themselves very experienced programmers were encouraged to choose team Red and not use AI, to test themselves against team Blue, those using AI. The code challenges were designed to test the output of applying logic. They were specifically designed to be passable without the use of Amazon Q Developer, to test the optimization of writing logical code with Amazon Q Developer. As a result, the code tasks worked well with Amazon Q Developer due to the nature of and underlying training of Amazon Q Developer models. Many people who attended the event were not Python Programmers (we constrained the challenge to Python only), and walked away impressed at how much of the challenge they could complete.
As an example of one of the more complex questions competitors were given to solve was:
Implement the rail fence cipher.
In the Rail Fence cipher, the message is written downwards on successive "rails" of an imaginary fence, then moving up when we get to the bottom (like a zig-zag). Finally the message is then read off in rows.
For example, using three "rails" and the message "WE ARE DISCOVERED FLEE AT ONCE", the cipherer writes out:
W . . . E . . . C . . . R . . . L . . . T . . . E
. E . R . D . S . O . E . E . F . E . A . O . C .
. . A . . . I . . . V . . . D . . . E . . . N . .
Then reads off: WECRLTEERDSOEEFEAOCAIVDEN
Given variable a. Use a three-rail fence cipher so that result is equal to the decoded message of variable a.
The questions were both algorithmic and logical in nature, which made them great for testing conversational natural language capability to solve questions using Amazon Q Developer, or by applying one’s own logic to write code to solve the question.
Top scoring individual per team:
Total Questions Complete
individual time (min)
With Q Developer (Blue Team)
19
30.46
Solo Coder (Red Team)
16
58.06
By comparing the top two competitors, and considering the solo coder was a highly experienced programmer versus the top Q Developer coder, who was a relatively new programmer not familiar with Python, you can see the efficiency gain when using Q Developer as an AI peer programmer. It took the entire 60 minutes for the solo coder to complete 16 questions, whereas the Q Developer coder got to the final question (Question 20, incomplete) in half of the time.
Summary
Integrating advanced IDE features and adopting paired programming have significantly improved coding efficiency and quality. However, the introduction of Amazon Q Developer has taken this evolution to new heights. By tapping into Q Developer’s vast knowledge base and problem-solving capabilities, the Blue Team was able to navigate complex coding challenges with remarkable speed and accuracy, outperforming the unassisted Red Team. This highlights the transformative impact of leveraging generative AI as a collaborative pair programmer in modern software development, delivering greater efficiency, problem-solving, and, ultimately, higher-quality code. Get started with Amazon Q Developer for your IDE by installing the plugin and enabling your builder ID today.
We have developed an innovative activity to support young people as they transition from visual programming languages like Scratch to text-based programming languages like Python.
This activity introduces a unique interface that empowers learners to easily interact with Python while they create a customised painting app.
“The kids liked the self-paced learning, it allowed them to work at their own rate. They liked using RGB tables to find their specific colours.” – Code Club mentor
Why learn to code Python?
We’ve long been championing Python as an ideal tool for young people who want to start text-based programming. Python has simple syntax and needs very few lines of code to get started, and there is a vibrant community of supportive programmers surrounding it.
However, we know that starting with Python can be challenging for young people who have never done any text-based coding. They can face obstacles such as software installation issues, getting used to a new syntax, and the need for appropriate typing skills.
How ‘Paint with Python’ helps learners get started
‘Paint with Python’ is an online educational activity that addresses many of these challenges and helps young people learn to code Python for the first time. It’s entirely web-based, requiring no software installation beyond a web browser. Instructions are displayed in a side panel, allowing learners to read and code without needing to switch tabs.
To help young people with creating their painting app, much of the initial code is pre-written behind the scenes, which enables learners to focus on experimenting with Python and observing the outcomes. They engage with the code by clicking on suggested options or, in some cases, by typing small snippets of Python. For example, they can select colours from a range of options or, as they grow more confident, type RGB values to create custom colours.
The activity is fully responsive for mobile and tablet devices and provides a final view of the full program on the last page, together with suggested routes to continue learning text-based programming.
An accessible introduction to text-based programming
We believe this activity offers an accessible way for young learners to begin their journey with text-based programming and learning to code Python. The code they write is straightforward and the activity is designed to minimise errors. When mistakes do occur, the interface provides clear, constructive feedback, guiding learners to make corrections.
This activity was developed with support from the Cisco Foundation. Through our funding partnership with them, we’ve been able to provide thousands of young people with the inspiration and opportunity to progress their coding skills anywhere, and on any device.
In this article, we will explore the capabilities of the new asynchronous modules of the zabbix_utils library. Thanks to asynchronous execution, users can expect improved efficiency, reduced latency, and increased flexibility in interacting with Zabbix components, ultimately enabling them to create efficient and reliable monitoring solutions that meet their specific requirements.
There is a high demand for the Python library zabbix_utils. Since its release and up to the moment of writing this article, zabbix_utils has been downloaded from PyPI more than 15,000 times. Over the past week, the library has been downloaded more than 2,700 times. The first article about the zabbix_utils library has already gathered around 3,000 views. Among the array of tools available, the library has emerged as a popular choice, offering developers and administrators a comprehensive set of functions for interacting with Zabbix components such as Zabbix server, proxy, and agents.
Considering the demand from users, as well as the potential of asynchronous programming to optimize interaction with Zabbix, we are pleased to present a new version of the library with new asynchronous modules in addition to the existing synchronous ones. The new zabbix_utils modules are designed to provide a significant performance boost by taking advantage of the inherent benefits of asynchronous programming to speed up communication between Zabbix and your service or script.
From expedited data retrieval and real-time event monitoring to enhanced scalability, asynchronous programming empowers you to build highly efficient, flexible, and reliable monitoring solutions adapted to meet your specific needs and challenges.
The new version of zabbix_utils and its asynchronous components may be useful in the following scenarios:
Mass data gathering from multiple hosts: When it’s necessary to retrieve data from a large number of hosts simultaneously, asynchronous programming allows requests to be executed in parallel, significantly speeding up the data collection process;
Mass resource exporting: When templates, hosts or problems need to be exported in parallel. This parallel execution reduces the overall export time, especially when dealing with a large number of resources;
Sending alerts from or to your system: When certain actions need to be performed based on monitoring conditions, such as sending alerts or running scripts, asynchronous programming provides rapid condition processing and execution of corresponding actions;
Scaling the monitoring system: With an increase in the number of monitored resources or the volume of collected data, asynchronous programming provides better scalability and efficiency for the monitoring system.
Installation and Configuration
If you already use the zabbix_utils library, simply updating the library to the latest version and installing all necessary dependencies for asynchronous operation is sufficient. Otherwise, you can install the library with asynchronous support using the following methods:
By using pip:
~$ pip install zabbix_utils[async]
Using [async] allows you to install additional dependencies (extras) needed for the operation of asynchronous modules.
The process of working with the asynchronous version of the zabbix_utils library is similar to the synchronous one, except for some syntactic differences of asynchronous code in Python.
Working with Zabbix API
To work with the Zabbix API in asynchronous mode, you need to import the AsyncZabbixAPI class from the zabbix_utils library:
from zabbix_utils import AsyncZabbixAPI
Similar to the synchronous ZabbixAPI, the new AsyncZabbixAPI can use the following environment variables: ZABBIX_URL, ZABBIX_TOKEN, ZABBIX_USER, ZABBIX_PASSWORD. However, when creating an instance of the AsyncZabbixAPI class you cannot specify a token or a username and password, unlike the synchronous version. They can only be passed when calling the login() method. The following usage scenarios are available here:
Use preset values of environment variables, i.e., not pass any parameters to AsyncZabbixAPI:
Pass only the Zabbix API address as input, which can be specified as either the server IP/FQDN address or DNS name (in this case, the HTTP protocol will be used) or as an URL of Zabbix API:
api = AsyncZabbixAPI(url="127.0.0.1")
After declaring an instance of the AsyncZabbixAPI class, you need to call the login() method to authenticate with the Zabbix API. There are two ways to do this:
After completing all needed API requests, it is necessary to call logout() to close the API session if authentication was done using username and password, and also close the asynchronous sessions:
The asynchronous class AsyncSender has been added, which also helps to send values to the Zabbix server or proxy for items of the Zabbix Trapper data type.
AsyncSender can be imported as follows:
from zabbix_utils import AsyncSender
Values can be sent in a group, for this it is necessary to import ItemValue:
In the example, the chunk size is set to 2. So, 5 values passed in the code above will be sent in three requests of two, two, and one value, respectively.
If your server has multiple network interfaces, and values need to be sent from a specific one, the AsyncSender provides the option to specify a source_ip for sent values:
AsyncSender also supports reading connection parameters from the Zabbix agent/agent2 configuration file. To do this, you need to set the use_config flag and specify the path to the configuration file if it differs from the default /etc/zabbix/zabbix_agentd.conf:
Getting values from Zabbix Agent/Agent2 by item key.
In cases where you need the functionality of our standart zabbix_get utility but native to your Python project and working asynchronously, consider using the AsyncGetter class. A simple example of its usage looks like this:
The new version of the zabbix_utils library provides users with the ability to implement efficient and scalable monitoring solutions, ensuring fast and reliable communication with the Zabbix components. Asynchronous way of interaction gives a lot of room for performance improvement and flexible task management when handling a large volume of requests to Zabbix components such as Zabbix API and others.
We have no doubt that the new version of zabbix_utils will become an indispensable tool for developers and administrators, helping them create more efficient, flexible, and reliable monitoring solutions that best meet their requirements and expectations.
This new support for Python is different from how Workers have historically supported languages beyond JavaScript — in this case, we have directly integrated a Python implementation into workerd, the open-source Workers runtime. All bindings, including bindings to Vectorize, Workers AI, R2, Durable Objects, and more are supported on day one. Python Workers can import a subset of popular Python packages including FastAPI, Langchain, Numpy and more. There are no extra build steps or external toolchains.
To do this, we’ve had to push the bounds of all of our systems, from the runtime itself, to our deployment system, to the contents of the Worker bundle that is published across our network. You can read the docs, and start using it today.
We want to use this post to pull back the curtain on the internal lifecycle of a Python Worker, share what we’ve learned in the process, and highlight where we’re going next.
Beyond “Just compile to WebAssembly”
Cloudflare Workers have supported WebAssembly since 2018 — each Worker is a V8 isolate, powered by the same JavaScript engine as the Chrome web browser. In principle, it’s been possible for years to write Workers in any language — including Python — so long as it first compiles to WebAssembly or to JavaScript.
In practice, just because something is possible doesn’t mean it’s simple. And just because “hello world” works doesn’t mean you can reliably build an application. Building full applications requires supporting an ecosystem of packages that developers are used to building with. For a platform to truly support a programming language, it’s necessary to go much further than showing how to compile code using external toolchains.
Python Workers are different from what we’ve done in the past. It’s early, and still in beta, but we think it shows what providing first-class support for programming languages beyond JavaScript can look like on Workers.
Create an isolate for your Worker, and automatically inject Pyodide
Serve your Python code using Pyodide
This all happens under the hood — no extra toolchain or precompilation steps needed. The Python execution environment is provided for you, mirroring how Workers written in JavaScript already work.
A Python interpreter built into the Workers runtime
Just as JavaScript has many engines, Python has many implementations that can execute Python code. CPython is the reference implementation of Python. If you’ve used Python before, this is almost certainly what you’ve used, and is commonly referred to as just “Python”.
Pyodide is a port of CPython to WebAssembly. It interprets Python code, without any need to precompile the Python code itself to any other format. It runs in a web browser — check out this REPL. It is true to the CPython that Python developers know and expect, providing most of the Python Standard Library. It provides a foreign function interface (FFI) to JavaScript, allowing you to call JavaScript APIs directly from Python — more on this below. It provides popular open-source packages, and can import pure Python packages directly from PyPI.
Pyodide struck us as the perfect fit for Workers. It is designed to allow the core interpreter and each native Python module to be built as separate WebAssembly modules, dynamically linked at runtime. This allows the code footprint for these modules to be shared among all Workers running on the same machine, rather than requiring each Worker to bring its own copy. This is essential to making WebAssembly work well in the Workers environment, where we often run thousands of Workers per machine — we need Workers using the same programming language to share their runtime code footprint. Running thousands of Workers on every machine is what makes it possible for us to deploy every application in every location at a reasonable price.
Just like with JavaScript Workers, with Python Workers we provide the runtime for you:
Pyodide is currently the exception — most languages that target WebAssembly do not yet support dynamic linking, so each application ends up bringing its own copy of its language runtime. We hope to see more languages support dynamic linking in the future, so that we can more effectively bring them to Workers.
How Pyodide works
Pyodide executes Python code in WebAssembly, which is a sandboxed environment, separated from the host runtime. Unlike running native code, all operations outside of pure computation (such as file reads) must be provided by a runtime environment, then imported by the WebAssembly module.
LLVM provides three target triples for WebAssembly:
wasm32-unknown-unknown – this backend provides no C standard library or system call interface; to support this backend, we would need to manually rewrite every system or library call to make use of imports we would define ourselves in the runtime.
wasm32-wasi – WASI is a standardized system interface, and defines a standard set of imports that are implemented in WASI runtimes such as wasmtime.
wasm32-unknown-emscripten – Like WASI, Emscripten defines the imports that a WebAssembly program needs to execute, but also outputs an accompanying JavaScript library that implements these imported functions.
Pyodide uses Emscripten, and provides three things:
A distribution of the CPython interpreter, compiled using Emscripten
A foreign function interface (FFI) between Python and JavaScript
A set of third-party Python packages, compiled using Emscripten’s compiler to WebAssembly.
Of these targets, only Emscripten currently supports dynamic linking, which, as we noted above, is essential to providing a shared language runtime for Python that is shared across isolates. Emscripten does this by providing implementations of dlopen and dlsym, which use the accompanying JavaScript library to modify the WebAssembly program’s table to link additional WebAssembly-compiled modules at runtime. WASI does not yet support the dlopen/dlsym dynamic linking abstractions used by CPython.
Pyodide and the magic of foreign function interfaces (FFI)
You might have noticed that in our Hello World Python Worker, we import Response from the js module:
Most Workers are written in JavaScript, and most of our engineering effort on the Workers runtime goes into improving JavaScript Workers. There is a risk in adding a second language that it might never reach feature parity with the first language and always be a second class citizen. Pyodide’s foreign function interface (FFI) is critical to avoiding this by providing access to all JavaScript functionality from Python. This can be used by the Worker author directly, and it is also used to make packages like FastAPI and Langchain work out-of-the-box, as we’ll show later in this post.
An FFI is a system for calling functions in one language that are implemented in another language. In most cases, an FFI is defined by a “higher-level” language in order to call functions implemented in a systems language, often C. Python’s ctypes module is such a system. These sorts of foreign function interfaces are often difficult to use because of the nature of C APIs.
Pyodide’s foreign function interface is an interface between Python and JavaScript, which are two high level object-oriented languages with a lot of design similarities. When passed from one language to another, immutable types such as strings and numbers are transparently translated. All mutable objects are wrapped in an appropriate proxy.
When a JavaScript object is passed into Python, Pyodide determines which JavaScript protocols the object supports and dynamically constructs an appropriate Python class that implements the corresponding Python protocols. For example, if the JavaScript object supports the JavaScript iteration protocol then the proxy will support the Python iteration protocol. If the JavaScript object is a Promise or other thenable, the Python object will be an awaitable.
from js import JSON
js_array = JSON.parse("[1,2,3]")
for entry in js_array:
print(entry)
The lifecycle of a request to a Python Worker makes use of Pyodide’s FFI, wrapping the incoming JavaScript Request object in a JsProxy object that is accessible in your Python code. It then converts the value returned by the Python Worker’s handler into a JavaScript Response object that can be delivered back to the client:
Why dynamic linking is essential, and static linking isn’t enough
Python comes with a C FFI, and many Python packages use this FFI to import native libraries. These libraries are typically written in C, so they must first be compiled down to WebAssembly in order to work on the Workers runtime. As we noted above, Pyodide is built with Emscripten, which overrides Python’s C FFI — any time a package tries to load a native library, it is instead loaded from a WebAssembly module that is provided by the Workers runtime. Dynamic linking is what makes this possible — it is what lets us override Python’s C FFI, allowing Pyodide to support many Python packages that have native library dependencies.
Dynamic linking is “pay as you go”, while static linking is “pay upfront” — if code is statically linked into your binary, it must be loaded upfront in order for the binary to run, even if this code is never used.
Dynamic linking enables the Workers runtime to share the underlying WebAssembly modules of packages across different Workers that are running on the same machine.
We won’t go too much into detail on how dynamic linking works in Emscripten, but the main takeaway is that the Emscripten runtime fetches WebAssembly modules from a filesystem abstraction provided in JavaScript. For each Worker, we generate a filesystem at runtime, whose structure mimics a Python distribution that has the Worker’s dependencies installed, but whose underlying files are shared between Workers. This makes it possible to share Python and WebAssembly files between multiple Workers that import the same dependency. Today, we’re able to share these files across Workers, but copy them into each new isolate. We think we can go even further, by employing copy-on-write techniques to share the underlying resource across many Workers.
Supporting Server and Client libraries
Python has a wide variety of popular HTTP client libraries, including httpx, urllib3, requests and more. Unfortunately, none of them work out of the box in Pyodide. Adding support for these has been one of the longest running user requests for the Pyodide project. The Python HTTP client libraries all work with raw sockets, and the browser security model and CORS do not allow this, so we needed another way to make them work in the Workers runtime.
Async Client libraries
For libraries that can make requests asynchronously, including aiohttp and httpx, we can use the Fetch API to make requests. We do this by patching the library, instructing it to use the Fetch API from JavaScript — taking advantage of Pyodide’s FFI. The httpx patch ends up quite simple —fewer than 100 lines of code. Simplified even further, it looks like this:
Another challenge in supporting Python HTTP client libraries is that many Python APIs are synchronous. For these libraries, we cannot use the fetch API directly because it is asynchronous.
Thankfully, Joe Marshall recently landed a contribution to urllib3 that adds Pyodide support in web browsers by:
Checking if blocking with Atomics.wait() is possible a. If so, start a fetch worker thread b. Delegate the fetch operation to the worker thread and serialize the response into a SharedArrayBuffer c. In the Python thread, use Atomics.wait to block for the response in the SharedArrayBuffer
If Atomics.wait() doesn’t work, fall back to a synchronous XMLHttpRequest
Despite this, today Cloudflare Workers do not support worker threads or synchronous XMLHttpRequest, so neither of these two approaches will work in Python Workers. We do not support synchronous requests today, but there is a way forward…
FastAPI and Python’s Asynchronous Server Gateway Interface
FastAPI is one of the most popular libraries for defining Python servers. FastAPI applications use a protocol called the Asynchronous Server Gateway Interface (ASGI). This means that FastAPI never reads from or writes to a socket itself. An ASGI application expects to be hooked up to an ASGI server, typically uvicorn. The ASGI server handles all of the raw sockets on the application’s behalf.
Conveniently for us, this means that FastAPI works in Cloudflare Workers without any patches or changes to FastAPI itself. We simply need to replace uvicorn with an appropriate ASGI server that can run within a Worker. Our initial implementation lives here, in the fork of Pyodide that we maintain. We hope to add a more comprehensive feature set, add test coverage, and then upstream this implementation into Pyodide.
You can try this yourself by cloning cloudflare/python-workers-examples, and running npx wrangler@latest dev in the directory of the FastAPI example.
Importing Python Packages
Python Workers support a subset of Python packages, which are provided directly by Pyodide, including numpy, httpx, FastAPI, Langchain, and more. This ensures compatibility with the Pyodide runtime by pinning package versions to Pyodide versions, and allows Pyodide to patch internal implementations, as we showed above in the case of httpx.
To import a package, simply add it to your requirements.txt file, without adding a version number. A specific version of the package is provided directly by Pyodide. Today, you can use packages in local development, and in the coming weeks, you will be able to deploy Workers that define dependencies in a requirements.txt file. Later in this post, we’ll show how we’re thinking about managing new versions of Pyodide and packages.
We maintain our own fork of Pyodide, which allows us to provide patches specific to the Workers runtime, and to quickly expand our support for packages in Python Workers, while also committing to upstreaming our changes back to Pyodide, so that the whole ecosystem of developers can benefit.
Python packages are often big and memory hungry though, and they can do a lot of work at import time. How can we ensure that you can bring in the packages you need, while mitigating long cold start times?
Making cold starts faster with memory snapshots
In the example at the start of this post, in local development, we mentioned injecting Pyodide into your Worker. Pyodide itself is 6.4MB — and Python packages can also be quite large.
If we simply shoved Pyodide into your Worker and uploaded it to Cloudflare, that’d be quite a large Worker to load into a new isolate — cold starts would be slow. On a fast computer with a good network connection, Pyodide takes about two seconds to initialize in a web browser, one second of network time and one second of cpu time. It wouldn’t be acceptable to initialize it every time you update your code for every isolate your Worker runs in across Cloudflare’s network.
Wrangler uploads your Python code and your requirements.txt file to the Workers API
We send your Python code, and your requirements.txt file to the Workers runtime to be validated
We create a new isolate for your Worker, and automatically inject Pyodide plus any packages you’ve specified in your requirements.txt file.
We scan the Worker’s code for import statements, execute them, and then take a snapshot of the Worker’s WebAssembly linear memory. Effectively, we perform the expensive work of importing packages at deploy time, rather than at runtime.
We deploy this snapshot alongside your Worker’s Python code to Cloudflare’s network.
Just like a JavaScript Worker, we execute the Worker’s top-level scope.
When a request comes in to your Worker, we load this snapshot and use it to bootstrap your Worker in an isolate, avoiding expensive initialization time:
This takes cold starts for a basic Python Worker down to below 1 second. We’re not yet satisfied with this though. We’re confident that we can drive this down much, much further. How? By reusing memory snapshots.
Reusing Memory Snapshots
When you upload a Python Worker, we generate a single memory snapshot of the Worker’s top-level imports, including both Pyodide and any dependencies. This snapshot is specific to your Worker. It can’t be shared, even though most of its contents are the same as other Python Workers.
Instead, we can create a single, shared snapshot ahead of time, and preload it into a pool of “pre-warmed” isolates. These isolates would already have the Pyodide runtime loaded and ready — making a Python Worker work just like a JavaScript Worker. In both cases, the underlying interpreter and execution environment is provided by the Workers runtime, and available on-demand without delay. The only difference is that with Python, the interpreter runs in WebAssembly, within the Worker.
Snapshots are a common pattern across runtimes and execution environments. Node.js uses V8 snapshots to speed up startup time. You can take snapshots of Firecracker microVMs and resume execution in a different process. There’s lots more we can do here — not just for Python Workers, but for Workers written in JavaScript as well, caching snapshots of compiled code from top-level scope and the state of the isolate itself. Workers are so fast and efficient that to-date we haven’t had to take snapshots in this way, but we think there are still big performance gains to be had.
This is our biggest lever towards driving cold start times down over the rest of 2024.
Future proofing compatibility with Pyodide versions and Compatibility Dates
When you deploy a Worker to Cloudflare, you expect it to keep running indefinitely, even if you never update it again. There are Workers deployed in 2018 that are still running just fine in production.
We achieve this using Compatibility Dates and Compatibility Flags, which provide explicit opt-in mechanisms for new behavior and potentially backwards-incompatible changes, without impacting existing Workers.
This works in part because it mirrors how the Internet and web browsers work. You publish a web page with some JavaScript, and rightly expect it to work forever. Web browsers and Cloudflare Workers have the same type of commitment of stability to developers.
There is a challenge with Python though — both Pyodide and CPython are versioned. Updated versions are published regularly and can contain breaking changes. And Pyodide provides a set of built-in packages, each with a pinned version number. This presents a question — how should we allow you to update your Worker to a newer version of Pyodide?
A new version of Python is released every year in August, and a new version of Pyodide is released six (6) months later. When this new version of Pyodide is published, we will add it to Workers by gating it behind a Compatibility Flag, which is only enabled after a specified Compatibility Date. This lets us continually provide updates, without risk of breaking changes, extending the commitment we’ve made for JavaScript to Python.
Each Python release has a five (5) year support window. Once this support window has passed for a given version of Python, security patches are no longer applied, making this version unsafe to rely on. To mitigate this risk, while still trying to hold as true as possible to our commitment of stability and long-term support, after five years any Python Worker still on a Python release that is outside of the support window will be automatically moved forward to the next oldest Python release. Python is a mature and stable language, so we expect that in most cases, your Python Worker will continue running without issue. But we recommend updating the compatibility date of your Worker regularly, to stay within the support window.
In between Python releases, we also expect to update and add additional Python packages, using the same opt-in mechanism. A Compatibility Flag will be a combination of the Python version and the release date of a set of packages. For example, python_3.17_packages_2025_03_01.
How bindings work in Python Workers
We mentioned earlier that Pyodide provides a foreign function interface (FFI) to JavaScript — meaning that you can directly use JavaScript objects, methods, functions and more, directly from Python.
This means that from day one, all binding APIs to other Cloudflare resources are supported in Cloudflare Workers. The env object that is provided by handlers in Python Workers is a JavaScript object that Pyodide provides a proxy API to, handling type translations across languages automatically.
For example, to write to and read from a KV namespace from a Python Worker, you would write:
from js import Response
async def on_fetch(request, env):
await env.FOO.put("bar", "baz")
bar = await env.FOO.get("bar")
return Response.new(bar) # returns "baz"
This works for Web APIs too — see how Response is imported from the js module? You can import any global from JavaScript this way.
Get this JavaScript out of my Python!
You’re probably reading this post because you want to write Python instead of JavaScript. from js import Response just isn’t Pythonic. We know — and we have actually tackled this challenge before for another language (Rust). And we think we can do this even better for Python.
We launched workers-rs in 2021 to make it possible to write Workers in Rust. For each JavaScript API in Workers, we, alongside open-source contributors, have written bindings that expose a more idiomatic Rust API.
We plan to do the same for Python Workers — starting with the bindings to Workers AI and Vectorize. But while workers-rs requires that you use and update an external dependency, the APIs we provide with Python Workers will be built into the Workers runtime directly. Just update your compatibility date, and get the latest, most Pythonic APIs.
This is about more than just making bindings to resources on Cloudflare more Pythonic though — it’s about compatibility with the ecosystem.
Similar to how we recently converted workers-rs to use types from the http crate, which makes it easy to use the axum crate for routing, we aim to do the same for Python Workers. For example, the Python standard library provides a raw socket API, which many Python packages depend on. Workers already provides connect(), a JavaScript API for working with raw sockets. We see ways to provide at least a subset of the Python standard library’s socket API in Workers, enabling a broader set of Python packages to work on Workers, with less of a need for patches.
But ultimately, we hope to kick start an effort to create a standardized serverless API for Python. One that is easy to use for any Python developer and offers the same capabilities as JavaScript.
We’re just getting started with Python Workers
Providing true support for a new programming language is a big investment that goes far beyond making “hello world” work. We chose Python very intentionally — it’s the second most popular programming language after JavaScript — and we are committed to continuing to improve performance and widen our support for Python packages.
We’re grateful to the Pyodide maintainers and the broader Python community — and we’d love to hear from you. Drop into the Python Workers channel in the Cloudflare Developers Discord, or start a discussion on Github about what you’d like to see next and which Python packages you’d like us to support.
2023 was a rollercoaster year in tech, and we at the AWS Architecture Blog feel so fortunate to have shared in the excitement. As we move into 2024 and all of the new technologies we could see, we want to take a moment to highlight the brightest stars from 2023.
As always, thanks to our readers and to the many talented and hardworking Solutions Architects and other contributors to our blog.
I give you our 2023 cream of the crop!
#10: Build a serverless retail solution for endless aisle on AWS
In this post, Sandeep and Shashank help retailers and their customers alike in this guided approach to finding inventory that doesn’t live on shelves.
Figure 1. Building endless aisle architecture for order processing
#9: Optimizing data with automated intelligent document processing solutions
Who else dreads wading through large amounts of data in multiple formats? Just me? I didn’t think so. Using Amazon AI/ML and content-reading services, Deependra, Anirudha, Bhajandeep, and Senaka have created a solution that is scalable and cost-effective to help you extract the data you need and store it in a format that works for you.
#8: Disaster Recovery Solutions with AWS managed services, Part 3: Multi-Site Active/Passive
Disaster recovery posts are always popular, and this post by Brent and Dhruv is no exception. Their creative approach in part 3 of this series is most helpful for customers who have business-critical workloads with higher availability requirements.
#7: Simulating Kubernetes-workload AZ failures with AWS Fault Injection Simulator
Continuing with the theme of “when bad things happen,” we have Siva, Elamaran, and Re’s post about preparing for workload failures. If resiliency is a concern (and it really should be), the secret is test, test, TEST.
Figure 4. Architecture flow for Microservices to simulate a realistic failure scenario
Luca, Laura, Vittorio, and Zamira weren’t content with their four top-10 spots last year – they’re back with some things you definitely need to know about event-driven architectures.
#5: Use a reusable ETL framework in your AWS lake house architecture
As your lake house increases in size and complexity, you could find yourself facing maintenance challenges, and Ashutosh and Prantik have a solution: frameworks! The reusable ETL template with AWS Glue templates might just save you a headache or three.
#4: Invoking asynchronous external APIs with AWS Step Functions
It’s possible that AWS’ menagerie of services doesn’t have everything you need to run your organization. (Possible, but not likely; we have a lot of amazing services.) If you are using third-party APIs, then Jorge, Hossam, and Shirisha’s architecture can help you maintain a secure, reliable, and cost-effective relationship among all involved.
#3: Announcing updates to the AWS Well-Architected Framework
The Well-Architected Framework continues to help AWS customers evaluate their architectures against its six pillars. They are constantly striving for improvement, and Haleh’s diligence in keeping us up to date has not gone unnoticed. Thank you, Haleh!
#2: Let’s Architect! Designing architectures for multi-tenancy
The practically award-winning Let’s Architect! series strikes again! This time, Luca, Laura, Vittorio, and Zamira were joined by Federica to discuss multi-tenancy and why that concept is so crucial for SaaS providers.
#1: Understand resiliency patterns and trade-offs to architect efficiently in the cloud
Haresh, Lewis, and Bonnie revamped this 2022 post into a masterpiece that completely stole our readers’ hearts and is among the top posts we’ve ever made!
Today customers want to reduce manual operations for deploying and maintaining their infrastructure. The recommended method to deploy and manage infrastructure on AWS is to follow Infrastructure-As-Code (IaC) model using tools like AWS CloudFormation, AWS Cloud Development Kit (AWS CDK) or Terraform.
One of the critical components in terraform is managing the state file which keeps track of your configuration and resources. When you run terraform in an AWS CI/CD pipeline the state file has to be stored in a secured, common path to which the pipeline has access to. You need a mechanism to lock it when multiple developers in the team want to access it at the same time.
In this blog post, we will explain how to manage terraform state files in AWS, best practices on configuring them in AWS and an example of how you can manage it efficiently in your Continuous Integration pipeline in AWS when used with AWS Developer Tools such as AWS CodeCommit and AWS CodeBuild. This blog post assumes you have a basic knowledge of terraform, AWS Developer Tools and AWS CI/CD pipeline. Let’s dive in!
Challenges with handling state files
By default, the state file is stored locally where terraform runs, which is not a problem if you are a single developer working on the deployment. However if not, it is not ideal to store state files locally as you may run into following problems:
When working in teams or collaborative environments, multiple people need access to the state file
Data in the state file is stored in plain text which may contain secrets or sensitive information
Local files can get lost, corrupted, or deleted
Best practices for handling state files
The recommended practice for managing state files is to use terraform’s built-in support for remote backends. These are:
Remote backend on Amazon Simple Storage Service (Amazon S3): You can configure terraform to store state files in an Amazon S3 bucket which provides a durable and scalable storage solution. Storing on Amazon S3 also enables collaboration that allows you to share state file with others.
Remote backend on Amazon S3 with Amazon DynamoDB: In addition to using an Amazon S3 bucket for managing the files, you can use an Amazon DynamoDB table to lock the state file. This will allow only one person to modify a particular state file at any given time. It will help to avoid conflicts and enable safe concurrent access to the state file.
There are other options available as well such as remote backend on terraform cloud and third party backends. Ultimately, the best method for managing terraform state files on AWS will depend on your specific requirements.
When deploying terraform on AWS, the preferred choice of managing state is using Amazon S3 with Amazon DynamoDB.
AWS configurations for managing state files
Create an Amazon S3 bucket using terraform. Implement security measures for Amazon S3 bucket by creating an AWS Identity and Access Management (AWS IAM) policy or Amazon S3 Bucket Policy. Thus you can restrict access, configure object versioning for data protection and recovery, and enable AES256 encryption with SSE-KMS for encryption control.
Next create an Amazon DynamoDB table using terraform with Primary key set to LockID. You can also set any additional configuration options such as read/write capacity units. Once the table is created, you will configure the terraform backend to use it for state locking by specifying the table name in the terraform block of your configuration.
For a single AWS account with multiple environments and projects, you can use a single Amazon S3 bucket. If you have multiple applications in multiple environments across multiple AWS accounts, you can create one Amazon S3 bucket for each account. In that Amazon S3 bucket, you can create appropriate folders for each environment, storing project state files with specific prefixes.
Now that you know how to handle terraform state files on AWS, let’s look at an example of how you can configure them in a Continuous Integration pipeline in AWS.
Architecture
Figure 1: Example architecture on how to use terraform in an AWS CI pipeline
This diagram outlines the workflow implemented in this blog:
The AWS CodeCommit repository contains the application code
The AWS CodeBuild job contains the buildspec files and references the source code in AWS CodeCommit
The AWS Lambda function contains the application code created after running terraform apply
Amazon S3 contains the state file created after running terraform apply. Amazon DynamoDB locks the state file present in Amazon S3
Implementation
Pre-requisites
Before you begin, you must complete the following prerequisites:
Install the latest version of AWS Command Line Interface (AWS CLI)
After cloning, you could see the following folder structure:
Figure 2: AWS CodeCommit repository structure
Let’s break down the terraform code into 2 parts – one for preparing the infrastructure and another for preparing the application.
Preparing the Infrastructure
The main.tf file is the core component that does below:
It creates an Amazon S3 bucket to store the state file. We configure bucket ACL, bucket versioning and encryption so that the state file is secure.
It creates an Amazon DynamoDB table which will be used to lock the state file.
It creates two AWS CodeBuild projects, one for ‘terraform plan’ and another for ‘terraform apply’.
Note – It also has the code block (commented out by default) to create AWS Lambda which you will use at a later stage.
AWS CodeBuild projects should be able to access Amazon S3, Amazon DynamoDB, AWS CodeCommit and AWS Lambda. So, the AWS IAM role with appropriate permissions required to access these resources are created via iam.tf file.
Next you will find two buildspec files named buildspec-plan.yaml and buildspec-apply.yaml that will execute terraform commands – terraform plan and terraform apply respectively.
Modify AWS region in the provider.tf file.
Update Amazon S3 bucket name, Amazon DynamoDB table name, AWS CodeBuild compute types, AWS Lambda role and policy names to required values using variable.tf file. You can also use this file to easily customize parameters for different environments.
With this, the infrastructure setup is complete.
You can use your local terminal and execute below commands in the same order to deploy the above-mentioned resources in your AWS account.
terraform init
terraform validate
terraform plan
terraform apply
Once the apply is successful and all the above resources have been successfully deployed in your AWS account, proceed with deploying your application.
Preparing the Application
In the cloned repository, use the backend.tf file to create your own Amazon S3 backend to store the state file. By default, it will have below values. You can override them with your required values.
bucket = "tfbackend-bucket"
key = "terraform.tfstate"
region = "eu-central-1"
The repository has sample python code stored in main.py that returns a simple message when invoked.
In the main.tf file, you can find the below block of code to create and deploy the Lambda function that uses the main.py code (uncomment these code blocks).
Now you can deploy the application using AWS CodeBuild instead of running terraform commands locally which is the whole point and advantage of using AWS CodeBuild.
Run the two AWS CodeBuild projects to execute terraform plan and terraform apply again.
Once successful, you can verify your deployment by testing the code in AWS Lambda. To test a lambda function (console):
Open AWS Lambda console and select your function “tf-codebuild”
In the navigation pane, in Code section, click Test to create a test event
Provide your required name, for example “test-lambda”
Accept default values and click Save
Click Test again to trigger your test event “test-lambda”
It should return the sample message you provided in your main.py file. In the default case, it will display “Hello from AWS Lambda !” message as shown below.
Figure 3: Sample Amazon Lambda function response
To verify your state file, go to Amazon S3 console and select the backend bucket created (tfbackend-bucket). It will contain your state file.
Figure 4: Amazon S3 bucket with terraform state file
Open Amazon DynamoDB console and check your table tfstate-lock and it will have an entry with LockID.
Figure 5: Amazon DynamoDB table with LockID
Thus, you have securely stored and locked your terraform state file using terraform backend in a Continuous Integration pipeline.
Cleanup
To delete all the resources created as part of the repository, run the below command from your terminal.
terraform destroy
Conclusion
In this blog post, we explored the fundamentals of terraform state files, discussed best practices for their secure storage within AWS environments and also mechanisms for locking these files to prevent unauthorized team access. And finally, we showed you an example of how efficiently you can manage them in a Continuous Integration pipeline in AWS.
Zabbix is a flexible and universal monitoring solution that integrates with a wide variety of different systems right out of the box. Despite actively expanding the list of natively supported systems for integration (via templates or webhook integrations), there may still be a need to integrate with custom systems and services that are not yet supported. In such cases, a library taking care of implementing interaction protocols with the Zabbix API, Zabbix server/proxy, or Agent/Agent2 becomes extremely useful. Given that Python is widely adopted among DevOps and SRE engineers as well as server administrators, we decided to release a library for this programming language first.
We are pleased to introduce zabbix_utils – a Python library for seamless interaction with Zabbix API, Zabbix server/proxy, and Zabbix Agent/Agent2. Of course, there are popular community solutions for working with these Zabbix components in Python. Keeping this fact in mind, we have tried to consolidate popular issues and cases along with our experience to develop as convenient a tool as possible. Furthermore, we made sure that transitioning to the tool is as straightforward and clear as possible. Thanks to official support, you can be confident that the current version of the library is compatible with the latest Zabbix release.
In this article, we will introduce you to the main capabilities of the library and provide examples of how to use it with Zabbix components.
Usage Scenarios
The zabbix_utils library can be used in the following scenarios, but is not limited to them:
Zabbix automation
Integration with third-party systems
Custom monitoring solutions
Data export (hosts, templates, problems, etc.)
Integration into your Python application for Zabbix monitoring support
Anything else that comes to mind
You can use zabbix_utils for automating Zabbix tasks, such as scripting the automatic monitoring setup of your IT infrastructure objects. This can involve using ZabbixAPI for the direct management of Zabbix objects, Sender for sending values to hosts, and Getter for gathering data from Agents. We will discuss Sender and Getter in more detail later in this article.
For example, let’s imagine you have an infrastructure consisting of different branches. Each server or workstation is deployed from an image with an automatically configured Zabbix Agent and each branch is monitored by a Zabbix proxy since it has an isolated network. Your custom service or script can fetch a list of this equipment from your CMDB system, along with any additional information. It can then use this data to create hosts in Zabbix and link the necessary templates using ZabbixAPI based on the received information. If the information from CMDB is insufficient, you can request data directly from the configured Zabbix Agent using Getter and then use this information for further configuration and decision-making during setup. Another part of your script can access AD to get a list of branch users to update the list of users in Zabbix through the API and assign them the appropriate permissions and roles based on information from AD or CMDB (e.g., editing rights for server owners).
Another use case of the library may be when you regularly export templates from Zabbix for subsequent import into a version control system. You can also establish a mechanism for loading changes and rolling back to previous versions of templates. Here a variety of other use cases can also be implemented – it’s all up to your requirements and the creative usage of the library.
Of course, if you are a developer and there is a requirement to implement Zabbix monitoring support for your custom system or tool, you can implement sending data describing any events generated by your custom system/tool to Zabbix using Sender.
Installation and Configuration
To begin with, you need to install the zabbix_utils library. You can do this in two main ways:
No additional configuration is required. But you can specify values for the following environment variables: ZABBIX_URL, ZABBIX_TOKEN, ZABBIX_USER, ZABBIX_PASSWORD if you need. These use cases are described in more detail below.
Working with Zabbix API
To work with Zabbix API, it is necessary to import the ZabbixAPI class from the zabbix_utils library:
from zabbix_utils import ZabbixAPI
If you are using one of the existing popular community libraries, in most cases, it will be sufficient to simply replace the ZabbixAPI import statement with an import from our library.
At that point you need to create an instance of the ZabbixAPI class. T4here are several usage scenarios:
Use preset values of environment variables, i.e., not pass any parameters to ZabbixAPI:
from zabbix_utils import ZabbixAPI
api = ZabbixAPI()
Pass only the Zabbix API address as input, which can be specified as either the server IP/FQDN address or DNS name (in this case, the HTTP protocol will be used) or as an URL, and the authentication data should still be specified as values for environment variables:
from zabbix_utils import ZabbixAPI
api = ZabbixAPI(url="127.0.0.1")
Pass only the Zabbix API address to ZabbixAPI, as in the example above, and pass the authentication data later using the login() method:
from zabbix_utils import ZabbixAPI
api = ZabbixAPI(url="127.0.0.1")
api.login(user="Admin", password="zabbix")
Pass all parameters at once when creating an instance of ZabbixAPI; in this case, there is no need to subsequently call login():
from zabbix_utils import ZabbixAPI
api = ZabbixAPI(
url="127.0.0.1",
user="Admin",
password="zabbix"
)
The ZabbixAPI class supports working with various Zabbix versions, automatically checking the API version during initialization. You can also work with the Zabbix API version as an object as follows:
from zabbix_utils import ZabbixAPI
api = ZabbixAPI()
# ZabbixAPI version field
ver = api.version
print(type(ver).__name__, ver) # APIVersion 6.0.24
# Method to get ZabbixAPI version
ver = api.api_version()
print(type(ver).__name__, ver) # APIVersion 6.0.24
# Additional methods
print(ver.major) # 6.0
print(ver.minor) # 24
print(ver.is_lts()) # True
As a result, you will get an APIVersion object that has major and minor fields returning the respective minor and major parts of the current version, as well as the is_lts() method, returning true if the current version is LTS (Long Term Support), and false otherwise. The APIVersion object can also be compared to a version represented as a string or a float number:
If the account and password (or starting from Zabbix 5.4 – token instead of login/password) are not set as environment variable values or during the initialization of ZabbixAPI, then it is necessary to call the login() method for authentication:
from zabbix_utils import ZabbixAPI
api = ZabbixAPI(url="127.0.0.1")
api.login(token="xxxxxxxx")
After authentication, you can make any API requests described for all supported versions in the Zabbix documentation.
The format for calling API methods looks like this:
api_instance.zabbix_object.method(parameters)
For example:
api.host.get()
After completing all the necessary API requests, it’s necessary to execute logout() if authentication was done using login and password:
There is often a need to send values to Zabbix Trapper. For this purpose, the zabbix_sender utility is provided. However, if your service or script sending this data is written in Python, calling an external utility may not be very convenient. Therefore, we have developed the Sender, which will help you send values to Zabbix server or proxy one by one or in groups. To work with Sender, you need to import it as follows:
For cases when there is a necessity to send more values than Zabbix Trapper can accept at one time, there is an option for fragmented sending, i.e. sequential sending in separate fragments (chunks). By default, the chunk size is set to 250 values. In other words, when sending values in bulk, the 400 values passed to the send() method for sending will be sent in two stages. 250 values will be sent first, and the remaining 150 values will be sent after receiving a response. The chunk size can be changed, to do this, you simply need to specify your value for the chunk_size parameter when initializing Sender:
In the example above, the chunk size is set to 2. So, 5 values passed will be sent in three requests of two, two, and one value, respectively.
If your server has multiple network interfaces, and values need to be sent from a specific one, the Sender provides the option to specify a source_ip for the sent values:
It also supports reading connection parameters from the Zabbix Agent/Agent2 configuration file. To do this, set the use_config flag, after which it is not necessary to pass connection parameters when creating an instance of Sender:
Since the Zabbix Agent/Agent2 configuration file can specify one or even several Zabbix clusters consisting of multiple Zabbix server instances, Sender will send data to the first available server of each cluster specified in the ServerActive parameter in the configuration file. In case the ServerActive parameter is not specified in the Zabbix Agent/Agent2 configuration file, the server address from the Server parameter with the standard Zabbix Trapper port – 10051 will be taken.
By default, Sender returns the aggregated result of sending across all clusters. But it is possible to get more detailed information about the results of sending for each chunk and each cluster:
print(response)
# {"processed": 2, "failed": 0, "total": 2, "time": "0.000108", "chunk": 2}
if response.failed == 0:
print(f"Value sent successfully in {response.time}")
else:
print(response.details)
# {
# 127.0.0.1:10051: [
# {
# "processed": 1,
# "failed": 0,
# "total": 1,
# "time": "0.000051",
# "chunk": 1
# }
# ],
# zabbix.example.local:10051: [
# {
# "processed": 1,
# "failed": 0,
# "total": 1,
# "time": "0.000057",
# "chunk": 1
# }
# ]
# }
for node, chunks in response.details.items():
for resp in chunks:
print(f"processed {resp.processed} of {resp.total} at {node.address}:{node.port}")
# processed 1 of 1 at 127.0.0.1:10051
# processed 1 of 1 at zabbix.example.local:10051
Getting values from Zabbix Agent/Agent2 by item key.
Sometimes it can also be useful to directly retrieve values from the Zabbix Agent. To assist with this task, zabbix_utils provides the Getter. It performs the same function as the zabbix_get utility, allowing you to work natively within Python code. Getter is straightforward to use; just import it, create an instance by passing the Zabbix Agent’s address and port, and then call the get() method, providing the data item key for the value you want to retrieve:
In cases where your server has multiple network interfaces, and requests need to be sent from a specific one, you can specify the source_ip for the Agent connection:
The zabbix_utils library for Python allows you to take full advantage of monitoring using Zabbix, without limiting yourself to the integrations available out of the box. It can be valuable for both DevOps and SRE engineers, as well as Python developers looking to implement monitoring support for their system using Zabbix.
In the next article, we will thoroughly explore integration with an external service using this library to demonstrate the capabilities of zabbix_utils more comprehensively.
Questions
Q: Which Agent versions are supported for Getter?
A: Supported versions of Zabbix Agents are the same as Zabbix API versions, as specified in the readme file. Our goal is to create a library with full support for all Zabbix components of the same version.
Q: Does Getter support Agent encryption?
A: Encryption support is not yet built into Sender and Getter, but you can create your wrapper using third-party libraries for both.
from zabbix_utils import Sender
def psk_wrapper(sock, tls):
# ...
# Implementation of TLS PSK wrapper for the socket
# ...
sender = Sender(
server='zabbix.example.local',
port=10051,
socket_wrapper=psk_wrapper
)
Q: Is it possible to set a timeout value for Getter?
A: The response timeout value can be set for the Getter, as well as for ZabbixAPI and Sender. In all cases, the timeout is set for waiting for any responses to requests.
# Example of setting a timeout for Sender
sender = Sender(server='127.0.0.1', port=10051, timeout=30)
# Example of setting a timeout for Getter
agent = Getter(host='127.0.0.1', port=10050, timeout=30)
Q: Is parallel (asynchronous) mode supported?
A: Currently, the library does not include asynchronous classes and methods, but we plan to develop asynchronous versions of ZabbixAPI and Sender.
Q: Is it possible to specify multiple servers when sending through Sender without specifying a configuration file (for working with an HA cluster)?
This post is written by Suranjan Choudhury (Head of TME and ITeS SA) and Anil Sharma (Sr PSA, Migration)
Chaos engineering is the process of stressing an application in testing or production environments by creating disruptive events, such as outages, observing how the system responds, and implementing improvements. Chaos engineering helps you create the real-world conditions needed to uncover hidden issues and performance bottlenecks that are challenging to find in distributed applications.
You can build resilient distributed serverless applications using AWS Lambda and test Lambda functions in real world operating conditions using chaos engineering. This blog shows an approach to inject chaos in Lambda functions, making no change to the Lambda function code. This blog uses the AWS Fault Injection Simulator (FIS) service to create experiments that inject disruptions for Lambda based serverless applications.
AWS FIS is a managed service that performs fault injection experiments on your AWS workloads. AWS FIS is used to set up and run fault experiments that simulate real-world conditions to discover application issues that are difficult to find otherwise. You can improve application resilience and performance using results from FIS experiments.
The sample code in this blog introduces random faults to existing Lambda functions, like an increase in response times (latency) or random failures. You can observe application behavior under introduced chaos and make improvements to the application.
Approaches to inject chaos in Lambda functions
AWS FIS currently does not support injecting faults in Lambda functions. However, there are two main approaches to inject chaos in Lambda functions: using external libraries or using Lambda layers.
Developers have created libraries to introduce failure conditions to Lambda functions, such as chaos_lambda and failure-Lambda. These libraries allow developers to inject elements of chaos into Python and Node.js Lambda functions. To inject chaos using these libraries, developers must decorate the existing Lambda function’s code. Decorator functions wrap the existing Lambda function, adding chaos at runtime. This approach requires developers to change the existing Lambda functions.
You can also use Lambda layers to inject chaos, requiring no change to the function code, as the fault injection is separated. Since the Lambda layer is deployed separately, you can independently change the element of chaos, like latency in response or failure of the Lambda function. This blog post discusses this approach.
Injecting chaos in Lambda functions using Lambda layers
A Lambda layer is a .zip file archive that contains supplementary code or data. Layers usually contain library dependencies, a custom runtime, or configuration files. This blog creates an FIS experiment that uses Lambda layers to inject disruptions in existing Lambda functions for Java, Node.js, and Python runtimes.
The Lambda layer contains the fault injection code. It is invoked prior to invocation of the Lambda function and injects random latency or errors. Injecting random latency simulates real world unpredictable conditions. The Java, Node.js, and Python chaos injection layers provided are generic and reusable. You can use them to inject chaos in your Lambda functions.
The Chaos Injection Lambda Layers
Java Lambda Layer for Chaos Injection
The chaos injection layer for Java Lambda functions uses the JAVA_TOOL_OPTIONS environment variable. This environment variable allows specifying the initialization of tools, specifically the launching of native or Java programming language agents. The JAVA_TOOL_OPTIONS has a javaagent parameter that points to the chaos injection layer. This layer uses Java’s premain method and the Byte Buddy library for modifying the Lambda function’s Java class during runtime.
When the Lambda function is invoked, the JVM uses the class specified with the javaagent parameter and invokes its premain method before the Lambda function’s handler invocation. The Java premain method injects chaos before Lambda runs.
The FIS experiment adds the layer association and the JAVA_TOOL_OPTIONS environment variable to the Lambda function.
Python and Node.js Lambda Layer for Chaos Injection
When injecting chaos in Python and Node.js functions, the Lambda function’s handler is replaced with a function in the respective layers by the FIS aws:ssm:start-automation-execution action. The automation, which is an SSM document, saves the original Lambda function’s handler to in AWS Systems Manager Parameter Store, so that the changes can be rolled back once the experiment is finished.
The layer function contains the logic to inject chaos. At runtime, the layer function is invoked, injecting chaos in the Lambda function. The layer function in turn invokes the Lambda function’s original handler, so that the functionality is fulfilled.
The result in all runtimes (Java, Python, or Node.js), is invocation of the original Lambda function with latency or failure injected. The observed changes are random latency or failure injected by the layer.
Once the experiment is completed, an SSM document is provided. This rolls back the layer’s association to the Lambda function and removes the environment variable, in the case of the Java runtime.
Sample FIS experiments using SSM and Lambda layers
In the sample code provided, Lambda layers are provided for Python, Node.js and Java runtimes along with sample Lambda functions for each runtime.
The CloudFormation template provided along with this blog deploys sample code. Execute runCfn.sh.
When this is complete, it returns the StackId that CloudFormation created:
Step 3: Run the chaos injection experiment
By default, the experiment is configured to inject chaos in the Java sample Lambda function. To change it to Python or Node.js Lambda functions, edit the experiment template and configure it to inject chaos using steps from here.
Step 4: Start the experiment
From the FIS Console, choose Start experiment.
Wait until the experiment state changes to “Completed”.
Step 5: Run your test
At this stage, you can inject chaos into your Lambda function. Run the Lambda functions and observe their behavior.
1. Invoke the Lambda function using the command below:
aws lambda invoke --function-name NodeChaosInjectionExampleFn out --log-type Tail --query 'LogResult' --output text | base64 -d
2. The CLI commands output displays the logs created by the Lambda layers showing latency introduced in this invocation.
In this example, the output shows that the Lambda layer injected 1799ms of random latency to the function.
The experiment injects random latency or failure in the Lambda function. Running the Lambda function again results in a different latency or failure. At this stage, you can test the application, and observe its behavior under conditions that may occur in the real world, like an increase in latency or Lambda function’s failure.
Step 6: Roll back the experiment
To roll back the experiment, run the SSM document for rollback. This rolls back the Lambda function to the state before chaos injection. Run this command:
To avoid incurring future charges, clean up the resources created by the CloudFormation template by running the following CLI command. Update the stack name to the one you provided when creating the stack.
You can use FIS experiment results to validate expected system behavior. An example of expected behavior is: “If application latency increases by 10%, there is less than a 1% increase in sign in failures.” After the experiment is completed, evaluate whether the application resiliency aligns with your business and technical expectations.
Conclusion
This blog explains an approach for testing reliability and resilience in Lambda functions using chaos engineering. This approach allows you to inject chaos in Lambda functions without changing the Lambda function code, with clear segregation of chaos injection and business logic. It provides a way for developers to focus on building business functionality using Lambda functions.
The Lambda layers that inject chaos can be developed and managed separately. This approach uses AWS FIS to run experiments that inject chaos using Lambda layers and test serverless application’s performance and resiliency. Using the insights from the FIS experiment, you can find, fix, or document risks that surface in the application while testing.
For more serverless learning resources, visit Serverless Land.
A deployment pipeline typically comprises several stages such as dev, test, and prod, which ensure that changes undergo testing before reaching the production environment. To improve the reliability and stability of release processes, DevOps teams must review Infrastructure as Code (IaC) changes before applying them in production. As a result, implementing a mechanism for notification and manual approval that grants stakeholders improved access to changes in their release pipelines has become a popular practice for DevOps teams.
Notifications keep development teams and stakeholders informed in real-time about updates and changes to deployment status within release pipelines. Manual approvals establish thresholds for transitioning a change from one stage to the next in the pipeline. They also act as a guardrail to mitigate risks arising from errors and rework because of faulty deployments.
Please note that manual approvals, as described in this post, are not a replacement for the use of automation. Instead, they complement automated checks within the release pipeline.
In this blog post, we describe how to set up notifications and add a manual approval stage to AWS Cloud Development Kit (AWS CDK) Pipeline.
Concepts
CDK Pipeline
CDK Pipelines is a construct library for painless continuous delivery of CDK applications. CDK Pipelines can automatically build, test, and deploy changes to CDK resources. CDK Pipelines are self-mutating which means as application stages or stacks are added, the pipeline automatically reconfigures itself to deploy those new stages or stacks. Pipelines need only be manually deployed once, afterwards, the pipeline keeps itself up to date from the source code repository by pulling the changes pushed to the repository.
Notifications
Adding notifications to a pipeline provides visibility to changes made to the environment by utilizing the NotificationRule construct. You can also use this rule to notify pipeline users of important changes, such as when a pipeline starts execution. Notification rules specify both the events and the targets, such as Amazon Simple Notification Service (Amazon SNS) topic or AWS Chatbot clients configured for Slack which represents the nominated recipients of the notifications. An SNS topic is a logical access point that acts as a communication channel while Chatbot is an AWS service that enables DevOps and software development teams to use messaging program chat rooms to monitor and respond to operational events.
Manual Approval
In a CDK pipeline, you can incorporate an approval action at a specific stage, where the pipeline should pause, allowing a team member or designated reviewer to manually approve or reject the action. When an approval action is ready for review, a notification is sent out to alert the relevant parties. This combination of notifications and approvals ensures timely and efficient decision-making regarding crucial actions within the pipeline.
Solution Overview
The solution explains a simple web service that is comprised of an AWS Lambda function that returns a static web page served by Amazon API Gateway. Since Continuous Deployment and Continuous Integration (CI/CD) are important components to most web projects, the team implements a CDK Pipeline for their web project.
There are two important stages in this CDK pipeline; the Pre-production stage for testing and the Production stage, which contains the end product for users.
The flow of the CI/CD process to update the website starts when a developer pushes a change to the repository using their Integrated Development Environment (IDE). An Amazon CloudWatch event triggers the CDK Pipeline. Once the changes reach the pre-production stage for testing, the CI/CD process halts. This is because a manual approval gate is between the pre-production and production stages. So, it becomes a stakeholder’s responsibility to review the changes in the pre-production stage before approving them for production. The pipeline includes an SNS notification that notifies the stakeholder whenever the pipeline requires manual approval.
After approving the changes, the CI/CD process proceeds to the production stage and the updated version of the website becomes available to the end user. If the approver rejects the changes, the process ends at the pre-production stage with no impact to the end user.
The following diagram illustrates the solution architecture.
Figure 1. This image shows the CDK pipeline process in our solution and how applications or updates are deployed using AWS Lambda Function to end users.
Prerequisites
For this walkthrough, you should have the following prerequisites:
A basic understanding of CDK and CDK Pipelines. Please go through the Python Workshop on cdkworkshop.com to follow along with the code examples and get hands-on learning about CDK and related concepts.
Since the pipeline stack is being modified, there may be a need to run cdk deploy locally again.
NOTE: The CDK Pipeline code structure used in the CDK workshop can be found here: Pipeline stack Code.
Add notification to the pipeline
In this tutorial, perform the following steps:
Add the import statements for AWS CodeStar notifications and SNS to the import section of the pipeline stack py
import aws_cdk.aws_codestarnotifications as notifications
import aws_cdk.pipelines as pipelines
import aws_cdk.aws_sns as sns
import aws_cdk.aws_sns_subscriptions as subs
Ensure the pipeline is built by calling the ‘build pipeline’ function.
pipeline.build_pipeline()
Create an SNS topic.
topic = sns.Topic(self, "MyTopic1")
Add a subscription to the topic. This specifies where the notifications are sent (Add the stakeholders’ email here).
Assign the source the value pipeline.pipeline The first pipeline is the name of the CDK pipeline(variable) and the .pipeline is to show it is a pipeline(function).
source=pipeline.pipeline,
Define the events to be monitored. Specify notifications for when the pipeline starts, when it fails, when the execution succeeds, and finally when manual approval is needed.
Add the ManualApprovalStep import to the aws_cdk.pipelines import statement.
from aws_cdk.pipelines import (
CodePipeline,
CodePipelineSource,
ShellStep,
ManualApprovalStep
)
Add the ManualApprovalStep to the production stage. The code must be added to the add_stage() function.
prod = WorkshopPipelineStage(self, "Prod")
prod_stage = pipeline.add_stage(prod,
pre = [ManualApprovalStep('PromoteToProduction')])
When a stage is added to a pipeline, you can specify the pre and post steps, which are arbitrary steps that run before or after the contents of the stage. You can use them to add validations like manual or automated gates to the pipeline. It is recommended to put manual approval gates in the set of pre steps, and automated approval gates in the set of post steps. So, the manual approval action is added as a pre step that runs after the pre-production stage and before the production stage .
The final version of the pipeline_stack.py becomes:
from constructs import Construct
import aws_cdk as cdk
import aws_cdk.aws_codestarnotifications as notifications
import aws_cdk.aws_sns as sns
import aws_cdk.aws_sns_subscriptions as subs
from aws_cdk import (
Stack,
aws_codecommit as codecommit,
aws_codepipeline as codepipeline,
pipelines as pipelines,
aws_codepipeline_actions as cpactions,
)
from aws_cdk.pipelines import (
CodePipeline,
CodePipelineSource,
ShellStep,
ManualApprovalStep
)
class WorkshopPipelineStack(cdk.Stack):
def __init__(self, scope: Construct, id: str, **kwargs) -> None:
super().__init__(scope, id, **kwargs)
# Creates a CodeCommit repository called 'WorkshopRepo'
repo = codecommit.Repository(
self, "WorkshopRepo", repository_name="WorkshopRepo",
)
#Create the Cdk pipeline
pipeline = pipelines.CodePipeline(
self,
"Pipeline",
synth=pipelines.ShellStep(
"Synth",
input=pipelines.CodePipelineSource.code_commit(repo, "main"),
commands=[
"npm install -g aws-cdk", # Installs the cdk cli on Codebuild
"pip install -r requirements.txt", # Instructs Codebuild to install required packages
"npx cdk synth",
]
),
)
# Create the Pre-Prod Stage and its API endpoint
deploy = WorkshopPipelineStage(self, "Pre-Prod")
deploy_stage = pipeline.add_stage(deploy)
deploy_stage.add_post(
pipelines.ShellStep(
"TestViewerEndpoint",
env_from_cfn_outputs={
"ENDPOINT_URL": deploy.hc_viewer_url
},
commands=["curl -Ssf $ENDPOINT_URL"],
)
)
deploy_stage.add_post(
pipelines.ShellStep(
"TestAPIGatewayEndpoint",
env_from_cfn_outputs={
"ENDPOINT_URL": deploy.hc_endpoint
},
commands=[
"curl -Ssf $ENDPOINT_URL",
"curl -Ssf $ENDPOINT_URL/hello",
"curl -Ssf $ENDPOINT_URL/test",
],
)
)
# Create the Prod Stage with the Manual Approval Step
prod = WorkshopPipelineStage(self, "Prod")
prod_stage = pipeline.add_stage(prod,
pre = [ManualApprovalStep('PromoteToProduction')])
prod_stage.add_post(
pipelines.ShellStep(
"ViewerEndpoint",
env_from_cfn_outputs={
"ENDPOINT_URL": prod.hc_viewer_url
},
commands=["curl -Ssf $ENDPOINT_URL"],
)
)
prod_stage.add_post(
pipelines.ShellStep(
"APIGatewayEndpoint",
env_from_cfn_outputs={
"ENDPOINT_URL": prod.hc_endpoint
},
commands=[
"curl -Ssf $ENDPOINT_URL",
"curl -Ssf $ENDPOINT_URL/hello",
"curl -Ssf $ENDPOINT_URL/test",
],
)
)
# Create The SNS Notification for the Pipeline
pipeline.build_pipeline()
topic = sns.Topic(self, "MyTopic")
topic.add_subscription(subs.EmailSubscription("[email protected]"))
rule = notifications.NotificationRule(self, "NotificationRule",
source = pipeline.pipeline,
events = ["codepipeline-pipeline-pipeline-execution-started", "codepipeline-pipeline-pipeline-execution-failed", "codepipeline-pipeline-manual-approval-needed", "codepipeline-pipeline-manual-approval-succeeded"],
targets=[topic]
)
When a commit is made with git commit -am "Add manual Approval" and changes are pushed with git push, the pipeline automatically self-mutates to add the new approval stage.
Now when the developer pushes changes to update the build environment or the end user application, the pipeline execution stops at the point where the approval action was added. The pipeline won’t resume unless a manual approval action is taken.
Figure 2. This image shows the pipeline with the added Manual Approval action.
Since there is a notification rule that includes the approval action, an email notification is sent with the pipeline information and approval status to the stakeholder(s) subscribed to the SNS topic.
Figure 3. This image shows the SNS email notification sent when the pipeline starts.
After pushing the updates to the pipeline, the reviewer or stakeholder can use the AWS Management Console to access the pipeline to approve or deny changes based on their assessment of these changes. This process helps eliminate any potential issues or errors and ensures only changes deemed relevant are made.
Figure 4. This image shows the review action that gives the stakeholder the ability to approve or reject any changes.
If a reviewer rejects the action, or if no approval response is received within seven days of the pipeline stopping for the review action, the pipeline status is “Failed.”
Figure 5. This image depicts when a stakeholder rejects the action.
If a reviewer approves the changes, the pipeline continues its execution.
Figure 6. This image depicts when a stakeholder approves the action.
Considerations
It is important to consider any potential drawbacks before integrating a manual approval process into a CDK pipeline. one such consideration is its implementation may delay the delivery of updates to end users. An example of this is business hours limitation. The pipeline process might be constrained by the availability of stakeholders during business hours. This can result in delays if changes are made outside regular working hours and require approval when stakeholders are not immediately accessible.
Clean up
To avoid incurring future charges, delete the resources. Use cdk destroy via the command line to delete the created stack.
Conclusion
Adding notifications and manual approval to CDK Pipelines provides better visibility and control over the changes made to the pipeline environment. These features ideally complement the existing automated checks to ensure that all updates are reviewed before deployment. This reduces the risk of potential issues arising from bugs or errors. The ability to approve or deny changes through the AWS Management Console makes the review process simple and straightforward. Additionally, SNS notifications keep stakeholders updated on the status of the pipeline, ensuring a smooth and seamless deployment process.
Many organizations struggle to effectively manage and derive insights from the large amount of unstructured data locked in emails, PDFs, images, scanned documents, and more. The variety of formats, document layouts, and text makes it difficult for any standard Optical Character Recognition (OCR) to extract key insights from these data sources.
To help organizations overcome these document management and information extraction challenges, AWS offers connected, pre-trained artificial intelligence (AI) service APIs that help drive business outcomes from these document-based rich data sources.
This blog post describes a cost-effective, scalable automated intelligent document processing solution that leverages a Natural Processing Language (NLP) engine using Amazon Textract and Amazon Comprehend. This solution helps customers take advantage of industry leading machine learning (ML) technology in their document workflows without the need for in-house ML expertise.
Customer document management challenges
Customers across industry verticals experience the following document management challenges:
Extraction process accuracy varies significantly when applied to diverse sources; specifically handwritten text, images, and scanned documents.
Existing scripting and rule-based solutions cannot provide customer domain or problem-specific classifiers.
Traditional document management systems cannot consider feedback from domain experts to improve the learning process.
The Personally Identifiable Information (PII) data-handling is not robust or customizable, causing data privacy leakage concern.
Many manual interventions are required to complete the entire process.
We introduced an automated intelligent document processing implementation to address key document management challenges. At the heart of the solution is a NLP engine that combines:
The full solution also leverages other AWS services as described in the following diagram (Figure 1) and steps to develop and operate a cost-effective and scalable architecture for document processing. It effectively extracts text from document types including PDFs, images, scanned documents, Microsoft Excel workbooks, and more.
Let’s explore the automated intelligent document processing solution step by step.
The document upload engine or business users upload the respective files or documents through a custom web application to the designated Amazon Simple Storage Service (Amazon S3) bucket.
The event-based architecture signals an Amazon S3 push event to invoke the respective AWS Lambda function to start document pre-processing.
The Lambda function evaluates the document payload, leverages Amazon Simple Queue Service (Amazon SQS) for async processing, prepares document metadata, stores it in Amazon DynamoDB, and calls the NLP engine to perform the information extraction process.
The NLP engine leverages Amazon Textract for text extraction from a variety of sources and leverages document metadata to optimize the appropriate API calls (for example, form, tabular, or PDF).
Amazon Textract output is fed into Amazon Comprehend which consumes the extracted text and performs entity parsing, line/paragraph-based sentiment analysis, and document/paragraph classification. For better accuracy, we leverage a custom classifier within Amazon Comprehend.
Amazon Comprehend also provides key APIs to mask PII data before it is used for any further consumption. The solution offers the ability to configure masking rules for each PII entity per masking requirements.
To ensure the solution has capability to handle data from Microsoft Excel workbooks, we developed a custom parser using Python running inside an AWS Lambda function. Depending on the document metadata, this function can be invoked.
Output of Amazon Comprehend is then fed to ML models deployed using Amazon SageMaker depending on additional use cases configured by the customer to complement the overall process with ML-based recommendations, predictions, and personalization.
Once the NLP engine completes its processing, the job completion notification event signals another AWS Lambda function and updates the status in the respective Amazon SQS queue.
The Lambda post-processing function parses the resultant content generated by the NLP engine and stores it in the Amazon DynamoDB and Amazon S3 bucket. This step is responsible for the required data augmentation, key entities validation, and default value assignment to create a data structure that could be consumed by the presentation/visualization layer.
Users get the flexibility to see the extracted information and compare it with the original document extract in the custom user interface (UI). They can provide their feedback on extraction and entity parsing accuracy. From a user access management perspective, Amazon Cognito provides authorization and authentication.
Customer benefits
The automated intelligent document processing solution helps customers:
Increase overall document management efficiency by 50-60%, leveraging automation and nullifying manual interventions
Reduce in-house team involvement in administrative activities by up to 70% using integrated and connected processing workflows
Gain better visibility into key contractual obligations with features such as Document Classification (helps properly route documents to the respective process/team) and Obligation Extraction
Utilize a UI-based feedback mechanism for in-house domain experts/reviewers to see and validate the extracted information and offer feedback to inform further model training
From a cost-optimization perspective, depending on document type and required information, only the respective Amazon Textract APIs calls are submitted. (For example, it is not worth using form/table-based Textract API calls for a Know Your Customer (KYC) document such as a driver’s license or passport when the AnalyzeID API is the most efficient solution.)
To maximize solution benefits, customers should invest time in building well-defined taxonomies ahead of using the document processing solution to accommodate their own use cases or industry domain-specific requirements. Their taxonomy input highlights only relevant keys and takes respective actions in case the requires keys are not extracted.
Vertical industry use cases
As mentioned, this document processing solution can be used across industry segments. Let’s explore some practical use cases. For example, it can help insurance industry professionals to accelerate claim processing and customer KYC-related processes. By extracting the key entities from the claim documents, mapping them against the customer defined taxonomy, and integrating with Amazon SageMaker models for anomaly detection (anomalous claims), insurance providers can improve claim management and customer satisfaction.
In the healthcare industry, the solution can help with medical records and report processing, key medical entity extraction, and customer data masking.
The document processing solution can help the banking industry by automating check processing and delivering the ability to extract key entities like payer, payee, date, and amount from the checks.
Conclusion
Manual document processing is resource-intensive, time consuming, and costly. Customers need to allocate resources to process large volume documents, lowering business agility. Their employees are performing manual “stare and compare” tasks, potentially reducing worker morale and preventing them from focusing where their efforts are better placed.
Intelligent document processing helps businesses overcome these challenges by automating the classification, extraction, and analysis of data. This expedites decision cycles, allocates resources to high-value tasks, and reduces costs.
Pre-trained APIs of AWS AI services allow for quick classification, extraction, and data analyzation from scores of documents. This solution also has industry specific features that can quickly process specialized industry specific documents. This blog discussed the foundational architecture to helps to accelerate implementation of any specific document processing use case.
Programming is becoming an increasingly useful skill in today’s society. As we continue to rely more and more on software and digital technology, knowing how to code is also more and more valuable. That’s why many parents are looking for ways to introduce their children to programming. You might find it difficult to know where to begin, with so many different kids’ coding languages and platforms available. In this blog post, we explore how children can progress through different programming languages to realise their potential as proficient coders and creators of digital technology.
ScratchJr
Everyone needs to start somewhere, and one great option for children aged 5–7 is ScratchJr (Scratch Junior), a visual programming language with drag-and-drop blocks for creating simple programs. ScratchJr is available for free on Android and iOS mobile devices. It’s great for introducing young children to the basics of programming, and they can use it to create interactive stories and games.
Scratch
Moving on from ScratchJr, there’s its web-based sibling Scratch. Scratch offers drag-and-drop blocks for creating programs and comes with an assortment of graphics, sounds, and music for your child to bring their programs to life. This visual programming language is designed specifically for children to learn programming fundamentals. Scratch is available in multiple spoken languages and is perfect for beginners. It allows kids to create interactive stories, animations, and games with ease.
The Raspberry Pi Foundation has a wealth of free Scratch resources we have created specifically for young people who are beginners, such as the ‘Introduction to Scratch’ project path. And if your child is interested in physical computing to interact with the real world using code, they can also learn how to use electronic components, such as buzzers and LEDs, with Scratch and a Raspberry Pi computer.
MakeCode
Another fun option for children who want to explore coding and physical computing is the micro:bit. This is a small programmable device with an LED display, buttons, and sensors, and it can be used to create games, animations, interactive projects, and lots more. To control a micro:bit, a visual programming language called MakeCode can be used. The micro:bit can also be programmed using Scratch or text-based languages such as Python, offering an easy transition for children as their coding skills progress. Have a look at our free collection of micro:bit resources to learn more.
HTML
Everyone is familiar with websites, but fewer people know how they are coded. HTML is a markup language that is used to create the webpages we use every day. It’s a great language for children to learn because they can see the results of their code in real time, in their web browser. They can use HTML and CSS to create simple webpages that include links, videos, pictures, and interactive elements, all the while learning how websites are structured and designed. We have many free web design resources for your child, including a basic ‘Introduction to web development’ project path.
Python
If your child is becoming confident with Scratch and HTML, then using Python is the recommended next stage in their learning. Python is a high-level text-based programming language that is easy to read and learn. It is a popular choice for beginners as it has a simple syntax that often reads like plain English. Many free Python projects for young people are available on our website, including the ‘Introduction to Python’ path.
The Python community is also really welcoming and has produced a myriad of online tutorials and videos to help learners explore this language. Python can be used to do some very powerful things with ease, which is why it is so popular. For example, it is relatively simple to create Python programs to engage in machine learning and data analysis. If you wanted to explore large language models such as GPT, on which the ChatGPT chatbot is based, then Python would be the language of choice.
JavaScript
JavaScript is the language of the web, and if your child has become proficient in HTML, then this is the next language for them. JavaScript is used to create interactive websites and web applications. As young people become more comfortable with programming, JavaScript is a useful language to progress to, given how ubiquitous the web is today. It can be tricky to learn, but like Python, it has a vast number of libraries of functions that people have already created for it to achieve things more quickly. These libraries make JavaScript a very powerful language to use.
Try out kids’ coding languages
There are many different programming languages, and each one has its own strengths and weaknesses. Some are easy to learn and use, some are really fast, and some are very secure.
Starting with visual languages such as Scratch or MakeCode allows your child to begin to understand the basic concepts of programming without needing any developed reading and keyboard skills. Once their understanding and skills have improved, they can try out text-based languages, find the one that they are comfortable with, and then continue to learn. It’s fairly common for people who are proficient in one programming language to learn other languages quite quickly, so don’t worry about which programming language your child starts with.
Whether your child is interested in working in software development or just wants to learn a valuable — and creative — skill, helping them learn to code and try out different kids’ coding languages is a great way for you to open up new opportunities for them.
We are building a new online text-based Code Editor to help young people aged 7 and older learn to write code. It’s free and designed for young people who attend Code Clubs and CoderDojos, students in schools, and learners at home.
At this stage of development, the Code Editor enables learners to:
Write and run Python code right in their browser, with no setup required. The interface is simple and intuitive, which makes getting started with text-based coding easier.
Save their code using their Raspberry Pi Foundation account. We want learners to easily build on projects they start in the classroom at home, or bring a project they’ve started at home to their coding club.
We’ve chosen Python as the first programming language our Code Editor supports because it is popular in schools, CoderDojos, and Code Clubs. Many educators and young people like Python because they see it as similar to the English language. It is often the text-based language young people learn when they take their first steps away from a block-based programming environment, such as Scratch.
Python is also widely used by professional programmers and usually tops at least one of the industry-standard indexes that ranks programming languages.
We will be adding support for web development languages (HTML/CSS/JavaScript) to the Editor in the near future.
We’re also planning to add features such as project sharing and collaboration, which we know young people will love. We want the Editor to be safe, accessible, and age-appropriate. As safeguarding is always at the core of what we do, we’ll only make new features available once we’ve ensured they comply with the ICO’s age-appropriate design code and our safeguarding policies.
Test the Code Editor and tell us what you think
We are inviting you to test the Code Editor as part of what we call the beta phase of development. As the Editor is still in development, some things might not look or work as well as we’d like — and this is why we need your help.
We’d love you to try the Editor out and let us know what worked well for you, what didn’t work well, and what you’d like to see next.
You can now try out the Code Editor in the first two projects of our ‘Intro to Python’ path. We’ve included a feedback form for you to let us know which project you tried, and what you think of the Editor. We’d love to hear from you.
Your feedback helps us decide what to do next. Based on what learners, educators, volunteers, teachers, and parents tell us, we will make the improvements to the Editor that matter most to the young people we aim to support.
Where next for the Code Editor?
One of our long-term goals is to engage millions of young people in learning about computing and how to create with digital technologies. We’re developing the Code Editor with three main aims in mind.
1. Supporting young people’s learning journeys
We aim to build the Code Editor so it:
Suits beginners and also supports them as their confidence and independence grows, so they can take on their own coding projects in a familiar environment
Helps learners to transition from block-based to text-based, informed by our deep understanding of pedagogy and computing education
Brings together projects instructions and code editing into a single interface so that young people do not have to switch screens, which makes coding easier
2. Removing barriers to accessing computing education
Our work on the Code Editor will:
Ensure it works well on mobile and tablet devices, and low-cost computers including the Raspberry Pi 4 2GB
Support localisation and translation, so we can tailor the Editor for the needs of young people all over the world
3. Making learning to program engaging for more young people
We want to offer a Code Editor that:
Enables young people to build a vast variety of projects because it supports graphic user interface output and supplies images and sprites for use in multimedia projects
We’re also planning on making the Editor available as an open source project so that other projects and organisations focussed on helping people learn to code can benefit. More on this soon.
Our work on the Code Editor has been generously funded by the Algorand Foundation and Endless, and we thank them for their generous support. If you are interested in partnering with us to fund this key work, please reach out to us via email.
Welcome to the 21st edition of the AWS Serverless ICYMI (in case you missed it) quarterly recap. Every quarter, we share all the most recent product launches, feature enhancements, blog posts, webinars, live streams, and other interesting things that you might have missed!
In case you missed our last ICYMI, check out what happened last quarter here.
Example notification of a story hosted with Next.js and App Runner
Serverless Land is a website maintained by the Serverless Developer Advocate team to help you build serverless applications and includes workshops, code examples, blogs, and videos. There is now enhanced search functionality so you can search across resources, patterns, and video content.
ServerlessLand search
AWS Lambda
AWS Lambda has improved how concurrency works with Amazon SQS. You can now control the maximum number of concurrent Lambda functions invoked.
The launch blog post explains the scaling behavior of Lambda using this architectural pattern, challenges this feature helps address, and a demo of maximum concurrency in action.
Maximum concurrency is set to 10 for the SQS queue.
AWS Lambda Powertools is an open-source library to help you discover and incorporate serverless best practices more easily. Lambda Powertools for .NET is now generally available and currently focused on three observability features: distributed tracing (Tracer), structured logging (Logger), and asynchronous business and application metrics (Metrics). Powertools is also available for Python, Java, and Typescript/Node.js programming languages.
Lambda announced a new feature, runtime management controls, which provide more visibility and control over when Lambda applies runtime updates to your functions. The runtime controls are optional capabilities for advanced customers that require more control over their runtime changes. You can now specify a runtime management configuration for each function with three settings, Automatic (default), Function update, or manual.
There are three new Amazon CloudWatch metrics for asynchronous Lambda function invocations: AsyncEventsReceived, AsyncEventAge, and AsyncEventsDropped. You can track the asynchronous invocation requests sent to Lambda functions to monitor any delays in processing and take corrective actions if required. The launch blog post explains the new metrics and how to use them to troubleshoot issues.
Lambda now supports Amazon DocumentDB change streams as an event source. You can use Lambda functions to process new documents, track updates to existing documents, or log deleted documents. You can use any programming language that is supported by Lambda to write your functions.
There is a helpful blog post suggesting best practices for developing portable Lambda functions that allow you to port your code to containers if you later choose to.
AWS Step Functions
AWS Step Functions has expanded its AWS SDK integrations with support for 35 additional AWS services including Amazon EMR Serverless, AWS Clean Rooms, AWS IoT FleetWise, AWS IoT RoboRunner and 31 other AWS services. In addition, Step Functions also added support for 1000+ new API actions from new and existing AWS services such as Amazon DynamoDB and Amazon Athena. For the full list of added services, visit AWS SDK service integrations.
EventBridge event buses now also support enhanced integration with Service Quotas. Your quota increase requests for limits such as PutEvents transactions-per-second, number of rules, and invocations per second among others will be processed within one business day or faster, enabling you to respond quickly to changes in usage.
AWS SAM
The AWS Serverless Application Model (SAM) Command Line Interface (CLI) has added the sam list command. You can now show resources defined in your application, including the endpoints, methods, and stack outputs required to test your deployed application.
AWS SAM has a preview of sam build support for building and packaging serverless applications developed in Rust. You can use cargo-lambda in the AWS SAM CLI build workflow and AWS SAM Accelerate to iterate on your code changes rapidly in the cloud.
You can now use AWS SAM connectors as a source resource parameter. Previously, you could only define AWS SAM connectors as a AWS::Serverless::Connector resource. Now you can add the resource attribute on a connector’s source resource, which makes templates more readable and easier to update over time.
AWS SAM connectors now also support multiple destinations to simplify your permissions. You can now use a single connector between a single source resource and multiple destination resources.
In October 2022, AWS released OpenID Connect (OIDC) support for AWS SAM Pipelines. This improves your security posture by creating integrations that use short-lived credentials from your CI/CD provider. There is a new blog post on how to implement it.
Amazon S3 now automatically applies default encryption to all new objects added to S3, at no additional cost and with no impact on performance.
You can now use an S3 Object Lambda Access Point alias as an origin for your Amazon CloudFront distribution to tailor or customize data to end users. For example, you can resize an image depending on the device that an end user is visiting from.
S3 has introduced Mountpoint for S3, a high performance open source file client that translates local file system API calls to S3 object API calls like GET and LIST.
S3 Multi-Region Access Points now support datasets that are replicated across multiple AWS accounts. They provide a single global endpoint for your multi-region applications, and dynamically route S3 requests based on policies that you define. This helps you to more easily implement multi-Region resilience, latency-based routing, and active-passive failover, even when data is stored in multiple accounts.
Amazon Kinesis
Amazon Kinesis Data Firehose now supports streaming data delivery to Elastic. This is an easier way to ingest streaming data to Elastic and consume the Elastic Stack (ELK Stack) solutions for enterprise search, observability, and security without having to manage applications or write code.
Amazon DynamoDB
Amazon DynamoDB now supports table deletion protection to protect your tables from accidental deletion when performing regular table management operations. You can set the deletion protection property for each table, which is set to disabled by default.
Amazon SNS
Amazon SNS now supports AWS X-Ray active tracing to visualize, analyze, and debug application performance. You can now view traces that flow through Amazon SNS topics to destination services, such as Amazon Simple Queue Service, Lambda, and Kinesis Data Firehose, in addition to traversing the application topology in Amazon CloudWatch ServiceLens.
SNS also now supports setting content-type request headers for HTTPS notifications so applications can receive their notifications in a more predictable format. Topic subscribers can create a DeliveryPolicy that specifies the content-type value that SNS assigns to their HTTPS notifications, such as application/json, application/xml, or text/plain.
EDA Visuals collection added to Serverless Land
The Serverless Developer Advocate team has extended Serverless Land and introduced EDA visuals. These are small bite sized visuals to help you understand concept and patterns about event-driven architectures. Find out about batch processing vs. event streaming, commands vs. events, message queues vs. event brokers, and point-to-point messaging. Discover bounded contexts, migrations, idempotency, claims, enrichment and more!
There is also a new section on Serverless Land containing helpful code repositories. You can search for code repos to use for examples, learning or building serverless applications. You can also filter by use-case, runtime, and level.
Weekly office hours live stream. In each session we talk about a specific topic or technology related to serverless and open it up to helping you with your real serverless challenges and issues. Ask us anything you want about serverless technologies and applications.
Marcia Villalba frequently publishes new videos on her popular serverless YouTube channel. You can view all of Marcia’s videos at https://www.youtube.com/c/FooBar_codes.
Eric Johnson is exploring how developers are building serverless applications. We spend time talking about AWS SAM as well as others like AWS CDK, Terraform, Wing, and AMPT.
The Serverless landing page has more information. The Lambda resources page contains case studies, webinars, whitepapers, customer stories, reference architectures, and even more Getting Started tutorials.
Which programming language has been around for more than three decades and continues to grow in popularity each year?
If you guessed Python, you nailed it. In the 2022 Octoverse report, we found that Python remains the second most-used programming language on GitHub. Interestingly, Python’s use grew more than 22 percent year over year with more than four million developers on GitHub using it at some point in 2022.
In this article, we’ll dive into a brief history of Python, its benefits, its use cases, and seek to answer why a program language conceived in the 1980s continues to dominate development. And, since this is GitHub, we’ll also offer a few useful tips and tricks for developers new to—and experienced in—Python.
So, what is Python?
Python is a high-level, interpreted programming language with a simple syntax, which makes it easily readable and extremely user- and beginner-friendly. Originally built to satisfy Guido Van Rossum’s desire for a programming language that was simple to use and beautiful to look at, Python was first released to the world in 1991.
Since its development, it has grown to have widespread applicability for developers, data scientists, researchers, and more. But how, you may ask, can a coding language be simple and beautiful to look at? Here’s some proof:
Python
print("Hello world.")
vs.
Java
public class HelloWorld {
public static void main (String[]args) {
System.out.println.("Hello world");
}
}
Since Python is a general-purpose language, it can be used in a variety of applications, and its uncomplicated nature makes it an excellent language for automating tasks, building websites or software, and analyzing data.
Python also has several other characteristics that make it popular amongst developers and engineers. These include:
It’s easy to read. Python code uses English keywords rather than punctuation, and its line breaks help define the code blocks. In practice, this means you can identify what the code is designed to do simply by looking at it.
It’s portable. Some languages require you to modify code to run on different platforms, but Python is a cross-platform language, which means you can run the same code on any operating system with a Python interpreter.
It’s extendable. Python code can be written in other languages (such as C++), and users can add low-level modules to the Python interpreter to customize and optimize their tools.
It has a broad standard library. This library is available for anyone to access and means that users don’t have to write code for every single function—they can access built-in modules that help with issues in everyday programming and more.
What is Python commonly used for?
Python can be used for just about anything, from web and software development to machine learning and artificial intelligence (AI). Let’s take a look at some of its most common use cases.
import antigravity
def main():
antigravity.fly()
if __name__ == '__main__':
main()
Run this command to check out an inside joke among Python developers.
Using Python for web and software development
Python is a popular language for web and software development because you can create complex, multi-protocol applications while maintaining concise, readable syntax. In fact, some of the most popular applications were built with Python. Plus, Python’s open source community provides developers with an extensive amount of reusable code, frameworks, and support. Case in point: Django is one of the most-used Python frameworks designed by experienced developers to help others accelerate their application build times and avoid issues that might balk their progress.
Using Python for task automation
One of Python’s key benefits is its ability to automate manual, repetitive tasks. With Python, you can learn how to automate just about anything by using either built-in modules or pre-written code from its robust library. Or you can write your own custom scripts to perform specific actions. For example, you can easily automate emails with the “smtplib” module or copy files with the “shutil” module. Python also has a robust set of testing frameworks, which makes it an excellent language for test automation. Frameworks such as Pytest, Behave, and Robot allow developers to write simple yet effective tests to ensure the quality of their builds.
Using Python for machine learning and data science
Here’s a fun fact: Python is the top preferred language for data science and research. Since its syntax is easily understandable and adaptable, people with little-to-no development experience can easily learn Python and use it to manipulate data for research, reporting, predictable or regression analyses, and more. Collecting and parsing data can be a time-consuming task for data scientists. Python is also one of the top languages for training machine learning (ML) models. Through specific algorithms, these models can analyze and identify patterns in data to make predictions or decisions based on that data. They also constantly evolve based on outputs of previous datasets to confront new variables. Data scientists and developers training ML models often utilize libraries, such as NumPy, Pandas, and Matplotlib, to automate functions like cleaning, data transformation, and visualization.
Using Python for financial analysis
Similar to how Python can assist data scientists with the heavy lift of large data sets, Python is widely used in the financial industry to quickly perform complex computations. Stock markets generate huge amounts of data, and Python can be used to import data on stock prices and generate strategies through algorithms to identify trading opportunities. The language can also be used for portfolio optimization, risk management, financial modeling and visualization, cryptocurrency analysis, and even fraud detection.
Using Python for and artificial intelligence
Python can also be found in some of the most complex, artificial intelligence (AI) technologies—and it’s actually one of the preferred languages for AI. Python’s concise and readable code allows developers to create consistent, reliable systems, and its vast library provides a number of frameworks like PyBrain, which offers developers powerful algorithms for machine learning tasks. Plus, Python’s visualization capabilities can help convert these large datasets for AI or ML into comprehensible graphs or reports. Interestingly enough, OpenAI, the artificial intelligence research lab, utilizes the Python framework, Pytorch, as their standard framework for deep learning, which trains its AI systems.
Why is Python so popular?
In addition to its relative simplicity to learn, there are a few other reasons why Python continues to consistently grow in popularity. These include:
It’s more productive. Compared to some other more complex programming languages like C++, Python’s syntax allows users to do more with less and cut down on time and effort to write the same lines of code.
It has an expansive, supportive community of users. Even the best developers run into problems— and this is where user communities can become an invaluable resource. Python has a huge community with documentation, tutorials, tips, and tricks to master the language. The Python community on GitHub, for example, offers everything from information on the latest version of the language to bug reports and update notes.
It’s academic. Python has become the go-to language in academia with some students even encountering Python as early as elementary school. (Believe it or not, there are children’s picture books dedicated to Python.) While computer science students are often taught Python, its use extends beyond that discipline into other areas of STEM and academic research. For example, Python can be used to solve differential equations, perform statistical analyses, simulate and track particle diffusion, and more.
It has high corporate demand. Because of its wide scale applicability in development and data analysis work, learning and knowing Python is often considered a top-skill among job seekers. According to Statista, Python was the third most demanded language in 2022 by recruiters worldwide.
The bottom line
Python is everywhere—and it’s been used to build a significant number of the technologies, websites, and even systems most people encounter on a daily basis. It powers everything from your favorite video streaming service to the ML algorithms that can help you make your next cryptocurrency trade. And for an even broader scope example (pun absolutely intended), NASA uses Python to power data analysis with its sophisticated James Webb Space Telescope, which makes it one of the few programming languages that is, quite literally, out of this world.
How to get started with Python
A quick Google search will yield hundreds of resources out there to jumpstart your Python journey—and that can quickly get a little overwhelming. To simplify things, here are a few helpful GitHub repositories to help you get started with Python:
Explore pre-built Python algorithms: From networking flows to physics and neural networks, this repository is a great guide to building algorithms in Python.
Learn Python in 30 days: This step-by-step guide will walk you through the basics of Python in 30 days.
Grab some tips from this cheatsheet: Check out this collection of Python scripts with code examples and explanations for learning the language.
Sharpen your Python skills: Pick up some tips from this study guide for both beginner and seasoned users built by a Python superfan!
GitHub offers two easier ways to start working with Python: GitHub Codespaces and GitHub Copilot.
You can start building today for free with GitHub Codespaces, which every developer on GitHub gets 60 free hours of use time per month to spin up a development environment in the cloud from any device at speed. Check out the Django quick start template to begin coding right in your browser!
You can also use GitHub Copilot, GitHub’s AI pair programmer, to write your first lines of Python. Here’s how:
Install the GitHub Copilot extension into your code editor.
Describe the purpose of your project in a comment.
Write a comment describing which libraries you may need.
Start tabbing and let GitHub Copilot suggest lines of code to help you learn new techniques or methods.
From machine learning to data analysis, Python’s versatility allows it to continue its explosive growth with developers and non-developers alike. Experiment with Python through GitHub or on your local machine to be part of this growth and get started today!
Finding similar columns in a data lake has important applications in data cleaning and annotation, schema matching, data discovery, and analytics across multiple data sources. The inability to accurately find and analyze data from disparate sources represents a potential efficiency killer for everyone from data scientists, medical researchers, academics, to financial and government analysts.
Conventional solutions involve lexical keyword search or regular expression matching, which are susceptible to data quality issues such as absent column names or different column naming conventions across diverse datasets (for example, zip_code, zcode, postalcode).
In this post, we demonstrate a solution for searching for similar columns based on column name, column content, or both. The solution uses approximate nearest neighbors algorithms available in Amazon OpenSearch Service to search for semantically similar columns. To facilitate the search, we create features representations (embeddings) for individual columns in the data lake using pre-trained Transformer models from the sentence-transformers library in Amazon SageMaker. Finally, to interact with and visualize results from our solution, we build an interactive Streamlit web application running on AWS Fargate.
We include a code tutorial for you to deploy the resources to run the solution on sample data or your own data.
Solution overview
The following architecture diagram illustrates the two-stage workflow for finding semantically similar columns. The first stage runs an AWS Step Functions workflow that creates embeddings from tabular columns and builds the OpenSearch Service search index. The second stage, or the online inference stage, runs a Streamlit application through Fargate. The web application collects input search queries and retrieves from the OpenSearch Service index the approximate k-most-similar columns to the query.
Figure 1. Solution architecture
The automated workflow proceeds in the following steps:
The user uploads tabular datasets into an Amazon Simple Storage Service (Amazon S3) bucket, which invokes an AWS Lambda function that initiates the Step Functions workflow.
The workflow begins with an AWS Glue job that converts the CSV files into Apache Parquet data format.
A SageMaker Processing job creates embeddings for each column using pre-trained models or custom column embedding models. The SageMaker Processing job saves the column embeddings for each table in Amazon S3.
A Lambda function creates the OpenSearch Service domain and cluster to index the column embeddings produced in the previous step.
Finally, an interactive Streamlit web application is deployed with Fargate. The web application provides an interface for the user to input queries to search the OpenSearch Service domain for similar columns.
You can download the code tutorial from GitHub to try this solution on sample data or your own data. Instructions on the how to deploy the required resources for this tutorial are available on Github.
Prerequistes
To implement this solution, you need the following:
In this post, we build a search index to include over 400 columns from over 25 tabular datasets. The datasets originate from the following public sources:
For the the full list of the tables included in the index, see the code tutorial on GitHub.
You can bring your own tabular dataset to augment the sample data or build your own search index. We include two Lambda functions that initiate the Step Functions workflow to build the search index for individual CSV files or a batch of CSV files, respectively.
Transform CSV to Parquet
Raw CSV files are converted to Parquet data format with AWS Glue. Parquet is a column-oriented format file format preferred in big data analytics that provides efficient compression and encoding. In our experiments, the Parquet data format offered significant reduction in storage size compared to raw CSV files. We also used Parquet as a common data format to convert other data formats (for example JSON and NDJSON) because it supports advanced nested data structures.
Create tabular column embeddings
To extract embeddings for individual table columns in the sample tabular datasets in this post, we use the following pre-trained models from the sentence-transformers library. For additional models, see Pretrained Models.
The SageMaker Processing job runs create_embeddings.py(code) for a single model. For extracting embeddings from multiple models, the workflow runs parallel SageMaker Processing jobs as shown in the Step Functions workflow. We use the model to create two sets of embeddings:
column_name_embeddings – Embeddings of column names (headers)
column_content_embeddings – Average embedding of all the rows in the column
For more information about the column embedding process, see the code tutorial on GitHub.
An alternative to the SageMaker Processing step is to create a SageMaker batch transform to get column embeddings on large datasets. This would require deploying the model to a SageMaker endpoint. For more information, see Use Batch Transform.
Index embeddings with OpenSearch Service
In the final step of this stage, a Lambda function adds the column embeddings to a OpenSearch Service approximate k-Nearest-Neighbor (kNN) search index. Each model is assigned its own search index. For more information about the approximate kNN search index parameters, see k-NN.
Online inference and semantic search with a web app
The second stage of the workflow runs a Streamlit web application where you can provide inputs and search for semantically similar columns indexed in OpenSearch Service. The application layer uses an Application Load Balancer, Fargate, and Lambda. The application infrastructure is automatically deployed as part of the solution.
The application allows you to provide an input and search for semantically similar column names, column content, or both. Additionally, you can select the embedding model and number of nearest neighbors to return from the search. The application receives inputs, embeds the input with the specified model, and uses kNN search in OpenSearch Service to search indexed column embeddings and find the most similar columns to the given input. The search results displayed include the table names, column names, and similarity scores for the columns identified, as well as the locations of the data in Amazon S3 for further exploration.
The following figure shows an example of the web application. In this example, we searched for columns in our data lake that have similar Column Names (payload type) to district (payload). The application used all-MiniLM-L6-v2 as the embedding model and returned 10 (k) nearest neighbors from our OpenSearch Service index.
The application returned transit_district, city, borough, and location as the four most similar columns based on the data indexed in OpenSearch Service. This example demonstrates the ability of the search approach to identify semantically similar columns across datasets.
Figure 3: Web application user interface
Clean up
To delete the resources created by the AWS CDK in this tutorial, run the following command:
cdk destroy --all
Conclusion
In this post, we presented an end-to-end workflow for building a semantic search engine for tabular columns.
Get started today on your own data with our code tutorial available on GitHub. If you’d like help accelerating your use of ML in your products and processes, please contact the Amazon Machine Learning Solutions Lab.
About the Authors
Kachi Odoemene is an Applied Scientist at AWS AI. He builds AI/ML solutions to solve business problems for AWS customers.
Taylor McNally is a Deep Learning Architect at Amazon Machine Learning Solutions Lab. He helps customers from various industries build solutions leveraging AI/ML on AWS. He enjoys a good cup of coffee, the outdoors, and time with his family and energetic dog.
Austin Welch is a Data Scientist in the Amazon ML Solutions Lab. He develops custom deep learning models to help AWS public sector customers accelerate their AI and cloud adoption. In his spare time, he enjoys reading, traveling, and jiu-jitsu.
While writing code to develop applications, developers must keep up with multiple programming languages, frameworks, software libraries, and popular cloud services from providers such as AWS. Even though developers can find code snippets on developer communities, to either learn from them or repurpose the code, manually searching for the snippets with an exact or even similar use case is a distracting and time-consuming process. They have to do all of this while making sure that they’re following the correct programming syntax and best coding practices.
Amazon CodeWhisperer, a machine learning (ML) powered coding aide for developers, lets you overcome those challenges. Developers can simply write a comment that outlines a specific task in plain English, such as “upload a file to S3.” Based on this, CodeWhisperer automatically determines which cloud services and public libraries are best-suited for the specified task, it creates the specific code on the fly, and then it recommends the generated code snippets directly in the IDE. And this isn’t about copy-pasting code from the web, but generating code based on the context of your file, such as which libraries and versions you have, as well as the existing code. Moreover, CodeWhisperer seamlessly integrates with your Visual Studio Code and JetBrains IDEs so that you can stay focused and never leave the development environment. At the time of this writing, CodeWhisperer supports Java, Python, JavaScript, C#, and TypeScript.
To make our application easier to digest, we’ll split it into three segments:
Image download – The user provides an image URL to the first API. A Lambda function downloads the image from the URL and stores it on an S3 bucket. Amazon S3 automatically sends a notification to an Amazon SNS topic informing that a new image is ready for processing. Amazon SNS then delivers the message to an Amazon SQS queue.
Image recognition – A second Lambda function handles the orchestration and processing of the image. It receives the message from the Amazon SQS queue, sends the image for Amazon Rekognition to process, stores the recognition results on a DynamoDB table, and sends a message with those results as JSON to a second Amazon SNS topic used in section three. A user can list the images and the objects present on each image by calling a second API which queries the DynamoDB table.
3rd-party integration – The last Lambda function reads the message from the second Amazon SQS queue. At this point, the Lambda function must deliver that message to a fictitious external e-mail server HTTP API that supports only XML payloads. Because of that, the Lambda function converts the JSON message to XML. Lastly, the function sends the XML object via HTTP POST to the e-mail server.
The following diagram depicts the architecture of our application:
Figure 1. Architecture diagram depicting the application architecture. It contains the service icons with the component explained on the text above.
Prerequisites
Before getting started, you must have the following prerequisites:
We already created the scaffolding for the application that we’ll build, which you can find on this Git repository. This application is represented by a CDK app that describes the infrastructure according to the architecture diagram above. However, the actual business logic of the application isn’t provided. You’ll implement it using CodeWhisperer. This means that we already declared using AWS CDK components, such as the API Gateway endpoints, DynamoDB table, and topics and queues. If you’re new to AWS CDK, then we encourage you to go through the CDK workshop later on.
Deploying AWS CDK apps into an AWS environment (a combination of an AWS account and region) requires that you provision resources that the AWS CDK needs to perform the deployment. These resources include an Amazon S3 bucket for storing files and IAM roles that grant permissions needed to perform deployments. The process of provisioning these initial resources is called bootstrapping. The required resources are defined in an AWS CloudFormation stack, called the bootstrap stack, which is usually named CDKToolkit. Like any CloudFormation stack, it appears in the CloudFormation console once it has been deployed.
After cloning the repository, let’s deploy the application (still without the business logic, which we’ll implement later on using CodeWhisperer). For this post, we’ll implement the application in Python. Therefore, make sure that you’re under the python directory. Then, use the cdk bootstrap command to bootstrap an AWS environment for AWS CDK. Replace {AWS_ACCOUNT_ID} and {AWS_REGION} with corresponding values first:
cdk bootstrap aws://{AWS_ACCOUNT_ID}/{AWS_REGION}
For more information about bootstrapping, refer to the documentation.
Let’s get started by implementing the first Lambda function, which is responsible for downloading an image from the provided URL and storing that image in an S3 bucket. Open the get_save_image.py file from the python/api/runtime/ directory. This file contains an empty Lambda function handler and the needed inputs parameters to integrate this Lambda function.
url is the URL of the input image provided by the user,
name is the name of the image provided by the user, and
S3_BUCKET is the S3 bucket name defined by our application infrastructure.
Write a comment in natural language that describes the required functionality, for example:
# Function to get a file from url
To trigger CodeWhisperer, hit the Enter key after entering the comment and wait for a code suggestion. If you want to manually trigger CodeWhisperer, then you can hit Option + C on MacOS or Alt + C on Windows. You can browse through multiple suggestions (if available) with the arrow keys. Accept a code suggestion by pressing Tab. Discard a suggestion by pressing Esc or typing a character.
You should get a suggested implementation of a function that downloads a file using a specified URL. The following image shows an example of the code snippet that CodeWhisperer suggests:
Figure 2. Screenshot of the code generated by CodeWhisperer on VS Code. It has a function called get_file_from_url with the implementation suggestion to download a file using the requests lib.
Be aware that CodeWhisperer uses artificial intelligence (AI) to provide code recommendations, and that this is non-deterministic. The result you get in your IDE may be different from the one on the image above. If needed, fine-tune the code, as CodeWhisperer generates the core logic, but you might want to customize the details depending on your requirements.
Let’s try another action, this time to upload the image to an S3 bucket:
# Function to upload image to S3
As a result, CodeWhisperer generates a code snippet similar to the following one:
Figure 3. Screenshot of the code generated by CodeWhisperer on VS Code. It has a function called upload_image with the implementation suggestion to download a file using the requests lib and upload it to S3 using the S3 client.
Now that you have the functions with the functionalities to download an image from the web and upload it to an S3 bucket, you can wire up both functions in the Lambda handler function by calling each function with the correct inputs.
Image recognition
Now let’s implement the Lambda function responsible for sending the image to Amazon Rekognition for processing, storing the results in a DynamoDB table, and sending a message with those results as JSON to a second Amazon SNS topic. Open the image_recognition.py file from the python/recognition/runtime/ directory. This file contains an empty Lambda and the needed inputs parameters to integrate this Lambda function.
queue_url is the URL of the Amazon SQS queue to which this Lambda function is subscribed,
table_name is the name of the DynamoDB table, and
topic_arn is the ARN of the Amazon SNS topic to which this Lambda function is published.
Using CodeWhisperer, implement the business logic of the next Lambda function as you did in the previous section. For example, to detect the labels from an image using Amazon Rekognition, write the following comment:
# Detect labels from image with Rekognition
And as a result, CodeWhisperer should give you a code snippet similar to the one in the following image:
Figure 4. Screenshot of the code generated by CodeWhisperer on VS Code. It has a function called detect_labels with the implementation suggestion to use the Rekognition SDK to detect labels on the given image.
You can continue generating the other functions that you need to fully implement the business logic of your Lambda function. Here are some examples that you can use:
# Save labels to DynamoDB
# Publish item to SNS
# Delete message from SQS
Following the same approach, open the list_images.py file from the python/recognition/runtime/ directory to implement the logic to list all of the labels from the DynamoDB table. As you did previously, type a comment in plain English:
# Function to list all items from a DynamoDB table
Other frequently used code
Interacting with AWS isn’t the only way that you can leverage CodeWhisperer. You can use it to implement repetitive tasks, such as creating unit tests and converting message formats, or to implement algorithms like sorting and string matching and parsing. The last Lambda function that we’ll implement as part of this post is to convert a JSON payload received from Amazon SQS to XML. Then, we’ll POST this XML to an HTTP endpoint.
Open the send_email.py file from the python/integration/runtime/ directory. This file contains an empty Lambda function handler. An event is a JSON-formatted document that contains data for a Lambda function to process. Type a comment with your intent to get the code snippet:
# Transform json to xml
As CodeWhisperer uses the context of your files to generate code, depending on the imports that you have on your file, you’ll get an implementation such as the one in the following image:
Figure 5. Screenshot of the code generated by CodeWhisperer on VS Code. It has a function called json_to_xml with the implementation suggestion to transform JSON payload into XML payload.
Repeat the same process with a comment such as # Send XML string with HTTP POST to get the last function implementation. Note that the email server isn’t part of this implementation. You can mock it, or simply ignore this HTTP POST step. Lastly, wire up both functions in the Lambda handler function by calling each function with the correct inputs.
Deploy and test the application
To deploy the application, run the command cdk deploy --all. You should get a confirmation message, and after a few minutes your application will be up and running on your AWS account. As outputs, the APIStack and RekognitionStack will print the API Gateway endpoint URLs. It will look similar to this example:
The first endpoint expects two string parameters: url (the image file URL to download) and name (the target file name that will be stored on the S3 bucket). Use any image URL you like, but remember that you must encode an image URL before passing it as a query string parameter to escape the special characters. Use an online URL encoder of your choice for that. Then, use the curl command to invoke the API Gateway endpoint:
curl -X GET 'https://examp1eid0.execute-api.eu-east-
2.amazonaws.com/prod?url={encoded-image-URL}&name={file-name}'
Replace {encoded-image-URL} and {file-name} with the corresponding values. Also, make sure that you use the correct API endpoint that you’ve noted from the AWS CDK deploy command output as mentioned above.
It will take a few seconds for the processing to happen in the background. Once it’s ready, see what has been stored in the DynamoDB table by invoking the List Images API (make sure that you use the correct URL from the output of your deployed AWS CDK stack):
curl -X GET 'https://examp1eid7.execute-api.eu-east-2.amazonaws.com/prod'
After you’re done, to avoid unexpected charges to your account, make sure that you clean up your AWS CDK stacks. Use the cdk destroy command to delete the stacks.
Conclusion
In this post, we’ve seen how to get a significant productivity boost with the help of ML. With that, as a developer, you can stay focused on your IDE and reduce the time that you spend searching online for code snippets that are relevant for your use case. Writing comments in natural language, you get context-based snippets to implement full-fledged applications. In addition, CodeWhisperer comes with a mechanism called reference tracker, which detects whether a code recommendation might be similar to particular CodeWhisperer training data. The reference tracker lets you easily find and review that reference code and see how it’s used in the context of another project. Lastly, CodeWhisperer provides the ability to run scans on your code (generated by CodeWhisperer as well as written by you) to detect security vulnerabilities.
During the preview period, CodeWhisperer is available to all developers across the world for free. Get started with the free preview on JetBrains, VS Code or AWS Cloud9.
Launched in 2013, Hour of Code is an initiative to introduce young people to computer science using fun one-hour tutorials. To date, over 100 million young people have completed an hour of code with it.
Although the Hour of Code website is accessible all year round, every December for Computer Science Education Week people worldwide run their own Hour of Code events. Each year we love seeing many Code Clubs, CoderDojos, and young people at home across the community complete their Hour of Code. You can register your 2022 Hour of Code event now to run between 5 and 11 December.
To support your event, we have pulled together a bumper set of our free coding projects, which can each be completed in just one hour. You will find these activities on the Hour of Code website.
There’s something for all ages and levels of experience, so put an hour aside and help young people make something fabulous with code:
Ages 7–11
Beginner
For younger creators new to coding, a Scratch project is a great place to start.
With our Space talk project, they can create a space scene with characters that ‘emote’ to share their thoughts or feelings using sounds, colours, and actions. Creators program the character emotes using Scratch blocks to control graphic effects, costume animation, and sound effects.
Alternatively, our Stress ball project lets them code an onscreen stress ball that reacts to user clicks. Creators use the Paint and Sound editors in Scratch to personalise a clickable stress ball, and they add Scratch blocks to control graphic effects, costume animation, and sound effects.
We love this fun stress ball example sent to us recently by young creator April from the United States:
Another great option is to use Code Club World, which is a free tool to help children who are new to coding.
Comfortable
For 7- to 11-year-olds who are more comfortable with block-based coding, our project Broadcasting spells is ideal to choose. With the project, they connect Scratch blocks to code a wand that casts spells turning sprites into toads, and growing and shrinking them. Creators use broadcast blocks to transform multiple sprites at once, and they create sound effects with the Sound editor in Scratch.
Ages 11–14
Beginner
We have three exciting projects for trying text-based coding during Hour of Code in this category. The first, Anime expressions, is one of our brand-new ‘Introduction to web development’ projects. With this project, young people create a responsive webpage with text and images for an anime drawing tutorial. They write HTML to structure the webpage and CSS styles to apply layout, colour palettes, and fonts.
For a great introduction to coding with Python, we have the project Hello world from our ‘Introduction to Python’ path. With this project, creators write Python text-based code to create an interactive program that shows text and emojis based on user input. They learn about variables as they use them to store text and numbers, and they learn about writing functions to organise code and do calculations, retrieve the current date and time, and make a customisable dice.
LED firefly is a fantastic physical making project in which young people use a Raspberry Pi Pico microcontroller and basic electronic components to create a blinking LED firefly. They program the LED’s light patterns with MicroPython code and activate it via a switch they make themselves using jumper wires.
Comfortable
For 11- to 14-year-olds who are already comfortable with HTML, the Flip treat webcards project is a fun option. With this, they create a webpage showing a set of cards that flip when a visitor’s mouse pointer hovers over them. Creators use CSS styling and animations to add interactivity, then they customise the cards with fancy fonts and colour gradients.
Young people who have already done some Python coding can try out our project Target practice. With this project they create a game, using the p5 graphics library to draw a colourful target, and writing code so that the player scores points by hitting the target’s rings with arrows. While they create the project, they learn about RGB colours, shape positioning with x and y coordinates, and decisions using if, else-if, and else code statements.
Ages 14+
Beginner
Our project Charting champions is a great introduction to data visualisation and analysis for coders aged 15 and older. With the project, they will discover the power of the Python programming language as they store Olympic medal data in lists and use the pygal library to create an interactive chart.
Comfortable
Teenage coders who feel comfortable with Python programming can use our project Solar system simulator to code an animated, interactive solar system model using the Python p5 graphics library. Their model will be interactive, as they’ll use dictionaries to store planet facts that display when a user clicks on an orbiting planet.
Coding for Hour of Code and beyond
Now is the time to register your Hour of Code event, then decide which project you’d like to support young people to create. You can download certificates for each of the creators from the Hour of Code certificates page.
And make sure to check out our project paths so you know what projects you can help the young people you support to code beyond this one hour of code.
We don’t just create activities so that other people can experience coding and digital making — we also get involved ourselves!
Recently, our teams who support the Code Club and CoderDojo networks got together to make LED fireflies. We are excited to get coding again as part of Hour of Code and Computer Science Education Week.
VPC Flow Logs is an AWS feature that captures information about the network traffic flows going to and from network interfaces in Amazon Virtual Private Cloud (Amazon VPC). Visibility to the network traffic flows of your application can help you troubleshoot connectivity issues, architect your application and network for improved performance, and improve security of your application.
Each VPC flow log record contains the source and destination IP address fields for the traffic flows. The records also contain the Amazon Elastic Compute Cloud (Amazon EC2) instance ID that generated the traffic flow, which makes it easier to identify the EC2 instance and its associated VPC, subnet, and Availability Zone from where the traffic originated. However, when you have a large number of EC2 instances running in your environment, it may not be obvious where the traffic is coming from or going to simply based on the EC2 instance IDs or IP addresses contained in the VPC flow log records.
By enriching flow log records with additional metadata such as resource tags associated with the source and destination resources, you can more easily understand and analyze traffic patterns in your environment. For example, customers often tag their resources with resource names and project names. By enriching flow log records with resource tags, you can easily query and view flow log records based on an EC2 instance name, or identify all traffic for a certain project.
In addition, you can add resource context and metadata about the destination resource such as the destination EC2 instance ID and its associated VPC, subnet, and Availability Zone based on the destination IP in the flow logs. This way, you can easily query your flow logs to identify traffic crossing Availability Zones or VPCs.
In this solution, you enable VPC flow logs and stream them to Kinesis Data Firehose. This solution enriches log records using an AWS Lambda function on Kinesis Data Firehose in a completely serverless manner. The Lambda function fetches resource tags for the instance ID. It also looks up the destination resource from the destination IP using the Amazon EC2 API and IPAM, and adds the associated VPC network context and metadata for the destination resource. It then stores the enriched log records in an Amazon Simple Storage Service (Amazon S3) bucket. After you have enriched your flow logs, you can query, view, and analyze them in a wide variety of services, such as AWS Glue, Athena, QuickSight, Amazon OpenSearch Service, as well as solutions from the AWS Partner Network such as Splunk and Datadog.
The following diagram illustrates the solution architecture.
The workflow contains the following steps:
Amazon VPC sends the VPC flow logs to the Kinesis Data Firehose delivery stream.
The delivery stream uses a Lambda function to fetch resource tags for instance IDs from the flow log record and add it to the record. You can also fetch tags for the source and destination IP address and enrich the flow log record.
When the Lambda function finishes processing all the records from the Kinesis Data Firehose buffer with enriched information like resource tags, Kinesis Data Firehose stores the result file in the destination S3 bucket. Any failed records that Kinesis Data Firehose couldn’t process are stored in the destination S3 bucket under the prefix you specify during delivery stream setup.
All the logs for the delivery stream and Lambda function are stored in Amazon CloudWatch log groups.
Prerequisites
As a prerequisite, you need to create the target S3 bucket before creating the Kinesis Data Firehose delivery stream.
You can download the Lambda function code from the GitHub repo used in this solution. The example in this post assumes you are enabling all the available fields in the VPC flow logs. You can use it as is or customize per your needs. For example, if you intend to use the default fields when enabling the VPC flow logs, you need to modify the Lambda function with the respective fields. Creating this function creates an AWS Identity and Access Management (IAM) Lambda execution role.
To create your Lambda function, complete the following steps:
On the Lambda console, choose Functions in the navigation pane.
Choose Create function.
Select Author from scratch.
For Function name, enter a name.
For Runtime, choose Python 3.8.
For Architecture, select x86_64.
For Execution role, select Create a new role with basic Lambda permissions.
Choose Create function.
You can then see code source page, as shown in the following screenshot, with the default code in the lambda_function.py file.
Delete the default code and enter the code from the GitHub Lambda function aws-vpc-flowlogs-enricher.py.
Choose Deploy.
To enrich the flow logs with additional tag information, you need to create an additional IAM policy to give Lambda permission to describe tags on resources from the VPC flow logs.
On the IAM console, choose Policies in the navigation pane.
Choose Create policy.
On the JSON tab, enter the JSON code as shown in the following screenshot.
This policy gives the Lambda function permission to retrieve tags for the source and destination IP and retrieve the VPC ID, subnet ID, and other relevant metadata for the destination IP from your VPC flow log record.
Choose Next: Tags.
Add any tags and choose Next: Review.
For Name, enter vpcfl-describe-tag-policy.
For Description, enter a description.
Choose Create policy.
Navigate to the previously created Lambda function and choose Permissions in the navigation pane.
Choose the role that was created by Lambda function.
A page opens in a new tab.
On the Add permissions menu, choose Attach policies.
Search for the vpcfl-describe-tag-policy you just created.
Select the vpcfl-describe-tag-policy and choose Attach policies.
Create the Kinesis Data Firehose delivery stream
To create your delivery stream, complete the following steps:
On the Kinesis Data Firehose console, choose Create delivery stream.
For Source, choose Direct PUT.
For Destination, choose Amazon S3.
After you choose Amazon S3 for Destination, the Transform and convert records section appears.
For Data transformation, select Enable.
Browse and choose the Lambda function you created earlier.
This impacts how long (in seconds) the delivery stream will buffer the incoming records from the VPC.
Optionally, you can enable Record format conversion.
If you want to query from Athena, it’s recommended to convert it to Apache Parquet or ORC and compress the files with available compression algorithms, such as gzip and snappy. For more performance tips, refer to Top 10 Performance Tuning Tips for Amazon Athena. In this post, record format conversion is disabled.
For S3 bucket, choose Browse and choose the S3 bucket you created as a prerequisite to store the flow logs.
Optionally, you can specify the S3 bucket prefix. The following expression creates a Hive-style partition for year, month, and day:
Dynamic partitioning enables you to create targeted datasets by partitioning streaming S3 data based on partitioning keys. The right partitioning can help you to save costs related to the amount of data that is scanned by analytics services like Athena. For more information, see Kinesis Data Firehose now supports dynamic partitioning to Amazon S3.
Note that you can enable dynamic partitioning only when you create a new delivery stream. You can’t enable dynamic partitioning for an existing delivery stream.
Expand Buffer hints, compression and encryption.
Set the buffer size to 128 and buffer interval to 900 for best performance.
For Compression for data records, select GZIP.
Create a VPC flow log subscription
Now you create a VPC flow log subscription for the Kinesis Data Firehose delivery stream you created.
Navigate to AWS CloudShell or Terminal/PowerShell for a Mac or Windows computer and run the following AWS CLI command to enable the subscription. Provide your VPC ID for the parameter --resource-ids and delivery stream ARN for the parameter --log-destination.
If you’re running CloudShell for the first time, it will take a few seconds to prepare the environment to run.
After you successfully enable the subscription for your VPC flow logs, it takes a few minutes depending on the intervals mentioned in the setup to create the log record files in the destination S3 folder.
To view those files, navigate to the Amazon S3 console and choose the bucket storing the flow logs. You should see the compressed interval logs, as shown in the following screenshot.
You can download any file from the destination S3 bucket on your computer. Then extract the gzip file and view it in your favorite text editor.
The following is a sample enriched flow log record, with the new fields in bold providing added context and metadata of the source and destination IP addresses:
Now that you have enriched the VPC flow logs and stored them in Amazon S3, the next step is to create the Athena database and table to query the data. You first create an AWS Glue crawler to infer the schema from the log files in Amazon S3.
On the AWS Glue console, choose Crawlers in the navigation pane.
Choose Create crawler.
For Name¸ enter a name for the crawler.
For Description, enter an optional description.
Choose Next.
Choose Add a data source.
For Data source¸ choose S3.
For S3 path, provide the path of the flow logs bucket.
Select Crawl all sub-folders.
Choose Add an S3 data source.
Choose Next.
Choose Create new IAM role.
Enter a role name.
Choose Next.
Choose Add database.
For Name, enter a database name.
For Description, enter an optional description.
Choose Create database.
On the previous tab for the AWS Glue crawler setup, for Target database, choose the newly created database.
Choose Next.
Review the configuration and choose Create crawler.
On the Crawlers page, select the crawler you created and choose Run.
You can rerun this crawler when new tags are added to your AWS resources, so that they’re available for you to query from the Athena database.
Run Athena queries
Now you’re ready to query the enriched VPC flow logs from Athena.
On the Athena console, open the query editor.
For Database, choose the database you created.
Enter the query as shown in the following screenshot and choose Run.
The following code shows some of the sample queries you can run:
Select * from awslogs where "dst-az-id"='us-east-1a'
Select * from awslogs where "src-tag-project"='Log Analytics' or "dst-tag-team"='Engineering'
Select "srcaddr", "srcport", "dstaddr", "dstport", "region", "az-id", "dst-az-id", "flow-direction" from awslogs where "az-id"='use1-az2' and "dst-az-id"='us-east-1a'
The following screenshot shows an example query result of the source Availability Zone to the destination Availability Zone traffic.
This post provided a complete serverless solution architecture for enriching VPC flow log records with additional information like resource tags using a Kinesis Data Firehose delivery stream and Lambda function to process logs to enrich with metadata and store in a target S3 file. This solution can help you query, analyze, and visualize VPC flow logs with relevant application metadata because resource tags have been assigned to resources that are available in the logs. This meaningful information associated with each log record wherever the tags are available makes it easy to associate log information to your application.
We encourage you to follow the steps provided in this post to create a delivery stream, integrate with your VPC flow logs, and create a Lambda function to enrich the flow log records with additional metadata to more easily understand and analyze traffic patterns in your environment.
About the Authors
Chaitanya Shah is a Sr. Technical Account Manager with AWS, based out of New York. He has over 22 years of experience working with enterprise customers. He loves to code and actively contributes to AWS solutions labs to help customers solve complex problems. He provides guidance to AWS customers on best practices for their AWS Cloud migrations. He is also specialized in AWS data transfer and in the data and analytics domain.
Vaibhav Katkade is a Senior Product Manager in the Amazon VPC team. He is interested in areas of network security and cloud networking operations. Outside of work, he enjoys cooking and the outdoors.
If you’re new to teaching programming or looking to build or refresh your programming knowledge, we have a free resource that is perfect for you. Our ‘Learn to program in Python’ online course pathway is for educators who want to develop their understanding of the text-based language Python. Each course is packed with information and activities to help you apply what you learn in your classroom teaching.
Why learn to program in Python?
Writing a program in Python is very similar to writing in English, which makes starting to program much easier. Python is also a general-purpose programming language, so once you’ve learned the basics, you can use Python for lots of different programming activities.
That’s why Python is a perfect choice for learning to program, and why many of our educational resources involve Python. Our seven online Python courses cover aspects from taking your first steps into programming, to writing a program to control an electronic circuit, to learning about object-oriented programming.
With time and practice, you will be able to use Python programming to create unique solutions to problems, build helpful tools, and make things that are important to you.
How does the Python course pathway work?
The courses in the pathway have been written by our educators and include advice and activities to help you teach programming in your classroom. You can reuse the course activities to explain programming concepts to your learners and get them to write programs themselves. Because you will have first-hand experience of the activities, you’ll be able to anticipate your learners’ difficulties and adapt your lessons to suit them.
All the courses are designed to take three or four weeks to complete, based on you spending two hours a week on participating. You can have free time-limited access to each course for the length of time it’s designed to take to complete. For example, if it’s a four-week course, like ‘Programming 101’, you can sign up for free to get four weeks of access.
The seven courses in the Python path can be completed in any order you like, and you can choose the courses that match your interests and needs.
Each course involves activities that help you create a programming project using the concepts that you’re learning about. These activities are designed to be a fun and interactive way to reinforce what you’ve learned and can also be used with your learners in the classroom.
Course spotlight: Programming 101
If programming is completely new to you, our ‘Programming 101’ course is the best place to start. In ‘Programming 101’, we use this definition of programming to start with the idea that programming is about you telling a computer what to do:
“Programming is how you get computers to solve problems.”
We see programming as a chance to think creatively about a problem and about all the different ways it could be solved. While you might be unfamiliar with terms like programming, algorithms, or selection, the ‘Programming 101’ course demonstrates how they touch on things that many of us know from other areas of our lives.
On the course, you will:
Learn about basic programming concepts such as sequencing and repetition
Start to write your own programs
Discover how to interpret error messages to find and fix mistakes in your programs
What will you make in the courses?
Through building an understanding of programming, you will see how you can write your own programs to make games, quizzes, physical computing projects, and more. Here’s look at some of the things you could make in three of the seven courses:
Programming 101: Write your first program in Python to make a personal assistant bot. You’ll discover how to make the output of your program respond to the user’s input.
Programming with GUIs: Build a game where players compare two sets of emoji to find the emoji that matches. To make this game, you’ll use what you learn in the course to design the layout of a graphic user interface (GUI) and make sure only one emoji appears twice.
Object-oriented Programming: Create a text-based adventure game with a character on a quest through different rooms! You’ll discover how to write a program that reacts to user input, and how to write your own code to create more challenges within the game based on your ideas.
If you want to help your learners develop their understanding of programming in Python, you’ll be interested in these free resources we’ve created for young people:
Introduction to Python: Our guided project path for learners who are new to text-based programming. We have created these projects with young people around the age of 9 to 13 in mind. Each project takes one hour to complete, and learners can make their own fun programs while learning about Python.
More Python: Our guided project path for learners who want to move beyond the ‘Intro to Python’ path to write programs that contain charts, artwork, and more. We’ve written these projects for young people around the age of 10 to 13.
Isaac Computer Science: This learning platform we’ve created for GCSE and A level students (age 14 to 18) uses Python and other text-based languages to teach the programming concepts within England’s computer science curriculum.
You may have heard a lot about coding and how important it is for children to start learning about coding as early as possible. Computers have become part of our lives, and we’re not just talking about the laptop or desktop computer you might have in your home or on your desk at work. Your phone, your microwave, and your car are all controlled by computers, and those computers need instructions to tell them what to do. Coding, or computer programming, involves writing those instructions.
If children discover a love for coding, they will have an avenue to make the things they want to make; to write programs and build projects that they find useful, fun, or interesting. So how do you give your child the opportunity to learn about coding? We’ve listed some free resources and suggested activities below.
Scratch Junior
If you have a young child under about 7 years of age, then a great place to begin is with ScratchJr. This is an app available on Android and iOS phones and tablets, that lets children learn the basics of programming, without having to worry about making mistakes.
Code Club World
The Raspberry Pi Foundation has developed a series of activities for young learners, on their journey to developing their computing skills. Code Club World provides a platform for children to play with code to design their own avatar, make it dance, and play music. Plus they can share their creations with other learners.
“You could have a go too and discover Scratch together. The platform is designed for complete beginners and it is great fun to play with.”
Carol Thornhill, Engineering Science MA, Mathematics teacher
Scratch
For 7- to 11-year-old children, Scratch is a good way to begin their journey in coding, or to progress from ScratchJr. Like ScratchJr, Scratch is a block-based language, allowing children to assemble code to produce games, animations, stories, or even use some of the add-ons to interact with electronic devices and explore physical computing.
The Raspberry Pi Foundation has hundreds of Scratch projects that your child can try out, but the best place to begin is with our Introduction to Scratch path, which will provide your child with the basic skills they need, and then encourage them to build projects that are relevant to them, culminating in their creation of their own interactive ebook.
Your child may never tire of Scratch, and that is absolutely fine — it is a fully functioning programming language that is surprisingly powerful, when you learn to understand everything it can do. Another advantage of Scratch is that it provides easy access to graphics, sounds, and interactivity that can be trickier to achieve in other programming languages.
Python
If you’re looking for more traditional programming languages for your child to progress on to, especially when they reach 12 years of age or beyond, then we like to direct our young learners to the Python programming language and to the languages that the World Wide Web is built on, particularly HTML, CSS, and JavaScript.
Our Python resources cover the basics of using the language, and then progress from there. Python is one of the most widely used languages when it comes to the fields of artificial intelligence and data science, and we have resources to support your child in learning about these fascinating aspects of technology. Our projects can even introduce your child to the world of electronics and physical computing with activities that use the inexpensive Raspberry Pi Pico, and a handful of electronic components, enabling your kids to create a wide variety of art installations and useful gadgets.
“Trying Python doesn’t mean you can’t go back to Scratch or switch between Scratch and Python for different purposes. I still use Scratch for some projects myself!”
Tracy Gardner, Computer Science PhD, former IBM Software Architect and currently a project writer at the Raspberry Pi Foundation
Coding projects
On our coding tutorials website we have many different projects to help your child learn coding and digital making. These range from beginner resources like the Introduction to Scratch path to more advanced activities such as the Introduction to Unity path, where children can learn how to make 3D worlds and games.
“Our new project paths can be tackled by young creators on their own, without adult intervention. Paths are structured so that they build skills and confidence in the early stages, and then provide more open-ended tasks and inspirational ideas that creators can adapt or work from.”
Rik Cross, BSc (Hons), PGCE, former teacher and Director of Informal Learning at the Raspberry Pi Foundation
Web development
The Web is integral to many of our lives, and we believe that it is important for children to have an understanding of the technology that drives it. That is why we have an Introduction to the Web path that allows children to develop their own web pages, focusing on the kinds of webpages that they want to build, be that sending a greeting card, telling a story, or creating a showcase of their projects.
Coding clubs
Coding clubs are a great place for children to have fun and become more confident with coding, where they can learn through making and share their creations with each other. The Raspberry Pi Foundation operates the world’s largest network of coding clubs — CoderDojo and Code Club.
“I have a new group of creators at my Code Club every year and my favourite part is when they realise they really can let their imagination run wild. You want to make an animation where a talking pineapple chases a snowman — absolutely. You want to make a piece of scalable art out of 1000 pixelated cartoon musical instruments — go right ahead. If you can code it, you can make it ”
Liz Smart, Code Club and CoderDojo mentor, former Solutions Architect and project writer for the Raspberry Pi Foundation
Coding challenges
Once your child has learnt some of the basics, they may enjoy entering a coding challenge! The European Astro Pi Challenge programme allows young people to write code and actually have it run on the International Space Station, and Coolest Projects gives children a chance to showcase their projects from across the globe.
Free resources
No matter what technology your child wants to engage with, there is a wealth of free resources and materials available from organisations such as the Raspberry Pi Foundation and Scratch Foundation, that prepare young people for 21st century life. Whether they want to become professional software engineers, tinker with some electronics, or just have a play around … encourage them to explore some coding projects, and see what they can learn, make, and do!
Author: Marc Scott, BSc (Hons) is a former Science, Computer Science, and Engineering teacher and the Content Lead for Projects at the Raspberry Pi Foundation.
By continuing to use the site, you agree to the use of cookies. more information
The cookie settings on this website are set to "allow cookies" to give you the best browsing experience possible. If you continue to use this website without changing your cookie settings or you click "Accept" below then you are consenting to this.