[$] Progress in wrangling the Python C API

Post Syndicated from jake original https://lwn.net/Articles/950457/

There has been a lot of action for the Python C API in the last month or
so—much of it organizational in nature. As predicted in our late September article on using the “limited”
C API in the standard library, the core developer sprint in October was the
scene of some discussions about the API and the plans for it. Out
of those discussions have come two PEPs, one of which describes the API,
its purposes, strengths, and weaknesses, while the other would establish a C
API working group to coordinate and oversee the development and maintenance
of it.

AWS KMS is now FIPS 140-2 Security Level 3. What does this mean for you?

Post Syndicated from Rushir Patel original https://aws.amazon.com/blogs/security/aws-kms-now-fips-140-2-level-3-what-does-this-mean-for-you/

AWS Key Management Service (AWS KMS) recently announced that its hardware security modules (HSMs) were given Federal Information Processing Standards (FIPS) 140-2 Security Level 3 certification from the U.S. National Institute of Standards and Technology (NIST). For organizations that rely on AWS cryptographic services, this higher security level validation has several benefits, including simpler set up and operation. In this post, we will share more details about the recent change in FIPS validation status for AWS KMS and explain the benefits to customers using AWS cryptographic services as a result of this change.

Background on NIST FIPS 140

The FIPS 140 framework provides guidelines and requirements for cryptographic modules that protect sensitive information. FIPS 140 is the industry standard in the US and Canada and is recognized around the world as providing authoritative certification and validation for the way that cryptographic modules are designed, implemented, and tested against NIST cryptographic security guidelines.

Organizations follow FIPS 140 to help ensure that their cryptographic security is aligned with government standards. FIPS 140 validation is also required in certain fields such as manufacturing, healthcare, and finance and is included in several industry and regulatory compliance frameworks, such as the Payment Card Industry Data Security Standard (PCI DSS), the Federal Risk and Authorization Management Program (FedRAMP), and the Health Information Trust Alliance (HITRUST) framework. FIPS 140 validation is recognized in many jurisdictions around the world, so organizations that operate globally can use FIPS 140 certification internationally.

For more information on FIPS Security Levels and requirements, see FIPS Pub 140-2: Security Requirements for Cryptographic Modules.

What FIPS 140-2 Security Level 3 means for AWS KMS and you

Until recently, AWS KMS had been validated at Security Level 2 overall and at Security Level 3 in the following four sub-categories:

  • Cryptographic module specification
  • Roles, services, and authentication
  • Physical security
  • Design assurance

The latest certification from NIST means that AWS KMS is now validated at Security Level 3 overall in each sub-category. As a result, AWS assumes more of the shared responsibility model, which will benefit customers for certain use cases. Security Level 3 certification can assist organizations seeking compliance with several industry and regulatory standards. Even though FIPS 140 validation is not expressly required in a number of regulatory regimes, maintaining stronger, easier-to-use encryption can be a powerful tool for complying with FedRAMP, U.S. Department of Defense (DOD) Approved Product List (APL), HIPAA, PCI, the European Union’s General Data Protection Regulation (GDPR), and the ISO 27001 standard for security management best practices and comprehensive security controls.

Customers who previously needed to meet compliance requirements for FIPS 140-2 Level 3 on AWS were required to use AWS CloudHSM, a single-tenant HSM solution that provides dedicated HSMs instead of managed service HSMs. Now, customers who were using CloudHSM to help meet their compliance obligations for Level 3 validation can use AWS KMS by itself for key generation and usage. Compared to CloudHSM, AWS KMS is typically lower cost and easier to set up and operate as a managed service, and using AWS KMS shifts the responsibility for creating and controlling encryption keys and operating HSMs from the customer to AWS. This allows you to focus resources on your core business instead of on undifferentiated HSM infrastructure management tasks.

AWS KMS uses FIPS 140-2 Level 3 validated HSMs to help protect your keys when you request the service to create keys on your behalf or when you import them. The HSMs in AWS KMS are designed so that no one, not even AWS employees, can retrieve your plaintext keys. Your plaintext keys are never written to disk and are only used in volatile memory of the HSMs while performing your requested cryptographic operation.

The FIPS 140-2 Level 3 certified HSMs in AWS KMS are deployed in all AWS Regions, including the AWS GovCloud (US) Regions. The China (Beijing) and China (Ningxia) Regions do not support the FIPS 140-2 Cryptographic Module Validation Program. AWS KMS uses Office of the State Commercial Cryptography Administration (OSCCA) certified HSMs to protect KMS keys in China Regions. The certificate for the AWS KMS FIPS 140-2 Security Level 3 validation is available on the NIST Cryptographic Module Validation Program website.

As with many industry and regulatory frameworks, FIPS 140 is evolving. NIST approved and published a new updated version of the 140 standard, FIPS 140-3, which supersedes FIPS 140-2. The U.S. government has begun transitioning to the FIPS 140-3 cryptography standard, with NIST announcing that they will retire all FIPS 140-2 certificates on September 22, 2026. NIST recently validated AWS-LC under FIPS 140-3 and is currently in the process of evaluating AWS KMS and certain instance types of AWS CloudHSM under the FIPS 140-3 standard. To check the status of these evaluations, see the NIST Modules In Process List.

For more information on FIPS 140-3, see FIPS Pub 140-3: Security Requirements for Cryptographic Modules.

Legal Disclaimer

This document is provided for the purposes of information only; it is not legal advice, and should not be relied on as legal advice. Customers are responsible for making their own independent assessment of the information in this document. This document: (a) is for informational purposes only, (b) represents current AWS product offerings and practices, which are subject to change without notice, and (c) does not create any commitments or assurances from AWS and its affiliates, suppliers or licensors. AWS products or services are provided “as is” without warranties, representations, or conditions of any kind, whether express or implied. The responsibilities and liabilities of AWS to its customers are controlled by AWS agreements, and this document is not part of, nor does it modify, any agreement between AWS and its customers.

AWS encourages its customers to obtain appropriate advice on their implementation of privacy and data protection environments, and more generally, applicable laws and other obligations relevant to their business.

AWS encourages its customers to obtain appropriate advice on their implementation of privacy and data protection environments, and more generally, applicable laws and other obligations relevant to their business.

If you have feedback about this post, submit comments in the Comments section below. If you have questions about this post, contact AWS Support.

Want more AWS Security news? Follow us on Twitter.

Rushir Patel

Rushir Patel

Rushir is a Senior Security Specialist at AWS, focused on data protection and cryptography services. His goal is to make complex topics simple for customers and help them adopt better security practices. Before joining AWS, he worked in security product management at IBM and Bank of America.

Rohit Panjala

Rohit Panjala

Rohit is a Worldwide Security GTM Specialist at AWS, focused on data protection and cryptography services. He is responsible for developing and implementing go-to-market (GTM) strategies and sales plays and driving customer and partner engagements for AWS data protection services on a global scale. Before joining AWS, Rohit worked in security product management and electrical engineering roles.

Anything as a Service: All the “as a Service” Acronyms You Didn’t Know You Needed

Post Syndicated from Molly Clancy original https://www.backblaze.com/blog/anything-as-a-service-all-the-as-a-service-acronyms-you-didnt-know-you-needed/

A decorative image showing acronyms for different "as as service" acronyms.

Have you ever felt like you need a dictionary just to understand what tech-savvy folks are talking about? Well, you’re in luck, because we’re about to decode some of the most common jargon of the digital age, one acronym at a time. Welcome to the world of “as a Service” acronyms, where we take the humble alphabet and turn it into a digital buffet. 

So, whether you’re SaaS-savvy or PaaS-puzzled, or just someone desperately searching for a little HaaS (Humor as a Service …yeah, we made that one up), you’ve come to the right place. Let’s take a big slurp from this alphabet soup of tech terms.

The One That Started It All: SaaS

SaaS stands for software as a service, and it’s the founding member of the “as a service” nomenclature. (Though, very confusingly, there’s also Sales as a Service—it’s just not shortened to SaaS. Usually.)

Imagine your software as a pizza delivery service. You don’t need to buy all the ingredients, knead the dough, and bake it yourself. Instead, you simply order a slice, and it magically appears on your table (a.k.a. screen). SaaS products are like that, but instead of pizza they serve up everything from messaging to video conferencing to email marketing to …well, really you name it. Which brings us to…

The Kind of Ironic One: XaaS

XaaS stands for, variously, “everything” or “anything” as a service. No one is really sure about the term’s provenance, but it’s a fair guess to say it came into existence when, well, everything started to become a service, probably sometime around the mid-2010s. The thinking is: if it exists in the digital realm, you can probably get it “as a service.” 

The Hardware Related Ones: HaaS, IaaS, and PaaS

HaaS (Hardware as a Service): Instead of purchasing hardware yourself, like computers, servers, networking equipment, and other physical infrastructure components, with HaaS, you can lease or rent the equipment for a given period. It would be like renting a pizza kitchen to make your specialty pies specifically for your sister’s wedding or your grandma’s birthday.

IaaS (Infrastructure as a Service): Infrastructure as a service is kind of like hardware as a service, but it comes with some additional goodies thrown in. Instead of renting just the kitchen, you rent the whole restaurant, chair, tables, and servers (no pun intended) included. IaaS delivers virtualized computing resources, like virtual machines, storage (that’s us!), and networking, over the internet.

PaaS (Platform as a Service): Think of PaaS as a step even further than IaaS—you’re not just renting a pizza restaurant, you’re renting a test kitchen where you can develop your award-winning pie. PaaS provides developers the ability to build, manage, and deploy applications with services like development frameworks, databases, and infrastructure management. It’s the ultimate DIY platform for tech enthusiasts.

The Bad One: RaaS

RaaS stands for Ransomware as a Service, and this is one “as a service” variant you don’t want to mess with. Basically, cybercriminals can purchase ransomware just as easily as you would purchase any app on the app store (it’s probably more complicated than that, but you get the general gist). This makes it easy for even the least savvy cybercriminal to get into the ransomware game. Not great. 

The Ones That Help With the Last One: BaaS and DRaaS

BaaS (Backup as a Service): Backup as a Service is a cloud-based data protection solution that allows individuals and organizations to back up their data to a remote cloud. (Hey! That’s us too!) Instead of managing on-premises backup infrastructure, users can securely store their data off-site, often on highly redundant and geographically distributed servers.

DRaaS (Disaster Recovery as a Service): DRaaS stands for disaster recovery as a service, and it’s the antidote to RaaS. Of course, you need good backups to begin with, but adding DRaaS allows businesses to ensure specific recovery time objectives (RTOs, FYI) so they can get back up and running in the event they’re attacked by ransomware or there’s a natural disaster at your primary storage location. DRaaS solutions used to be made almost exclusively with the large enterprise in mind, but today, it’s possible to architect a DRaaS solution for your business affordably and easily.

The Analytical One: DaaS

DaaS stands for data as a service, and it’s your data’s personal chauffeur. It fetches the information you need and serves it up on a silver platter. DaaS offers data on-demand, making structured data accessible to users over the internet. It simplifies data sharing and access, often in real-time, without the need for complex data management.

The Development-Focused Ones: CaaS, BaaS (again), and FaaS

CaaS (Containers as a Service): CaaS simplifies the deployment, scaling, and orchestration of containerized applications. It’s the tech version of a literal container ship. The individual containers “ship” individual pieces of software, and a CaaS tool helps carry all of those individual containers. Check out container management software Docker’s logo for a visualization:

It looks more like a whale carrying containers, which is far more adorable, in our opinion.

BaaS (Backend as a Service): It wouldn’t be the first time an acronym has two meanings. BaaS, in this context, provides a backend infrastructure for mobile and web app developers, offering services like databases, user authentication, and APIs. Imagine your own team of digital butlers tending to the back end of your apps. They handle all the behind-the-scenes stuff, so you can focus on making your app shine. 

FaaS (Function as a Service): FaaS is a serverless computing model where developers focus on writing and deploying individual functions or code snippets. These functions run in response to specific events, promoting scalability and efficiency in application development. It’s like having a team of tiny, code-savvy robots doing your bidding.

Go Forth and Abbreviate

Now that you’ve sampled all of the flavors the vast “as a service” world has to offer, we hope you’ve gained a clearer understanding of these sometimes confounding terms. So whether you’re a business professional navigating the cloud or just curious about the tech world, you can wield these acronyms with confidence. 

Did we miss any? I’m sure. Let us know in the comments.

The post Anything as a Service: All the “as a Service” Acronyms You Didn’t Know You Needed appeared first on Backblaze Blog | Cloud Storage & Cloud Backup.

Sponsorship for the Openwall lists

Post Syndicated from corbet original https://lwn.net/Articles/950538/

Alexander “Solar Designer” Peslyak, the longtime maintainer of the
oss-security and linux-distros mailing lists, has announced
that this work has gained a sponsor:

After 15+ years of being a 100% volunteer effort, Openwall’s
maintenance of oss-security and (linux-)distros is finally
sponsored by the OpenSSF, a project of the Linux Foundation. This
sponsorship does not provide the Linux Foundation with the ability
to set policies for community resources managed by Openwall. I am
grateful for the support, which will help ensure continued
operation of these resources on a new level while retaining
independence.

As part of this arrangement, Peslyak is now producing statistics on
vulnerability handling; the first
set for 2023
has been posted.

Fedora 39 released

Post Syndicated from corbet original https://lwn.net/Articles/950524/

Fedora
39
has been released, one day after the Fedora project’s 20th
anniversary. See the list of
approved changes
and this Fedora
Magazine article
for more information.

As always, we’ve updated many, many other packages as we work to
bring you the best of everything the free and open source software
world has to offer. Fedora Linux 39 includes gcc 13.2, binutils
2.40, glibc 2.38, gdb 13.2, and rpm 4.19. It also has updates to
popular programming language stacks, including Python 3.12 and Rust
1.73.

Security updates for Tuesday

Post Syndicated from corbet original https://lwn.net/Articles/950523/

Security updates have been issued by Debian (trapperkeeper-webserver-jetty9-clojure), Mageia (libsndfile, packages, thunderbird, and x11-server), Oracle (.NET 6.0), SUSE (kernel, kubevirt, virt-api-container, virt-controller-container, virt-handler-container, virt-launcher-container, virt-libguestfs-tools- container, virt-operator-container, redis, and squid), and Ubuntu (gsl).

New micro:bit coding projects for kids

Post Syndicated from Author original https://www.raspberrypi.org/blog/microbit-coding-projects/

Young people can now learn to code and create with our brand-new path of micro:bit coding projects. The ‘Intro to micro:bit’ path is free and kids can follow it to code projects that focus on wellbeing, including topics like mental health, relaxation, and exercise.

As you might know, a micro:bit (pronounced “microbit”) is a small, programmable device designed for education. You can program it using any computer. It’s easy to use and learn with, and suitable for beginners, especially young people in and out of school.

The theme of the new project path: Wellbeing

Our aim for this new micro:bit project path is to help young people explore how they can create their own tech tools that help them look after themselves and others. By designing the micro:bit coding projects around wellbeing, we want to not only help kids develop programming and digital literacy skills, but also promote open conversations about the important topic of mental health.

Kids coding a microbit project.
Credit: David Bird

The six micro:bit coding projects in our new path all cover different aspects of wellbeing in a fun, creative way:

  1. Good sleep patterns
  2. Relaxation
  3. Self-confidence
  4. Happiness
  5. Health 
  6. Entertainment

We hope that following the path and making projects helps encourage learners to ask questions, share their experiences, and feel like they can ask parents, teachers, or mentors for support, and help support their friends and peers.

What is in the ‘Intro to micro:bit’ project path?

The ‘Intro to micro:bit’ path is designed according to our Digital Making Framework. Its aim is to encourage young people to become independent coders and tech creators as they progress along the projects in a path by gently removing scaffolding.

  • Our project paths begin with three Explore projects, in which learners are guided through tasks that introduce them to new coding skills.
  • Next, learners complete two Design projects. Here, they are encouraged to practise their skills and bring in their own interests to personalise their coding creations.
  • Finally, learners complete one Invent project. This is where they put everything that they have learned together and create something unique that matters to them.

The structure of the path means that learners are led through the development process of a coding project and learn how to turn their ideas into reality. The path structure also supports them with fixing programming errors (debugging), showing them that errors are a normal part of computer programming and just temporary setbacks that they can overcome.

Credit: David Bird

Because community is important for learning, the path also offers young people the chance to share the projects they make with peers around the world.

What coding skills and knowledge will young people learn?

The Explore projects at the start of the path are where the initial learning takes place. Learners then develop their new skills and knowledge by putting them into practice in the Design and Invent projects, where they add in their own ideas and creativity.

The key programming concepts covered in this path are:

  • Variables
  • Using selection (if, else if, and else)
  • Using repetition (for loops)
  • Using randomisation
  • Using functions
Kids coding a microbit project.
Credit: David Bird

There are two versions of the micro:bit (V1 and V2) and learners can use either version to create the micro:bit coding projects in the path, using the micro:bit’s input and output features:

Input features:

  • Buttons
  • Accelerometer
  • Sound sensor/microphone (micro:bit V2 only)
  • Capacitive touch sensor
  • Light sensor

Output features:

  • LED display
  • Speaker
  • Headphones connected via GPIO (micro:bit V1 only)

Explore project 1: Music player

In this Explore project, kids create a music player on the micro:bit to explore how listening to music can improve their mood. While creating their music player, young people get to choose melodies that they enjoy or that make them feel more relaxed. They also add a range of functions such as pausing, skipping, and shuffling tracks.

Explore project 2: Sound level meter

Noise levels can affect people’s well-being, so in this project, kids create a program to use the micro:bit to display how noisy their environment is. They will also learn how to save the noise data the micro:bit measures so they can identify the noisiest times in their day.

Explore project 3: Sleep tracker

Sleep is an important factor that contributes towards well-being. With this third Explore project, kids create a program to track their sleep movements using the micro:bit. This teaches them about variables and about using the micro:bit’s accelerometer, and its LEDs to display data.

Design project 1: How’s your day?

The first Design project of the path gets young people to build a mood checker program using the question ‘How’s your day?’. Kids get creative design control over the mood checker’s outputs according to the user’s replies, including displaying an animation or positive messages, or playing music. Kids can also make use of sensors to measure the various factors in the environment that could be affecting the user’s mood.

In this project, young people apply all of the coding skills and knowledge covered in the Explore projects, including selection, repetition, variables, functions, and randomisation.

Design project 2: Active assistant

In the second Design project, young people create an assistant that helps them get active.The project provides examples, a structure, and brief summaries of what kids have learned to do on the path so far to inspire and motivate them. This mean young people can work independently to produce their own outcomes and the functionality of their assistant is up to each young tech creator.

Invent project: Party game

The final project, Party game, encourages learners to independently replicate their favourite party game for entertainment and relaxation. Learners will combine all of the knowledge and skills they’ve gained throughout the path to make something of their own around the theme of well-being. This is a chance for them to unleash their creativity and reflect on real-life games they enjoy. The outcome will be unique, and fun for them to share with their friends and family.

Key questions answered

Who is this path for?

We have written these micro:bit coding projects with young people around the age of 6 to 13 in mind. Building the projects on the path does not require any previous coding experience, although complete beginners may want to try our free ‘Intro to Scratch’ path first.

What software do learners need to code these projects?

A web browser on a computer. In every project, starter code is provided in the MakeCode online code editor. Learners can either download their project code to a physical micro:bit (recommended) or use the micro:bit simulator in MakeCode.

Kids coding a microbit project.
Credit: David Bird

Young people who live where there isn’t constant internet connectivity can also download the offline version of the MakeCode editor. There are also free micro:bit coding apps for smartphones and tablets.

How long will the path take to complete?

We’ve designed the ‘Intro to micro:bit’ path to be completed in six one-hour sessions, with one hour per project. However, the project instructions invite learners to take additional time to upgrade their projects if they wish.

What can learners do next?

Take part in Coolest Projects

At the end of the micro:bit path, learners are encouraged to register a project they’re making with their new coding skills for Coolest Projects, our annual online technology showcase for young people around the world.

Taking part is free, and beginners as well as more experienced young tech creators are invited. This is their opportunity to share their ingenuity in an online gallery for the world and the Coolest Projects community to celebrate.

The post New micro:bit coding projects for kids appeared first on Raspberry Pi Foundation.

Introducing Amazon MWAA support for Apache Airflow version 2.7.2 and deferrable operators

Post Syndicated from Manasi Bhutada original https://aws.amazon.com/blogs/big-data/introducing-amazon-mwaa-support-for-apache-airflow-version-2-7-2-and-deferrable-operators/

Amazon Managed Workflow for Apache Airflow (Amazon MWAA) is a managed service that allows you to use a familiar Apache Airflow environment with improved scalability, availability, and security to enhance and scale your business workflows without the operational burden of managing the underlying infrastructure.

Today, we are announcing the availability of Apache Airflow version 2.7.2 environments and support for deferrable operators on Amazon MWAA. In this post, we provide an overview of deferrable operators and triggers, including a walkthrough of an example showcasing how to use them. We also delve into some of the new features and capabilities of Apache Airflow, and how you can set up or upgrade your Amazon MWAA environment to version 2.7.2.

Deferrable operators and triggers

Standard operators and sensors continuously occupy an Airflow worker slot, regardless of whether they are active or idle. For example, even while waiting for an external system to complete a job, a worker slot is consumed. The Gantt chart below, representing a Directed Acyclic Graph (DAG), showcases this scenario through multiple Amazon Redshift operations.

Gantt chart representing DAG idle time

You can see the time each task spends idling while waiting for the Redshift cluster to be created, snapshotted, and paused. With the introduction of deferrable operators in Apache Airflow 2.2, the polling process can be offloaded to ensure efficient utilization of the worker slot. A deferrable operator can suspend itself and resume once the external job is complete, instead of continuously occupying a worker slot. This minimizes queued tasks and leads to a more efficient utilization of resources within your Amazon MWAA environment. The following figure shows a simplified diagram describing the process flow.

After a task has deferred its run, it frees up the worker slot and assigns the check of completion to a small piece of asynchronous code called a trigger. The trigger runs in a parent process called a triggerer, a service that runs an asyncio event loop. The triggerer has the capability to run triggers in parallel at scale, and to signal tasks to resume when a condition is met.

The Amazon provider package for Apache Airflow has added triggers for popular AWS services like AWS Glue and Amazon EMR. In Amazon MWAA environments running Apache Airflow v2.7.2, the management and operation of the triggerer service is taken care of for you. If you prefer not to use the triggerer service, you can change the configuration mwaa.triggerer_enabled. Additionally, you can define how many triggers each triggerer can run in parallel using the configuration parameter triggerer.default_capacity. This parameter defaults to values based on your Amazon MWAA environment class. Refer to the Configuration reference in the User Guide for detailed configuration values.

When to use deferrable operators

Deferrable operators are particularly useful for tasks that submit jobs to systems external to an Amazon MWAA environment, such as Amazon EMR, AWS Glue, and Amazon SageMaker, or other sensors waiting for a specific event to occur. These tasks can take minutes to hours to complete and are primarily idle operators, making them good candidates to be replaced by their deferrable versions. Some additional use cases include:

  • File system-based operations.
  • Database operations with long running queries.

Using deferrable operators in Amazon MWAA

To use deferrable operators in Amazon MWAA, ensure you’re running Apache Airflow version 2.7 or greater in your Amazon MWAA environment, and the operators or sensors in your DAGs support deferring. Operators in the Amazon provider package expose a deferrable parameter which you can set to True to run the operator in asynchronous mode. For example, you can use S3KeySensor in asynchronous mode as follows:

wait_for_source_data = S3KeySensor (
task_id="WaitForSourceData",
bucket_name="source_bucket_name",
bucket_key = "object_key",
aws_conn_id="aws_default",
deferrable=True
)

You can also utilize various pre-built deferrable operators available in other provider packages, such as Snowflake and Databricks.

Follow the complete sample code in the GitHub repository to understand how deferrable operators work together. You will be building and orchestrating the data pipeline illustrated in the following figure.

The pipeline consists of three stages:

  • A S3KeySensor that waits for a dataset to be uploaded in Amazon Simple Storage Service (Amazon S3)
  • An AWS Glue crawler to classify objects in the dataset and save schemas into the AWS Glue Data Catalog
  • An AWS Glue job that uses the metadata in the Data Catalog to denormalize the source dataset, create Data Catalog tables based on filtered data, and write the resulting data back to Amazon S3 in separate Apache Parquet files.

Setup and Teardown tasks

It’s common to build workflows that require ephemeral resources, for example an S3 bucket to temporarily store data, databases and corresponding datasets to run quality checks, or a compute cluster to train a model in a machine learning (ML) orchestration pipeline. You need to have these resources properly configured before running work tasks, and after their run, ensure they are torn down. Doing this manually is complex. It may lead to poor readability and maintainability of your DAGs, and leave resources running constantly, thereby increasing costs. With Amazon MWAA support for Apache Airflow version 2.7.2, you can use two new types of tasks to support this scenario: setup and teardown tasks.

Setup and teardown tasks ensure that the resources needed for a work task are set up before the task starts its run and then are taken down after it has finished, even if the work task fails. Any task can be configured as a setup or teardown task. Once configured, they have special visibility in the Airflow UI and also special behavior. The following graph describes a simple data quality check pipeline using setup and teardown tasks.

One option to mark setup_db_instance and teardown_db_instance as setup and teardown tasks is to use the as_teardown() method in the teardown task in the dependencies chain declaration. Note that the method receives the setup task as a parameter:

setup_db_instance >> column_quality_check >> row_count_quality_check >> teardown_db_instance.as_teardown(setups=setup_db_instance)

Another option is to use @setup and @teardown decorators:

from airflow.decorators import setup

@setup
def setup_db_instance():
...
return "Resources fully setup"

setup_db_instance()

After you configure the tasks, the graph view shows your setup tasks with an upward arrow and your teardown tasks with a downward arrow. They’re connected by a dotted line depicting the setup/teardown workflow. Any task between the setup and teardown tasks (such as column_quality_check and row_count_quality_check) are in the scope of the workflow. This arrangement involves the following behavior:

  • If you clear column_quality_check or row_count_quality_check, both setup_db_instance and teardown_db_instance will be cleared
  • If setup_db_instance runs successfully, and column_quality_check and row_count_quality_check have completed, regardless of whether they were successful or not, teardown_db_instance will run
  • If setup_db_instance fails or is skipped, then teardown_db_instance will fail or skip
  • If teardown_db_instance fails, by default Airflow ignores its status to evaluate whether the pipeline run was successful

Note that when creating setup and teardown workflows, there can be more than one set of setup and teardown tasks, and they can be parallel and nested. Neither setup nor teardown tasks are limited in number, nor are the worker tasks you can include in the scope of the workflow.

Follow the complete sample code in the GitHub repository to understand how setup and teardown tasks work.

When to use setup and teardown tasks

Setup and teardown tasks are useful to improve the reliability and cost-effectiveness of DAGs, ensuring that required resources are created and deleted in the right time. They can also help simplify complex DAGs by breaking them down into smaller, more manageable tasks, improving maintainability. Some use cases include:

  • Data processing based on ephemeral compute, like Amazon Elastic Compute Cloud (Amazon EC2) instances fleets or EMR clusters
  • ML model training or tuning pipelines
  • Extract, transform, and load (ETL) jobs using external ephemeral data stores to share data among Airflow tasks

With Amazon MWAA support for Apache Airflow version 2.7.2, you can start using setup and teardown tasks to improve your pipelines as of today. To learn more about Setup and Teardown tasks, refer to the Apache Airflow documentation.

Secrets cache

To reflect changes to your DAGs and tasks, the Apache Airflow scheduler parses your DAG files continuously, every 30 seconds by default. If you have variables or connections as top-level code (code outside the operator’s execute methods), a request is generated every time the DAG file is parsed, impacting parsing speed and leading to sub-optimal performance in the DAG file processing. If you are running at scale, it has the potential to affect Airflow performance and scalability as the amount of network communication and load on the metastore database increase. If you’re using an alternative secrets backend, such as AWS Secrets Manager, every DAG parse is a new request to that service, increasing costs.

With Amazon MWAA support for Apache Airflow version 2.7.2, you can use secrets cache for variables and connections. Airflow will cache variables and connections locally so that they can be accessed faster during DAG parsing, without having to fetch them from the secrets backend, environments variables, or metadata database. The following diagram describes the process.

Enabling caching will help lower the DAG parsing time, especially if variables and connections are used in top-level code (which is not a best practice). With the introduction of a secrets cache, the frequency of API calls to the backend is reduced, which in turn lowers the overall cost associated with backend access. However, similar to other caching implementations, a secrets cache may serve outdated values until the time to live (TTL) expires.

When to use the secrets cache feature

You should consider using the secrets cache feature to improve performance and reliability, and to reduce the operating costs of your Airflow tasks. This is particularly useful if your DAG frequently retrieves variables or connections in the top-level Python code.

How to use the secrets cache feature on Amazon MWAA

To enable the secrets cache, you can set the secrets.use_cache environment configuration parameter to True. Once enabled, Airflow will automatically cache secrets when they are accessed. The cache will only be used during DAG files parsing, and not during DAG runtime.

You can also control the TTL of stored values for which the cache is considered valid using the environment configuration parameter secrets.cache_ttl_seconds, which is defaulted to 15 minutes.

Running or failed filters and Cluster Activity page

Identifying DAGs in failed state can be challenging for large Airflow instances. You typically find yourself scrolling through pages searching for failures to address. With Apache Airflow version 2.7.2 environments in Amazon MWAA, you can now filter DAGs currently running and DAGs with failed DAG runs. As you can see in the following screenshot, two status tabs, Running and Failed, were added to the UI.

Another advantage of Amazon MWAA environments using Apache Airflow version 2.7.2 is the new Cluster Activity page for environment-level monitoring.

The Cluster Activity page gathers useful data to monitor your cluster’s live and historical metrics. In the top section of the page, you get live metrics on the number of DAGs ready to be scheduled, the top 5 longest running DAGs, slots used in different pools, and components health (meta database, scheduler, and triggerer). The following screenshot shows an example of this page.

The bottom section of the Cluster Activity page includes historical metrics of DAG runs and task instances states.

Set up a new Apache Airflow v2.7.2 environment in Amazon MWAA

Setting up a new Apache Airflow version 2.7.2 environment in Amazon MWAA not only provides new features, but also leverages Python 3.11 and the Amazon Linux 2023 (AL2023) base image, offering enhanced security, modern tooling, and support for the latest Python libraries and features. You can initiate the set up in your account and preferred Region using the AWS Management Console, API, or AWS Command Line Interface (AWS CLI). If you’re adopting infrastructure as code (IaC), you can automate the setup using AWS CloudFormation, the AWS Cloud Development Kit (AWS CDK), or Terraform scripts.

Upon successful creation of an Apache Airflow version 2.7.2 environment in Amazon MWAA, certain packages are automatically installed on the scheduler and worker nodes. For a complete list of installed packages and their versions, refer to this MWAA documentation. You can install additional packages using a requirements file. Beginning with Apache Airflow version 2.7.2, your requirements file must include a --constraints statement. If you do not provide a constraint, Amazon MWAA will specify one for you to ensure the packages listed in your requirements are compatible with the version of Apache Airflow you are using.

Upgrade from older versions of Apache Airflow to Apache Airflow v2.7.2

Take advantage of these latest capabilities by upgrading your older Apache Airflow v2.x-based environments to version 2.7.2 using in-place version upgrades. To learn more about in-place version upgrades, refer to Upgrading the Apache Airflow version or Introducing in-place version upgrades with Amazon MWAA.

Conclusion

In this post, we discussed deferrable operators along with some significant changes introduced in Apache Airflow version 2.7.2, such as the Cluster Activity page in the UI, the cache for variables and connections, and how you can get started using them in Amazon MWAA.

For additional details and code examples on Amazon MWAA, visit the Amazon MWAA User Guide and the Amazon MWAA examples GitHub repo.

Apache, Apache Airflow, and Airflow are either registered trademarks or trademarks of the Apache Software Foundation in the United States and/or other countries.


About the Authors

Manasi Bhutada is an ISV Solutions Architect based in the Netherlands. She helps customers design and implement well architected solutions in AWS that address their business problems. She is passionate about data analytics and networking. Beyond work she enjoys experimenting with food, playing pickleball, and diving into fun board games.

Hernan Garcia is a Senior Solutions Architect at AWS based in the Netherlands. He works in the Financial Services Industry supporting enterprises in their cloud adoption. He is passionate about serverless technologies, security, and compliance. He enjoys spending time with family and friends, and trying out new dishes from different cuisines.

The collective thoughts of the interwebz