Tag Archives: Sustainability

Optimize your modern data architecture for sustainability: Part 2 – unified data governance, data movement, and purpose-built analytics

Post Syndicated from Sam Mokhtari original https://aws.amazon.com/blogs/architecture/optimize-your-modern-data-architecture-for-sustainability-part-2-unified-data-governance-data-movement-and-purpose-built-analytics/

In the first part of this blog series, Optimize your modern data architecture for sustainability: Part 1 – data ingestion and data lake, we focused on the 1) data ingestion, and 2) data lake pillars of the modern data architecture. In this blog post, we will provide guidance and best practices to optimize the components within the 3) unified data governance, 4) data movement, and 5) purpose-built analytics pillars.
Figure 1 shows the different pillars of the modern data architecture. It includes data ingestion, data lake, unified data governance, data movement, and purpose-built analytics pillars.

Modern Data Analytics Reference Architecture on AWS

Figure 1. Modern Data Analytics Reference Architecture on AWS

3. Unified data governance

A centralized Data Catalog is responsible for storing business and technical metadata about datasets in the storage layer. Administrators apply permissions in this layer and track events for security audits.

Data discovery

To increase data sharing and reduce data movement and duplication, enable data discovery and well-defined access controls for different user personas. This reduces redundant data processing activities. Separate teams within an organization can rely on this central catalog. It provides first-party data (such as sales data) or third-party data (such as stock prices, climate change datasets). You’ll only need access data once, rather than having to pull from source repeatedly.

AWS Glue Data Catalog can simplify the process for adding and searching metadata. Use AWS Glue crawlers to update the existing schemas and discover new datasets. Carefully plan schedules to reduce unnecessary crawling.

Data sharing

Establish well-defined access control mechanisms for different data consumers using services such as AWS Lake Formation. This will enable datasets to be shared between organizational units with fine-grained access control, which reduces redundant copying and movement. Use Amazon Redshift data sharing to avoid copying the data across data warehouses.

Well-defined datasets

Create well-defined datasets and associated metadata to avoid unnecessary data wrangling and manipulation. This will reduce resource usage that might result from additional data manipulation.

4. Data movement

AWS Glue provides serverless, pay-per-use data movement capability, without having to stand up and manage servers or clusters. Set up ETL pipelines that can process tens of terabytes of data.

To minimize idle resources without sacrificing performance, use auto scaling for AWS Glue.

You can create and share AWS Glue workflows for similar use cases by using AWS Glue blueprints, rather than creating an AWS Glue workflow for each use case. AWS Glue job bookmark can track previously processed data.

Consider using Glue Flex Jobs for non-urgent or non-time sensitive data integration workloads such as pre-production jobs, testing, and one-time data loads. With Flex, AWS Glue jobs run on spare compute capacity instead of dedicated hardware.

Joins between several dataframes is a common operation in Spark jobs. To reduce shuffling of data between nodes, use broadcast joins when one of the merged dataframes is small enough to be duplicated on all the executing nodes.

The latest AWS Glue version provides more new and efficient features for your workload.

5. Purpose-built analytics

Data Processing modes

Real-time data processing options need continuous computing resources and require more energy consumption. For the most favorable sustainability impact, evaluate trade-offs and choose the optimal batch data processing option.

Identify the batch and interactive workload requirements and design transient clusters in Amazon EMR. Using Spot Instances and configuring instance fleets can maximize utilization.

To improve energy efficiency, Amazon EMR Serverless can help you avoid over- or under-provisioning resources for your data processing jobs. Amazon EMR Serverless automatically determines the resources that the application needs, gathers these resources to process your jobs, and releases the resources when the jobs finish.

Amazon Redshift RA3 nodes can improve compute efficiency. With RA3 nodes, you can scale compute up and down without having to scale storage. You can choose Amazon Redshift Serverless to intelligently scale data warehouse capacity. This will deliver faster performance for the most demanding and unpredictable workloads.

Energy efficient transformation and data model design

Data processing and data modeling best practices can reduce your organization’s environmental impact.

To avoid unnecessary data movement between nodes in an Amazon Redshift cluster, follow best practices for table design.

You can also use automatic table optimization (ATO) for Amazon Redshift to self-tune tables based on usage patterns.

Use the EXPLAIN feature in Amazon Athena or Amazon Redshift to tune and optimize the queries.

The Amazon Redshift Advisor provides specific, tailored recommendations to optimize the data warehouse based on performance statistics and operations data.

Consider migrating Amazon EMR or Amazon OpenSearch Service to a more power-efficient processor such as AWS Graviton. AWS Graviton 3 delivers 2.5–3 times better performance over other CPUs. Graviton 3-based instances use up to 60% less energy for the same performance than comparable EC2 instances.

Minimize idle resources

Use auto scaling features in EMR Clusters or employ Amazon Kinesis Data Streams On-Demand to minimize idle resources without sacrificing performance.

AWS Trusted Advisor can help you identify underutilized Amazon Redshift Clusters. Pause Amazon Redshift clusters when not in use and resume when needed.

Energy efficient consumption patterns

Consider querying the data in place with Amazon Athena or Amazon Redshift Spectrum for one-off analysis, rather than copying the data to Amazon Redshift.

Enable a caching layer for frequent queries as needed. This is in addition to the result caching that comes built-in with services such as Amazon Redshift. Also, use Amazon Athena Query Result Reuse for every query where the source data doesn’t change frequently.

Use materialized views capabilities available in Amazon Redshift or Amazon Aurora Postgres to avoid unnecessary computation.

Use federated queries across data stores powered by Amazon Athena federated query or Amazon Redshift federated query to reduce data movement. For querying across separate Amazon Redshift clusters, consider using Amazon Redshift data sharing feature that decreases data movement between these clusters.

Track and assess improvement for environmental sustainability

The optimal way to evaluate success in optimizing your workloads for sustainability is to use proxy measures and unit of work KPI. This can be GB per transaction for storage, or vCPU minutes per transaction for compute.

In Table 1, we list certain metrics you could collect on analytics services as proxies to measure improvement. These fall under each pillar of the modern data architecture covered in this post.

Pillar Metrics
Unified data governance
Data movement
Purpose-built Analytics

Table 1. Metrics for the Modern data architecture pillars

Conclusion

In this blog post, we provided best practices to optimize processes under the unified data governance, data movement, and purpose-built analytics pillars of modern architecture.

If you want to learn more, check out the Sustainability Pillar of the AWS Well-Architected Framework and other blog posts on architecting for sustainability.

If you are looking for more architecture content, refer to the AWS Architecture Center for reference architecture diagrams, vetted architecture solutions, Well-Architected best practices, patterns, icons, and more.

How to select a Region for your workload based on sustainability goals

Post Syndicated from Sam Mokhtari original https://aws.amazon.com/blogs/architecture/how-to-select-a-region-for-your-workload-based-on-sustainability-goals/

The Amazon Web Services (AWS) Cloud is a constantly expanding network of Regions and points of presence (PoP), with a global network infrastructure linking them together. The choice of Regions for your workload significantly affects your workload KPIs, including performance, cost, and carbon footprint.

The Well-Architected Framework’s sustainability pillar offers design principles and best practices that you can use to meet sustainability goals for your AWS workloads. It recommends choosing Regions for your workload based on both your business requirements and sustainability goals. In this blog, we explain how to select an appropriate AWS Region for your workload. This process includes two key steps:

  • Assess and shortlist potential Regions for your workload based on your business requirements.
  • Choose Regions near Amazon renewable energy projects and Region(s) where the grid has a lower published carbon intensity.

To demonstrate this two-step process, let’s assume we have a web application that must be deployed in the AWS Cloud to support end users in the UK and Sweden. Also, let’s assume there is no local regulation that binds the data residency to a specific location. Let’s select a Region for this workload based on guidance in the sustainability pillar of AWS Well-Architected Framework.

Shortlist potential Regions for your workload

Let’s follow the best practice on Region selection in the sustainability pillar of AWS Well-Architected Framework. The first step is to assess and shortlist potential Regions for your workload based on your business requirements.

In What to Consider when Selecting a Region for your Workloads, there are four key business factors to consider when evaluating and shortlisting each AWS Region for a workload:

  • Latency
  • Cost
  • Services and features
  • Compliance

To shortlist your potential Regions:

  • Confirm that these Regions are compliant, based on your local regulations.
  • Use the AWS Regional Services Lists to check if the Regions have the services and features you need to run your workload.
  • Calculate the cost of the workload on each Region using the AWS Pricing Calculator.
  • Test the network latency between your end user locations and each AWS Region.

At this point, you should have a list of AWS Regions. For this sample workload, let’s assume only Europe (London) and Europe (Stockholm) Regions are shortlisted. They can address the requirements for latency, cost, and features for our use case.

Choose Regions for your workload

After shortlisting the potential Regions, the next step is to choose Regions for your workload. Choose Regions near Amazon renewable energy projects or Regions where the grid has a lower published carbon intensity. To understand this step, you need to first understand the Greenhouse Gas (GHG) Protocol to track emissions.

Based on the GHG Protocol, there are two methods to track emissions from electricity production: market-based and location-based. Companies may choose one of these methods based on their relevant sustainability guidelines to track and compare their year-to-year emissions. Amazon uses the market-based model to report our emissions.

AWS Region(s) selection based on market-based method

With the market-based method, emissions are calculated based on the electricity that businesses have chosen to purchase. For example, the business could decide to contract and purchase electricity produced by renewable energy sources like solar and wind.

Amazon’s goal is to power our operations with 100% renewable energy by 2025 – five years ahead of our original 2030 target. We contract for renewable power from utility-scale wind and solar projects that add clean energy to the grid. These new renewable projects support hundreds of jobs and hundreds of millions of dollars investment in local communities. Find more details about our work around the globe. We support these grids through the purchase of environmental attributes, like Renewable Energy Certificates (RECs) and Guarantees of Origin (GoO), in line with our renewable energy methodology. As a result, we have a number of Regions listed that are powered by more than 95% renewable energy on the Amazon sustainability website.

Choose one of these Regions to help you power your workload with more renewable energy and reduce your carbon footprint. For the sample workload we’re using as our example, both the Europe (London) and Europe (Stockholm) Regions are in this list. They are powered by over 95% renewable energy based on the market-based emission method.

AWS Regions selection based on location-based carbon method 

The location-based method considers the average emissions intensity of the energy grids where consumption takes place. As a result, wherever the organization conducts business, it assesses emissions from the local electricity system. You can use the emissions intensity of the energy grids through a trusted data source to assess Regions for your workload.

Let’s look how we can use Electricity Maps data to select a Region for our sample workload:

1. Go to Electricity Maps (see Figure 1)

2. Search for South Central Sweden zone to get carbon intensity of electricity consumed for Europe (Stockholm) Region (display aggregated data on yearly basis)

Carbon intensity of electricity for South Central Sweden

Figure 1. Carbon intensity of electricity for South Central Sweden

3. Search for Great Britain to get carbon intensity of electricity consumed for Europe (London) Region (display aggregated data on yearly basis)

Carbon intensity of electricity for Great Britain

Figure 2. Carbon intensity of electricity for Great Britain

As you can determine from Figure 2, the Europe (Stockholm) Region has a lower carbon intensity of electricity consumed compared to the Europe (London) Region.

For our sample workload, we have selected the Europe (Stockholm) Region due to latency, cost, features, and compliance. It also provides 95% renewable energy using the market-based method, and low grid carbon intensity with the location-based method.

Conclusion

In this blog, we explained the process for selecting an appropriate AWS Region for your workload based on both business requirements and sustainability goals.

Further reading:

Reducing Your Organization’s Carbon Footprint with Amazon CodeGuru Profiler

Post Syndicated from Isha Dua original https://aws.amazon.com/blogs/devops/reducing-your-organizations-carbon-footprint-with-codeguru-profiler/

It is crucial to examine every functional area when firms reorient their operations toward sustainable practices. Making informed decisions is necessary to reduce the environmental effect of an IT stack when creating, deploying, and maintaining it. To build a sustainable business for our customers and for the world we all share, we have deployed data centers that provide the efficient, resilient service our customers expect while minimizing our environmental footprint—and theirs. While we work to improve the energy efficiency of our datacenters, we also work to help our customers improve their operations on the AWS cloud. This two-pronged approach is based on the concept of the shared responsibility between AWS and AWS’ customers. As shown in the diagram below, AWS focuses on optimizing the sustainability of the cloud, while customers are responsible for sustainability in the cloud, meaning that AWS customers must optimize the workloads they have on the AWS cloud.

Figure 1. Shared responsibility model for sustainability

Figure 1. Shared responsibility model for sustainability

Just by migrating to the cloud, AWS customers become significantly more sustainable in their technology operations. On average, AWS customers use 77% fewer servers, 84% less power, and a 28% cleaner power mix, ultimately reducing their carbon emissions by 88% compared to when they ran workloads in their own data centers. These improvements are attributable to the technological advancements and economies of scale that AWS datacenters bring. However, there are still significant opportunities for AWS customers to make their cloud operations more sustainable. To uncover this, we must first understand how emissions are categorized.

The Greenhouse Gas Protocol organizes carbon emissions into the following scopes, along with relevant emission examples within each scope for a cloud provider such as AWS:

  • Scope 1: All direct emissions from the activities of an organization or under its control. For example, fuel combustion by data center backup generators.
  • Scope 2: Indirect emissions from electricity purchased and used to power data centers and other facilities. For example, emissions from commercial power generation.
  • Scope 3: All other indirect emissions from activities of an organization from sources it doesn’t control. AWS examples include emissions related to data center construction, and the manufacture and transportation of IT hardware deployed in data centers.

From an AWS customer perspective, emissions from customer workloads running on AWS are accounted for as indirect emissions, and part of the customer’s Scope 3 emissions. Each workload deployed generates a fraction of the total AWS emissions from each of the previous scopes. The actual amount varies per workload and depends on several factors including the AWS services used, the energy consumed by those services, the carbon intensity of the electric grids serving the AWS data centers where they run, and the AWS procurement of renewable energy.

At a high level, AWS customers approach optimization initiatives at three levels:

  • Application (Architecture and Design): Using efficient software designs and architectures to minimize the average resources required per unit of work.
  • Resource (Provisioning and Utilization): Monitoring workload activity and modifying the capacity of individual resources to prevent idling due to over-provisioning or under-utilization.
  • Code (Code Optimization): Using code profilers and other tools to identify the areas of code that use up the most time or resources as targets for optimization.

In this blogpost, we will concentrate on code-level sustainability improvements and how they can be realized using Amazon CodeGuru Profiler.

How CodeGuru Profiler improves code sustainability

Amazon CodeGuru Profiler collects runtime performance data from your live applications and provides recommendations that can help you fine-tune your application performance. Using machine learning algorithms, CodeGuru Profiler can help you find your most CPU-intensive lines of code, which contribute the most to your scope 3 emissions. CodeGuru Profiler then suggests ways to improve the code to make it less CPU demanding. CodeGuru Profiler provides different visualizations of profiling data to help you identify what code is running on the CPU, see how much time is consumed, and suggest ways to reduce CPU utilization. Optimizing your code with CodeGuru profiler leads to the following:

  • Improvements in application performance
  • Reduction in cloud cost, and
  • Reduction in the carbon emissions attributable to your cloud workload.

When your code performs the same task with less CPU, your applications run faster, customer experience improves, and your cost reduces alongside your cloud emission. CodeGuru Profiler generates the recommendations that help you make your code faster by using an agent that continuously samples stack traces from your application. The stack traces indicate how much time the CPU spends on each function or method in your code—information that is then transformed into CPU and latency data that is used to detect anomalies. When anomalies are detected, CodeGuru Profiler generates recommendations that clearly outline you should do to remediate the situation. Although CodeGuru Profiler has several visualizations that help you visualize your code, in many cases, customers can implement these recommendations without reviewing the visualizations. Let’s demonstrate this with a simple example.

Demonstration: Using CodeGuru Profiler to optimize a Lambda function

In this demonstration, the inefficiencies in a AWS Lambda function will be identified by CodeGuru Profiler.

Building our Lambda Function (10mins)

To keep this demonstration quick and simple, let’s create a simple lambda function that display’s ‘Hello World’. Before writing the code for this function, let’s review two important concepts. First, when writing Python code that runs on AWS and calls AWS services, two critical steps are required:

The Python code lines (that will be part of our function) that execute these steps listed above are shown below:

import boto3 #this will import AWS SDK library for Python
VariableName = boto3.client('dynamodb’) #this will create the AWS SDK service client

Secondly, functionally, AWS Lambda functions comprise of two sections:

  • Initialization code
  • Handler code

The first time a function is invoked (i.e., a cold start), Lambda downloads the function code, creates the required runtime environment, runs the initialization code, and then runs the handler code. During subsequent invocations (warm starts), to keep execution time low, Lambda bypasses the initialization code and goes straight to the handler code. AWS Lambda is designed such that the SDK service client created during initialization persists into the handler code execution. For this reason, AWS SDK service clients should be created in the initialization code. If the code lines for creating the AWS SDK service client are placed in the handler code, the AWS SDK service client will be recreated every time the Lambda function is invoked, needlessly increasing the duration of the Lambda function during cold and warm starts. This inadvertently increases CPU demand (and cost), which in turn increases the carbon emissions attributable to the customer’s code. Below, you can see the green and brown versions of the same Lambda function.

Now that we understand the importance of structuring our Lambda function code for efficient execution, let’s create a Lambda function that recreates the SDK service client. We will then watch CodeGuru Profiler flag this issue and generate a recommendation.

  1. Open AWS Lambda from the AWS Console and click on Create function.
  2. Select Author from scratch, name the function ‘demo-function’, select Python 3.9 under runtime, select x86_64 under Architecture.
  3. Expand Permissions, then choose whether to create a new execution role or use an existing one.
  4. Expand Advanced settings, and then select Function URL.
  5. For Auth type, choose AWS_IAM or NONE.
  6. Select Configure cross-origin resource sharing (CORS). By selecting this option during function creation, your function URL allows requests from all origins by default. You can edit the CORS settings for your function URL after creating the function.
  7. Choose Create function.
  8. In the code editor tab of the code source window, copy and paste the code below:
#invocation code
import json
import boto3

#handler code
def lambda_handler(event, context):
  client = boto3.client('dynamodb') #create AWS SDK Service client’
  #simple codeblock for demonstration purposes  
  output = ‘Hello World’
  print(output)
  #handler function return

  return output

Ensure that the handler code is properly indented.

  1. Save the code, Deploy, and then Test.
  2. For the first execution of this Lambda function, a test event configuration dialog will appear. On the Configure test event dialog window, leave the selection as the default (Create new event), enter ‘demo-event’ as the Event name, and leave the hello-world template as the Event template.
  3. When you run the code by clicking on Test, the console should return ‘Hello World’.
  4. To simulate actual traffic, let’s run a curl script that will invoke the Lambda function every 0.2 seconds. On a bash terminal, run the following command:
while true; do curl {Lambda Function URL]; sleep 0.06; done

If you do not have git bash installed, you can use AWS Cloud 9 which supports curl commands.

Enabling CodeGuru Profiler for our Lambda function

We will now set up CodeGuru Profiler to monitor our Lambda function. For Lambda functions running on Java 8 (Amazon Corretto), Java 11, and Python 3.8 or 3.9 runtimes, CodeGuru Profiler can be enabled through a single click in the configuration tab in the AWS Lambda console.  Other runtimes can be enabled following a series of steps that can be found in the CodeGuru Profiler documentation for Java and the Python.

Our demo code is written in Python 3.9, so we will enable Profiler from the configuration tab in the AWS Lambda console.

  1. On the AWS Lambda console, select the demo-function that we created.
  2. Navigate to Configuration > Monitoring and operations tools, and click Edit on the right side of the page.

  1.  Scroll down to Amazon CodeGuru Profiler and click the button next to Code profiling to turn it on. After enabling Code profiling, click Save.

Note: CodeGuru Profiler requires 5 minutes of Lambda runtime data to generate results. After your Lambda function provides this runtime data, which may need multiple runs if your lambda has a short runtime, it will display within the Profiling group page in the CodeGuru Profiler console. The profiling group will be given a default name (i.e., aws-lambda-<lambda-function-name>), and it will take approximately 15 minutes after CodeGuru Profiler receives the runtime data for this profiling group to appear. Be patient. Although our function duration is ~33ms, our curl script invokes the application once every 0.06 seconds. This should give profiler sufficient information to profile our function in a couple of hours. After 5 minutes, our profiling group should appear in the list of active profiling groups as shown below.

Depending on how frequently your Lambda function is invoked, it can take up to 15 minutes to aggregate profiles, after which you can see your first visualization in the CodeGuru Profiler console. The granularity of the first visualization depends on how active your function was during those first 5 minutes of profiling—an application that is idle most of the time doesn’t have many data points to plot in the default visualization. However, you can remedy this by looking at a wider time period of profiled data, for example, a day or even up to a week, if your application has very low CPU utilization. For our demo function, a recommendation should appear after about an hour. By this time, the profiling groups list should show that our profiling group now has one recommendation.

Profiler has now flagged the repeated creation of the SDK service client with every invocation.

From the information provided, we can see that our CPU is spending 5x more computing time than expected on the recreation of the SDK service client. The estimated cost impact of this inefficiency is also provided. In production environments, the cost impact of seemingly minor inefficiencies can scale very quickly to several kilograms of CO2 and hundreds of dollars as invocation frequency, and the number of Lambda functions increase.

CodeGuru Profiler integrates with Amazon DevOps Guru, a fully managed service that makes it easy for developers and operators to improve the performance and availability of their applications. Amazon DevOps Guru analyzes operational data and application metrics to identify behaviors that deviate from normal operating patterns. Once these operational anomalies are detected, DevOps Guru presents intelligent recommendations that address current and predicted future operational issues. By integrating with CodeGuru Profiler, customers can now view operational anomalies and code optimization recommendations on the DevOps Guru console. The integration, which is enabled by default, is only applicable to Lambda resources that are supported by CodeGuru Profiler and monitored by both DevOps Guru and CodeGuru.

We can now stop the curl loop (Control+C) so that the Lambda function stops running. Next, we delete the profiling group that was created when we enabled profiling in Lambda, and then delete the Lambda function or repurpose as needed.

Conclusion

Cloud sustainability is a shared responsibility between AWS and our customers. While we work to make our datacenter more sustainable, customers also have to work to make their code, resources, and applications more sustainable, and CodeGuru Profiler can help you improve code sustainability, as demonstrated above. To start Profiling your code today, visit the CodeGuru Profiler documentation page. To start monitoring your applications, head over to the Amazon DevOps Guru documentation page.

About the authors:

Isha Dua

Isha Dua is a Senior Solutions Architect based in San Francisco Bay Area. She helps AWS Enterprise customers grow by understanding their goals and challenges, and guiding them on how they can architect their applications in a cloud native manner while making sure they are resilient and scalable. She’s passionate about machine learning technologies and Environmental Sustainability.

Christian Tomeldan

Christian Tomeldan is a DevOps Engineer turned Solutions Architect. Operating out of San Francisco, he is passionate about technology and conveys that passion to customers ensuring they grow with the right support and best practices. He focuses his technical depth mostly around Containers, Security, and Environmental Sustainability.

Ifeanyi Okafor

Ifeanyi Okafor is a Product Manager with AWS. He enjoys building products that solve customer problems at scale.

Optimize your modern data architecture for sustainability: Part 1 – data ingestion and data lake

Post Syndicated from Sam Mokhtari original https://aws.amazon.com/blogs/architecture/optimize-your-modern-data-architecture-for-sustainability-part-1-data-ingestion-and-data-lake/

The modern data architecture on AWS focuses on integrating a data lake and purpose-built data services to efficiently build analytics workloads, which provide speed and agility at scale. Using the right service for the right purpose not only provides performance gains, but facilitates the right utilization of resources. Review Modern Data Analytics Reference Architecture on AWS, see Figure 1.

In this series of two blog posts, we will cover guidance from the Sustainability Pillar of the AWS Well-Architected Framework on optimizing your modern data architecture for sustainability. Sustainability in the cloud is an ongoing effort focused primarily on energy reduction and efficiency across all components of a workload. This will achieve the maximum benefit from the resources provisioned and minimize the total resources required.

Modern data architecture includes five pillars or capabilities: 1) data ingestion, 2) data lake, 3) unified data governance, 4) data movement, and 5) purpose-built analytics. In the first part of this blog series, we will focus on the data ingestion and data lake pillars of modern data architecture. We’ll discuss tips and best practices that can help you minimize resources and improve utilization.

Modern Data Analytics Reference Architecture on AWS

Figure 1. Modern Data Analytics Reference Architecture on AWS

1. Data ingestion

The data ingestion process in modern data architecture can be broadly divided into two main categories: batch, and real-time ingestion modes.

To improve the data ingestion process, see the following best practices:

Avoid unnecessary data ingestion

Work backwards from your business needs and establish the right datasets you’ll need. Evaluate if you can avoid ingesting data from source systems by using existing publicly available datasets in AWS Data Exchange or Open Data on AWS. Using these cleaned and curated datasets will help you to avoid duplicating the compute and storage resources needed to ingest this data.

Reduce the size of data before ingestion

When you design your data ingestion pipelines, use strategies such as compression, filtering, and aggregation to reduce the size of ingested data. This will permit smaller data sizes to be transferred over network and stored in the data lake.

To extract and ingest data from data sources such as databases, use change data capture (CDC) or date range strategies instead of full-extract ingestion. Use AWS Database Migration Service (DMS) transformation rules to selectively include and exclude the tables (from schema) and columns (from wide tables, for example) for ingestion.

Consider event-driven serverless data ingestion

Adopt an event-driven serverless architecture for your data ingestion so it only provisions resources when work needs to be done. For example, when you use AWS Glue jobs and AWS Step Functions for data ingestion and pre-processing, you pass the responsibility and work of infrastructure optimization to AWS.

2. Data lake

Amazon Simple Storage Service (S3) is an object storage service which customers use to store any type of data for different use cases as a foundation for a data lake. To optimize data lakes on Amazon S3, follow these best practices:

Understand data characteristics

Understand the characteristics, requirements, and access patterns of your workload data in order to optimally choose the right storage tier. You can classify your data into categories shown in Figure 2, based on their key characteristics.

Data Characteristics

Figure 2. Data Characteristics

Adopt sustainable storage options

Based on your workload data characteristics, use the appropriate storage tier to reduce the environmental impact of your workload, as shown in Figure 3.

Storage tiering on Amazon S3

Figure 3. Storage tiering on Amazon S3

Implement data lifecycle policies aligned with your sustainability goals

Based on your data classification information, you can move data to more energy-efficient storage or safely delete it. Manage the lifecycle of all your data automatically using Amazon S3 Lifecycle policies.

Amazon S3 Storage Lens delivers visibility into storage usage, activity trends, and even makes recommendations for improvements. This information can be used to lower the environmental impact of storing information on S3.

Select efficient file formats and compression algorithms

Use efficient file formats such as Parquet, where a columnar format provides opportunities for flexible compression options and encoding schemes. Parquet also enables more efficient aggregation queries, as you can skip over the non-relevant data. Using an efficient way of storage and accessing data is translated into higher performance with fewer resources.

Compress your data to reduce the storage size. Remember, you will need to trade off compression level (storage saved on disk) against the compute effort required to compress and decompress. Choosing the right compression algorithm can be beneficial as well. For instance, ZStandard (zstd) provides a better compression ratio compared with LZ4 or GZip.

Use data partitioning and bucketing

Partitioning and bucketing divides your data and keeps related data together. This can help reduce the amount of data scanned per query, which means less compute resources needed to service the workload.

Track and assess the improvement for environmental sustainability

The best way for customers to evaluate success in optimizing their workloads for sustainability is to use proxy measures and unit of work KPIs. For storage, this is GB per transaction, and for compute, it would be vCPU minutes per transaction. To use proxy measures to optimize workloads for energy efficiency, read Sustainability Well-Architected Lab on Turning the Cost and Usage Report into Efficiency Reports.

In Table 1, we have listed certain metrics to use as a proxy metric to measure specific improvements. These fall under each pillar of modern data architecture covered in this post. This is not an exhaustive list, you could use numerous other metrics to spot inefficiencies. Remember, just tracking one metric may not explain the impact on sustainability. Use an analytical exercise of combining the metric with data, type of attributes, type of workload, and other characteristics.

Pillar Metrics
Data ingestion
Data lake

Table 1. Metrics for the Modern data architecture pillars

Conclusion

In this post, we have provided guidance and best practices to help reduce the environmental impact of the data ingestion and data lake pillars of modern data architecture.

In the next post, we will cover best practices for sustainability for the unified governance, data movement, and purpose-built analytics and insights pillars.

Further reading:

Repair cafés in computing education | Hello World #19

Post Syndicated from Katharine Childs original https://www.raspberrypi.org/blog/repair-cafes-computing-education-hello-world-19/

Many technology items are disposed of each year, either because they are broken, are no longer needed, or have been upgraded. Researchers from Germany have identified this as an opportunity to develop a scheme of work for Computing, while at the same time highlighting the importance of sustainability in hardware and software use. They hypothesised that by repairing defective devices, students would come to understand better how these devices work, and therefore meet some of the goals of their curriculum.

A smartphone with the back cover taken off so it can be repaired.

The research team visited three schools in Germany to deliver Computing lessons based around the concept of a repair café, where defective items are repaired or restored rather than thrown away. This idea was translated into a series of lessons about using and repairing smartphones. Learners first of all explored the materials used in smartphones and reflected on their personal use of these devices. They then spent time moving around three repair workstations, examining broken smartphones and looking at how they could be repaired or repurposed. Finally, learners reflected on their own ecological footprint and what they had learnt about digital hardware and software.

An educational repair café

In the classroom, repair workstations were set up for three different categories of activity: fixing cable breaks, fixing display breaks, and tinkering to upcycle devices. Each workstation had a mentor to support learners in investigating faults themselves by using the question prompt, “Why isn’t this feature or device working?” At the display breaks and cable breaks workstations, a mentor was on hand to provide guidance with further questions about the hardware and software used to make the smartphone work. On the other hand, the tinkering workstation offered a more open-ended approach, asking learners to think about how a smartphone could be upcycled to be used for a different purpose, such as a bicycle computer. It was interesting to note that students visited each of the three workstations equally.

Two girls solder physical computing components in a workshop.
Getting hands-on with hardware through physical computing activities can be very engaging for learners.

The feedback from the participants showed there had been a positive impact in prompting learners to think about the sustainability of their smartphone use. Working with items that were already broken also gave them confidence to explore how to repair the technology. This is a different type of experience from other Computing lessons, in which devices such as laptops or tablets are provided and are expected to be carefully looked after. The researchers also asked learners to complete a questionnaire two weeks after the lessons, and this showed that 10 of the 67 participants had gone on to repair another smartphone after taking part in the lessons.

Links to computing education

The project drew on a theory called duality reconstruction that has been developed by a researcher called Carsten Schulte. This theory argues that in computing education, it is equally important to teach learners about the function of a digital device as about the structure. For example, in the repair café lessons, learners discovered more about the role that smartphones play in society, as well as experimenting with broken smartphones to find out how they work. This brought a socio-technical perspective to the lessons that helped make the interaction between the technology and society more visible.

A young girl solders something at a worktop while a man looks over her shoulder.
It’s important to make sure young people know how to work safely with electronic and physical computing components.

Using this approach in the Computing classroom may seem counter-intuitive when compared to the approach of splitting the curriculum into topics and teaching each topic sequentially. However, the findings from this project suggest that learners understand better how smartphones work when they also think about how they are manufactured and used. Including societal implications of computing can provide learners with useful contexts about how computing is used in real-world problem-solving, and can also help to increase learners’ motivation for studying the subject.

Working together

The final aspect of this research project looked at collaborative problem-solving. The lessons were structured to include time for group work and group discussion, to acknowledge and leverage the range of experiences among learners. At the workstations, learners formed small groups to carry out repairs. The paper doesn’t mention whether these groups were self-selecting or assigned, but the researchers did carry out observations of group behaviours in order to evaluate whether the collaboration was effective. In the findings, the ideal group size for the repair workstation activity was either two or three learners working together. The researchers noticed that in groups of four or more learners, at least one learner would become disinterested and disengaged. Some groups were also observed taking part in work that wasn’t related to the task, and although no further details are given about the nature of this, it is possible that the groups became distracted.

The findings from this project suggest that learners understand better how smartphones work when they also think about how they are manufactured and used.

Further investigation into effective pedagogies to set group size expectations and maintain task focus would be helpful to make sure the lessons met their learning objectives. This research was conducted as a case study in a small number of schools, and the results indicate that this approach may be more widely helpful. Details about the study can be found in the researchers’ paper (in German).

Repair café start-up tips

If you’re thinking about setting up a repair café in your school to promote sustainable computing, either as a formal or informal learning activity, here are ideas on where to begin:

  • Connect with a network of repair cafés in your region; a great place to start is repaircafe.org
  • Ask for volunteers from your local community to act as mentors
  • Use video tutorials to learn about common faults and how to fix them
  • Value upcycling as much as repair — both lead to more sustainable uses of digital devices
  • Look for opportunities to solve problems in groups and promote teamwork

Discover more in Hello World

This article is from our free computing education magazine Hello World. Every issue is written by educators for educators and packed with resources, ideas, and insights to inspire your learners and your own classroom practice.

Cover of issue 19 of Hello World magazine.

For more about computing education in the context of sustainability, climate change, and environmental impact, download issue 19 of Hello World, which focuses on these topics.

You can subscribe to Hello World for free to never miss a digital issue, and if you’re an educator in the UK, a print subscription will get you free print copies in the post.

PS If you’re interested in facilitating productive classroom discussions with your learners about ethical, legal, cultural, and environmental concerns surrounding computer science, take a look at our free online course ‘Impacts of Technology: How To Lead Classroom Discussions’.

The post Repair cafés in computing education | Hello World #19 appeared first on Raspberry Pi.

Optimizing your AWS Infrastructure for Sustainability, Part IV: Databases

Post Syndicated from Otis Antoniou original https://aws.amazon.com/blogs/architecture/optimizing-your-aws-infrastructure-for-sustainability-part-iv-databases/

In Part I: Compute, Part II: Storage, and Part III: Networking of this series, we introduced strategies to optimize the compute, storage, and networking layers of your AWS architecture for sustainability.

This post, Part IV, focuses on the database layer and proposes recommendations to optimize your databases’ utilization, performance, and queries. These recommendations are based on design principles of AWS Well-Architected Sustainability Pillar.

Optimizing the database layer of your AWS infrastructure

AWS database services

Figure 1. AWS database services

As your application serves more customers, the volume of data stored within your databases will increase. Implementing the recommendations in the following sections will help you use databases resources more efficiently and save costs.

Use managed databases

Usually, customers overestimate the capacity they need to absorb peak traffic, wasting resources and money on unused infrastructure. AWS fully managed database services provide continuous monitoring, which allows you to increase and decrease your database capacity as needed. Additionally, most AWS managed databases use a pay-as-you-go model based on the instance size and storage used.

Managed services shift responsibility to AWS for maintaining high average utilization and sustainability optimization of the deployed hardware. Amazon Relational Database Service (Amazon RDS) reduces your individual contribution compared to maintaining your own databases on Amazon Elastic Compute Cloud (Amazon EC2). In a managed database, AWS continuously monitors your clusters to keep your workloads running with self-healing storage and automated scaling.

AWS offers 15+ purpose-built engines to support diverse data models. For example, if an Internet of Things (IoT) application needs to process large amounts of time series data, Amazon Timestream is designed and optimized for this exact use case.

Rightsize, reduce waste, and choose the right hardware

To see metrics, thresholds, and actions you can take to identify underutilized instances and rightsizing opportunities, Optimizing costs in Amazon RDS provides great guidance. The following table provides additional tools and metrics for you to find unused resources:

Service Metric Source
Amazon RDS DatabaseConnections Amazon CloudWatch
Amazon RDS Idle DB Instances AWS Trusted Advisor
Amazon DynamoDB AccountProvisionedReadCapacityUtilization, AccountProvisionedWriteCapacityUtilization, ConsumedReadCapacityUnits, ConsumedWriteCapacityUnits CloudWatch
Amazon Redshift Underutilized Amazon Redshift Clusters AWS Trusted Advisor
Amazon DocumentDB DatabaseConnections, CPUUtilization, FreeableMemory CloudWatch
Amazon Neptune CPUUtilization, VolumeWriteIOPs, MainRequestQueuePendingRequests CloudWatch
Amazon Keyspaces ProvisionedReadCapacityUnits, ProvisionedWriteCapacityUnits, ConsumedReadCapacityUnits, ConsumedWriteCapacityUnits CloudWatch

These tools will help you identify rightsizing opportunities. However, rightsizing databases can affect your SLAs for query times, so consider this before making changes.

We also suggest:

  • Evaluating if your existing SLAs meet your business needs or if they could be relaxed as an acceptable trade-off to optimize your environment for sustainability.
  • If any of your RDS instances only need to run during business hours, consider shutting them down outside business hours either manually or with Instance Scheduler.
  • Consider using a more power-efficient processor like AWS Graviton-based instances for your databases. Graviton2 delivers 2-3.5 times better CPU performance per watt than any other processor in AWS.

Make sure to choose the right RDS instance type for the type of workload you have. For example, burstable performance instances can deal with spikes that exceed the baseline without the need to overprovision capacity. In terms of storage, Amazon RDS provides three storage types that differ in performance characteristics and price, so you can tailor the storage layer of your database according to your needs.

Use serverless databases

Production databases that experience intermittent, unpredictable, or spiky traffic may be underutilized. To improve efficiency and eliminate excess capacity, scale your infrastructure according to its load.

AWS offers relational and non-relational serverless databases that shut off when not in use, quickly restart, and automatically scale database capacity based on your application’s needs. This reduces your environmental impact because capacity management is automatically optimized. By selecting the best purpose-built database for your workload, you’ll benefit from the scalability and fully-managed experience of serverless database services, as shown in the following table.

 

Serverless Relational Databases Serverless Non-relational Databases
Amazon Aurora Serverless for an on-demand, autoscaling configuration Amazon DynamoDB (in On-Demand mode) for a fully managed, serverless, key-value NoSQL database
Amazon Redshift Serverless runs and scales data warehouse capacity; you don’t need to set up and manage data warehouse infrastructure Amazon Timestream for a time series database service for IoT and operational applications
Amazon Keyspaces for a scalable, highly available, and managed Apache Cassandra–compatible database service
Amazon Quantum Ledger Database for a fully managed ledger database that provides a transparent, immutable, and cryptographically verifiable transaction log ‎owned by a central trusted authority

Use automated database backups and remove redundant data

Manual Amazon RDS backups, unlike automated backups, take a manual snapshot of your database and do not have a retention period set by default. This means that unless you delete a manual snapshot, it will not be removed automatically. Removing manual snapshots you don’t need will use fewer resources, which will reduce your costs. If you want manual snapshots of RDS, you can set an “expiration” with AWS Backup. To keep long-term snapshots of MariaDB, MySQL, and PostgreSQL data, we recommend exporting snapshot data to Amazon Simple Storage Service (Amazon S3). You can also export specific tables or databases. This way, you can move data to “colder” longer-term archival storage instead of keeping it within your database.

Optimize long running queries

Identify and optimize queries that are resource intensive because they can affect the overall performance of your application. By using the Performance Insights dashboard, specifically the Top Dimensions table, which displays the Top SQL, waits, and hosts, you’ll be able to view and download SQL queries to diagnose and investigate further.

Tuning Amazon RDS for MySQL with Performance Insights and this knowledge center article will help you optimize and tune queries in Amazon RDS for MySQL. The Optimizing and tuning queries in Amazon RDS PostgreSQL based on native and external tools and Improve query performance with parallel queries in Amazon RDS for PostgreSQL and Amazon Aurora PostgreSQL-Compatible Edition blog posts outline how to use native and external tools to optimize and tune Amazon RDS PostgreSQL queries, as well as improve query performance using the parallel query feature.

Improve database performance

You can improve your database performance by monitoring, identifying, and remediating anomalous performance issues. Instead of relying on a database administrator (DBA), AWS offers native tools to continuously monitor and analyze database telemetry, as shown in the following table.

Service CloudWatch Metric Source
Amazon DynamoDB CPUUtilization, FreeStorageSpace CloudWatch
Amazon Redshift CPUUtilization, PercentageDiskSpaceUsed CloudWatch
Amazon Aurora CPUUtilization, FreeLocalStorage Amazon RDS
DynamoDB AccountProvisionedReadCapacityUtilization, AccountProvisionedWriteCapacityUtilization CloudWatch
Amazon ElastiCache CPUUtilization CloudWatch

CloudWatch displays instance-level and account-level usage metrics for Amazon RDS. Create CloudWatch alarms to activate and notify you based on metric value thresholds you specify or when anomalous metric behavior is detected. Enable Enhanced Monitoring real-time metrics for the operating system the DB instance runs on.

Amazon RDS Performance Insights collects performance metrics, such as database load, from each RDS DB instance. This data gives you a granular view of the databases’ activity every second. You can enable Performance Insights without causing downtime, reboot, or failover.

Amazon DevOps Guru for RDS uses the data from Performance Insights, Enhanced Monitoring, and CloudWatch to identify operational issues. It uses machine learning to detect and notify of database-related issues, including resource overutilization or misbehavior of certain SQL queries.

Conclusion

In this blog post, we discussed technology choices, design principles, and recommended actions to optimize and increase efficiency of your databases. As your data grows, it is important to scale your database capacity in line with your user load, remove redundant data, optimize database queries, and optimize database performance. Figure 2 shows an overview of the tools you can use to optimize your databases.

Figure 2. Tools you can use on AWS for optimization purposes

Figure 2. Tools you can use on AWS for optimization

Other blog posts in this series

Young people’s projects for a sustainable future

Post Syndicated from Rosa Brown original https://www.raspberrypi.org/blog/young-peoples-projects-for-a-sustainable-future/

This post has been adapted from issue 19 of Hello World magazine, which explored the interaction between technology and sustainability.

We may have had the Coolest Projects livestream, but we are still in awe of the 2092 projects that young people sent in for this year’s online technology showcase! To continue the Coolest Projects Global 2022 celebrations, we’re shining a light on some of the participants and the topics that inspired their projects.    

Coolest Projects team and participants at an in-person event.

In this year’s showcase, the themes of sustainability and the environment were extremely popular. We received over 300 projects related to the environment from young people all over the world. Games, apps, websites, hardware — we’ve seen so many creative projects that demonstrate how important the environment is to young people. 

Here are some of these projects and a glimpse into how kids and teens across the world are using technology to look after their environment.      

Using tech to make one simple change 

Has anyone ever told you that a small change can lead to a big impact? Check out these two Coolest Projects entries that put this idea into practice with clever inventions to make positive changes to the environment.

Arik (15) from the UK wanted to make something to reduce the waste he noticed at home. Whenever lots of people visited Arik’s house, getting the right drink for everyone was a challenge and often resulted in wasted, spilled drinks. This problem was the inspiration behind Arik’s ‘Liquid Dispenser’ project, which can hold two litres of any desired liquid and has an outer body made from reused cardboard. As Arik says, “You don’t need a plastic bottle, you just need a cup!”

A young person's home-made project to help people get a drink at the press of a button.
Arik’s project helps you easily select a drink with the press of a button

Amrit (13), Kingston (12), and Henry (12) from Canada were also inspired to make a project to reduce waste. ‘Eco Light’ is a light that automatically turns off when someone leaves their house to avoid wasted electricity. For the project, the team used a micro:bit to detect the signal strength and decide whether the LED should be on (if someone is in the house) or off (if the house is empty).

“We wanted to create something that hopefully would create a meaningful impact on the world.”

Amrit, Kingston, and Henry

Projects for local and global positive change 

We love to see young people invent things to have positive changes in the community, on a local and global level.

This year, Sashrika (11) from the US shared her ‘Gas Leak Detector’ project, which she designed to help people who heat their homes with diesel. On the east coast of America, many people store their gas tanks in the basement. This means they may not realise if the gas is leaking. To solve this problem, Sashrika has combined programming with physical computing to make a device that can detect if there is a gas leak and send a notification to your phone. 

A young person and their home-made gas leak detector.
Sashrika and her gas leak detector

Sashrika’s project has the power to help lots of people and she has even thought about how she would make more changes to her project in the name of sustainability: 

“I would probably add a solar panel because there are lots of houses that have outdoor oil tanks. Solar panel[s] will reduce electricity consumption and reduce CO2 emission[s].”

Sashrika

Amr in Syria was also thinking about renewable energy sources when he created his own ‘Smart Wind Turbine’.  

The ‘Smart Wind Turbine’ is connected to a micro:bit to measure the electricity generated by a fan. Amr conducted tests that recorded that more electricity was generated when the turbine faced in the direction of the wind. So Amr made a wind vane to determine the wind’s direction and added another micro:bit to communicate the results to the turbine. 

Creating projects for the future  

We’ve also seen projects created by young people to make the world a better place for future generations. 

Naira and Rhythm from India have designed houses that are suited for people and the planet. They carried out a survey and from their results they created the ‘Net Zero Home’. Naira and Rhythm’s project offers an idea for homes that are comfortable for people of all abilities and ages, while also being sustainable.

“Our future cities will require a lot of homes, this means we will require a lot of materials, energy, water and we will also produce a lot of waste. So we have designed this net zero home as a solution.”

Naira and Rhythm

Andrea (9) and Yuliana (10) from the US have also made something to benefit future generations. The ‘Bee Counter’ project uses sensors and a micro:bit to record bees’ activity around a hive. Through monitoring the bees, the team hope they can see (and then fix) any problems with the hive. Andrea and Yuliana want to maintain the bees’ home to help them continue to have a positive influence on our environment.

Knowledge is power: projects to educate and inspire 

Some young creators use Coolest Projects as an opportunity to educate and inspire people to make environmental changes in their own lives.

Sabrina (13) from the UK created her own website, ‘A Guide to Climate Change’. It includes images, text, graphics of the Earth’s temperature change, and suggestions for people to minimise their waste.  Sabrina also received the Broadcom Coding with Commitment award for using her skills to provide vital information about the effects of climate change.

alt=""
Sabrina’s project

Kushal (12) from India wanted to use tech to encourage people to help save the environment. Kushal had no experience of app development before making his ‘Green Steps’ app. He says, “I have created a mobile app to connect like-minded people who want to do something about [the] environment.” 

A young person's app to help people connect over a shared interest in the environment.
Kushal’s app helps people to upload and save pictures, like content from other users, and access helpful resources

These projects are just some of the incredible ideas we’ve seen young people enter for Coolest Projects this year. It’s clear from the projects submitted that the context of the environment and protecting our planet resonates with so many students, summarised by Sabrina, “Some of us don’t understand how important the earth is to us. And I hope we don’t have to wait until it is gone to realise.” 

Check out the Coolest Projects showcase for even more projects about the environment, alongside other topics that have inspired young creators.

The post Young people’s projects for a sustainable future appeared first on Raspberry Pi.

35,000 new trees in Nova Scotia

Post Syndicated from Patrick Day original https://blog.cloudflare.com/35-000-new-trees-in-nova-scotia/

35,000 new trees in Nova Scotia

Cloudflare is proud to announce the first 35,000 trees from our commitment to help clean up bad bots (and the climate) have been planted.

35,000 new trees in Nova Scotia

Working with our partners at One Tree Planted (OTP), Cloudflare was able to support the restoration of 20 hectares of land at Victoria Park in Nova Scotia, Canada. The 130-year-old natural woodland park is located in the heart of Truro, NS, and includes over 3,000 acres of hiking and biking trails through natural gorges, rivers, and waterfalls, as well as an old-growth eastern hemlock forest.

The planting projects added red spruce, black spruce, eastern white pine, eastern larch, northern red oak, sugar maple, yellow birch, and jack pine to two areas of the park. The first area was a section of the park that recently lost a number of old conifers due to insect attacks. The second was an area previously used as a municipal dump, which has since been covered by a clay cap and topsoil.

35,000 new trees in Nova Scotia

Our tree commitment began far from the Canadian woodlands. In 2019, we launched an ambitious tool called Bot Fight Mode, which for the first time fought back against bots, targeting scrapers and other automated actors.

Our idea was simple: preoccupy bad bots with nonsense tasks, so they cannot attack real sites. Even better, make these tasks computationally expensive to engage with. This approach is effective, but it forces bad actors to consume more energy and likely emit more greenhouse gasses (GHG). So in addition to launching Bot Fight Mode, we also committed to supporting tree planting projects to account for any potential environmental impact.

What is Bot Fight Mode?

As soon as Bot Fight Mode is enabled, it immediately starts challenging bots that visit your site. It is available to all Cloudflare customers for free, regardless of plan.

35,000 new trees in Nova Scotia

When Bot Fight Mode identifies a bot, it issues a computationally expensive challenge to exhaust it (also called “tarpitting”). Our aim is to disincentivize attackers, so they have to find a new hobby altogether. When we tarpit a bot, we require a significant amount of compute time that will stall its progress and result in a hefty server bill. Sorry not sorry.

We do this because bots are leeches. They draw resources, slow down sites, and abuse online platforms. They also hack into accounts and steal personal data. Of course, we allowlist a small number of bots that are well-behaved, like Slack and Google. And Bot Fight Mode only acts on traffic from cloud and hosting providers (because that is where bots usually originate from).

Over 550,000 sites use Bot Fight Mode today! We believe this makes it the most widely deployed bot management solution in the world (though this is impossible to validate). Free customers can enable the tool from the dashboard and paid customers can use a special version, known as Super Bot Fight Mode.

How many trees? Let’s do the math 🚀

Now, the hard part: how can we translate bot challenges into a specific number of trees that should be planted? Fortunately, we can use a series of unit conversions, similar to those we use to calculate Cloudflare’s total GHG emissions.

We started with the following assumptions.

Table 1.

Measure Quantity Scaled Source
Energy used by a standard server 1,760.3 kWh / year To hours (0.2 kWh / hour) Go Climate
Emissions factor 0.33852 kgCO2e / kWh To grams (338.52 gCO2e / kWh) Go Climate
CO2 absorbed by a mature tree 48 lbsCO2e / year To kilograms (21 kgCO2e / year) One Tree Planted

Next, we selected a high-traffic day to model the rate and duration of bot challenges on our network. On May 23, 2021, Bot Fight Mode issued 2,878,622 challenges, which lasted an average of 50 seconds each. In total, bots spent 39,981 hours engaging with our network defenses, or more than four years of challenges in a single day!

We then converted that time value into kilowatt-hours (kWh) of energy based on the rate of power consumed by our generic server listed in Table 1 above.

39,981 (hours) x .2 (kWh/hour) = 7,996 (kWh)

Once we knew the total amount of energy consumed by bad bot servers, we used an emissions factor (the amount of greenhouse gasses emitted per unit of energy consumed) to determine total emissions.

7,996 (kwh) x 338.52 (gCO2e/kwh) = 2,706,805 (gCO2e)

If you have made it this far, clearly you like to geek out like we do, so for the sake of completeness, the unit commonly used in emissions calculations is carbon dioxide equivalent (CO2e), which is a composite unit for all six GHGs listed in the Kyoto Protocol weighted by Global Warming Potential.

The last conversion we needed was from emissions to trees. Our partners at OTP found that a mature tree absorbs roughly 21 kgCO2e per year. Based on our total emissions that translates to roughly 47,000 trees per server, or 840 trees per CPU core. However, in our original post, we also noted that given the time it takes for a newly planted tree to reach maturity, we would multiply our donation by a factor of 25.

In the end, over the first two years of the program, we calculated that we would need approximately 42,000 trees to account for all the individual CPU cores engaged in Bot Fight Mode. For good measure, we rounded up to an even 50,000.

We are proud that most of these trees are already in the ground, and we look forward to providing an update when the final 15,000 are planted.

A piece of the puzzle

“Planting trees will benefit species diversity of the existing forest, animal habitat, greening of reclamation areas as well as community recreation areas, and visual benefits along popular hiking/biking trail networks.”  
Stephanie Clement, One Tree Planted, Project Manager North America

Reforestation is an important part of protecting healthy ecosystems and promoting biodiversity. Trees and forests are also a fundamental part of helping to slow the growth of global GHG emissions.

However, we recognize there is no single solution to the climate crisis. As part of our mission to help build a better, more sustainable Internet, Cloudflare is investing in renewable energy, tools that help our customers understand and mitigate their own carbon footprints on our network, and projects that will help offset or remove historical emissions associated with powering our network by 2025.

Want to be part of our bots & trees effort? Enable Bot Fight Mode today! It’s available on our free plan and takes only a few seconds. By the time we made our first donation to OTP in 2021, Bot Fight Mode had already spent more than 3,000 years distracting bots.

Help us defeat bad bots and improve our planet today!

35,000 new trees in Nova Scotia

—-
For more information on Victoria Park, please visit https://www.victoriaparktruro.ca
For more information on One Tree Planted, please visit https://onetreeplanted.org
For more information on sustainability at Cloudflare, please visit www.cloudflare.com/impact

Computing and sustainability in your classroom | Hello World #19

Post Syndicated from Gemma Coleman original https://www.raspberrypi.org/blog/computing-sustainability-classroom-hello-world-19/

Issue 19 of our free magazine Hello World, written by and for the computing education community, focuses on the interaction between sustainability and computing, from how we can interact with technology responsibly, to its potential to mitigate climate change.

Cover of issue 19 of Hello World magazine.

To give you a taste of this brand-new issue, here is primary school teacher Peter Gaynord’s article about his experience of using an environmental case study to develop a cross-curricular physical computing unit that gives his learners a real-life context.

Peter Gaynord.
Peter Gaynord.

Real-life problem solving

The prospect of developing your own unit of work from scratch can feel very daunting. With the number of free resources available, it begs the question, why do it? Firstly, it gives you the opportunity to deliver computing that is interwoven with the rest of your curriculum. It also naturally lends itself to a constructionist approach to learning through meaningful engagement with real-world problem-solving. In this article, I am going to share my experience of developing a ten-lesson unit of physical computing for students aged nine to ten that is linked to the more general topic of the environment.

To engage children in the process of problem-solving, it is important that the problem is presented as a real and meaningful one. To introduce the topic of the environment, we showed pupils a video of the Panama Canal, including information about the staggering amount of CO2 that is saved by ships taking this route instead of the alternative, longer routes that use more fuel. However, we explained that because of the special geographical features, a moving bridge needed to be constructed over the canal. The students’ challenge was first to design a solution to the problem, and then to make a working model.

An model of a bridge.
One bridge model from Peter’s class.

The model would use physical computing as part of the solution to the problem. The children would program a single-geared motor using a Crumble microcontroller to slowly lift and lower the bridge by the desired amount. We decided to issue a warning to drivers that the road bridge was about to close using a Sparkle, a programmable LED. Ultimately, the raising and lowering of the bridge would happen automatically when a ship approached. For this purpose, we would use an ultrasonic sensor to detect the presence of the ship.

Building the required skills

To develop the skills required to use the Crumble microcontroller, we led some discrete computing lessons based largely on the Teach Computing Curriculum’s ‘Programming A — Selection in physical computing’ unit. In these lessons, the children developed the skill of sensing and responding differently to conditions using the selection programming construct. They learnt this key concept alongside controlling and connecting the motor, the Sparkle, and the ultrasonic sensor.

A learner does physical computing in the primary school classroom.
Physical computing allows learners to get hands-on.

For students to succeed, we also had to teach them skills from other subjects, and consider at what stage it would be most useful to introduce them. For example, before asking children to document their designs, we first needed to teach the design technology (DT) objectives for communicating ideas through sketches. Most other DT objectives that covered the practical skills to make a model were interwoven as the project progressed. At the end of the project, we guided the children through how to evaluate their design ideas and reflect on the process of model making. Before pupils designed their solutions, we also had to introduce some science for them to apply to their designs. We covered increasing forces using levers, pulleys, and gears, as well as the greenhouse effect and how burning fossil fuels contributes to global warming.

An end pivot model of a bridge.
Another bridge model made in Peter’s class.

It is very important not to specify a solution for students at the beginning, otherwise the whole project becomes craft instead of problem-solving. However, it is important to spend some time thinking about any practical aspects of the model building that may need extra scaffolding. Experience suggested that it was important to limit the scale of the children’s models. We did this by showing them a completed central bridge span and later, guiding the building of this component so that all bridges had the same scale. It also turned out to be very important that the children were limited in their model building to using one single-geared motor. This would ensure that all children engaged with actively thinking about how to utilise the lever and pulley system to increase force, instead of relying on using more motors to lift the bridge.

If you want to finish reading Peter’s article and see his unit outline, download Hello World issue 19 as a free PDF.

Discover more in Hello World 19 — for free

As always, you’ll find this new issue of Hello World packed with resources, ideas, and insights to inspire your learners and your own classroom practice:

  • Portraits of scientists who apply artificial intelligence models to sustainability research
  • Research behind device-repair cafés
  • A deep dive into the question of technology obsolescence
  • And much more

All issues of Hello World as available as free PDF downloads. Subscribe to never miss a digital issue — and if you’re an educator in the UK, you can subscribe to receive free print copies in the post.

PS: US-based educators, if you’re at CSTA Annual Conference in Chicago this month, come meet us at booth 521 and join us at our sessions about writing for Hello World, the Big Book of Computing Pedagogy, and more. We look forward to seeing you there!

The post Computing and sustainability in your classroom | Hello World #19 appeared first on Raspberry Pi.

AWS Week in Review – June 27, 2022

Post Syndicated from Danilo Poccia original https://aws.amazon.com/blogs/aws/aws-week-in-review-june-27-2022/

This post is part of our Week in Review series. Check back each week for a quick roundup of interesting news and announcements from AWS!

It’s the beginning of a new week, and I’d like to start with a recap of the most significant AWS news from the previous 7 days. Last week was special because I had the privilege to be at the very first EMEA AWS Heroes Summit in Milan, Italy. It was a great opportunity of mutual learning as this community of experts shared their thoughts with AWS developer advocates, product managers, and technologists on topics such as containers, serverless, and machine learning.

Participants at the EMEA AWS Heroes Summit 2022

Last Week’s Launches
Here are the launches that got my attention last week:

Amazon Connect Cases (available in preview) – This new capability of Amazon Connect provides built-in case management for your contact center agents to create, collaborate on, and resolve customer issues. Learn more in this blog post that shows how to simplify case management in your contact center.

Many updates for Amazon RDS and Amazon AuroraAmazon RDS Custom for Oracle now supports Oracle database 12.2 and 18c, and Amazon RDS Multi-AZ deployments with one primary and two readable standby database instances now supports M5d and R5d instances and is available in more Regions. There is also a Regional expansion for RDS Custom. Finally, PostgreSQL 14, a new major version, is now supported by Amazon Aurora PostgreSQL-Compatible Edition.

AWS WAF Captcha is now generally available – You can use AWS WAF Captcha to block unwanted bot traffic by requiring users to successfully complete challenges before their web requests are allowed to reach resources.

Private IP VPNs with AWS Site-to-Site VPN – You can now deploy AWS Site-to-Site VPN connections over AWS Direct Connect using private IP addresses. This way, you can encrypt traffic between on-premises networks and AWS via Direct Connect connections without the need for public IP addresses.

AWS Center for Quantum Networking – Research and development of quantum computers have the potential to revolutionize science and technology. To address fundamental scientific and engineering challenges and develop new hardware, software, and applications for quantum networks, we announced the AWS Center for Quantum Networking.

Simpler access to sustainability data, plus a global hackathon – The Amazon Sustainability Data Initiative catalog of datasets is now searchable and discoverable through AWS Data Exchange. As part of a new collaboration with the International Research Centre in Artificial Intelligence, under the auspices of UNESCO, you can use the power of the cloud to help the world become sustainable by participating to the Amazon Sustainability Data Initiative Global Hackathon.

For a full list of AWS announcements, be sure to keep an eye on the What’s New at AWS page.

Other AWS News
A couple of takeaways from the Amazon re:MARS conference:

Amazon CodeWhisperer (preview) – Amazon CodeWhisperer is a coding companion powered by machine learning with support for multiple IDEs and languages.

Synthetic data generation with Amazon SageMaker Ground TruthGenerate labeled synthetic image data that you can combine with real-world data to create more complete training datasets for your ML models.

Some other updates you might have missed:

AstraZeneca’s drug design program built using AWS wins innovation award – AstraZeneca received the BioIT World Innovative Practice Award at the 20th anniversary of the Bio-IT World Conference for its novel augmented drug design platform built on AWS. More in this blog post.

Large object storage strategies for Amazon DynamoDB – A blog post showing different options for handling large objects within DynamoDB and the benefits and disadvantages of each approach.

Amazon DevOps Guru for RDS under the hoodSome details of how DevOps Guru for RDS works, with a specific focus on its scalability, security, and availability.

AWS open-source news and updates – A newsletter curated by my colleague Ricardo to bring you the latest open-source projects, posts, events, and more.

Upcoming AWS Events
It’s AWS Summits season and here are some virtual and in-person events that might be close to you:

On June 30, the AWS User Group Ukraine is running an AWS Tech Conference to discuss digital transformation with AWS. Join to learn from many sessions including a fireside chat with Dr. Werner Vogels, CTO at Amazon.com.

That’s all from me for this week. Come back next Monday for another Week in Review!

Danilo

Author Spotlight: Margaret O’Toole, WW Tech Leader in Sustainability at AWS

Post Syndicated from Elise Chahine original https://aws.amazon.com/blogs/architecture/author-spotlight-margaret-otoole-ww-tech-leader-in-sustainability-at-aws/

The Author Spotlight series pulls back the curtain on some of AWS’s most prolific authors. Read on to find out more about our very own Margaret O’Toole’s journey, in her own words!


My favorite part of working at AWS is collaborating with my peers. Over the last five years, I’ve had the pleasure to work with a wide range of smart, passionate, curious people. Many of whom have been co-authors of the blogs we’ve written together.

When I joined AWS in 2017, I joined as a Cloud Support Associate in Northern Virginia. My team focused on DevOps, and while I focused on Containers mostly, I was also pushed to expand my knowledge. With that, I started to invest time with AWS OpsWorks (in large part thanks to Darko Mesaroš (Sr Developer Advocate), who many of you may know and love!). Although I was really excited about the agile, scalable nature of containers, I knew that it was important to understand how to manage configuration changes more broadly.

In 2019, I decided to leave AWS Support and become a Solutions Architect (SA) in Berlin. I’ve always been really interested in systems and how different components of a system worked together—in fact, this is exactly why I studied computer science and biology at university. My role as an SA pushed me to look at customer challenges from completely new perspectives. Now, instead of helping customers with a single component of their workload, I worked with customers on massive, multi-component workloads. I was exposed to new technology that I hadn’t worked with in my Support role, like AWS Control Tower, Amazon SageMaker, and AWS IoT. In many ways, I was learning alongside my customers, and we were bouncing ideas around together to make sure we were architecting the best possible solutions for their needs.

However, I always had a passion for sustainability. When I was in university, I was mostly interested in the intersection between natural systems and synthetic systems—I really wanted to understand how I could combine my passion for sustainability with the power of the AWS Cloud. And, as it turned out, so did many others at AWS! We spent most of 2020 working on experiments and writing narratives (the Amazonian version of a pitch), to better understand if customers wanted to work with AWS on sustainability related challenges, and if so, on what topics. My work today comes directly from the results of those initial customer interactions.

These events also marked a big change in my career! In 2020, I transitioned to a full-time sustainability role, becoming a Sustainability Solutions Architect—a novel function at the time. Today, I’m based in Copenhagen, and my focus is to help customers globally build the most energy-efficient and sustainable workloads on AWS. Every day, I find ways for customers to leverage AWS to solve challenges within their sustainability strategy.

Favorite blog posts

Perform Continuous cookbook integration testing and delivery for AWS OpsWorks for Chef Automate

My very first blog at AWS was on how to do Continuous Integration / Continuous Delivery with AWS OpsWorks. A group of us in AWS Support were asked to build out a lab that could be used at ChefConf, which we turned into a blog post.

Many customers are using tools like Chef Automate and Puppet to manage large sets of infrastructure, but finding cloud-native approaches to these tools was not always super obvious. My favorite part of writing this blog post was combining cloud-native ideas with traditional infrastructure management tools.

How to setup and use AWS OpsWorks for Chef Automate or Puppet Enterprise in an isolated subnet

We also saw that customers wanted to understand how to leverage cloud network security in their OpsWorks environment, and so we decided to build a walkthrough on how to use OpsWorks for Chef Automate or Puppet Enterprise in an isolated subnet.

How to set up automatic failover for AWS OpsWorks for Chef Automate

In addition to wanting to move fast and be secure, our customers also wanted to have reliability baked into their workloads. For many customers, their Chef Automate Server is a critical component, and they cannot afford downtime.

Sustainability content

Ask an Expert – Sustainability

In August 2021, Joe Beer, WW Technology Leader, and I worked on Architecture Monthly discussing the overlap between sustainability and technology.

Sustainability is a really broad topic, so in order to help scope conversations, we broke the topic down into three main categories: sustainability of, in, and through the Cloud:

  • Sustainability of the Cloud is AWS’s responsibility, and it covers things like our renewable energy projects, continuous work to increase the energy efficiency of our data centers, and all work that supports our goal of operating as environmentally friendly as possible.
  • Sustainability in the Cloud is the customers’ responsibility. AWS is committed to sustainable operations, but builders also need to consider sustainability as a core non-functional requirement. To make this clearer, a set of best practices have been published in the Well Architected Sustainability Pillar.
  • Sustainability through the Cloud covers all of the ways that the cloud solutions support sustainability strategies. Smart EV charging, for example, uses the AWS Cloud and AI/ML to lessen the aggregate impact to the grid that may occur because of EV charging peaks and ramp ups.

Throughout 2021, we worked with customers in almost all industries on both sustainability in and through the cloud, putting together a lineup of various sustainability talks at re:Invent 2021.

A highlight for me, personally, was seeing the AWS Well-Architected Framework Sustainability Pillar whitepaper released. After spending most of my AWS writing career on internal documentation or blog posts, leading the development of a full whitepaper was a completely new experience. I’m proud of the result and excited to continue to work on improving the content. Today, you can find the pillar in the Well-Architected tool and also explore some labs.

Architecting for sustainability: a re:Invent 2021 recap

We also did a deep dive into one of the sessions to highlight some of the key themes from the Well-Architected Pillar and the unique actions Starbucks took to reduce their footprint.

Architecting for sustainability: a re:Invent 2021 recap

Post Syndicated from Margaret O'Toole original https://aws.amazon.com/blogs/architecture/architecting-for-sustainability-a-reinvent-2021-recap/

At AWS re:Invent 2021, we announced the AWS Well-Architected Sustainability Pillar, which marks sustainability as a key pillar of building workloads to best deliver on business need. In session ARC325 – Architecting for Sustainability, Adrian Cockcroft, Steffen Grunwald, and Drew Engelson (Director of Engineering at Starbucks) gave a detailed explanation of what to expect from the Sustainability Pillar whitepaper and how Starbucks has applied AWS best practices to their workloads.

AWS is committed to building a sustainable business for our customers and the planet. Amazon co-founded The Climate Pledge in 2019, committing to reach net-zero carbon by the Year 2040 — 10 years ahead of the Paris Agreement. As part of reaching our commitment, Amazon is on a path to powering our operations with 100% renewable energy by 2025 — 5 years ahead of our original target of 2030.

In 2020, we became the world’s largest corporate purchaser of renewable energy, reaching 65% renewable energy across our business. Amazon also continues to invest in renewable energy projects paired with energy storage: the energy storage systems allow Amazon to store clean energy produced by its solar projects and deploy it when solar energy is unavailable, such as in the evening hours, or during periods of high demand. This strengthens the climate impact of Amazon’s clean energy portfolio by enabling carbon-free electricity throughout more parts of the day.

As customers architect solutions with AWS services to fulfill their business needs, they also influence the sustainability of a workload. AWS had published five pillars in the AWS Well-Architected Framework that capture best practices for operational excellence, security, reliability, performance, and cost of workloads. These pillars have supported the business need for faster time-to-value for features and the delivery of products.

In light of the climate crisis, we have a new challenge to help businesses optimize their application architectures and workloads for sustainability. With sustainability as the sixth pillar in the AWS Well-Architected Framework, builders have guidance to optimize their workloads for energy reduction and efficiency improvement across all components of a workload.

How Starbucks is reducing their footprint

Starbucks has set very ambitious sustainability goals to reduce the environmental impact of their business. As Drew Engelson mentioned in his presentation, there was a gap between these major goals and how members of the technology teams could participate in this mission. Drew decided to evaluate the systems his team was responsible for and initiated an internal framework for understanding the environmental impact of the systems.

Sustainability proxy metrics based on service usage, as demonstrated in the AWS Well-Architected Lab for Sustainability, complemented AWS customer carbon footprint tool data to help the team drive reductions. The goal was to snap a baseline and identify areas for further optimization. The Starbucks team applied the Well-Architected Sustainability Pillar best practices after performing an AWS Well-Architected review early in the process.

Through working with AWS, Starbucks saw that from 2019 to 2020, the actual carbon footprint of their AWS workloads was reduced by approximately 32%, despite tremendous business growth during that same period. The customer carbon footprint tool indicates that these systems’ carbon footprint was further reduced by 50% in subsequent quarters.

Optimizing beyond cost

The Starbucks team already optimized their workloads for cost efficiency, leading to high energy and resource efficiency. Starbucks relies heavily on Kubernetes, which allows them to densely pack their services onto the underlying infrastructure, yielding very high utilization. Binary protocols, like gRPC, are used for efficient communication between microservices. This also cuts down on the data transfer between different networks. Many of their services are written in efficient Scala code, which adds another layer of optimization to the workload.

By taking a data-driven approach, Drew and his team at Starbucks also were able to review scaling thresholds. Based on data, they were able to adjust the auto scaling curve much more closely and smoothly, matching the actual demand curve and reducing the resources they needed to provision.

Drew’s team went beyond the initial optimization cost-benefits to sustainability of the whole stack. They identified workloads suitable for Amazon EC2 Spot Instances to leverage unused, on-demand capacity and increase the overall resource utilization of the cloud. The Starbucks team is starting to examine the impact of their mobile application to end-user devices, and they are taking steps to reduce the downstream impacts: for example, considering the size of client-side scripts, devices’ CPU usage, and color scheme (dark mode reduces the energy required for certain display types).

Starbucks takes optimization for sustainability seriously—so seriously that they coined the term “TCO2”, which highlights the importance climate impact when measuring Total Costs of Ownership (Figure). Drew raised an important question for application teams and architects that frequently make tradeoffs for cost benefits:

If carbon was the thing we’re optimizing for, would we make different choices?

Drew Engelson, Director of Engineering at Starbucks, discussing "TCO2"

Figure. Drew Engelson, Director of Engineering at Starbucks, discussing “TCO2

Get started, the sustainable way

At AWS, we encourage architects to build solutions with sustainability in mind. If your team wants to get started with the concepts from the AWS Well-Architected Sustainability Pillar, conduct a review of your workload in the Well-Architected Tool, or check out the AWS Well-Architected Framework and AWS Well-Architected Labs to learn more!

Optimize AI/ML workloads for sustainability: Part 3, deployment and monitoring

Post Syndicated from Benoit de Chateauvieux original https://aws.amazon.com/blogs/architecture/optimize-ai-ml-workloads-for-sustainability-part-3-deployment-and-monitoring/

We’re celebrating Earth Day 2022 from 4/22 through 4/29 with posts that highlight how to build, maintain, and refine your workloads for sustainability.


AWS estimates that inference (the process of using a trained machine learning [ML] algorithm to make a prediction) makes up 90 percent of the cost of an ML model. Given with AWS you pay for what you use, we estimate that inference also generally equates to most of the resource usage within an ML lifecycle.

In this series, we’re following the phases of the Well-Architected machine learning lifecycle (Figure 1) to optimize your artificial intelligence (AI)/ML workloads. In Part 3, our final piece in the series, we show you how to reduce the environmental impact of your ML workload once your model is in production.


If you missed the first parts of this series, in Part 1, we showed you how to examine your workload to help you 1) evaluate the impact of your workload, 2) identify alternatives to training your own model, and 3) optimize data processing. In Part 2, we identified ways to reduce the environmental impact of developing, training, and tuning ML models.


ML lifecycle

Figure 1. ML lifecycle

Deployment

Select sustainable AWS Regions

As mentioned in Part 1, select an AWS Region with sustainable energy sources. When regulations and legal aspects allow, choose Regions near Amazon renewable energy projects and Regions where the grid has low published carbon intensity to deploy your model.

Align SLAs with sustainability goals

Define SLAs that support your sustainability goals while meeting your business requirements:

Use efficient silicon

For CPU-based ML inference, use AWS Graviton3. These processors offer the best performance per watt in Amazon Elastic Compute Cloud (Amazon EC2). They use up to 60% less energy than comparable EC2 instances. Graviton3 processors deliver up to three times better performance compared to Graviton2 processors for ML workloads, and they support bfloat16.

For deep learning workloads, the Amazon EC2 Inf1 instances (based on custom designed AWS Inferentia chips) deliver 2.3 times higher throughput and 80% lower cost compared to g4dn instances. Inf1 has 50% higher performance per watt than g4dn, which makes it the most sustainable ML accelerator Amazon EC2 offers.

Make efficient use of GPU

Use Amazon Elastic Inference to attach just the right amount of GPU-powered inference acceleration to any EC2 or SageMaker instance type or Amazon Elastic Container Service (Amazon ECS) task.

While training jobs batch process hundreds of data samples in parallel, inference jobs usually process a single input in real time, and thus consume a small amount of GPU compute. Elastic Inference allows you to reduce the cost and environmental impact of your inference by using GPU resources more efficiently.

Optimize models for inference

Improve efficiency of your models by compiling them into optimized forms with the following:

  • Various open-source libraries (like Treelite for decision tree ensembles)
  • Third-party tools like Hugging Face Infinity, which allows you to speed up transformer models and run inference not only on GPU but also on CPU.
  • SageMaker Neo’s runtime consumes as little as one-tenth the footprint of a deep learning framework and optimizes models to perform up to 25 time faster with no loss in accuracy (example with XGBoost).

Deploying more efficient models means you need fewer resources for inference.

Deploy multiple models behind a single endpoint

SageMaker provides three methods to deploy multiple models to a single endpoint to improve endpoint utilization:

  1. Host multiple models in one container behind one endpoint. Multi-model endpoints are served using a single container. This can help you cut up to 90 percent of your inference costs and carbon emissions.
  2. Host multiple models that use different containers behind one endpoint.
  3. Host a linear sequence of containers in an inference pipeline behind a single endpoint.

Sharing endpoint resources is more sustainable and less expensive than deploying a single model behind one endpoint.

Right-size your inference environment

Right-size your endpoints by using metrics from Amazon CloudWatch or by using the Amazon SageMaker Inference Recommender. This tool can run load testing jobs and recommend the proper instance type to host your model. When you use the appropriate instance type, you limit the carbon emission associated with over-provisioning.

If your workload has intermittent or unpredictable traffic, configure autoscaling inference endpoints in SageMaker to optimize your endpoints. Autoscaling monitors your endpoints and dynamically adjusts their capacity to maintain steady and predictable performance using as few resources as possible. You can also try Serverless Inference (in preview), which automatically launches compute resources and scales them in and out depending on traffic, which eliminates idle resources.

Consider inference at the edge

When working on Internet of Things (IoT) use cases, evaluate if ML inference at the edge can reduce the carbon footprint of your workload. To do this, consider factors like the compute capacity of your devices, their energy consumption, or the emissions related to data transfer to the cloud. When deploying ML models to edge devices, consider using SageMaker Edge Manager, which integrates with SageMaker Neo and AWS IoT Greengrass (Figure 2).

Run inference at the edge with SageMaker Edge

Figure 2. Run inference at the edge with SageMaker Edge

Device manufacturing represents 32-57 percent of the global Information Communication Technology carbon footprint. If your ML model is optimized, it requires less compute resources. You can then perform inference on lower specification machines, which minimizes the environmental impact of the device manufacturing and uses less energy.

The following techniques compress the size of models for deployment, which speeds up inference and saves energy without significant loss of accuracy:

  • Pruning removes weights (learnable parameters) that don’t contribute much to the model.
  • Quantization represents numbers with the low-bit integers without incurring significant loss in accuracy. Specifically, you can reduce resource usage by replacing the parameters in an inference model with half-precision (16 bit), bfloat16 (16 bit, but the same dynamic range as 32 bit), or 8-bit integers instead of the usual single-precision floating-point (32 bit) values.

Archive or delete unnecessary artifacts

Compress and reduce the volume of logs you keep during the inference phase. By default, CloudWatch retains logs indefinitely. By setting limited retention time for your inference logs, you’ll avoid the carbon footprint of unnecessary log storage. Also delete unused versions of your models and custom container images from your repositories.

Monitoring

Retrain only when necessary

Monitor your ML model in production and only retrain if it’s required. Because of model drift, robustness, or new ground truth data being available, models usually need to be retrained. Instead of retraining arbitrarily, monitor your ML model in production, automate your model drift detection and only retrain when your model’s predictive performance has fallen below defined KPIs.

Consider SageMaker PipelinesAWS Step Functions Data Science SDK for Amazon SageMaker, or third-party tools to automate your retraining pipelines.

Measure results and improve

To monitor and quantify improvements during the inference phase, track the following metrics:

For storage:

Conclusion

AI/ML workloads can be energy intensive, but as called out by UN and mentioned in the last IPCC report, AI can contribute to mitigation of climate change and the achievement of several Sustainable Development Goals. As technology builders, it’s our responsibility to make sustainable use of AI and ML.

In this blog post series, we presented best practices you can use to make sustainability-conscious architectural decisions and reduce the environmental impact for your AI/ML workloads.

Other posts in this series

About the Well-Architected Framework

These practices are part of the Sustainability Pillar of the AWS Well-Architected Framework. AWS Well-Architected is a set of guiding design principles developed by AWS to help organizations build secure, high-performing, resilient, and efficient infrastructure for a variety of applications and workloads. Use the AWS Well-Architected Tool to review your workloads periodically to address important design considerations and ensure that they follow the best practices and guidance of the AWS Well-Architected Framework. For follow up questions or comments, join our growing community on AWS re:Post.

Ask an Expert – Sustainability

Post Syndicated from Margaret O'Toole original https://aws.amazon.com/blogs/architecture/ask-an-expert-sustainability/

In this first edition of Ask an Expert, we chat with Margaret O’Toole, Worldwide Tech Leader – Environmental Sustainability and Joseph Beer, Worldwide Tech Leader – Power and Utilities about sustainability solutions and tools to implement sustainability practices into IT design.

When putting together an AWS architecture to solve business problems specifically for sustainability-focused customers, what are some of the considerations?

A core idea of sustainability comes down to efficiency: how can we do the most work with the fewest number of resources? In this case, you want efficiency when you design and build the solution and also when you apply and operate it.

In broad strokes, there are two main things to consider. First, you want to optimize technology usage to reduce impact. Second, you want to find and use the best mix of technology to support sustainability. These objectives must also delight your customers, constituents, and stakeholders as you meet your business objectives in the most cost effective and expeditious way possible.

However, to be successful in combining technology and sustainability, you must consider the culture change of the sustainability transformation. Sustainability must become part of each person’s job function. When it comes to responsibility around sustainability at AWS, we think about it through two lenses.

First, we have the sustainability OF the AWS Cloud, which is our responsibility at AWS. This covers the work we do around purchasing renewable energy, operating efficiently, reducing water consumption in the data centers, and so on. There is more information on sustainability of the AWS Cloud on our sustainability page.

Then, there’s sustainability IN the cloud, which focuses on customers and their AWS usage. This is again focused on efficiency, mostly how to optimize existing patterns of user consumption, data access, software and development patterns, and hardware utilization.

In a related but slightly different vein, we also talk about sustainability THROUGH the cloud. This is how our customers use AWS to work on sustainability projects that help them meet their sustainability goals. This can include anything from carbon tracking or accounting to route optimization for fleets to using machine learning (ML) to reduce packaging and anything in between.

What are the general architecture pattern trends for sustainability in the cloud?

Solutions designed with sustainability in mind aim to be highly efficient. An architect wanting to optimize for sustainability looks for opportunities within user patterns, software patterns, development/test patterns, hardware patterns, and data patterns.

There is no one-size-fits-all way to optimize for sustainability, but the core themes are maximizing utilization and reducing waste or duplication. Most customers start with relatively easy things to accomplish. These typically include things like using the AWS Instance Scheduler to turn off compute when it will not be used or comparing cost and utilization reports to find hot spots to reduce utilization.

Another way to optimize for sustainability is to incorporate AWS Managed Services (AMS) as much as possible (many of these are also serverless). AMS not only increases the speed and efficiency of your design and build time and lower your overhead to run, but they also include automatic scaling as part of the service, which increases compute efficiency. Where AMS is not applicable, you can often configure automatic scaling into the solutions themselves. Automate everything, including your continuous integration and continuous delivery/deployment (CI/CD) code pipeline, data analytics and artificial intelligence (AI)/ML pipelines, and infrastructure builds where you are not using AMS.

And finally, include ongoing AWS Well-Architected reviews and continuously review and optimize your usage of AMS and the size and mix of your compute and storage in your standard operating procedures.

What are the key AWS-based sustainability solutions you are seeing customers ask for across industries and unique to specific industries?

Almost all industries have a set of shared challenges. This generally includes things like facilities or building management, design or optimization, and carbon tracking/footprinting. To help with this, customers must first understand the impact of their facilities, operations, or supply chain. Many customers use AWS services for ingestion, aggregation, and transformation of their real-world data. Once the data is collected and customers understand their relative impact, this data can be used to form models, which act as the basis for optimization. Technologies such as AWS IoT Core, Amazon Managed Blockchain, and AWS Lake Formation are crucial here.

For industries like power and utilities, there are more targeted solutions. Many of these are aimed at supporting the transition to electric vehicles (EVs). Smart EV charging, for example, uses the AWS Cloud and AI/ML to lessen the aggregate impact to the grid that may occur because of EV charging peaks and ramp ups. This helps avoid requiring natural gas at peak times. Amazon Forecast, a fully managed service that delivers highly accurate forecasts, can be useful in the case of short-term electric load forecasting. Grid voltage optimization is another solution that allows utilities to forecast usage requests and more accurately provide the desired voltage to their customers.

Within supply chains, customers use AWS to support traceability and carbon dashboarding to nudge suppliers toward greener energy. Customers commonly look for ways to track and trace throughout their supply chains, either to measure and reduce scope 3 emissions or to optimize their logistics network.

What’s your outlook for sustainability, and what role will the cloud play in future development efforts?

The cloud is critical to solving sustainability challenges that businesses and governments are being challenged with right now. It gives you the flexibility to use resources only when you need them, coupled with immense computing power. Thus, the cloud will be an essential tool in solving many data challenges like reporting and measuring and predicting and analyzing trends.

Migration to the cloud is essential to optimizing workloads and handling massive amounts of data. We can see this directly in how Boom used AWS HPC to support the creation of the world’s fastest and most sustainable aircraft. Additionally, FLSmidth is pursuing sustainable, technology-driven productivity under MissionZero. This initiative is working to achieve zero emissions and zero waste in cement production and mining by 2030 with the help of AWS high performance computing (HPC).

Do you see different trends in sustainability in the cloud versus on-premises?

The usage pattern is different. With the cloud you can use what you want, whenever you want, which allows for customers to drive up a high utilization. This type of efficiency is critical. It’s why 451 Research found that the same task can be completed on AWS with an 88% lower carbon footprint compared to the average surveyed US enterprise data center.

The cloud offers technology that wouldn’t be available on premises, such as large GPU-backed instances capable of processing huge amounts of data in hours that would take weeks on premises. It can also ingest massive streams of data from energy- and resource-consuming and producing assets to optimize their performance and environmental impact in near-real-time.

With the cloud, you have the flexibility and the power to move quickly through research and development to solve sustainability challenges. You can accelerate the development process of new ideas and solutions, which will be essential for the transformation to a carbon neutral, climate positive economy.

A collection of posts to help you design and build sustainable cloud architecture

Post Syndicated from Bonnie McClure original https://aws.amazon.com/blogs/architecture/a-collection-of-posts-to-help-you-design-and-build-sustainable-cloud-architecture/

We’re celebrating Earth Day 2022 from 4/22 through 4/29 with posts that highlight how to build, maintain, and refine your workloads for sustainability.


A blog can be a great starting point for you in finding and implementing a particular solution; learning about new features, services, and products; keeping up with the latest trends and ideas; or even understanding and resolving a tricky problem. Today, as part of our Earth Day celebration, we’re showcasing blog posts that do just that and more.

Optimize AI/ML workloads for sustainability series

Training artificial intelligence (AI) services and machine learning (ML) workloads uses a lot of energy, but they are also one of the best tools we have to fight the effects of climate change. For example, we’ve used ML to help deliver food and pharmaceuticals safely and with much less waste, reduce the cost and risk involved in maintaining wind farms, restore at-risk ecosystems, and predict and understand extreme weather.

In this series of three blog posts, Benoit, Eddie, Dan, and Brendan provide guidance from the Sustainability Pillar of the AWS Well-Architected Framework to reduce the carbon footprint of your AI/ML workloads.

ML lifecycle

Machine learning lifecycle

Improve workload sustainability with services and features from re:Invent 2021

Creating a well-architected, sustainable workload is a continuous process. This blog post highlight services and features from re:Invent 2021 that will help you design and optimize your AWS workloads from a sustainability perspective.

Optimizing Your IoT Devices for Environmental Sustainability

To become more environmentally sustainable, customers commonly introduce Internet of Things (IoT) devices. These connected devices collect and analyze data from commercial buildings, factories, homes, cars, and other locations to measure, understand, and improve operational efficiency. However, you must consider their environmental impact when using these devices. They must be manufactured, shipped, and installed; they consume energy during operations; and they must eventually be disposed of. They are also a challenge to maintain—an expert may need physical access to the device to diagnose issues and update it. This post considers device properties that influence an IoT device’s footprint throughout its lifecycle and shows you how Amazon Web Services (AWS) IoT services can help.

Let’s Architect! Architecting for Sustainability

This first post in the Let’s Architect! series gathers content to help software architects and tech leaders explore new ideas, case studies, and technical approaches. Luca, Laura, Vittori, and Zamira provide materials to help you address sustainability challenges and design sustainable architectures.

Optimizing your AWS Infrastructure for Sustainability Series

As organizations align their business with sustainable practices, it is important to review every functional area. If you’re building, deploying, and maintaining an IT stack, improving its environmental impact requires informed decision making.

This three-part blog series provides strategies to optimize your AWS architecture within compute, storage, and networking.

The shared responsibility model for sustainability shows how it is a shared responsibility between AWS and customers

The shared responsibility model for sustainability shows how it is a shared responsibility between AWS and customers

IBM Hackathon Produces Innovative Sustainability Solutions on AWS

Our consulting partner, IBM, organized the “Sustainability Applied 2021” hackathon in September 2021. This three-day event aimed to generate new ideas, create reference architecture patterns using AWS Cloud services, and produce sustainable solutions.

This post highlights four reference architectures that were developed during the hackathon. These architectures show you how IBM hack teams used AWS services to resolve real-world sustainability challenges. We hope these architectures inspire you to help address other sustainability challenges with your own solutions.

Improve workload sustainability with services and features from re:Invent 2021

Post Syndicated from Ernesto Bethencourt Plaz original https://aws.amazon.com/blogs/architecture/improve-workload-sustainability-with-services-and-features-from-reinvent-2021/

At our recent annual AWS re:Invent 2021 conference, we had important announcements regarding sustainability, including the new Sustainability Pillar for AWS Well-Architected Framework and the AWS Customer Carbon Footprint Tool.

In this blog post, I highlight services and features from these announcements to help you design and optimize your AWS workloads from a sustainability perspective.

Architecting for sustainability basics

Environmental sustainability is a shared responsibility between customers and AWS. We maintain sustainability of the cloud by delivering efficient, shared infrastructure. As a customer, you maintain sustainability in the cloud. This means you optimize your workload to efficiently use resources.

Shared responsibility model for sustainability

Figure 1. Shared responsibility model for sustainability

The Sustainability Pillar of the Well-Architected Framework provides the following design principles that you can use to optimize your workload for sustainability in the cloud:

  • Understand your impact.
  • Establish sustainability goals.
  • Maximize utilization.
  • Anticipate and adopt new, more efficient hardware and software offerings.
  • Use managed services.
  • Reduce the downstream impact of your cloud workloads.

In the next sections, I share new services and service features that were announced at re:Invent 2021 that are related to these design principles and provide recommendations on how they can help you design and operate your workloads more sustainably.

Understand your impact

Measure the environmental impact of your workloads and propose optimizations to meet your sustainability goals:

Maximize utilization

Reduce the total energy required to power your workloads by right-sizing your workloads to ensure high utilization and eliminating or minimizing idle resources.

Optimize your storage

Optimize data movement

Moving data across networks can contribute to energy consumption and your overall costs. Optimizing networks and data movement can save energy and costs. Optimize the way you access and move data across networks by:

  • Use AWS Direct Connect SiteLink to connect an on-premises network directly through the AWS global network. This helps your data travel more efficiently (across the shortest path) rather than using public network.
  • Migrate tape backups to the cloud with AWS Snow Family offline tape data migration. This helps you eliminate physical tapes and store your virtual tapes in the cloud with cold storage.
  • Automakers that build connected cars generate loads of data. Use AWS IoT FleetWise to reduce the amount of unnecessary data transferred to the cloud.

Optimize your processing

Minimize the resources you use and increase the utilization of resources you use to run your workloads:

Anticipate and adopt new, more efficient hardware and software offerings

Rapid adoption of new, more efficient technologies helps you reduce the impact of your workloads.

Adopt more efficient instances

Use more efficient software offerings

Use managed services

We launched several managed services that shift the responsibility of sustainability optimization to AWS:

 Reduce the downstream impact of your cloud workloads

  • Use the AWS Rust SDK for the native energy efficiency gains of the Rust programming language.
  • Use Amazon CloudWatch RUM to understand the performance of your application and use that information to make it more efficient.
  • Review your carbon emissions with the new AWS Customer Carbon Footprint Tool. This helps you define your sustainability key performance indicators (KPIs), optimize your workloads for sustainability, and improve your KPIs.

Conclusion

Having a well-architected, sustainable workload is a continuous process. This blog post brings you AWS announcements from the sustainability perspective. I encourage you to review your workload and find out which of these announcements can be adopted in your workload.

Ready to get started? I encourage you to check out on our What’s New blog for announcements and check them under a sustainability point of view, to identify if they help you improve and meet your sustainability goals.

Looking for more architecture content? AWS Architecture Center provides reference architecture diagrams, vetted architecture solutions, Well-Architected best practices, patterns, icons, and more!

Related information

Optimize AI/ML workloads for sustainability: Part 2, model development

Post Syndicated from Benoit de Chateauvieux original https://aws.amazon.com/blogs/architecture/optimize-ai-ml-workloads-for-sustainability-part-2-model-development/

More complexity often means using more energy, and machine learning (ML) models are becoming bigger and more complex. And though ML hardware is getting more efficient, the energy required to train these ML models is increasing sharply.

In this series, we’re following the phases of the Well-Architected machine learning lifecycle (Figure 1) to optimize your artificial intelligence (AI)/ML workloads. In Part 2, we examine the model development phase and show you how to train, tune, and evaluate your ML model to help you reduce your carbon footprint.


If you missed the first part of this series, we showed you how to examine your workload to help you 1) evaluate the impact of your workload, 2) identify alternatives to training your own model, and 3) optimize data processing.


ML lifecycle

Figure 1. ML lifecycle

Model building

Define acceptable performance criteria

When you build an ML model, you’ll likely need to make trade-offs between your model’s accuracy and its carbon footprint. When we focus only on the model’s accuracy, we “ignore the economic, environmental, or social cost of reaching the reported accuracy.” Because the relationship between model accuracy and complexity is at best logarithmic, training a model longer or looking for better hyperparameters only leads to a small increase in performance.

Establish performance criteria that support your sustainability goals while meeting your business requirements, not exceeding them.

Select energy-efficient algorithms

Begin with a simple algorithm to establish a baseline. Then, test different algorithms with increasing complexity to observe whether performance has improved. If so, compare the performance gain against the difference in resources required.

Try to find simplified versions of algorithms. This will help you use less resources to achieve a similar outcome. For example, DistilBERT, a distilled version of BERT, has 40% fewer parameters, runs 60% faster, and preserves 97% of BERT’s performance.

Use pre-trained or partially pre-trained models

Consider techniques to avoid training a model from scratch:

  • Transfer Learning: Use a pre-trained source model and reuse it as the starting point for a second task. For example, a model trained on ImageNet (14 million images) can generalize with other datasets.
  • Incremental Training: Use artifacts from an existing model on an expanded dataset to train a new model.

Optimize your deep learning models to accelerate training

Compile your DL models from their high-level language representation to hardware-optimized instructions to reduce training time. You can achieve this with open-source compilers or Amazon SageMaker Training Compiler, which can speed up training of DL models by up to 50% by more efficiently using SageMaker GPU instances.

Start with small experiments, datasets, and compute resources

Experiment with smaller datasets in your development notebook. This allows you to iterate quickly with limited carbon emission.

Automate the ML environment

When building your model, use Lifecycle Configuration Scripts to automatically stop idle SageMaker Notebook instances. If you are using SageMaker Studio, install the auto-shutdown Jupyter extension to detect and stop idle resources.

Use the fully managed training process provided by SageMaker to automatically launch training instances and shut them down as soon as the training job is complete. This minimizes idle compute resources and thus limits the environmental impact of your training job.

Adopt a serverless architecture for your MLOps pipelines. For example, orchestration tools like AWS Step Functions or SageMaker Pipelines only provision resources when work needs to be done. This way, you’re not maintaining compute infrastructure 24/7.

Model training

Select sustainable AWS Regions

As mentioned in Part 1, select an AWS Region with sustainable energy sources. When regulations and legal aspects allow, choose Regions near Amazon renewable energy projects and Regions where the grid has low published carbon intensity to train your model.

Use a debugger

A debugger like SageMaker Debugger can identify training problems like system bottlenecks, overfitting, saturated activation functions, and under-utilization of system resources. It also provides built-in rules like LowGPUUtilization or Overfit. These rules monitor your workload and will automatically stop a training job as soon as it detects a bug (Figure 2), which helps you avoid unnecessary carbon emissions.

Automatically stop buggy training jobs with SageMaker Debugger

Figure 2. Automatically stop buggy training jobs with SageMaker Debugger

Optimize the resources of your training environment

Reference the recommended instance types for the algorithm you’ve selected in the SageMaker documentation. For example, for DeepAR, you should start with a single CPU instance and only switch to GPU and multiple instances when necessary.

Right size your training jobs with Amazon CloudWatch metrics that monitor the utilization of resources like CPU, GPU, memory, and disk utilization.

Consider Managed Spot Training, which takes advantage of unused Amazon Elastic Compute Cloud (Amazon EC2) capacity and can save you up to 90% in cost compared to On-Demand instances. By shaping your demand for the existing supply of EC2 instance capacity, you will improve your overall resource efficiency and reduce idle capacity of the overall AWS Cloud.

Use efficient silicon

Use AWS Trainium for optimized for DL training workloads. It is expected to be our most energy efficient processor for this purpose.

Archive or delete unnecessary training artifacts

Organize your ML experiments with SageMaker Experiments to clean up training resources you no longer need.

Reduce the volume of logs you keep. By default, CloudWatch retains logs indefinitely. By setting limited retention time for your notebooks and training logs, you’ll avoid the carbon footprint of unnecessary log storage.

Model tuning and evaluation

Use efficient cross-validation techniques for hyperparameter optimization

Prefer Bayesian search over random search (and avoid grid search). Bayesian search makes intelligent guesses about the next set of parameters to pick based on the prior set of trials. It typically requires 10 times fewer jobs than random search, and thus 10 times less compute resources, to find the best hyperparameters.

Limit the maximum number of concurrent training jobs. Running hyperparameter tuning jobs concurrently gets more work done quickly. However, a tuning job improves only through successive rounds of experiments. Typically, running one training job at a time achieves the best results with the least amount of compute resources.

Carefully choose the number of hyperparameters and their ranges. You get better results and use less compute resources by limiting your search to a few parameters and small ranges of values. If you know that a hyperparameter is log-scaled, convert it to further improve the optimization.

Use warm-start hyperparameter tuning

Use warm-start to leverage the learning gathered in previous tuning jobs to inform which combinations of hyperparameters to search over in the new tuning job. This technique avoids restarting hyperparameter optimization jobs from scratch and thus reduces the compute resources needed.

Measure results and improve

To monitor and quantify improvements of your training jobs, track the following metrics:

For storage:

Conclusion

In this blog post, we discussed techniques and best practices to reduce the energy required to build, train, and evaluate your ML models.

We also provided recommendations for the tuning process as it makes up a large part of the carbon impact of building an ML model. During hyperparameter and neural design search, hundreds of versions of a given model are created, trained, and evaluated before identifying an optimal design.

In the next post, we’ll continue our sustainability journey through the ML lifecycle and discuss the best practices you can follow when deploying and monitoring your model in production.

Want to learn more? Check out the Sustainability Pillar of the AWS Well-Architected Framework, the Architecting for sustainability session at re:Invent 2021, and other blog posts on architecting for sustainability.

Looking for more architecture content? AWS Architecture Center provides reference architecture diagrams, vetted architecture solutions, Well-Architected best practices, patterns, icons, and more!

Other posts in this series

New – Customer Carbon Footprint Tool

Post Syndicated from Jeff Barr original https://aws.amazon.com/blogs/aws/new-customer-carbon-footprint-tool/

Carbon is the fourth-most abundant element in the universe, and is also a primary component of all known life on Earth. When combined with oxygen it creates carbon dioxide (CO2). Many industrial activities, including the burning of fossil fuels such as coal and oil, release CO2 into the atmosphere and cause climate change.

As part of Amazon’s efforts to increase sustainability and reduce carbon emissions, we co-founded The Climate Pledge in 2019. Along with the 216 other signatories to the Pledge, we are committed to reaching net-zero carbon by 2040, 10 years ahead of the Paris Agreement. We are driving carbon out of our business in a multitude of ways, as detailed on our Carbon Footprint page. When I share this information with AWS customers, they respond positively. They now understand that running their applications in AWS Cloud can help them to lower their carbon footprint by 88% (when compared to the enterprise data centers that were surveyed), as detailed in The Carbon Reduction Opportunity of Moving to Amazon Web Services, published by 451 Research.

In addition to our efforts, organizations in many industries are working to set sustainability goals and to make commitments to reach them. In order to help them to measure progress toward their goals they are implementing systems and building applications to measure and monitor their carbon emissions data.

Customer Carbon Footprint Tool
After I share information about our efforts to decarbonize with our customers, they tell me that their organization is on a similar path, and that they need to know more about the carbon footprint of their cloud infrastructure. Today I am happy to announce the new Customer Carbon Footprint Tool. This tool will help you to meet your own sustainability goals, and is available to all AWS customers at no cost. To access the calculator, I open the AWS Billing Console and click Cost & Usage Reports:

Then I scroll down to Customer Carbon Footprint Tool and review the report:

Let’s review each section. The first one allows me to select a time period with month-level granularity, and shows my carbon emissions in summary, geographic, and per-service form. In all cases, emissions are in Metric Tons of Carbon Dioxide Equivalent, abbreviated as MTCO2e:

All of the values in this section reflect the selected time period. In this example (all of which is sample data), my AWS resources emit an estimated 0.3 MTCO2e from June to August of 2021. If I had run the same application in my own facilities instead of in the AWS Cloud, I would have used an additional 0.9 MTCO2e. Of this value, 0.7 MTCO2e was saved due to renewable energy purchases made by AWS, and an additional 0.2 MTCO2e was saved due to the fact that AWS uses resources more efficiently.

I can also see my emissions by geography (all in America for this time period), and by AWS service in this section.

The second section shows my carbon emission statistics on a monthly, quarterly, or annual basis:

The third and final section projects how the AWS path to 100% renewable energy for our data centers will have a positive effect on my carbon emissions over time:

If you are an AWS customer, then you are already benefiting from our efforts to decarbonize and to reach 100% renewable energy usage by 2025, five years ahead of our original target.

You should also take advantage of the new Sustainability Pillar of AWS Well-Architected. This pillar contains six design principles for sustainability in the cloud, and will show you how to understand impact and to get the best utilization from the minimal number of necessary resources, while also reducing downstream impacts.

Things to Know
Here are a couple of important facts to keep in mind:

Regions – The emissions displayed reflect your AWS usage in all commercial AWS regions.

Timing – Emissions are calculated monthly. However, there is a three month delay due to the underlying billing cycle of the electric utilities that supply us with power.

Scope – The calculator shows Scope 1 and Scope 2 emissions, as defined here.

Jeff;

Optimize AI/ML workloads for sustainability: Part 1, identify business goals, validate ML use, and process data

Post Syndicated from Benoit de Chateauvieux original https://aws.amazon.com/blogs/architecture/optimize-ai-ml-workloads-for-sustainability-part-1-identify-business-goals-validate-ml-use-and-process-data/

Training artificial intelligence (AI) services and machine learning (ML) workloads uses a lot of energy—and they are becoming bigger and more complex. As an example, the Carbontracker: Tracking and Predicting the Carbon Footprint of Training Deep Learning Models study estimates that a single training session for a language model like GPT-3 can have a carbon footprint similar to traveling 703,808 kilometers by car.

Although ML uses a lot of energy, it is also one of the best tools we have to fight the effects of climate change. For example, we’ve used ML to help deliver food and pharmaceuticals safely and with much less waste, reduce the cost and risk involved in maintaining wind farms, restore at-risk ecosystems, and predict and understand extreme weather.

In this series of three blog posts, we’ll provide guidance from the Sustainability Pillar of the AWS Well-Architected Framework to reduce the carbon footprint of your AI/ML workloads.

This first post follows the first three phases provided in the Well-Architected machine learning lifecycle (Figure 1):

  • Business goal identification
  • ML problem framing
  • Data processing (data collection, data preprocessing, feature engineering)

You’ll learn best practices for each phase to help you review and refine your workloads to maximize utilization and minimize waste and the total resources deployed and powered to support your workload.

ML lifecycle

Figure 1. ML lifecycle

Business goal identification

Define the overall environmental impact or benefit

Measure your workload’s impact and its contribution to the overall sustainability goals of the organization. Questions you should ask:

  • How does this workload support our overall sustainability mission?
  • How much data will we have to store and process? What is the impact of training the model? How often will we have to re-train?
  • What are the impacts resulting from customer use of this workload?
  • What will be the productive output compared with this total impact?

Asking these questions will help you establish specific sustainability objectives and success criteria to measure against in the future.

ML problem framing

Identify if ML is the right solution

Always ask if AI/ML is right for your workload. There is no need to use computationally intensive AI when a simpler, more sustainable approach might succeed just as well.

For example, using ML to route Internet of Things (IoT) messages may be unwarranted; you can express the logic with a Rules Engine.

Consider AI services and pre-trained models 

Once you decide if AI/ML is the right tool, consider whether the workload needs to be developed as a custom model.

Many workloads can use the managed AWS AI services shown in Figure 2. Using these services means that you won’t need the associated resources to collect/store/process data and to prepare/train/tune/deploy an ML model.

Managed AWS AI services

Figure 2. Managed AWS AI services

If adopting a fully managed AI service is not appropriate, evaluate if you can use pre-existing datasets, algorithms, or models. AWS Marketplace offers over 1,400 ML-related assets that customers can subscribe to. You can also fine-tune an existing model starting from a pre-trained model, like those available on Hugging Face. Using pre-trained models from third parties can reduce the resources you need for data preparation and model training.

Select sustainable Regions

Select an AWS Region with sustainable energy sources. When regulations and legal aspects allow, choose Regions near Amazon renewable energy projects and Regions where the grid has low published carbon intensity to host your data and workloads.

Data processing (data collection, data preprocessing, feature engineering)

Avoid datasets and processing duplication

Evaluate if you can avoid data processing by using existing publicly available datasets like AWS Data Exchange and Open Data on AWS (which includes the Amazon Sustainability Data Initiative). They offer weather and climate datasets, satellite imagery, air quality or energy data, among others. When you use these curated datasets, it avoids duplicating the compute and storage resources needed to download the data from the providers, store it in the cloud, organize, and clean it.

For internal data, you can also reduce duplication and rerun of feature engineering code across teams and projects by using a feature storage, such as Amazon SageMaker Feature Store.

Once your data is ready for training, use pipe input mode to stream it from Amazon Simple Storage Service (Amazon S3) instead of copying it to Amazon Elastic Block Store (Amazon EBS). This way, you can reduce the size of your EBS volumes.

Minimize idle resources with serverless data pipelines

Adopt a serverless architecture for your data pipeline so it only provisions resources when work needs to be done. For example, when you use AWS Glue and AWS Step Functions for data ingestion and preprocessing, you are not maintaining compute infrastructure 24/7. As shown in Figure 3, Step Functions can orchestrate AWS Glue jobs to create event-based serverless ETL/ELT pipelines.

Orchestrating data preparation with AWS Glue and Step Functions

Figure 3. Orchestrating data preparation with AWS Glue and Step Functions

Implement data lifecycle policies aligned with your sustainability goals

Classify data to understand its significance to your workload and your business outcomes. Use this information to determine when you can move data to more energy-efficient storage or safely delete it.

Manage the lifecycle of all your data and automatically enforce deletion timelines to minimize the total storage requirements of your workload using Amazon S3 Lifecycle policies. The Amazon S3 Intelligent-Tiering storage class will automatically move your data to the most sustainable access tier when access patterns change.

Define data retention periods that support your sustainability goals while meeting your business requirements, not exceeding them.

Adopt sustainable storage options

Use the appropriate storage tier to reduce the carbon impact of your workload. On Amazon S3, for example, you can use energy-efficient, archival-class storage for infrequently accessed data, as shown in Figure 4. And if you can easily recreate an infrequently accessed dataset, use the Amazon S3 One Zone-IA class to reduce by 3x or more its carbon footprint.

Data access patterns for Amazon S3

Figure 4. Data access patterns for Amazon S3

Don’t over-provision block storage for notebooks and use object storage services like Amazon S3 for common datasets.

Tip: You can check the free disk space on your SageMaker Notebooks using !df -h.

Select efficient file formats and compression algorithms 

Use efficient file formats such as Parquet or ORC to train your models. Compared to CSV, they can help you reduce your storage by up to 87%.

Migrating to a more efficient compression algorithm can also greatly contribute to your storage reduction efforts. For example, Zstandard produces 10–15% smaller files than Gzip at the same compression speed. Some SageMaker built-in algorithms accept x-recordio-protobuf input, which can be streamed directly from Amazon S3 instead of being copied to a notebook instance.

Minimize data movement across networks

Compress your data before moving it over the network.

Minimize data movement across networks when selecting a Region; store your data close to your producers and train your models close to your data.

Measure results and improve

To monitor and quantify improvements, track the following metrics:

  • Total size of your S3 buckets and storage class distribution, using Amazon S3 Storage Lens
  • DiskUtilization metric of your SageMaker processing jobs
  • StorageBytes metric of your SageMaker Studio shared storage volume

Conclusion

In this blog post, we discussed the importance of defining the overall environmental impact or benefit of your ML workload and why managed AI services or pre-trained ML models are sustainable alternatives to custom models. You also learned best practices to reduce the carbon footprint of your ML workload in the data processing phase.

In the next post, we will continue our sustainability journey through the ML lifecycle and discuss the best practices you can follow in the model development phase.

Want to learn more? Check out the Sustainability Pillar of the AWS Well-Architected Framework, the Architecting for sustainability session at re:Invent 2021, and other blog posts on architecting for sustainability.

Looking for more architecture content? AWS Architecture Center provides reference architecture diagrams, vetted architecture solutions, Well-Architected best practices, patterns, icons, and more!

Optimizing Your IoT Devices for Environmental Sustainability

Post Syndicated from Jonas Bürkel original https://aws.amazon.com/blogs/architecture/optimizing-your-iot-devices-for-environmental-sustainability/

To become more environmentally sustainable, customers commonly introduce Internet of Things (IoT) devices. These connected devices collect and analyze data from commercial buildings, factories, homes, cars, and other locations to measure, understand, and improve operational efficiency. (There will be an estimated 24.1 billion active IoT devices by 2030 according to Transforma Insights.)

IoT devices offer several efficiencies. However, you must consider their environmental impact when using them. Devices must be manufactured, shipped, and installed; they consume energy during operations; and they must eventually be disposed of. They are also a challenge to maintain—an expert may need physical access to the device to diagnose issues and update it. This is especially true for smaller and cheaper devices, because extended device support and ongoing enhancements are often not economically feasible, which results in more frequent device replacements.

When architecting a solution to tackle operational efficiency challenges with IoT, consider the devices’ impact on environmental sustainability. Think critically about the impact of the devices you deploy and work to minimize their overall carbon footprint. This post considers device properties that influence an IoT device’s footprint throughout its lifecycle and shows you how Amazon Web Services (AWS) IoT services can help.

Architect for lean, efficient, and durable devices

So which device properties contribute towards minimizing environmental impact?

  • Lean devices use just the right amount of resources to do their job. They are designed, equipped, and built to use fewer resources, which reduces the impact of manufacturing and disposing them as well as their energy consumption. For example, electronic devices like smartphones use rare-earth metals in many of their components. These materials impact the environment when mined and disposed of. By reducing the amount of these materials used in your design, you can move towards being more sustainable.
  • Efficient devices lower their operational impact by using up-to-date and secure software and enhancements to code and data handling.
  • Durable devices remain in the field for a long time and still provide their intended function and value. They can adapt to changing business requirements and are able to recover from operational failure. The longer the device functions, the lower its carbon footprint will be. This is because device manufacturing, shipping, installing, and disposing will require relatively less effort.

In summary, deploy devices that efficiently use resources to bring business value for as long as possible. Finding the right tradeoff for your requirements allows you to improve operational efficiency while also maximizing your benefit on environmental sustainability.

High-level sustainable IoT architecture

Figure 1 shows building blocks that support sustainable device properties. Their main capabilities are:

  • Enabling remote device management
  • Allowing over-the-air (OTA) updates
  • Integrating with cloud services to access further processing capabilities while ensuring security of devices and data, at rest and in transit
Generic architecture for sustainable IoT devices

Figure 1. Generic architecture for sustainable IoT devices

Introducing AWS IoT Core and AWS IoT Greengrass to your architecture

Assuming you have an at least partially connected environment, the capabilities outlined in Figure 1 can be achieved by using mainly two AWS IoT services:

  • AWS IoT Core is a managed cloud platform that lets connected devices easily and securely interact with cloud applications and other devices.
  • AWS IoT Greengrass is an IoT open-source edge runtime and cloud service that helps you build, deploy, and manage device software.

Figure 2 shows how the building blocks introduced in Figure 1 translate to AWS IoT services.

AWS architecture for sustainable IoT devices

Figure 2. AWS architecture for sustainable IoT devices

Optimize your IoT devices for leanness and efficiency with AWS IoT Core

AWS IoT Core securely integrates IoT devices with other devices and the cloud. It allows devices to publish and subscribe to data in the cloud using device communication protocols. You can use this functionality to create event-driven data processing flows that can be integrated with additional services. For example, you can run machine learning inference, perform analytics, or interact with applications running on AWS.

According to a 451 Research report published in 2019, AWS can perform the same compute task with an 88% lower carbon footprint compared to the median of surveyed US enterprise data centers. More than two-thirds of this carbon reduction is attributable to more efficient servers and a higher server utilization. In 2021, 451 Research published similar reports for data centers in Asia Pacific and Europe.

AWS IoT Core offers this higher utilization and efficiency to edge devices in the following ways:

  • Non-latency critical, resource-intensive tasks can be run in the cloud where they can use managed services and be decommissioned when not in use.
  • Having less code on IoT devices also reduces maintenance efforts and attack surface while making it simpler to architect its software components for efficiency.
  • From a security perspective, AWS IoT Core protects and governs data exchange with the cloud in a central place. Each connected device must be credentialed to interact with AWS IoT. All traffic to and from AWS IoT is sent securely using Transport Layer Security (TLS) mutual authentication protocols. Services like AWS IoT Device Defender are available to analyze, audit, and monitor connected fleets of devices and cloud resources in AWS IoT at scale to detect abnormal behavior and mitigate security risks.

Customer Application:
Tibber, a Nordic energy startup, uses AWS IoT Core to securely exchange billions of messages per month about their clients’ real-time energy usage and aggregate data and perform analytics centrally. This allows them to keep their smart appliance lean and efficient while gaining access to scalable and more sustainable data processing capabilities.


Ensure device durability and longevity with AWS IoT Greengrass

Tasks like interacting with sensors or latency-critical computation must remain local. AWS IoT Greengrass, an edge runtime and cloud service, securely manages devices and device software, thereby enabling remote maintenance and secure OTA updates. It builds upon and extends the capabilities of AWS IoT Core and AWS IoT Device Management, which securely registers, organizes, monitors, and manages IoT devices.

AWS IoT Greengrass brings offline capabilities and simplifies the definition and distribution of business logic across Greengrass core devices. This allows for OTA updates of this business logic as well as the AWS IoT Greengrass Core software itself.

This is a distinctly different approach to what device manufacturers did in the past. Devices no longer need to be designed to run all code for one immutable purpose. Instead, they can be built to be flexible for potential future use cases, which ensures that business logic can be dynamically tweaked, maintained, and troubleshooted remotely when needed.

AWS IoT Greengrass does this using components. Components can represent applications, runtime installers, libraries, or any code that you would run on a device that are then distributed and managed through AWS IoT. Multiple AWS-provided components as well as the recently launched Greengrass Software Catalog extend the edge runtime’s default capabilities. The secure tunneling component, for example, establishes secure bidirectional communication with a Greengrass core device that is behind restricted firewalls, which can then be used for remote assistance and troubleshooting over SSH.

Conclusion

Historically, IoT devices were designed to stably and reliably serve one predefined purpose and were equipped for peak resource usage. However, as discussed in this post, to be sustainable, devices must now be lean, efficient, and durable. They must be manufactured, shipped, and installed once. From there, they should be able to be used flexibly for a long time. This way, they will consume less energy. Their smaller resource footprint and more efficient software allows organizations to improve operational efficiency but also fully realize their positive impact on emissions by minimizing devices’ carbon footprint throughout their lifecycle.

Ready to get started? Familiarize yourself with the topics of environmental sustainability and AWS IoT. Our AWS re:Invent 2021 Sustainability Attendee Guide covers this. When designing your IoT based solution, keep these device properties in mind. Follow the sustainability best practices described in the Sustainability Pillar of the AWS Well-Architected Framework.

Related information