All posts by Katja Philipp

Optimizing your AWS Infrastructure for Sustainability, Part III: Networking

2021-10-25 Katja Philipp

Post Syndicated from Katja Philipp original https://aws.amazon.com/blogs/architecture/optimizing-your-aws-infrastructure-for-sustainability-part-iii-networking/

In Part I: Compute and Part II: Storage of this series, we introduced strategies to optimize the compute and storage layer of your AWS architecture for sustainability.

This blog post focuses on the network layer of your AWS infrastructure and proposes concepts to optimize your network utilization.

Optimizing the networking layer of your AWS infrastructure

When you make your applications available to more customers, the packets that travel across the network will increase. Similarly, the larger the size of data, as well as the more distance a packet has to travel, the more resources are required to transmit it. With growing number of application users, optimizing network traffic can ensure that network resource consumption is not growing linearly.

The recommendations in the following sections will help you use your resources more efficiently for the network layer of your workload.

Reducing the network traveled per request

Reducing the data sent over the network and optimizing the path a packet takes will result in a more efficient data transfer. The following table provides metrics related to some AWS services that can help you find potential network optimization opportunities.

Service	Metric/Check	Source
Amazon CloudFront	Cache hit rate	Viewing CloudFront and Lambda@Edge metrics
Amazon CloudFront	Cache hit rate	AWS Trusted Advisor check reference
Amazon Simple Storage Service (Amazon S3)	Data transferred in/out of a bucket	Metrics and dimensions
Amazon Simple Storage Service (Amazon S3)	Data transferred in/out of a bucket	AWS Trusted Advisor check reference
Amazon Elastic Compute Cloud (Amazon EC2)	NetworkPacketsIn/NetworkPacketsOut	List the available CloudWatch metrics for your instances
AWS Trusted Advisor	CloudFront Content Delivery Optimization	AWS Trusted Advisor check reference

We recommend the following concepts to optimize your network utilization.

Read local, write global

The following strategies allow users to read the data from the source closest to them; thus, fewer requests travel longer distances.

If you are operating within a single AWS Region, you should choose a Region that is near the majority of your users. The further your users are away from the Region, the further data needs to travel through the global network.
If your users are spread over multiple Regions, set up multiple copies of the data to reside in each Region. Amazon Relational Database Service (Amazon RDS) and Amazon Aurora let you set up cross-Region read replicas. Amazon DynamoDB global tables allow for fast performance and alleviate network load.

Use a content delivery network

Content delivery networks (CDNs) bring your data closer to the end user. When requested, they cache static content from the original server and deliver it to the user. This shortens the distance each packet has to travel.

CloudFront optimizes network utilization and delivers traffic over CloudFront’s globally distributed edge network. Figure 1 shows a global user base that accesses an S3 bucket directly versus serving cached data from edge locations.
Trusted Advisor includes a check that recommends whether you should use a CDN for your S3 buckets. It analyzes the data transferred out of your S3 bucket and flags the buckets that could benefit from a CloudFront distribution.

Figure 1. Comparison of accessing an S3 bucket directly versus via a CloudFront distribution/edge locations

Optimize CloudFront cache hit ratio

CloudFront caches different versions of an object depending upon the request headers (for example, language, date, or user-agent). You can further optimize your CDN distribution’s cache hit ratio (the number of times an object is served from the CDN versus from the origin) with a Trusted Advisor check. It automatically checks for headers that do not affect the object and then recommends a configuration to ignore those headers and not forward the request to the origin.

Use edge-oriented services

Edge computing brings data storage and computation closer to users. By implementing this approach, you can perform data preprocessing or run machine learning algorithms on the edge.

Edge-oriented services applied on gateways or directly onto user devices reduce network traffic because data does not need to be sent back to the cloud server.
One-time, low-latency tasks are a good fit for edge use cases, like when an autonomous vehicle needs to detect objects nearby. You should generally archive data that needs to be accessed by multiple parties in the cloud, but consider factors such as device hardware and privacy regulations first.
CloudFront Functions can run compute on edge locations and Lambda@Edge can generate Regional edge caches. AWS IoT Greengrass provides edge computing for Internet of Things (IoT) devices.

Reducing the size of data transmitted

Serve compressed files

In addition to caching static assets, you can further optimize network utilization by serving compressed files to your users. You can configure CloudFront to automatically compress objects, which results in faster downloads, leading to faster rendering of webpages.

Enhance Amazon EC2 network performance

Network packets consist of data that you are sending (frame) and the processing overhead information. If you use larger packets, you can pass more data in a single packet and decrease processing overhead.

Jumbo frames use the largest permissible packet that can be passed over the connection. Keep in mind that outside a single virtual private cloud (VPC), over virtual private network (VPN) or internet gateway, traffic is limited to a lower frame regardless of using jumbo frames.

Optimize APIs

If your payloads are large, consider reducing their size to reduce network traffic by compressing your messages for your REST API payloads. Use the right endpoint for your use case. Edge-optimized API endpoints are best suited for geographically distributed clients. Regional API endpoints are best suited for when you have a few clients with higher demands, because they can help reduce connection overhead. Caching your API responses will reduce network traffic and enhance responsiveness.

Conclusion

As your organization’s cloud adoption grows, knowing how efficient your resources are is crucial when optimizing your AWS infrastructure for environmental sustainability. Using the fewest number of resources possible and using them to their fullest will have the lowest impact on the environment.

Throughout this three-part blog post series, we introduced you to the following architectural concepts and metrics for the compute, storage, and network layers of your AWS infrastructure.

Reducing idle resources and maximizing utilization
Shaping demand to existing supply
Managing your data’s lifecycle
Using different storage tiers
Optimizing the path data travels through a network
Reducing the size of data transmitted

This is not an exhaustive list. We hope it is a starting point for you to consider the environmental impact of your resources and how you can build your AWS infrastructure to be more efficient and sustainable. Figure 2 shows an overview of how you can monitor related metrics with CloudWatch and Trusted Advisor.

Figure 2. Overview of services that integrate with CloudWatch and Trusted Advisor for monitoring metrics

Ready to get started? Check out the AWS Sustainability page to find out more about our commitment to sustainability. It provides information about renewable energy usage, case studies on sustainability through the cloud, and more.

Related information

AWS Architecture Monthly magazine, Sustainability issue, August 2021

Optimizing your AWS Infrastructure for Sustainability, Part II: Storage

2021-09-22 Katja Philipp

Post Syndicated from Katja Philipp original https://aws.amazon.com/blogs/architecture/optimizing-your-aws-infrastructure-for-sustainability-part-ii-storage/

In Part I of this series, we introduced you to strategies to optimize the compute layer of your AWS architecture for sustainability. We provided you with success criteria, metrics, and architectural patterns to help you improve resource and energy efficiency of your AWS workloads.

This blog post focuses on the storage layer of your AWS infrastructure and provides recommendations that you can use to store your data sustainably.

Optimizing the storage layer of your AWS infrastructure

Managing your data lifecycle and using different storage tiers are key components to optimizing storage for sustainability. When you consider different storage mechanisms, remember that you’re introducing a trade-off between resource efficiency, access latency, and reliability. This means you’ll need to select your management pattern accordingly.

Reducing idle resources and maximizing utilization

Storing and accessing data efficiently, in addition to reducing idle storage resources results in a more efficient and sustainable architecture. Amazon CloudWatch offers storage metrics that can be used to assess storage improvements, as listed in the following table.

Service	Metric	Source
Amazon Simple Storage Service (Amazon S3)	BucketSizeBytes	Metrics and dimensions
Amazon Simple Storage Service (Amazon S3)	S3 Object Access	Logging requests using server access logging
Amazon Elastic Block Store (Amazon EBS)	VolumeIdleTime	Amazon EBS metrics
Amazon Elastic File System (Amazon EFS)	StorageBytes	Amazon CloudWatch metrics for Amazon EFS
Amazon FSx for Lustre	FreeDataStorageCapacity	Monitoring Amazon FSx for Lustre
Amazon FSx for Windows File Server	FreeStorageCapacity	Monitoring with Amazon CloudWatch

You can monitor these metrics with the architecture shown in Figure 1. CloudWatch provides a unified view of your resource metrics.

Figure 1. CloudWatch for monitoring your storage resources

In the following sections, we present four concepts to reduce idle resources and maximize utilization for your AWS storage layer.

Analyze data access patterns and use storage tiers

Choosing the right storage tier after analyzing data access patterns gives you more sustainable storage options in the cloud.

By storing less volatile data on technologies designed for efficient long-term storage, you will optimize your storage footprint. More specifically, you’ll reduce the impact you have on the lifetime of storage resources by storing slow-changing or unchanging data on magnetic storage, as opposed to solid state memory. For archiving data or storing slow-changing data, consider using Amazon EFS Infrequent Access, Amazon EBS Cold HDD volumes, and Amazon S3 Glacier.
To store your data efficiently throughout its lifetime, create an Amazon S3 Lifecycle configuration that automatically transfers objects to a different storage class based on your pre-defined rules. The Expiring Amazon S3 Objects Based on Last Accessed Date to Decrease Costs blog post shows you how to create custom object expiry rules for Amazon S3 based on the last accessed date of the object.
For data with unknown or changing access patterns, use Amazon S3 Intelligent-Tiering to monitor access patterns and move objects among tiers automatically. In general, you have to make a trade-off between resource efficiency, access latency, and reliability when considering these storage mechanisms. Figure 2 shows an overview of data access patterns for Amazon S3 and the resulting storage tier. For example, in S3 One Zone-IA, energy and server capacity are reduced, because data is stored only within one Availability Zone.

Figure 2. Data access patterns for Amazon S3

Use columnar data formats and compression

Columnar data formats like Parquet and ORC require less storage capacity compared to row-based formats like CSV and JSON.

Parquet consumes up to six times less storage in Amazon S3 compared to text formats. This is because of features such as column-wise compression, different encodings, or compression based on data type, as shown in the Top 10 Performance Tuning Tips for Amazon Athena blog post.
You can improve performance and reduce query costs of Amazon Athena by 30–90 percent by compressing, partitioning, and converting your data into columnar formats. Using columnar data formats and compressions reduces the amount of data scanned.

Reduce unused storage resources

Right size or delete unused storage volumes

As shown in the Cost Optimization on AWS video, right-sizing storage by data type and usage reduces your associated costs by up to 50 percent.

A straightforward way to reduce unused storage resources is to delete unattached EBS volumes. If the volume needs to be quickly restored later on, you can store an Amazon EBS snapshot before deletion.
You can also use Amazon Data Lifecycle Manager to retain and delete EBS snapshots and Amazon EBS-backed Amazon Machine Images (AMIs) automatically. This further reduces the storage footprint of stale resources.
To avoid over-provisioning volumes, see the Automating Amazon EBS Volume-resizing blog post. It demonstrates an automated workflow that can expand a volume every time it reaches a capacity threshold. These Amazon EBS elastic volumes extend a volume when needed, as shown in the Amazon EBS Update blog post.
Another way to optimize block storage is to identify volumes that are underutilized and downsize them. Or you can change the volume type, as shown in the AWS Storage Optimization whitepaper.

Modify the retention period of CloudWatch Logs

By default, CloudWatch Logs are kept indefinitely and never expire. You can adjust the retention policy for each log group to be between one day and 10 years. For compliance reasons, export log data to Amazon S3 and use archival storage such as Amazon S3 Glacier.

Deduplicate data

Large datasets often have redundant data, which increases your storage footprint.

By turning on data deduplication for your Amazon FSx for Windows File Server, you will optimize data storage. For general-purpose file shares, storage space can be reduced by 50–60 percent through deduplication.
If you have datasets residing in Amazon S3, you can automatically get rid of duplicates by using the FindMatches transform provided by AWS Lake Formation. See the Integrate and deduplicate datasets using AWS Lake Formation FindMatches blog post for more information on how to set it up.

Conclusion

In this blog post, we discussed data storing techniques to increase your storage efficiency. These include right-sizing storage volumes; choosing storage tiers depending on different data access patterns; and compressing and converting data.

These techniques allow you to optimize your AWS infrastructure for environmental sustainability.

This blog post is the second post in the series, you can find the first part of the series linked in the following section. In the next part of this blog post series, we will show you how you can optimize the networking part of your IT infrastructure for sustainability in the cloud!

Noise

All posts by Katja Philipp

Optimizing your AWS Infrastructure for Sustainability, Part III: Networking

Optimizing the networking layer of your AWS infrastructure

Reducing the network traveled per request

Read local, write global

Use a content delivery network

Optimize CloudFront cache hit ratio

Use edge-oriented services

Reducing the size of data transmitted

Serve compressed files

Enhance Amazon EC2 network performance

Optimize APIs

Conclusion

Other blog posts in this series

Related information

Optimizing your AWS Infrastructure for Sustainability, Part II: Storage

Optimizing the storage layer of your AWS infrastructure

Reducing idle resources and maximizing utilization

Analyze data access patterns and use storage tiers

Use columnar data formats and compression

Reduce unused storage resources

Right size or delete unused storage volumes

Modify the retention period of CloudWatch Logs

Deduplicate data

Conclusion

Related information

The collective thoughts of the interwebz