Tag Archives: Software as a Service

Let’s Architect! Learn About Machine Learning on AWS

2024-05-30 Luca Mezzalira

Post Syndicated from Luca Mezzalira original https://aws.amazon.com/blogs/architecture/lets-architect-learn-about-machine-learning-on-aws/

A data-driven approach empowers businesses to make informed decisions based on accurate predictions and forecasts, leading to improved operational efficiency and resource optimization. Machine learning (ML) systems have the remarkable ability to continuously learn and adapt, improving their performance over time as they are exposed to more data. This self-learning capability ensures that organizations can stay ahead of the curve, responding dynamically to changing market conditions and customer preferences, ultimately driving innovation and enhancing competitiveness.

By leveraging the power of machine learning on AWS, businesses can unlock benefits that enhance efficiency, improve decision-making, and foster growth.

AWS re:Invent 2023 – Zero to machine learning: Jump-start your data-driven journey

In this session, see how organizations with constrained resources (budgets, skill gaps, time) can jump start their data-driven journey with advanced analytics and ML capabilities. Learn AWS Working Backwards best practices to drive forward data-related projects that address tangible business value. Then dive into AWS analytics and AI/ML capabilities that simplify and expedite data pipeline delivery and business value from ML workloads. Hear about low-code no-code (LCNC) AWS services within the context of a complete data pipeline architecture.

Take me to this video

Figure 1. See an architecture to analyze customer churn using AWS services

Introduction to MLOps engineering on AWS

As artificial intelligence (AI) continues to revolutionize industries, the ability to operationalize and scale ML models has become a critical challenge. This session introduces the concept of MLOps, a discipline that builds upon and extends the widely adopted DevOps practices prevalent in software development. By applying MLOps principles, organizations can streamline the process of building, training, and deploying ML models, ensuring efficient and reliable model lifecycle management. By mastering MLOps, organizations can bridge the gap between AI development and operations, enabling them to unlock the full potential of their ML initiatives.

Take me to this video

Figure 2. MLOps maturity level will help to assess your organization and understand how to reach the next level.

Behind-the-scenes look at generative AI infrastructure at Amazon

To power generative AI applications while keeping costs under control, AWS designs and builds machine learning accelerators like AWS Trainium and AWS Inferentia. This session introduces purpose-built ML hardware for model training and inference, and shows how Amazon and AWS customers take advantage of those solutions to optimize costs and reduce latency.

You can learn from practical examples showing the impact of those solutions and explanations about how these chips work. ML accelerators are not only beneficial for generative AI workloads; they can also be applied to other use cases, including representation learning, recommender systems, or any scenario with deep neural network models.

Take me to this video

Figure 3. Discover the technology that powers our AI services

How our customers are implementing machine learning on AWS

The following resources drill down into the ML infrastructure that’s used to train large models at Pinterest and the experimentation framework built by Booking.com.

The Pinterest video discusses the strategy to create an ML development environment, orchestrate training jobs, ingest data into the training loop, and accelerate the training speed. You can also learn about the advantages derived from containers in the context of ML and how Pinterest decided to set up the entire ML lifecycle, including distributed model training.

The second resource covers how Booking.com accelerated the experimentation process by leveraging Amazon SageMaker for data analysis, model training, and online experimentation. This resulted in shorter development times for their ranking models and increased speed for the data science teams.

Take me to Pinterest video

Take me to Booking.com blog post

Figure 4. Let’s discover how Pinterest is using AWS services for machine learning workloads

SageMaker Immersion Day

Amazon SageMaker Immersion Day helps customers and partners provide end-to-end understanding of building ML use cases. From feature engineering to understanding various built-in algorithms, with a focus on training, tuning, and deploying the ML model in a production-like scenario, this workshop guides you to bring your own model to perform lift-and-shift from on-premises to the Amazon SageMaker platform. It further demonstrates more advanced concepts like model debugging, model monitoring, and AutoML.

Take me to the workshop

Figure 5. Train, tune and deploy your workload using Amazon SageMaker

See you next time!

Thanks for reading! With this post, introduced you to the art of possibility on using AWS machine learning services. In the next blog, we will talk about cloud migrations.

To revisit any of our previous posts or explore the entire series, visit the Let’s Architect! page.

Throttling a tiered, multi-tenant REST API at scale using API Gateway: Part 2

2022-05-09 Nick Choi

Post Syndicated from Nick Choi original https://aws.amazon.com/blogs/architecture/throttling-a-tiered-multi-tenant-rest-api-at-scale-using-api-gateway-part-2/

In Part 1 of this blog series, we demonstrated why tiering and throttling become necessary at scale for multi-tenant REST APIs, and explored tiering strategy and throttling with Amazon API Gateway.

In this post, Part 2, we will examine tenant isolation strategies at scale with API Gateway and extend the sample code from Part 1.

Enhancing the sample code

To enable this functionality in the sample code (Figure 1), we will make manual changes. First, create one API key for the Free Tier and five API keys for the Basic Tier. Currently, these API keys are private keys for your Amazon Cognito login, but we will make a further change in the backend business logic that will promote them to pooled resources. Note that all of these modifications are specific to this sample code’s implementation; the implementation and deployment of a production code may be completely different (Figure 1).

Figure 1. Cloud architecture of the sample code

Next, in the business logic for thecreateKey(), find the AWS Lambda function in lambda/create_key.js. It appears like this:

function createKey(tableName, key, plansTable, jwt, rand, callback) {
  const pool = getPoolForPlanId( key.planId ) 
  if (!pool) {
    createSiloedKey(tableName, key, plansTable, jwt, rand, callback);
  } else {
    createPooledKey(pool, tableName, key, jwt, callback);
  }
}

The getPoolForPlanId() function does a search for a pool of keys associated with the usage plan. If there is a pool, we “create” a kind of reference to the pooled resource, rather than a completely new key that is created by the API Gateway service directly. The lambda/api_key_pools.js should be empty.

exports.apiKeyPools = [];

In effect, all usage plans were considered as siloed keys up to now. To change that, populate the data structure with values from the six API keys that were created manually. You will have to look up the IDs of the API keys and usage plans that were created in API Gateway (Figures 2 and 3). Using the AWS console to navigate to API Gateway is the most intuitive.

Figure 2. A view of the AWS console when inspecting the ID for the Basic usage plan

Figure 3. A view of the AWS Console when looking up the API key value (not the ID)

When done, your code in lambda/api_key_pools.js should be the following, but instead of ellipses (…), the IDs for the user plans and API keys specific to your environment will appear.

exports.apiKeyPools = [{
    planName: "FreePlan"
    planId: "...",
    apiKeys: [ "..." ]
  },
 {
    planName: "BasicPlan"
    planId: "...",
    apiKeys: [ "...", "...", "...", "...", "..." ]
  }];

After making the code changes, run cdk deploy from the command line to update the Lambda functions. This change will only affect key creation and deletion because of the system implementation. Updates affect only the user’s specific reference to the key, not the underlying resource managed by API Gateway.

When the web application is run now, it will look similar to before—tenants should not be aware what tiering strategy they have been assigned to. The only way to notice the difference would be to create two Free Tier keys, test them, and note that the value of the X-API-KEY header is unchanged between the two.

Now, you have a virtually unlimited number of users who can have API keys in the Free or Basic Tier. By keeping the Premium Tier siloed, you are subject to the 10,000-API-key maximum (less any keys allocated for the lower tiers). You may consider additional techniques to continue to scale, such as replicating your service in another AWS account.

Other production considerations

The sample code is minimal, and it illustrates just one aspect of scaling a Software-as-a-service (SaaS) application. There are many other aspects be considered in a production setting that we explore in this section.

The throttled endpoint, GET /api rely only on API key for authorization for demonstration purpose. For any production implementation consider authentication options for your REST APIs. You may explore and extend to require authentication with Cognito similar to /admin/* endpoints in the sample code.

One API key for Free Tier access and five API keys for Basic Tier access are illustrative in a sample code but not representative of production deployments. Number of API keys with service quota into consideration, business and technical decisions may be made to minimize noisy neighbor effect such as setting blast radius upper threshold of 0.1% of all users. To satisfy that requirement, each tier would need to spread users across at least 1,000 API keys. The number of keys allocated to Basic or Premium Tier would depend on market needs and pricing strategies. Additional allocations of keys could be held in reserve for troubleshooting, QA, tenant migrations, and key retirement.

In the planning phase of your solution, you will decide how many tiers to provide, how many usage plans are needed, and what throttle limits and quotas to apply. These decisions depend on your architecture and business.

To define API request limits, examine the system API Gateway is protecting and what load it can sustain. For example, if your service will scale up to 1,000 requests per second, it is possible to implement three tiers with a 10/50/40 split: the lowest tier shares one common API key with a 100 request per second limit; an intermediate tier has a pool of 25 API keys with a limit of 20 requests per second each; and the highest tier has a maximum of 10 API keys, each supporting 40 requests per second.

Metrics play a large role in continuously evolving your SaaS-tiering strategy (Figure 4). They provide rich insights into how tenants are using the system. Tenant-aware and SaaS-wide metrics on throttling and quota limits can be used to: assess tiering in-place, if tenants’ requirements are being met, and if currently used tenant usage profiles are valid (Figure 5).

Figure 4. Tiering strategy example with 3 tiers and requests allocation per tier

Figure 5. An example SaaS metrics dashboard

API Gateway provides options for different levels of granularity required, including detailed metrics, and execution and access logging to enable observability of your SaaS solution. Granular usage metrics combined with underlying resource consumption leads to managing optimal experience for your tenants with throttling levels and policies per method and per client.

Cleanup

To avoid incurring future charges, delete the resources. This can be done on the command line by typing:

cd ${TOP}/cdk
cdk destroy

cd ${TOP}/react
amplify delete

${TOP} is the topmost directory of the sample code. For the most up-to-date information, see the README.md file.

Conclusion

In this two-part blog series, we have reviewed the best practices and challenges of effectively guarding a tiered multi-tenant REST API hosted in AWS API Gateway. We also explored how throttling policy and quota management can help you continuously evaluate the needs of your tenants and evolve your tiering strategy to protect your backend systems from being overwhelmed by inbound traffic.

Throttling a tiered, multi-tenant REST API at scale using API Gateway: Part 1

2022-05-06 Nick Choi

Post Syndicated from Nick Choi original https://aws.amazon.com/blogs/architecture/throttling-a-tiered-multi-tenant-rest-api-at-scale-using-api-gateway-part-1/

Many software-as-a-service (SaaS) providers adopt throttling as a common technique to protect a distributed system from spikes of inbound traffic that might compromise reliability, reduce throughput, or increase operational cost. Multi-tenant SaaS systems have an additional concern of fairness; excessive traffic from one tenant needs to be selectively throttled without impacting the experience of other tenants. This is also known as “the noisy neighbor” problem. AWS itself enforces some combination of throttling and quota limits on nearly all its own service APIs. SaaS providers building on AWS should design and implement throttling strategies in all of their APIs as well.

In this two-part blog series, we will explore tiering and throttling strategies for multi-tenant REST APIs and review tenant isolation models with hands-on sample code. In part 1, we will look at why a tiering and throttling strategy is needed and show how Amazon API Gateway can help by showing sample code. In part 2, we will dive deeper into tenant isolation models as well as considerations for production.

We selected Amazon API Gateway for this architecture since it is a fully managed service that helps developers to create, publish, maintain, monitor, and secure APIs. First, let’s focus on how Amazon API Gateway can be used to throttle REST APIs with fine granularity using Usage Plans and API Keys. Usage Plans define the thresholds beyond which throttling should occur. They also enable quotas, which sets a maximum usage per a day, week, or month. API Keys are identifiers for distinguishing traffic and determining which Usage Plans to apply for each request. We limit the scope of our discussion to REST APIs because other protocols that API Gateway supports — WebSocket APIs and HTTP APIs — have different throttling mechanisms that do not employ Usage Plans or API Keys.

SaaS providers must balance minimizing cost to serve and providing consistent quality of service for all tenants. They also need to ensure one tenant’s activity does not affect the other tenants’ experience. Throttling and quotas are a key aspect of a tiering strategy and important for protecting your service at any scale. In practice, this impact of throttling polices and quota management is continuously monitored and evaluated as the tenant composition and behavior evolve over time.

Architecture Overview

Figure 1. Cloud Architecture of the sample code.

Figure 1 – Architecture of the sample code

To get a firm foundation of the basics of throttling and quotas with API Gateway, we’ve provided sample code in AWS-Samples on GitHub. Not only does it provide a starting point to experiment with Usage Plans and API Keys in the API Gateway, but we will modify this code later to address complexity that happens at scale. The sample code has two main parts: 1) a web frontend and, 2) a serverless backend. The backend is a serverless architecture using Amazon API Gateway, AWS Lambda, Amazon DynamoDB, and Amazon Cognito. As Figure I illustrates, it implements one REST API endpoint, GET /api, that is protected with throttling and quotas. There are additional APIs under the /admin/* resource to provide Read access to Usage Plans, and CRUD operations on API Keys.

All these REST endpoints could be tested with developer tools such as curl or Postman, but we’ve also provided a web application, to help you get started. The web application illustrates how tenants might interact with the SaaS application to browse different tiers of service, purchase API Keys, and test them. The web application is implemented in React and uses AWS Amplify CLI and SDKs.

Prerequisites

To deploy the sample code, you should have the following prerequisites:

AWS Account
AWS CLI
AWS CDK
Amplify CLI
An AWS CLI profile with permissions to deploy the architecture

For clarity, we’ll use the environment variable, ${TOP}, to indicate the top-most directory in the cloned source code or the top directory in the project when browsing through GitHub.

Detailed instructions on how to install the code are in ${TOP}/INSTALL.md file in the code. After installation, follow the ${TOP}/WALKTHROUGH.md for step-by-step instructions to create a test key with a very small quota limit of 10 requests per day, and use the client to hit that limit. Search for HTTP 429: Too Many Requests as the signal your client has been throttled.

Figure 2: The web application (with browser developer tools enabled) shows that a quick succession of API calls starts returning an HTTP 429 after the quota for the day is exceeded.

Responsibilities of the Client to support Throttling

The Client must provide an API Key in the header of the HTTP request, labelled, “X-Api-Key:”. If a resource in API Gateway has throttling enabled and that header is missing or invalid in the request, then API Gateway will reject the request.

Important: API Keys are simple identifiers, not authorization tokens or cryptographic keys. API keys are for throttling and managing quotas for tenants only and not suitable as a security mechanism. There are many ways to properly control access to a REST API in API Gateway, and we refer you to the AWS documentation for more details as that topic is beyond the scope of this post.

Clients should always test for the response to any network call, and implement logic specific to an HTTP 429 response. The correct action is almost always “try again later.” Just how much later, and how many times before giving up, is application dependent. Common approaches include:

Retry – With simple retry, client retries the request up to defined maximum retry limit configured
Exponential backoff – Exponential backoff uses progressively larger wait time between retries for consecutive errors. As the wait time can become very long quickly, maximum delay and a maximum retry limits should be specified.
Jitter – Jitter uses a random amount of delay between retry to prevent large bursts by spreading the request rate.

AWS SDK is an example client-responsibility implementation. Each AWS SDK implements automatic retry logic that uses a combination of retry, exponential backoff, jitter, and maximum retry limit.

SaaS Considerations: Tenant Isolation Strategies at Scale

While the sample code is a good start, the design has an implicit assumption that API Gateway will support as many API Keys as we have number of tenants. In fact, API Gateway has a quota on available per region per account. If the sample code’s requirements are to support more than 10,000 tenants (or if tenants are allowed multiple keys), then the sample implementation is not going to scale, and we need to consider more scalable implementation strategies.

This is one instance of a general challenge with SaaS called “tenant isolation strategies.” We highly recommend reviewing this white paper ‘SasS Tenant Isolation Strategies‘. A brief explanation here is that the one-resource-per-customer (or “siloed”) model is just one of many possible strategies to address tenant isolation. While the siloed model may be the easiest to implement and offers strong isolation, it offers no economy of scale, has high management complexity, and will quickly run into limits set by the underlying AWS Services. Other models besides siloed include pooling, and bridged models. Again, we recommend the whitepaper for more details.

Figure 3- Tiered multi-tenant architectures often employ different tenant isolation strategies at different tiers. Our example is specific to API Keys, but the technique generalizes to storage, compute, and other resources.

In this example, we implement a range of tenant isolation strategies at different tiers of service. This allows us to protect against “noisy-neighbors” at the highest tier, minimize outlay of limited resources (namely, API-Keys) at the lowest tier, and still provide an effective, bounded “blast radius” of noisy neighbors at the mid-tier.

A concrete development example helps illustrate how this can be implemented. Assume three tiers of service: Free, Basic, and Premium. One could create a single API Key that is a pooled resource among all tenants in the Free Tier. At the other extreme, each Premium customer would get their own unique API Key. They would protect Premium tier tenants from the ‘noisy neighbor’ effect. In the middle, the Basic tenants would be evenly distributed across a set of fixed keys. This is not complete isolation for each tenant, but the impact of any one tenant is contained within “blast radius” defined.

In production, we recommend a more nuanced approach with additional considerations for monitoring and automation to continuously evaluate tiering strategy. We will revisit these topics in greater detail after considering the sample code.

Conclusion

In this post, we have reviewed how to effectively guard a tiered multi-tenant REST API hosted in Amazon API Gateway. We also explored how tiering and throttling strategies can influence tenant isolation models. In Part 2 of this blog series, we will dive deeper into tenant isolation models and gaining insights with metrics.

If you’d like to know more about the topic, the AWS Well-Architected SaaS Lens Performance Efficiency pillar dives deep on tenant tiers and providing differentiated levels of performance to each tier. It also provides best practices and resources to help you design and reduce impact of noisy neighbors your SaaS solution.

To learn more about Serverless SaaS architectures in general, we recommend the AWS Serverless SaaS Workshop and the SaaS Factory Serverless SaaS reference solution that inspired it.

Noise

Tag Archives: Software as a Service

Let’s Architect! Learn About Machine Learning on AWS

AWS re:Invent 2023 – Zero to machine learning: Jump-start your data-driven journey

Introduction to MLOps engineering on AWS

Behind-the-scenes look at generative AI infrastructure at Amazon

How our customers are implementing machine learning on AWS

SageMaker Immersion Day

See you next time!

Throttling a tiered, multi-tenant REST API at scale using API Gateway: Part 2

Enhancing the sample code

Other production considerations

Cleanup

Conclusion

Throttling a tiered, multi-tenant REST API at scale using API Gateway: Part 1

Architecture Overview

Prerequisites

Responsibilities of the Client to support Throttling

SaaS Considerations: Tenant Isolation Strategies at Scale

Conclusion

The collective thoughts of the interwebz

#10 Deploy Stable Diffusion ComfyUI on AWS elastically and efficiently

#9 Let’s Architect! Designing Well-Architected systems

#8 Let’s Architect! Learn About Machine Learning on AWS

#7 Creating an organizational multi-Region failover strategy

#6 Building a three-tier architecture on a budget

#5 Announcing updates to the AWS Well-Architected Framework guidance

#4 Let’s Architect! Serverless developer experience in AWS

#3 London Stock Exchange Group uses chaos engineering on AWS to improve resilience

#2 Achieving Frugal Architecture using the AWS Well-Architected Framework Guidance

#1 How an insurance company implements disaster recovery of 3-tier applications

Thank you!

How our customers are implementing machine learning on AWS

See you next time!

Enhancing the sample code

Other production considerations

Cleanup

Conclusion

Architecture Overview

Prerequisites

Responsibilities of the Client to support Throttling

SaaS Considerations: Tenant Isolation Strategies at Scale

Conclusion

The collective thoughts of the interwebz