Tag Archives: caching

Offloading SQL for Amazon RDS using the Heimdall Proxy

Post Syndicated from Antony Prasad Thevaraj original https://aws.amazon.com/blogs/architecture/offloading-sql-for-amazon-rds-using-the-heimdall-proxy/

Getting the maximum scale from your database often requires fine-tuning the application. This can increase time and incur cost – effort that could be used towards other strategic initiatives. The Heimdall Proxy was designed to intelligently manage SQL connections to help you get the most out of your database.

In this blog post, we demonstrate two SQL offload features offered by this proxy:

  1. Automated query caching
  2. Read/Write split for improved database scale

By leveraging the solution shown in Figure 1, you can save on development costs and accelerate the onboarding of applications into production.

Figure 1. Heimdall Proxy distributed, auto-scaling architecture

Figure 1. Heimdall Proxy distributed, auto-scaling architecture

Why query caching?

For ecommerce websites with high read calls and infrequent data changes, query caching can drastically improve your Amazon Relational Database Sevice (RDS) scale. You can use Amazon ElastiCache to serve results. Retrieving data from cache has a shorter access time, which reduces latency and improves I/O operations.

It can take developers considerable effort to create, maintain, and adjust TTLs for cache subsystems. The proxy technology covered in this article has features that allow for automated results caching in grid-caching chosen by the user, without code changes. What makes this solution unique is the distributed, scalable architecture. As your traffic grows, scaling is supported by simply adding proxies. Multiple proxies work together as a cohesive unit for caching and invalidation.

View video: Heimdall Data: Query Caching Without Code Changes

Why Read/Write splitting?

It can be fairly straightforward to configure a primary and read replica instance on the AWS Management Console. But it may be challenging for the developer to implement such a scale-out architecture.

Some of the issues they might encounter include:

  • Replication lag. A query read-after-write may result in data inconsistency due to replication lag. Many applications require strong consistency.
  • DNS dependencies. Due to the DNS cache, many connections can be routed to a single replica, creating uneven load distribution across replicas.
  • Network latency. When deploying Amazon RDS globally using the Amazon Aurora Global Database, it’s difficult to determine how the application intelligently chooses the optimal reader.

The Heimdall Proxy streamlines the ability to elastically scale out read-heavy database workloads. The Read/Write splitting supports:

  • ACID compliance. Determines the replication lag and know when it is safe to access a database table, ensuring data consistency.
  • Database load balancing. Tracks the status of each DB instance for its health and evenly distribute connections without relying on DNS.
  • Intelligent routing. Chooses the optimal reader to access based on the lowest latency to create local-like response times. Check out our Aurora Global Database blog.

View video: Heimdall Data: Scale-Out Amazon RDS with Strong Consistency

Customer use case: Tornado

Hayden Cacace, Director of Engineering at Tornado

Tornado is a modern web and mobile brokerage that empowers anyone who aspires to become a better investor.

Our engineering team was tasked to upgrade our backend such that it could handle a massive surge in traffic. With a 3-month timeline, we decided to use read replicas to reduce the load on the main database instance.

First, we migrated from Amazon RDS for PostgreSQL to Aurora for Postgres since it provided better data replication speed. But we still faced a problem – the amount of time it would take to update server code to use the read replicas would be significant. We wanted the team to stay focused on user-facing enhancements rather than server refactoring.

Enter the Heimdall Proxy: We evaluated a handful of options for a database proxy that could automatically do Read/Write splits for us with no code changes, and it became clear that Heimdall was our best option. It had the Read/Write splitting “out of the box” with zero application changes required. And it also came with database query caching built-in (integrated with Amazon ElastiCache), which promised to take additional load off the database.

Before the Tornado launch date, our load testing showed the new system handling several times more load than we were able to previously. We were using a primary Aurora Postgres instance and read replicas behind the Heimdall proxy. When the Tornado launch date arrived, the system performed well, with some background jobs averaging around a 50% hit rate on the Heimdall cache. This has really helped reduce the database load and improve the runtime of those jobs.

Using this solution, we now have a data architecture with additional room to scale. This allows us to continue to focus on enhancing the product for all our customers.

Download a free trial from the AWS Marketplace.

Resources

Heimdall Data, based in the San Francisco Bay Area, is an AWS Advanced Tier ISV partner. They have Amazon Service Ready designations for Amazon RDS and Amazon Redshift. Heimdall Data offers a database proxy that offloads SQL improving database scale. Deployment does not require code changes. For other proxy options, consider the Amazon RDS Proxy, PgBouncer, PgPool-II, or ProxySQL.

Why We Started Putting Unpopular Assets in Memory

Post Syndicated from Yuchen Wu original https://blog.cloudflare.com/why-we-started-putting-unpopular-assets-in-memory/

Why We Started Putting Unpopular Assets in Memory

Why We Started Putting Unpopular Assets in Memory

Part of Cloudflare’s service is a CDN that makes millions of Internet properties faster and more reliable by caching web assets closer to browsers and end users.

We make improvements to our infrastructure to make end-user experiences faster, more secure, and more reliable all the time. Here’s a case study of one such engineering effort where something counterintuitive turned out to be the right approach.

Our storage layer, which serves millions of cache hits per second globally, is powered by high IOPS NVMe SSDs.

Although SSDs are fast and reliable, cache hit tail latency within our system is dominated by the IO capacity of our SSDs. Moreover, because flash memory chips wear out, a non-negligible portion of our operational cost, including the cost of new devices, shipment, labor and downtime, is spent on replacing dead SSDs.

Recently, we developed a technology that reduces our hit tail latency and reduces the wear out of SSDs. This technology is a memory-SSD hybrid storage system that puts unpopular assets in memory.

The end result: cache hits from our infrastructure are now faster for all customers.

You may have thought that was a typo in my explanation of how the technique works. In fact, a few colleagues thought the same when we proposed this project internally, “I think you meant to say ‘popular assets’ in your document”. Intuitively, we should only put popular assets in memory since memory is even faster than SSDs.

You are not wrong. But I’m also correct. This blog post explains why.

First let me explain how we already use memory to speed up the IO of popular assets.

Page cache

Since it is so obvious that memory speeds up popular assets, Linux already does much of the hard work for us. This functionality is called the “page cache”. Files in Linux systems are organized into pages internally, hence the name.

The page cache uses the system’s available memory to cache reads from and buffer writes to durable storage. During typical operations of a Cloudflare server, all the services and processes themselves do not consume all physical memory. The remaining memory is used by the page cache.

Below shows the memory layout of a typical Cloudflare edge server with 256GB of physical memory.

Why We Started Putting Unpopular Assets in Memory

We can see the used memory (RSS) on top. RSS memory consumption for all services is about 87.71 GB, with our cache service consuming 4.1 GB. At the bottom we see overall page cache usage. Total size is 128.6 GB, with the cache service using 41.6 GB of it to serve cached web assets.

Caching reads

Pages that are frequently used are cached in memory to speed up reads. Linux uses the least recently used (LRU) algorithm among other techniques to decide what to keep in page cache. In other words, popular assets are already in memory.

Buffering writes

The page cache also buffers writes. Writes are not synchronized to disk right away. They are buffered until the page cache decides to do so at a later time, either periodically or due to memory pressure. Therefore, it also reduces repetitive writes to the same file.

However, because of the nature of the caching workload, a web asset arriving at our cache system is usually static. Unlike workloads of databases, where a single value can get repeatedly updated, in our caching workload, there are very few repetitive updates to an asset before it is completely cached. Therefore, the page cache does not help in reducing writes to disk since all pages will eventually be written to disks.

Smarter than the page cache?

Although the Linux page cache works great out of the box, our caching system can still outsmart it since our system knows more about the context of the data being accessed, such as the content type, access frequency and the time-to-live value.

Instead of improving caching popular assets, which the page cache already does, in the next section, we will focus on something the page cache cannot do well: putting unpopular assets in memory.

Putting unpopular assets in memory

Motivation: Writes are bad

Worn out SSDs are solely caused by writes, not reads. As mentioned above, it is costly to replace SSDs. Since Cloudflare operates data centers in 200 cities across the world, the cost of shipment and replacement can be a significant proportion of the cost of the SSD itself.

Moreover, write operations on SSDs slow down read operations. This is because SSD writes require program/erase (P/E) operations issued to the storage chips. These operations block reads to the same chips. In short, the more writes, the slower the reads. Since the page cache already cached popular assets, this effect has the highest impact on our cache hit tail latency. Previously we talked about how we dramatically reduced the impact of such latency. In the meantime, directly reducing the latency itself will also be very helpful.

Motivation: Many assets are “one-hit-wonders”

One-hit-wonders refer to cached assets that are never accessed; they get accessed once and then cached yet never read again. In addition, one-hit-wonders also include cached assets that are evicted before they are used once due to their lack of popularity. The performance of websites is not harmed even if we don’t cache such assets in the first place.

To quantify how many of our assets are one-hit-wonders and to test the hypothesis that reducing disk writes can speed up disk reads, we conducted a simple experiment. The experiment: using a representative sample of traffic, we modified our caching logic to only cache an asset the second time our server encountered it.

Why We Started Putting Unpopular Assets in Memory
Why We Started Putting Unpopular Assets in Memory

The red line indicates when the experiment started. The green line represents the experimental group and the yellow line represents the control group.

The result: disk writes per second were reduced by roughly half and corresponding disk hit tail latency was reduced by approximately five percent.

This experiment demonstrated that if we don’t cache one-hit-wonders, our disks last longer and our cache hits are faster.

One more benefit of not caching one-hit-wonders, which was not immediately obvious from the experiment, is that, it increases the effective capacity of cache because of the reduced competing pressure from the removal of the unpopular assets. This in turn increases the cache hit ratio and cache retention.

The next question: how can we replicate these results, at scale, in production, without impacting customer experience negatively?

Potential implementation approaches

Remember but don’t cache

One smart way to eliminate one-hit-wonders from cache is to not cache an asset during its first few misses but to remember its appearances. When the cache system encounters the same asset repeatedly over a certain number of times, the system will start to cache the asset. This is basically what we did in the experiment above.

In order to remember the appearances, beside hash tables, many memory efficient data structures, such as Bloom filter, Counting Bloom filter and Count-min sketch, can be used with their specific trade-offs.

Such an approach solves one problem but introduces a new one: every asset is missed at least twice before it is cached. Because the number of cache misses is amplified compared to what we do today, this approach multiplies both the cost of bandwidth and server utilization for our customers. This is not an acceptable tradeoff in our eyes.

Transient cache in memory

A better idea we came up with: put every asset we want to cache to disk in memory first. The memory is called a “transient cache”. Assets in transient cache are promoted to the permanent cache, backed by SSDs, after they are accessed a certain number of times, indicating they are popular enough to be stored persistently. If they are not promoted, they are eventually evicted from transient cache because of lack of popularity.

Why We Started Putting Unpopular Assets in Memory

This design makes sure that disk writes are only spent on assets that are popular, while ensuring every asset is only “missed” once.

The transient cache system on the real world Internet

We implemented this idea and deployed it to our production caching systems. Here are some learnings and data points we would like to share.

Trade-offs

Our transient cache technology does not pull a performance improvement rabbit out of a hat. It achieves improved performance for our customers by consuming other resources in our system. These resource consumption trade-offs must be carefully considered when deploying to systems at the scale at which Cloudflare operates.

Transient cache memory footprint

The amount of memory allocated for our transient cache matters. Cache size directly dictates how long a given asset will live in cache before it is evicted. If the transient cache is too small, new assets will evict old assets before the old assets receive hits that promote them to disk. From a customer’s perspective, cache retention being too short is unacceptable because it leads to more misses, commensurate higher cost, and worse eyeball performance.

Competition with page cache

As mentioned before, the page cache uses the system’s available memory. The more memory we allocate to the transient cache for unpopular assets, the less memory we have for popular assets in the page cache. Finding the sweet spot for the trade-off between these two depends on traffic volume, usage patterns, and the specific hardware configuration our software is running on.

Competition with process memory usage

Another competitor to transient cache is regular memory used by our services and processes. Unlike the page cache, which is used opportunistically, process memory usage is a hard requirement for the operation of our software. An additional wrinkle is that some of our edge services increase their total RSS memory consumption when performing a “zero downtime upgrade”, which runs both the old and new version of the service in parallel for a short period of time. From our experience, if there is not enough physical memory to do so, the system’s overall performance will degrade to an unacceptable level due to increased IO pressure from reduced page cache space.

In production

Given the considerations above, we enabled this technology conservatively to start. The new hybrid storage system is used only by our newer generations of servers that have more physical memory than older generations. Additionally, the system is only used on a subset of assets. By tuning the size of the transient cache and what percentage of requests use it at all, we are able to explore the sweet spots for the trade-offs between performance, cache retention, and memory resource consumption.

The chart below shows the IO usage of an enabled cohort before and after (red line) enabling transient cache.

Why We Started Putting Unpopular Assets in Memory

We can see that disk write (in bytes per second) was reduced by 25% at peak and 20% off peak. Although it is too early to tell, the life-span of those SSDs should be extended proportionally.

More importantly for our customers: our CDN cache hit tail latency is measurably decreased!

Why We Started Putting Unpopular Assets in Memory

Future work

We just made the first step towards a smart memory-SSD hybrid storage system. There is still a lot to be done.

Broader deployment

Currently, we only apply this technology to specific hardware generations and a portion of traffic. When we decommission our older generations of servers and replace them with newer generations with more physical memory, we will be able to apply the transient cache to a larger portion of traffic. Per our preliminary experiments, in certain data centers, applying the transient cache to all traffic is able to reduce disk writes up to 70% and decrease end user visible tail latency by up to 20% at peak hours.

Smarter promotion algorithms

Our implemented promotion strategy for moving assets from transient cache to durable storage is based on a simple heuristic: the number of hits. Other information can be used to make a better promotion decision. For example, the TTL (time to live) of an asset can be taken into account as well. One such strategy is to refuse to promote an asset if the TTL of it has only a few seconds left, which will further reduce unnecessary disk writes.

Another aspect of the algorithms is demotion. We can actively or passively (during eviction) demote assets from persistent cache to transient cache based on the criteria mentioned above. The demotion itself does not directly reduce writes to persistent cache but it could help cache hit ratio and cache retention if done smartly.

Transient cache on other types of storage

We choose memory to host the transient cache because its wear out cost is low (none). This is also the case for HDDs. It is possible to build a hybrid storage system across memory, SSDs and HDDs to find the right balance between their costs and performance characteristics.

Conclusion

By putting unpopular assets in memory, we are able to trade-off between memory usage, tail hit latency and SSD lifetimes. With our current configuration, we are able to extend SSD life while reducing tail hit latency, at the expense of available system memory. Transient cache also opens possibilities for future heuristic and storage hierarchy improvements.

Whether the techniques described in this post benefit your system depends on the workload and the hardware resource constraints of your system. Hopefully us sharing this counterintuitive idea can inspire other novel ways of building faster and more efficient systems.

And finally, if doing systems work like this sounds interesting to you, come work at Cloudflare!

re:Invent 2019: Introducing the Amazon Builders’ Library (Part II)

Post Syndicated from Annik Stahl original https://aws.amazon.com/blogs/architecture/reinvent-2019-introducing-the-amazon-builders-library-part-ii/

In last week’s post, I told you about a new site we introduced at re:Invent at the beginning of this month, the Amazon Builders’ Library, a site that’s chock-full articles by senior technical leaders that help you understand the underpinnings of both Amazon.com and AWS.

Below are four more architecture-based articles that describe how Amazon develops, architects, releases, and operates technology.

Caching Challenges and Strategies

Caching challenges

Over years of building services at Amazon we’ve experienced various versions of the following scenario: We build a new service, and this service needs to make some network calls to fulfill its requests. Perhaps these calls are to a relational database, or an AWS service like Amazon DynamoDB, or to another internal service. In simple tests or at low request rates the service works great, but we notice a problem on the horizon. The problem might be that calls to this other service are slow or that the database is expensive to scale out as call volume increases. We also notice that many requests are using the same downstream resource or the same query results, so we think that caching this data could be the answer to our problems. We add a cache and our service appears much improved. We observe that request latency is down, costs are reduced, and small downstream availability drops are smoothed over. After a while, no one can remember life before the cache. Dependencies reduce their fleet sizes accordingly, and the database is scaled down. Just when everything appears to be going well, the service could be poised for disaster. There could be changes in traffic patterns, failure of the cache fleet, or other unexpected circumstances that could lead to a cold or otherwise unavailable cache. This in turn could cause a surge of traffic to downstream services that can lead to outages both in our dependencies and in our service.

Read the full article by Matt Brinkley, Principal Engineer, and Jas Chhabra, Principal Engineer

Avoiding Fallback in Distributed Systems

Avoiding fallback

Critical failures prevent a service from producing useful results. For example, in an ecommerce website, if a database query for product information fails, the website cannot display the product page successfully. Amazon services must handle the majority of critical failures in order to be reliable.

This article covers fallback strategies and why we almost never use them at Amazon. You might find this surprising. After all, engineers often use the real world as a starting point for their designs. And in the real world, fallback strategies must be planned in advance and used when necessary. Let’s say an airport’s display boards go out. A contingency plan (such as humans writing flight information on whiteboards) must be in place to handle this situation, because passengers still need to find their gates. But consider how awful the contingency plan is: the difficulty of reading the whiteboards, the difficulty of keeping them up-to-date, and the risk that humans will add incorrect information. The whiteboard fallback strategy is necessary but it’s riddled with problems.

Read the full article by Jacob Gabrielson, Senior Principal Engineer

Leader Elections in Distributed Systems

Leader elections

Leader election is the simple idea of giving one thing (a process, host, thread, object, or human) in a distributed system some special powers. Those special powers could include the ability to assign work, the ability to modify a piece of data, or even the responsibility of handling all requests in the system.

Leader election is a powerful tool for improving efficiency, reducing coordination, simplifying architectures, and reducing operations. On the other hand, leader election can introduce new failure modes and scaling bottlenecks. In addition, leader election may make it more difficult for you to evaluate the correctness of a system.

Because of these complications, we carefully consider other options before implementing leader election. For data processing and workflows, workflow services like AWS Step Functions can achieve many of the same benefits as leader election and avoid many of its risks. For other systems, we often implement idempotent APIs, optimistic locking, and other patterns that make a single leader unnecessary.

In this article, I discuss some of the pros and cons of leader election in general and how Amazon approaches leader election in our distributed systems, including insights into leader failure.

Read the full article by Mark Brooker, Senior Principal Engineer

Workload Isolation Using Shuffle-Sharding

Shuffle-sharding

Not long after AWS began offering services, AWS customers made clear that they wanted to be able to use our Amazon Simple Storage Service (S3), Amazon CloudFront, and Elastic Load Balancing services at the “root” of their domain, that is, for names like “amazon.com” and not just for names like “www.amazon.com”.

That may seem very simple. However, due to a design decision in the DNS protocol, made back in the 1980s, it’s harder than it seems. DNS has a feature called CNAME that allows the owner of a domain to offload a part of their domain to another provider to host, but it doesn’t work at the root or top level of a domain. To serve our customers’ needs, we’d have to actually host our customers’ domains. When we host a customer’s domain, we can return whatever the current set of IP addresses are for Amazon S3, Amazon CloudFront, or Elastic Load Balancing. These services are constantly expanding and adding IP addresses, so it’s not something that customers could easily hard-code in their domain configurations either.

It’s no small task to host DNS. If DNS is having problems, an entire business can be offline. However, after we identified the need, we set out to solve it in the way that’s typical at Amazon—urgently. We carved out a small team of engineers, and we got to work

Read the full article by Colm MacCárthaigh, Senior Principal Engineer

Want to learn more about the Amazon Builders’ Library? Visit our FAQ.

Next week we’ll wrap up 2019 with a top ten list of the most-visited Architecture blog posts of 2019.

Re-Architecting the Video Gatekeeper

Post Syndicated from Netflix Technology Blog original https://medium.com/netflix-techblog/re-architecting-the-video-gatekeeper-f7b0ac2f6b00?source=rss----2615bd06b42e---4

By Drew Koszewnik

This is the story about how the Content Setup Engineering team used Hollow, a Netflix OSS technology, to re-architect and simplify an essential component in our content pipeline — delivering a large amount of business value in the process.

The Context

Each movie and show on the Netflix service is carefully curated to ensure an optimal viewing experience. The team responsible for this curation is Title Operations. Title Operations will confirm, among other things:

  • We are in compliance with the contracts — date ranges and places where we can show a video are set up correctly for each title
  • Video with captions, subtitles, and secondary audio “dub” assets are sourced, translated, and made available to the right populations around the world
  • Title name and synopsis are available and translated
  • The appropriate maturity ratings are available for each country

When a title meets all of the minimum above requirements, then it is allowed to go live on the service. Gatekeeper is the system at Netflix responsible for evaluating the “liveness” of videos and assets on the site. A title doesn’t become visible to members until Gatekeeper approves it — and if it can’t validate the setup, then it will assist Title Operations by pointing out what’s missing from the baseline customer experience.

Gatekeeper accomplishes its prescribed task by aggregating data from multiple upstream systems, applying some business logic, then producing an output detailing the status of each video in each country.

The Tech

Hollow, an OSS technology we released a few years ago, has been best described as a total high-density near cache:

  • Total: The entire dataset is cached on each node — there is no eviction policy, and there are no cache misses.
  • High-Density: encoding, bit-packing, and deduplication techniques are employed to optimize the memory footprint of the dataset.
  • Near: the cache exists in RAM on any instance which requires access to the dataset.

One exciting thing about the total nature of this technology — because we don’t have to worry about swapping records in-and-out of memory, we can make assumptions and do some precomputation of the in-memory representation of the dataset which would not otherwise be possible. The net result is, for many datasets, vastly more efficient use of RAM. Whereas with a traditional partial-cache solution you may wonder whether you can get away with caching only 5% of the dataset, or if you need to reserve enough space for 10% in order to get an acceptable hit/miss ratio — with the same amount of memory Hollow may be able to cache 100% of your dataset and achieve a 100% hit rate.

And obviously, if you get a 100% hit rate, you eliminate all I/O required to access your data — and can achieve orders of magnitude more efficient data access, which opens up many possibilities.

The Status-Quo

Until very recently, Gatekeeper was a completely event-driven system. When a change for a video occurred in any one of its upstream systems, that system would send an event to Gatekeeper. Gatekeeper would react to that event by reaching into each of its upstream services, gathering the necessary input data to evaluate the liveness of the video and its associated assets. It would then produce a single-record output detailing the status of that single video.

Old Gatekeeper Architecture

This model had several problems associated with it:

  • This process was completely I/O bound and put a lot of load on upstream systems.
  • Consequently, these events would queue up throughout the day and cause processing delays, which meant that titles may not actually go live on time.
  • Worse, events would occasionally get missed, meaning titles wouldn’t go live at all until someone from Title Operations realized there was a problem.

The mitigation for these issues was to “sweep” the catalog so Videos matching specific criteria (e.g., scheduled to launch next week) would get events automatically injected into the processing queue. Unfortunately, this mitigation added many more events into the queue, which exacerbated the problem.

Clearly, a change in direction was necessary.

The Idea

We decided to employ a total high-density near cache (i.e., Hollow) to eliminate our I/O bottlenecks. For each of our upstream systems, we would create a Hollow dataset which encompasses all of the data necessary for Gatekeeper to perform its evaluation. Each upstream system would now be responsible for keeping its cache updated.

New Gatekeeper Architecture

With this model, liveness evaluation is conceptually separated from the data retrieval from upstream systems. Instead of reacting to events, Gatekeeper would continuously process liveness for all assets in all videos across all countries in a repeating cycle. The cycle iterates over every video available at Netflix, calculating liveness details for each of them. At the end of each cycle, it produces a complete output (also a Hollow dataset) representing the liveness status details of all videos in all countries.

We expected that this continuous processing model was possible because a complete removal of our I/O bottlenecks would mean that we should be able to operate orders of magnitude more efficiently. We also expected that by moving to this model, we would realize many positive effects for the business.

  • A definitive solution for the excess load on upstream systems generated by Gatekeeper
  • A complete elimination of liveness processing delays and missed go-live dates.
  • A reduction in the time the Content Setup Engineering team spends on performance-related issues.
  • Improved debuggability and visibility into liveness processing.

The Problem

Hollow can also be thought of like a time machine. As a dataset changes over time, it communicates those changes to consumers by breaking the timeline down into a series of discrete data states. Each data state represents a snapshot of the entire dataset at a specific moment in time.

Hollow is like a time machine

Usually, consumers of a Hollow dataset are loading the latest data state and keeping their cache updated as new states are produced. However, they may instead point to a prior state — which will revert their view of the entire dataset to a point in the past.

The traditional method of producing data states is to maintain a single producer which runs a repeating cycle. During that cycle, the producer iterates over all records from the source of truth. As it iterates, it adds each record to the Hollow library. Hollow then calculates the differences between the data added during this cycle and the data added during the last cycle, then publishes the state to a location known to consumers.

Traditional Hollow usage

The problem with this total-source-of-truth iteration model is that it can take a long time. In the case of some of our upstream systems, this could take hours. This data-propagation latency was unacceptable — we can’t wait hours for liveness processing if, for example, Title Operations adds a rating to a movie that needs to go live imminently.

The Improvement

What we needed was a faster time machine — one which could produce states with a more frequent cadence, so that changes could be more quickly realized by consumers.

Incremental Hollow is like a faster time machine

To achieve this, we created an incremental Hollow infrastructure for Netflix, leveraging work which had been done in the Hollow library earlier, and pioneered in production usage by the Streaming Platform Team at Target (and is now a public non-beta API).

With this infrastructure, each time a change is detected in a source application, the updated record is encoded and emitted to a Kafka topic. A new component that is not part of the source application, the Hollow Incremental Producer service, performs a repeating cycle at a predefined cadence. During each cycle, it reads all messages which have been added to the topic since the last cycle and mutates the Hollow state engine to reflect the new state of the updated records.

If a message from the Kafka topic contains the exact same data as already reflected in the Hollow dataset, no action is taken.

Hollow Incremental Producer Service

To mitigate issues arising from missed events, we implement a sweep mechanism that periodically iterates over an entire source dataset. As it iterates, it emits the content of each record to the Kafka topic. In this way, any updates which may have been missed will eventually be reflected in the Hollow dataset. Additionally, because this is not the primary mechanism by which updates are propagated to the Hollow dataset, this does not have to be run as quickly or frequently as a cycle must iterate the source in traditional Hollow usage.

The Hollow Incremental Producer is capable of reading a great many messages from the Kafka topic and mutating its Hollow state internally very quickly — so we can configure its cycle times to be very short (we are currently defaulting this to 30 seconds).

This is how we built a faster time machine. Now, if Title Operations adds a maturity rating to a movie, within 30 seconds, that data is available in the corresponding Hollow dataset.

The Tangible Result

With the data propagation latency issue solved, we were able to re-implement the Gatekeeper system to eliminate all I/O boundaries. With the prior implementation of Gatekeeper, re-evaluating all assets for all videos in all countries would have been unthinkable — it would tie up the entire content pipeline for more than a week (and we would then still be behind by a week since nothing else could be processed in the meantime). Now we re-evaluate everything in about 30 seconds — and we do that every minute.

There is no such thing as a missed or delayed liveness evaluation any longer, and the disablement of the prior Gatekeeper system reduced the load on our upstream systems — in some cases by up to 80%.

Load reduction on one upstream system

In addition to these performance benefits, we also get a resiliency benefit. In the prior Gatekeeper system, if one of the upstream services went down, we were unable to evaluate liveness at all because we were unable to retrieve any data from that system. In the new implementation, if one of the upstream systems goes down then it does stop publishing — but we still gate stale data for its corresponding dataset while all others make progress. So for example, if the translated synopsis system goes down, we can still bring a movie on-site in a region if it was held back for, and then receives, the correct subtitles.

The Intangible Result

Perhaps even more beneficial than the performance gains has been the improvement in our development velocity in this system. We can now develop, validate, and release changes in minutes which might have before taken days or weeks — and we can do so with significantly increased release quality.

The time-machine aspect of Hollow means that every deterministic process which uses Hollow exclusively as input data is 100% reproducible. For Gatekeeper, this means that an exact replay of what happened at time X can be accomplished by reverting all of our input states to time X, then re-evaluating everything again.

We use this fact to iterate quickly on changes to the Gatekeeper business logic. We maintain a PREPROD Gatekeeper instance which “follows” our PROD Gatekeeper instance. PREPROD is also continuously evaluating liveness for the entire catalog, but publishing its output to a different Hollow dataset. At the beginning of each cycle, the PREPROD environment will gather the latest produced state from PROD, and set each of its input datasets to the exact same versions which were used to produce the PROD output.

The PREPROD Gatekeeper instance “follows” the PROD instance

When we want to make a change to the Gatekeeper business logic, we do so and then publish it to our PREPROD cluster. The subsequent output state from PREPROD can be diffed with its corresponding output state from PROD to view the precise effect that the logic change will cause. In this way, at a glance, we can validate that our changes have precisely the intended effect, and zero unintended consequences.

A Hollow diff shows exactly what changes

This, coupled with some iteration on the deployment process, has resulted in the ability for our team to code, validate, and deploy impactful changes to Gatekeeper in literally minutes — at least an order of magnitude faster than in the prior system — and we can do so with a higher level of safety than was possible in the previous architecture.

Conclusion

This new implementation of the Gatekeeper system opens up opportunities to capture additional business value, which we plan to pursue over the coming quarters. Additionally, this is a pattern that can be replicated to other systems within the Content Engineering space and elsewhere at Netflix — already a couple of follow-up projects have been launched to formalize and capitalize on the benefits of this n-hollow-input, one-hollow-output architecture.

Content Setup Engineering is an exciting space right now, especially as we scale up our pipeline to produce more content with each passing quarter. We have many opportunities to solve real problems and provide massive value to the business — and to do so with a deep focus on computer science, using and often pioneering leading-edge technologies. If this kind of work sounds appealing to you, reach out to Ivan to get the ball rolling.


Re-Architecting the Video Gatekeeper was originally published in Netflix TechBlog on Medium, where people are continuing the conversation by highlighting and responding to this story.

Multiple Cache Configurations with Caffeine and Spring Boot

Post Syndicated from Bozho original https://techblog.bozho.net/multiple-cache-configurations-with-caffeine-and-spring-boot/

Caching is key for performance of nearly every application. Distributed caching is sometimes needed, but not always. In many cases a local cache would work just fine and there’s no need for the overhead and complexity of the distributed cache.

So, in many applications, including plain Spring and Spring Boot, you can use @Cacheable on any method and its result will be cached so that the next time the method is invoked, the cached result is returned.

Spring has some default cache manager implementations, but external libraries are always better and more flexible than simple implementations. Caffeine, for example is a high-performance Java cache library. And Spring Boot comes with a CaffeineCacheManager. So, ideally, that’s all you need – you just create a cache manager bean and you have caching for your @Cacheable annotated-methods.

However, the provided cache manager allows you to configure just one cache specification. Cache specifications include the expiry time, initial capacity, max size, etc. So all of your caches under this cache manager will be created with a single cache spec. The cache manager supports a list of predefined caches as well as dynamically created caches, but on both cases a single cache spec is used. And that’s rarely useful for production. Built-in cache managers are something you have to be careful with, as a general rule.

There are a few blogposts that tell you how to define custom caches with custom specs. However, these options do not support the dynamic, default cache spec usecase that the built-in manager supports. Ideally, you should be able to use any name in @Cacheable and automatically a cache should be created with some default spec, but you should also have the option to override that for specific caches.

That’s why I decided to use a simpler approach than defining all caches in code that allows for greater flexibility. It extends the CaffeineCacheManager to provide that functionality:

/**
 * Extending Caffeine cache manager to allow flexible per-cache configuration
 */
public class FlexibleCaffeineCacheManager extends CaffeineCacheManager implements InitializingBean {
    private Map<String, String> cacheSpecs = new HashMap<>();

    private Map<String, Caffeine<Object, Object>> builders = new HashMap<>();

    private CacheLoader cacheLoader;

    @Override
    public void afterPropertiesSet() throws Exception {
        for (Map.Entry<String, String> cacheSpecEntry : cacheSpecs.entrySet()) {
            builders.put(cacheSpecEntry.getKey(), Caffeine.from(cacheSpecEntry.getValue()));
        }
    }

    @Override
    @SuppressWarnings("unchecked")
    protected Cache<Object, Object> createNativeCaffeineCache(String name) {
        Caffeine<Object, Object> builder = builders.get(name);
        if (builder == null) {
            return super.createNativeCaffeineCache(name);
        }

        if (this.cacheLoader != null) {
            return builder.build(this.cacheLoader);
        } else {
            return builder.build();
        }
    }

    public Map<String, String> getCacheSpecs() {
        return cacheSpecs;
    }

    public void setCacheSpecs(Map<String, String> cacheSpecs) {
        this.cacheSpecs = cacheSpecs;
    }

    public void setCacheLoader(CacheLoader cacheLoader) {
        super.setCacheLoader(cacheLoader);
        this.cacheLoader = cacheLoader;
    }
}

In short, it create one caffeine builder per spec and uses that instead of the default builder when a new cache is needed.

Then a sample XML configuration would look like this:

<bean id="cacheManager" class="net.bozho.util.FlexibleCaffeineCacheManager">
    <property name="cacheSpecification" value="expireAfterWrite=10m"/>
    <property name="cacheSpecs">
        <map>
            <entry key="statistics" value="expireAfterWrite=1h"/> 
       </map>
    </property>
</bean>

With Java config it’s pretty straightforward – you just set the cacheSpecs map.

While Spring has already turned into a huge framework that provides all kinds of features, it hasn’t abandoned the design principles of extensibility.

Extending built-in framework classes is something that happens quite often, and it should be in everyone’s toolbox. These classes are created with extension in mind – you’ll notice that many methods in the CaffeineCacheManager are protected. So we should make use of that whenever needed.

The post Multiple Cache Configurations with Caffeine and Spring Boot appeared first on Bozho's tech blog.