Tag Archives: open source

Testing cloud apps with GitHub Actions and cloud-native open source tools

Post Syndicated from Sarah Khalife original https://github.blog/2020-10-09-devops-cloud-testing/

See this post in action during GitHub Demo Days on October 16.

What makes a project successful? For developers building cloud-native applications, successful projects thrive on transparent, consistent, and rigorous collaboration. That collaboration is one of the reasons that many open source projects, like Docker containers and Kubernetes, grow to become standards for how we build, deliver, and operate software. Our Open Source Guides and Introduction to innersourcing are great first steps to setting up and encouraging these best practices in your own projects.

However, a common challenge that application developers face is manually testing against inconsistent environments. Accurately testing Kubernetes applications can differ from one developer’s environment to another, and implementing a rigorous and consistent environment for end-to-end testing isn’t easy. It can also be very time consuming to spin up and down Kubernetes clusters. The inconsistencies between environments and the time required to spin up new Kubernetes clusters can negatively impact the speed and quality of cloud-native applications.

Building a transparent CI process

On GitHub, integration and testing becomes a little easier by combining GitHub Actions with open source tools. You can treat Actions as the native continuous integration and continuous delivery (CI/CD) tool for your project, and customize your Actions workflow to include automation and validation as next steps.

Since Actions can be triggered based on nearly any GitHub event, it’s also possible to build in accountability for updating tests and fixing bugs. For example, when a developer creates a pull request, Actions status checks can automatically block the merge if the test fails.

Here are a few more examples:

Branch protection rules in the repository help enforce certain workflows, such as requiring more than one pull request review or requiring certain status checks to pass before allowing a pull request to merge.

GitHub Actions are natively configured to act as status checks when they’re set up to trigger `on: [pull_request]`.

Continuous integration (CI) is extremely valuable as it allows you to run tests before each pull request is merged into production code. In turn, this will reduce the number of bugs that are pushed into production and increases confidence that newly introduced changes will not break existing functionality.

But transparency remains key: Requiring CI status checks on protected branches provides a clearly-defined, transparent way to let code reviewers know if the commits meet the conditions set for the repository—right in the pull request view.

Using community-powered workflows

Now that we’ve thought through the simple CI policies, automated workflows are next. Think of an Actions workflow as a set of “plug and play” open sourced, automated steps contributed by the community. You can use them as they are, or customize and make them your own. Once you’ve found the right one, open sourced Actions can be plugged into your workflow with the`- uses: repo/action-name` field.

You might ask, “So how do I find available Actions that suit my needs?”

The GitHub Marketplace!

As you’re building automation and CI pipelines, take advantage of Marketplace to find pre-built Actions provided by the community. Examples of pre-built Actions span from a Docker publish and the kubectl CLI installation to container scans and cloud deployments. When it comes to cloud-native Actions, the list keeps growing as container-based development continues to expand.

Testing with kind

Testing is a critical part of any CI/CD pipeline, but running tests in Kubernetes can absorb the extra time that automation saves. Enter kind. kind stands for “Kubernetes in Docker.” It’s an open source project from the Kubernetes special interest group (SIGs) community, and a tool for running local Kubernetes clusters using Docker container “nodes.” Creating a kind cluster is a simple way to run Kubernetes cluster and application testing—without having to spin up a complete Kubernetes environment.

As the number of Kubernetes users pushing critical applications to production grows, so does the need for a repeatable, reliable, and rigorous testing process. This can be accomplished by combining the creation of a homogenous Kubernetes testing environment with kind, the community-powered Marketplace, and the native and transparent Actions CI process.

Bringing it all together with kind and Actions

Come see kind and Actions at work during our next GitHub Demo Day live stream on October 16, 2020 at 11am PT. I’ll walk you through how to easily set up automated and consistent tests per pull request, including how to use kind with Actions to automatically run end-to-end tests across a common Kubernetes environment.

New – Redis 6 Compatibility for Amazon ElastiCache

Post Syndicated from Channy Yun original https://aws.amazon.com/blogs/aws/new-redis-6-compatibility-for-amazon-elasticache/

After the last Redis 5.0 compatibility for Amazon ElastiCache, there has been lots of improvements to Amazon ElastiCache for Redis including upstream supports such as 5.0.6.

Earlier this year, we announced Global Datastore for Redis that lets you replicate a cluster in one region to clusters in up to two other regions. Recently we improved your ability to monitor your Redis fleet by enabling 18 additional engine and node-level CloudWatch metrics. Also, we added support for resource-level permission policies, allowing you to assign AWS Identity and Access Management (IAM) principal permissions to specific ElastiCache resource or resources.

Today, I am happy to announce Redis 6 compatibility to Amazon ElastiCache for Redis. This release brings several new and important features to Amazon ElastiCache for Redis:

  • Managed Role-Based Access Control – Amazon ElastiCache for Redis 6 now provides you with the ability to create and manage users and user groups that can be used to set up Role-Based Access Control (RBAC) for Redis commands. You can now simplify your architecture while maintaining security boundaries by having several applications use the same Redis cluster without being able to access each other’s data. You can also take advantage of granular access control and authorization to create administration and read-only user groups. Amazon ElastiCache enhances the new Access Control Lists (ACL) introduced in open source Redis 6 to provide a managed RBAC experience, making it easy to set up access control across several Amazon ElastiCache for Redis clusters.
  • Client-Side Caching – Amazon ElastiCache for Redis 6 comes with server-side enhancements to deliver efficient client-side caching to further improve your application performance. Redis clusters now support client-side caching by tracking client requests and sending invalidation messages for data stored on the client. In addition, you can also take advantage of a broadcast mode that allows clients to subscribe to a set of notifications from Redis clusters.
  • Significant Operational Improvements – This release also includes several enhancements that improve application availability and reliability. Specifically, Amazon ElastiCache has improved replication under low memory conditions, especially for workloads with medium/large sized keys, by reducing latency and the time it takes to perform snapshots. Open source Redis enhancements include improvements to expiry algorithm for faster eviction of expired keys and various bug fixes.

Note that open source Redis 6 also announced support for encryption-in-transit, a capability that is already available in Amazon ElastiCache for Redis 4.0.10 onwards. This release of Amazon ElastiCache for Redis 6 does not impact Amazon ElastiCache for Redis’ existing support for encryption-in-transit.

In order to apply RBAC to a new or existing Redis 6 cluster, we first need to ensure you have a user and user group created. We’ll review the process to do this below.

Using Role-Based Access Control – How it works
An alternative to Authenticating Users with the Redis AUTH Command, Amazon ElastiCache for Redis 6 offers Role-Based Access Control (RBAC). With RBAC, you create users and assign them specific permissions via an Access String.

If you want to create, modify, and delete users and user groups, you will need to select to the User Management and User Group Management sections in the ElastiCache console.

ElastiCache will automatically configure a default user with user ID and user name “default”, and then you can add it or new created users to new groups in User Group Management.

If you want to change the default user with your own password and access setting, you need to create a new user with the username set to “default” and can then swap it with the original default user. We recommend using your own strong password for a default user.

The following example shows how to swap the original default user with another default that has a modified access string via AWS CLI.

$ aws elasticache create-user \
 --user-id "new-default-user" \
 --user-name "default" \
 --engine "REDIS" \
 --passwords "a-str0ng-pa))word" \ 
 --access-string "off +get ~keys*"

Create a user group and add the user you created previously.

$ aws elasticache create-user-group \
  --user-group-id "new-default-group" \
  --engine "REDIS" \
  --user-ids "default"

Swap the new default user with the original default user.

$ aws elasticache modify-user-group \
    --user-group-id "new-default-group" \
    --user-ids-to-add "new-default-user" \
    --user-ids-to-remove "default"

Also, you can modify a user’s password or change its access permissions using modify-user command, or remove a specific user using delete-user command. It will be removed from any user groups to which it belongs.

Similarly you can modify a user group by adding new users and/or removing current users using modify-user-group command, or delete a user group using delete-user-group command. Note that the user group itself, not the users belonging to the group, will be deleted.

Once you have created a user group and added users, you can assign the user group to a replication group, or migrate between Redis AUTH and RBAC. For more information, see the documentation in detail.

Redis 6 cluster for ElastiCache – Getting Started
As usual, you can use the ElastiCache Console, CLI, APIs, or a CloudFormation template to create to new Redis 6 cluster. I’ll use the Console, choose Redis from the navigation pane and click Create with the following settings:

Select “Encryption in-transit” checkbox to ensure you can see the “Access Control” options. You can select an option of Access Control either User Group Access Control List by RBAC features or Redis AUTH default user. If you select RBAC, you can choose one of the available user groups.

My cluster is up and running within minutes. You can also use the in-place upgrade feature on existing cluster. By selecting the cluster, click Action and Modify. You can change the Engine Version from 5.0.6-compatible engine to 6.x.

Now Available
Amazon ElastiCache for Redis 6 is now available in all AWS regions. For a list of ElastiCache for Redis supported versions, refer to the documentation. Please send us feedback either in the AWS forum for Amazon ElastiCache or through AWS support, or your account team.

Channy;

Amazon SageMaker Continues to Lead the Way in Machine Learning and Announces up to 18% Lower Prices on GPU Instances

Post Syndicated from Julien Simon original https://aws.amazon.com/blogs/aws/amazon-sagemaker-leads-way-in-machine-learning/

Since 2006, Amazon Web Services (AWS) has been helping millions of customers build and manage their IT workloads. From startups to large enterprises to public sector, organizations of all sizes use our cloud computing services to reach unprecedented levels of security, resiliency, and scalability. Every day, they’re able to experiment, innovate, and deploy to production in less time and at lower cost than ever before. Thus, business opportunities can be explored, seized, and turned into industrial-grade products and services.

As Machine Learning (ML) became a growing priority for our customers, they asked us to build an ML service infused with the same agility and robustness. The result was Amazon SageMaker, a fully managed service launched at AWS re:Invent 2017 that provides every developer and data scientist with the ability to build, train, and deploy ML models quickly.

Today, Amazon SageMaker is helping tens of thousands of customers in all industry segments build, train and deploy high quality models in production: financial services (Euler Hermes, Intuit, Slice Labs, Nerdwallet, Root Insurance, Coinbase, NuData Security, Siemens Financial Services), healthcare (GE Healthcare, Cerner, Roche, Celgene, Zocdoc), news and media (Dow Jones, Thomson Reuters, ProQuest, SmartNews, Frame.io, Sportograf), sports (Formula 1, Bundesliga, Olympique de Marseille, NFL, Guiness Six Nations Rugby), retail (Zalando, Zappos, Fabulyst), automotive (Atlas Van Lines, Edmunds, Regit), dating (Tinder), hospitality (Hotels.com, iFood), industry and manufacturing (Veolia, Formosa Plastics), gaming (Voodoo), customer relationship management (Zendesk, Freshworks), energy (Kinect Energy Group, Advanced Microgrid Systems), real estate (Realtor.com), satellite imagery (Digital Globe), human resources (ADP), and many more.

When we asked our customers why they decided to standardize their ML workloads on Amazon SageMaker, the most common answer was: “SageMaker removes the undifferentiated heavy lifting from each step of the ML process.” Zooming in, we identified five areas where SageMaker helps them most.

#1 – Build Secure and Reliable ML Models, Faster
As many ML models are used to serve real-time predictions to business applications and end users, making sure that they stay available and fast is of paramount importance. This is why Amazon SageMaker endpoints have built-in support for load balancing across multiple AWS Availability Zones, as well as built-in Auto Scaling to dynamically adjust the number of provisioned instances according to incoming traffic.

For even more robustness and scalability, Amazon SageMaker relies on production-grade open source model servers such as TensorFlow Serving, the Multi-Model Server, and TorchServe. A collaboration between AWS and Facebook, TorchServe is available as part of the PyTorch project, and makes it easy to deploy trained models at scale without having to write custom code.

In addition to resilient infrastructure and scalable model serving, you can also rely on Amazon SageMaker Model Monitor to catch prediction quality issues that could happen on your endpoints. By saving incoming requests as well as outgoing predictions, and by comparing them to a baseline built from a training set, you can quickly identify and fix problems like missing features or data drift.

Says Aude Giard, Chief Digital Officer at Veolia Water Technologies: “In 8 short weeks, we worked with AWS to develop a prototype that anticipates when to clean or change water filtering membranes in our desalination plants. Using Amazon SageMaker, we built a ML model that learns from previous patterns and predicts the future evolution of fouling indicators. By standardizing our ML workloads on AWS, we were able to reduce costs and prevent downtime while improving the quality of the water produced. These results couldn’t have been realized without the technical experience, trust, and dedication of both teams to achieve one goal: an uninterrupted clean and safe water supply.” You can learn more in this video.

#2 – Build ML Models Your Way
When it comes to building models, Amazon SageMaker gives you plenty of options. You can visit AWS Marketplace, pick an algorithm or a model shared by one of our partners, and deploy it on SageMaker in just a few clicks. Alternatively, you can train a model using one of the built-in algorithms, or your own code written for a popular open source ML framework (TensorFlow, PyTorch, and Apache MXNet), or your own custom code packaged in a Docker container.

You could also rely on Amazon SageMaker AutoPilot, a game-changing AutoML capability. Whether you have little or no ML experience, or you’re a seasoned practitioner who needs to explore hundreds of datasets, SageMaker AutoPilot takes care of everything for you with a single API call. It automatically analyzes your dataset, figures out the type of problem you’re trying to solve, builds several data processing and training pipelines, trains them, and optimizes them for maximum accuracy. In addition, the data processing and training source code is available in auto-generated notebooks that you can review, and run yourself for further experimentation. SageMaker Autopilot also now creates machine learning models up to 40% faster with up to 200% higher accuracy, even with small and imbalanced datasets.

Another popular feature is Automatic Model Tuning. No more manual exploration, no more costly grid search jobs that run for days: using ML optimization, SageMaker quickly converges to high-performance models, saving you time and money, and letting you deploy the best model to production quicker.

NerdWallet relies on data science and ML to connect customers with personalized financial products“, says Ryan Kirkman, Senior Engineering Manager. “We chose to standardize our ML workloads on AWS because it allowed us to quickly modernize our data science engineering practices, removing roadblocks and speeding time-to-delivery. With Amazon SageMaker, our data scientists can spend more time on strategic pursuits and focus more energy where our competitive advantage is—our insights into the problems we’re solving for our users.” You can learn more in this case study.
Says Tejas Bhandarkar, Senior Director of Product, Freshworks Platform: “We chose to standardize our ML workloads on AWS because we could easily build, train, and deploy machine learning models optimized for our customers’ use cases. Thanks to Amazon SageMaker, we have built more than 30,000 models for 11,000 customers while reducing training time for these models from 24 hours to under 33 minutes. With SageMaker Model Monitor, we can keep track of data drifts and retrain models to ensure accuracy. Powered by Amazon SageMaker, Freddy AI Skills is constantly-evolving with smart actions, deep-data insights, and intent-driven conversations.

#3 – Reduce Costs
Building and managing your own ML infrastructure can be costly, and Amazon SageMaker is a great alternative. In fact, we found out that the total cost of ownership (TCO) of Amazon SageMaker over a 3-year horizon is over 54% lower compared to other options, and developers can be up to 10 times more productive. This comes from the fact that Amazon SageMaker manages all the training and prediction infrastructure that ML typically requires, allowing teams to focus exclusively on studying and solving the ML problem at hand.

Furthermore, Amazon SageMaker includes many features that help training jobs run as fast and as cost-effectively as possible: optimized versions of the most popular machine learning libraries, a wide range of CPU and GPU instances with up to 100GB networking, and of course Managed Spot Training which lets you save up to 90% on your training jobs. Last but not least, Amazon SageMaker Debugger automatically identifies complex issues developing in ML training jobs. Unproductive jobs are terminated early, and you can use model information captured during training to pinpoint the root cause.

Amazon SageMaker also helps you slash your prediction costs. Thanks to Multi-Model Endpoints, you can deploy several models on a single prediction endpoint, avoiding the extra work and cost associated with running many low-traffic endpoints. For models that require some hardware acceleration without the need for a full-fledged GPU, Amazon Elastic Inference lets you save up to 90% on your prediction costs. At the other end of the spectrum, large-scale prediction workloads can rely on AWS Inferentia, a custom chip designed by AWS, for up to 30% higher throughput and up to 45% lower cost per inference compared to GPU instances.

Lyft, one of the largest transportation networks in the United States and Canada, launched its Level 5 autonomous vehicle division in 2017 to develop a self-driving system to help millions of riders. Lyft Level 5 aggregates over 10 terabytes of data each day to train ML models for their fleet of autonomous vehicles. Managing ML workloads on their own was becoming time-consuming and expensive. Says Alex Bain, Lead for ML Systems at Lyft Level 5: “Using Amazon SageMaker distributed training, we reduced our model training time from days to couple of hours. By running our ML workloads on AWS, we streamlined our development cycles and reduced costs, ultimately accelerating our mission to deliver self-driving capabilities to our customers.

#4 – Build Secure and Compliant ML Systems
Security is always priority #1 at AWS. It’s particularly important to customers operating in regulated industries such as financial services or healthcare, as they must implement their solutions with the highest level of security and compliance. For this purpose, Amazon SageMaker implements many security features, making it compliant with the following global standards: SOC 1/2/3, PCI, ISO, FedRAMP, DoD CC SRG, IRAP, MTCS, C5, K-ISMS, ENS High, OSPAR, and HITRUST CSF. It’s also HIPAA BAA eligible.

Says Ashok Srivastava, Chief Data Officer, Intuit: “With Amazon SageMaker, we can accelerate our Artificial Intelligence initiatives at scale by building and deploying our algorithms on the platform. We will create novel large-scale machine learning and AI algorithms and deploy them on this platform to solve complex problems that can power prosperity for our customers.”

#5 – Annotate Data and Keep Humans in the Loop
As ML practitioners know, turning data into a dataset requires a lot of time and effort. To help you reduce both, Amazon SageMaker Ground Truth is a fully managed data labeling service that makes it easy to annotate and build highly accurate training datasets at any scale (text, image, video, and 3D point cloud datasets).

Says Magnus Soderberg, Director, Pathology Research, AstraZeneca: “AstraZeneca has been experimenting with machine learning across all stages of research and development, and most recently in pathology to speed up the review of tissue samples. The machine learning models first learn from a large, representative data set. Labeling the data is another time-consuming step, especially in this case, where it can take many thousands of tissue sample images to train an accurate model. AstraZeneca uses Amazon SageMaker Ground Truth, a machine learning-powered, human-in-the-loop data labeling and annotation service to automate some of the most tedious portions of this work, resulting in reduction of time spent cataloging samples by at least 50%.

Amazon SageMaker is Evaluated
The hundreds of new features added to Amazon SageMaker since launch are testimony to our relentless innovation on behalf of customers. In fact, the service was highlighted in February 2020 as the overall leader in Gartner’s Cloud AI Developer Services Magic Quadrant. Gartner subscribers can click here to learn more about why we have an overall score of 84/100 in their “Solution Scorecard for Amazon SageMaker, July 2020”, the highest rating among our peer group. According to Gartner, we met 87% of required criteria, 73% of preferred, and 85% of optional.

Announcing a Price Reduction on GPU Instances

To thank our customers for their trust and to show our continued commitment to make Amazon SageMaker the best and most cost-effective ML service, I’m extremely happy to announce a significant price reduction on all ml.p2 and ml.p3 GPU instances. It will apply starting October 1st for all SageMaker components and across the following regions: US East (N. Virginia), US East (Ohio), US West (Oregon), EU (Ireland), EU (Frankfurt), EU (London), Canada (Central), Asia Pacific (Singapore), Asia Pacific (Sydney), Asia Pacific (Seoul), Asia Pacific (Tokyo), Asia Pacific (Mumbai), and AWS GovCloud (US-Gov-West).

Instance Name Price Reduction
ml.p2.xlarge -11%
ml.p2.8xlarge -14%
ml.p2.16xlarge -18%
ml.p3.2xlarge -11%
ml.p3.8xlarge -14%
ml.p3.16xlarge -18%
ml.p3dn.24xlarge -18%

Getting Started with Amazon SageMaker
As you can see, there are a lot of exciting features in Amazon SageMaker, and I encourage you to try them out! Amazon SageMaker is available worldwide, so chances are you can easily get to work on your own datasets. The service is part of the AWS Free Tier, letting new users work with it for free for hundreds of hours during the first two months.

If you’d like to kick the tires, this tutorial will get you started in minutes. You’ll learn how to use SageMaker Studio to build, train, and deploy a classification model based on the XGBoost algorithm.

Last but not least, I just published a book named “Learn Amazon SageMaker“, a 500-page detailed tour of all SageMaker features, illustrated by more than 60 original Jupyter notebooks. It should help you get up to speed in no time.

As always, we’re looking forward to your feedback. Please share it with your usual AWS support contacts, or on the AWS Forum for SageMaker.

– Julien

Migrating cdnjs to serverless with Workers KV

Post Syndicated from Tyler Caslin original https://blog.cloudflare.com/migrating-cdnjs-to-serverless-with-workers-kv/

Migrating cdnjs to serverless with Workers KV

Cloudflare powers cdnjs, an open-source project that accelerates websites by delivering popular JavaScript libraries and resources via Cloudflare’s network. Since our major update in December, we focused on remodelling cdnjs for scalability and resilience. Today, we are excited to announce how Cloudflare delivers cdnjs—a migration to a serverless infrastructure using Cloudflare Workers and its distributed key-value store Workers KV!

What is cdnjs and why do I care?

Migrating cdnjs to serverless with Workers KV

For those unfamiliar, cdnjs is an acronym describing a Content Delivery Network (CDN) for JavaScript (JS). A CDN simply refers to a geographically distributed network of servers that provide Internet content, whether it is memes, cat videos, or HTML pages. In our case, the CDN refers to Cloudflare’s ever expanding network of over 200 globally distributed data centers.

And here’s why this is relevant to you: it makes page load times lightning-fast. Virtually every website you visit needs to fetch JS libraries in order to load, including this one. Let’s say you visit a Sydney-based website that contains a local file from jQuery, a popular library found in 76.2% of websites. If you are located in New York, you may notice a delay, as it can easily exceed 300ms to fetch the file—not to mention the time it takes for the round trips involved with the TLS handshake. However, if the website references jQuery using cdnjs.cloudflare.com, you can retrieve the file from the closest Cloudflare data center in Buffalo, reducing the latency to a blazing 20ms.

While cdnjs operates behind the scenes, it is used by over 11% of websites, making the Internet a much faster and more reliable place. In July, cdnjs served almost 190 billion requests—an enormous 3.46PB of data.

Where are the files stored?

Migrating cdnjs to serverless with Workers KV

While cdnjs speeds up the Internet, it certainly isn’t magic!

Historically, a number of load-balanced machines at one of Cloudflare’s core data centers would periodically pull cdnjs files from a backing store, acting as the origin for cdnjs.cloudflare.com. When a new file is requested, it is cached by Cloudflare, allowing it to be fetched quickly from any of our data centers.

The backing store is a catalogue of JS, CSS, and other web libraries in the form of an open-source GitHub repository. What this means is that anyone—including you—can contribute to it, subject to review and other processes.

However, until recently, these existing operations were very labor intensive and fragile.

This blog post will explain why we changed the infrastructure behind cdnjs to make it faster, more reliable, and easier to maintain. First, we will discuss how the community used to contribute to cdnjs, outlining the pains and concerns of the old system. Then, we will explore the benefits of migrating to Workers KV. After, we will dive into the new architecture, as well as upgrades to the website and cdnjs API. Finally, we will review the history of cdnjs, and where it is headed in the future.

If you think you know how to make a PR, think again

Migrating cdnjs to serverless with Workers KV

For the non-technical reader, a pull request (PR) is a request to merge changes you’ve made to a repository. Traditionally, if you wanted to include your JavaScript library in cdnjs, you would first create a PR on GitHub to cdnjs/cdnjs with a JSON file describing your package and additional files for any version you wished to include. Once your PR was approved by our old bot, manually reviewed, and then merged by a maintainer, your package would be integrated with cdnjs.

Sounds easy, right? You can just fork the repo, clone it, and copy paste a few files, no?

Exactly. Contributing was easy if you had several hours to burn, a case-sensitive file system, and a couple hundred gigabytes of free disk space to git clone the 300GB repo. If you were short on time—no problem, you could always use your advanced knowledge of git sparse-checkout to get the job done. Don’t know git? Just add one file at a time manually through GitHub’s UI.

I think you get the point. I know I certainly did when I naively spent 10 hours cloning the repo, only to discover that macOS is case-insensitive by default.

However, updating cdnjs was not only difficult for the contributors, but also the maintainers. Historically, the community was able to contribute version files directly, which could potentially be malicious. This created lots of work for maintainers, requiring them to inspect each file manually, diffing files against the official library source and running malware checks.
So how did packages update once they were in cdnjs? In the JSON file describing each package, there was an optional auto-update definition telling the bot where to look for new versions of the library. If present, when your package released a new version from npm or GitHub, the bot would download it, pushing the files to cdnjs/cdnjs and computed Subresource Integrity (SRI) hashes to cdnjs/SRIs. If the auto-update property was missing, it would be your responsibility to make manual PRs to update cdnjs with any future versions.

A wake-up call for cdnjs

Migrating cdnjs to serverless with Workers KV

In April, during maintenance at one of our core data centers, a technician accidentally disconnected the cables supplying all external connections to our other data centers, causing the data center to go offline for approximately four hours. This incident served as the first wake-up call for cdnjs, especially since the affected data center housed the primary cdnjs origin web servers. In this case, we did have a backup running on an external provider, but what really saved us was Cloudflare’s global cache, which minimized the impact of the outage as only uncached assets failed to load.

We started to think about how we can improve both the reliability and performance of how we serve cdnjs. We went straight to Cloudflare Workers, our own platform for developing on the edge. One powerful tool built into Workers is Workers KV—a low-latency, globally distributed key-value store optimized for high-read applications.

We put two and two together, realizing that instead of pulling the cdnjs/cdnjs repository and serving files from disk, we could cut the physical machines out entirely, distributing the data around the world and serving files straight from the edge. That way, cdnjs would be able to recover from any origin data center failure, while also increasing its scalability.

Workers KV to the rescue

Migrating cdnjs to serverless with Workers KV

At first glance, the decision to use Workers KV was a no-brainer. Since files in cdnjs never change but require frequent reads, Workers KV was a perfect fit.

However, as we planned our migration, we became concerned that with over 7 million assets in cdnjs, there would undoubtedly exist files that exceed Workers KV’s 10MiB value limit. After investigating, we discovered that several hundred cdnjs files were oversized, the majority being JavaScript Source Maps.

Then the idea hit us. We could store compressed versions of cdnjs files in Workers KV, not only solving our oversized file issue, but also optimizing how we serve files.

If you pay the Internet bill, you’ll know that bandwidth is expensive! For this reason, all modern browsers will try to fetch compressed web content whenever it is available. Similarly, within Cloudflare we often experiment with on-the-fly compression to reduce our bandwidth, always serving compressed content to the eyeball when it is accepted. As a result, we decided to compress all cdnjs files ahead of time, writing them to Workers KV with both optimal Brotli and gzip forms. That way, we could increase the compression level compared to on-the-fly compression as we no longer have the latency requirements.

This means we now serve cdnjs files faster and smaller!

A complete makeover for cdnjs

Migrating cdnjs to serverless with Workers KV

Today, if you want to include your JavaScript library in cdnjs, you first create a PR on GitHub to our new repository cdnjs/packages. The repo is easily cloneable at 50MB and consists of thousands of JSON files, each describing a cdnjs package and how it is auto-updated from npm or git. Once your file is validated by our automated CI—powered by a new bot—and merged by a maintainer, your package would be automatically enrolled in our auto-update service.

In the new system, security and maintainability are prioritized. For starters, cdnjs version files are created by our bot, minimizing the possibility of human error when merging a new version. While the JSON files in cdnjs/packages are added by error-prone humans, they are inspected by our bot before being approved by a maintainer. Each file is automatically validated against a JSON schema, as well as checked for popularity on npm or GitHub.

When the bot discovers a new release, it pushes Brotli and gzip-compressed versions of the files to a files namespace in Workers KV. With each entry, the bot writes some metadata in Workers KV for the ETag and Last-Modified HTTP headers. Similar to before, the bot also computes Subresource Integrity (SRI) hashes of the uncompressed files, but now pushes them instead to a SRIs namespace in Workers KV.

Then, when a new file is requested from cdnjs.cloudflare.com, a Cloudflare Worker will inspect the client’s Accept-Encoding header, fetching either the Brotli or gzip-compressed version with its ETag and Last-Modified metadata from Workers KV. As the compressed file travels back through Cloudflare, it is cached for future requests and uncompressed on-the-fly if needed.

At the moment, there are still a handful of files exceeding Workers KV’s size limit. Consequently, if the Cloudflare Worker fails to retrieve a file from Workers KV, it is fetched from the origin backed by the original git repo. In the coming months, we plan on gradually removing this infrastructure.

Scaling the website and API

Migrating cdnjs to serverless with Workers KV

Besides the core cdnjs infrastructure, many of its other components received upgrades as well!

On the cdnjs project’s homepage, you will be greeted by a slick new beta website built by Matt. Constructed with Vue and Nuxt, the beta website is powered entirely by the cdnjs API. As a result, it is always up-to-date with the latest package information and requires low resource usage to serve the site—which runs completely on the client-side after the first page load—helping us scale with cdnjs’s never-ending growth.

In fact, the cdnjs API also strengthened its scalability, benefitting from a serverless architecture close to the one we have seen with cdnjs and Workers KV.

Before migrating to Workers KV, the cdnjs API relied on a regularly scheduled process that involved generating about 300MB of metadata. The cdnjs API’s backend would then fetch this enormous “package.min.js” file into memory and use it to operate the API. If you are curious, the file is still being hosted here, but be warned—it may lag your browser! Similarly, file SRIs were pushed to cdnjs/SRIs, which was cloned by the API locally to serve SRI responses.

After all cdnjs files (within the permitted size limit) were moved to Workers KV, these legacy processes became unsustainable, requiring millions of reads and an unreasonable amount of time. Therefore, we decided to upload all metadata found into Workers KV. We split the metadata into four namespaces—one for package-level metadata, one for version-specific metadata, one containing aggregated metadata, and one for file SRIs.

Similar to cdnjs’s serverless design, a Cloudflare Worker sits on top of metadata.speedcdnjs.com, serving data from Workers KV using several public endpoints. Currently, the cdnjs API is fully integrated with these endpoints, which provide an elegant solution as cdnjs continues to scale.

Transparency and the future of cdnjs

Since its birth in January 2011, cdnjs has always been deeply rooted in transparency, deriving its strength from the community. Even when cdnjs exploded in size and its founders Ryan Kirkman and Thomas Davis teamed up with us in June 2011, the project remained entirely open-source on GitHub.

As the years passed, it became harder for the founders to stay active, heavily depending on the community for support. With a nearly nonexistent budget and little access to the repository, core cdnjs maintainers were challenged every day to keep the project alive.

Last year, this led us to contact the founders, who were happy to have our assistance with the project. With Cloudflare’s increased role, cdnjs is as stable as ever, with active members from both Cloudflare and the community.

However, as we remove our reliance on the legacy system and store files in Workers KV, there are concerns that cdnjs will become proprietary. Don’t worry, we are working hard to ensure that cdnjs remains as transparent and open-source as possible. To help the community audit updates to Workers KV, there is a new repository, cdnjs/logs, which is used by the bot to log all Workers KV-related events. Furthermore, anyone can validate the integrity of cdnjs files by fetching SRIs from the cdnjs API.

Conclusion

Overall, this past year has been a turbulent time for cdnjs, but all of its shortcomings have acted as red flags to help us build a better system. Most recently, we have mitigated the risks of depending on physical machines at a single location, migrating cdnjs to a serverless infrastructure where its files are stored in Workers KV.

Today, cdnjs is in good hands, and is not going away anytime soon. Shout out especially to the maintainers Sven and Matt for creating tons of momentum with the project, working on everything from scaling cdnjs to editing this post.

Moving forward, we are committed to making cdnjs as transparent as possible. As we continue to improve cdnjs, we will release more blog posts to keep the community up to date. If you are interested, please subscribe to our blog. After all, it is the community that makes cdnjs possible! A special thanks to our active GitHub contributors and members of the cdnjs Community Forum for sticking with us!

Bullet Updates – Windowing, Apache Pulsar PubSub, Configuration-based Data Ingestion, and More

Post Syndicated from rosaliebeevm original https://yahooeng.tumblr.com/post/183315480351

yahoodevelopers:

By Akshay Sarma, Principal Engineer, Verizon Media & Brian Xiao, Software Engineer, Verizon Media

This is the first of an ongoing series of blog posts sharing releases and announcements for Bullet, an open-sourced lightweight, scalable, pluggable, multi-tenant query system.

Bullet allows you to query any data flowing through a streaming system without having to store it first through its UI or API. The queries are injected into the running system and have minimal overhead. Running hundreds of queries generally fit into the overhead of just reading the streaming data. Bullet requires running an instance of its backend on your data. This backend runs on common stream processing frameworks (Storm and Spark Streaming currently supported).

The data on which Bullet sits determines what it is used for. For example, our team runs an instance of Bullet on user engagement data (~1M events/sec) to let developers find their own events to validate their code that produces this data. We also use this instance to interactively explore data, throw up quick dashboards to monitor live releases, count unique users, debug issues, and more.

Since open sourcing Bullet in 2017, we’ve been hard at work adding many new features! We’ll highlight some of these here and continue sharing update posts for future releases.

Windowing

Bullet used to operate in a request-response fashion – you would submit a query and wait for the query to meet its termination conditions (usually duration) before receiving results. For short-lived queries, say, a few seconds, this was fine. But as we started fielding more interactive and iterative queries, waiting even a minute for results became too cumbersome.

Enter windowing! Bullet now supports time and record-based windowing. With time windowing, you can break up your query into chunks of time over its duration and retrieve results for each chunk.  For example, you can calculate the average of a field, and stream back results every second:

In the above example, the aggregation is operating on all the data since the beginning of the query, but you can also do aggregations on just the windows themselves. This is often called a Tumbling window:

image

With record windowing, you can get the intermediate aggregation for each record that matches your query (a Sliding window). Or you can do a Tumbling window on records rather than time. For example, you could get results back every three records:

image

Overlapping windows in other ways (Hopping windows) or windows that reset based on different criteria (Session windows, Cascading windows) are currently being worked on. Stay tuned!

image
image

Apache Pulsar support as a native PubSub

Bullet uses a PubSub (publish-subscribe) message queue to send queries and results between the Web Service and Backend. As with everything else in Bullet, the PubSub is pluggable. You can use your favorite pubsub by implementing a few interfaces if you don’t want to use the ones we provide. Until now, we’ve maintained and supported a REST-based PubSub and an Apache Kafka PubSub. Now we are excited to announce supporting Apache Pulsar as well! Bullet Pulsar will be useful to those users who want to use Pulsar as their underlying messaging service.

If you aren’t familiar with Pulsar, setting up a local standalone is very simple, and by default, any Pulsar topics written to will automatically be created. Setting up an instance of Bullet with Pulsar instead of REST or Kafka is just as easy. You can refer to our documentation for more details.

image

Plug your data into Bullet without code

While Bullet worked on any data source located in any persistence layer, you still had to implement an interface to connect your data source to the Backend and convert it into a record container format that Bullet understands. For instance, your data might be located in Kafka and be in the Avro format. If you were using Bullet on Storm, you would perhaps write a Storm Spout to read from Kafka, deserialize, and convert the Avro data into the Bullet record format. This was the only interface in Bullet that required our customers to write their own code. Not anymore! Bullet DSL is a text/configuration-based format for users to plug in their data to the Bullet Backend without having to write a single line of code.

Bullet DSL abstracts away the two major components for plugging data into the Bullet Backend. A Connector piece to read from arbitrary data-sources and a Converter piece to convert that read data into the Bullet record container. We currently support and maintain a few of these – Kafka and Pulsar for Connectors and Avro, Maps and arbitrary Java POJOs for Converters. The Converters understand typed data and can even do a bit of minor ETL (Extract, Transform and Load) if you need to change your data around before feeding it into Bullet. As always, the DSL components are pluggable and you can write your own (and contribute it back!) if you need one that we don’t support.

We appreciate your feedback and contributions! Explore Bullet on GitHub, use and help contribute to the project, and chat with us on Google Groups. To get started, try our Quickstarts on Spark or Storm to set up an instance of Bullet on some fake data and play around with it.

Achieving Major Stability and Performance Improvements in Yahoo Mail with a Novel Redux Architecture

Post Syndicated from mikesefanov original https://yahooeng.tumblr.com/post/173062946866

yahoodevelopers:

By Mohit Goenka, Gnanavel Shanmugam, and Lance Welsh

At Yahoo Mail, we’re constantly striving to upgrade our product experience. We do this not only by adding new features based on our members’ feedback, but also by providing the best technical solutions to power the most engaging experiences. As such, we’ve recently introduced a number of novel and unique revisions to the way in which we use Redux that have resulted in significant stability and performance improvements. Developers may find our methods useful in achieving similar results in their apps.

Improvements to product metrics

Last year Yahoo Mail implemented a brand new architecture using Redux. Since then, we have transformed the overall architecture to reduce latencies in various operations, reduce JavaScript exceptions, and better synchronized states. As a result, the product is much faster and more stable.

Stability improvements:

  • when checking for new emails – 20%
  • when reading emails – 30%
  • when sending emails – 20%

Performance improvements:

  • 10% improvement in page load performance
  • 40% improvement in frame rendering time

We have also reduced API calls by approximately 20%.

How we use Redux in Yahoo Mail

Redux architecture is reliant on one large store that represents the application state. In a Redux cycle, action creators dispatch actions to change the state of the store. React Components then respond to those state changes. We’ve made some modifications on top of this architecture that are atypical in the React-Redux community.

For instance, when fetching data over the network, the traditional methodology is to use Thunk middleware. Yahoo Mail fetches data over the network from our API. Thunks would create an unnecessary and undesirable dependency between the action creators and our API. If and when the API changes, the action creators must then also change. To keep these concerns separate we dispatch the action payload from the action creator to store them in the Redux state for later processing by “action syncers”. Action syncers use the payload information from the store to make requests to the API and process responses. In other words, the action syncers form an API layer by interacting with the store. An additional benefit to keeping the concerns separate is that the API layer can change as the backend changes, thereby preventing such changes from bubbling back up into the action creators and components. This also allowed us to optimize the API calls by batching, deduping, and processing the requests only when the network is available. We applied similar strategies for handling other side effects like route handling and instrumentation. Overall, action syncers helped us to reduce our API calls by ~20% and bring down API errors by 20-30%.

Another change to the normal Redux architecture was made to avoid unnecessary props. The React-Redux community has learned to avoid passing unnecessary props from high-level components through multiple layers down to lower-level components (prop drilling) for rendering. We have introduced action enhancers middleware to avoid passing additional unnecessary props that are purely used when dispatching actions. Action enhancers add data to the action payload so that data does not have to come from the component when dispatching the action. This avoids the component from having to receive that data through props and has improved frame rendering by ~40%. The use of action enhancers also avoids writing utility functions to add commonly-used data to each action from action creators.

image

In our new architecture, the store reducers accept the dispatched action via action enhancers to update the state. The store then updates the UI, completing the action cycle. Action syncers then initiate the call to the backend APIs to synchronize local changes.

Conclusion

Our novel use of Redux in Yahoo Mail has led to significant user-facing benefits through a more performant application. It has also reduced development cycles for new features due to its simplified architecture. We’re excited to share our work with the community and would love to hear from anyone interested in learning more.

Success at Apache: A Newbie’s Narrative

Post Syndicated from mikesefanov original https://yahooeng.tumblr.com/post/170536010891

yahoodevelopers:

Kuhu Shukla (bottom center) and team at the 2017 DataWorks Summit


By Kuhu Shukla

This post first appeared here on the Apache Software Foundation blog as part of ASF’s “Success at Apache” monthly blog series.

As I sit at my desk on a rather frosty morning with my coffee, looking up new JIRAs from the previous day in the Apache Tez project, I feel rather pleased. The latest community release vote is complete, the bug fixes that we so badly needed are in and the new release that we tested out internally on our many thousand strong cluster is looking good. Today I am looking at a new stack trace from a different Apache project process and it is hard to miss how much of the exceptional code I get to look at every day comes from people all around the globe. A contributor leaves a JIRA comment before he goes on to pick up his kid from soccer practice while someone else wakes up to find that her effort on a bug fix for the past two months has finally come to fruition through a binding +1.

Yahoo – which joined AOL, HuffPost, Tumblr, Engadget, and many more brands to form the Verizon subsidiary Oath last year – has been at the frontier of open source adoption and contribution since before I was in high school. So while I have no historical trajectories to share, I do have a story on how I found myself in an epic journey of migrating all of Yahoo jobs from Apache MapReduce to Apache Tez, a then-new DAG based execution engine.

Oath grid infrastructure is through and through driven by Apache technologies be it storage through HDFS, resource management through YARN, job execution frameworks with Tez and user interface engines such as Hive, Hue, Pig, Sqoop, Spark, Storm. Our grid solution is specifically tailored to Oath’s business-critical data pipeline needs using the polymorphic technologies hosted, developed and maintained by the Apache community.

On the third day of my job at Yahoo in 2015, I received a YouTube link on An Introduction to Apache Tez. I watched it carefully trying to keep up with all the questions I had and recognized a few names from my academic readings of Yarn ACM papers. I continued to ramp up on YARN and HDFS, the foundational Apache technologies Oath heavily contributes to even today. For the first few weeks I spent time picking out my favorite (necessary) mailing lists to subscribe to and getting started on setting up on a pseudo-distributed Hadoop cluster. I continued to find my footing with newbie contributions and being ever more careful with whitespaces in my patches. One thing was clear – Tez was the next big thing for us. By the time I could truly call myself a contributor in the Hadoop community nearly 80-90% of the Yahoo jobs were now running with Tez. But just like hiking up the Grand Canyon, the last 20% is where all the pain was. Being a part of the solution to this challenge was a happy prospect and thankfully contributing to Tez became a goal in my next quarter.

The next sprint planning meeting ended with me getting my first major Tez assignment – progress reporting. The progress reporting in Tez was non-existent – “Just needs an API fix,”  I thought. Like almost all bugs in this ecosystem, it was not easy. How do you define progress? How is it different for different kinds of outputs in a graph? The questions were many.

I, however, did not have to go far to get answers. The Tez community actively came to a newbie’s rescue, finding answers and posing important questions. I started attending the bi-weekly Tez community sync up calls and asking existing contributors and committers for course correction. Suddenly the team was much bigger, the goals much more chiseled. This was new to anyone like me who came from the networking industry, where the most open part of the code are the RFCs and the implementation details are often hidden. These meetings served as a clean room for our coding ideas and experiments. Ideas were shared, to the extent of which data structure we should pick and what a future user of Tez would take from it. In between the usual status updates and extensive knowledge transfers were made.

Oath uses Apache Pig and Apache Hive extensively and most of the urgent requirements and requests came from Pig and Hive developers and users. Each issue led to a community JIRA and as we started running Tez at Oath scale, new feature ideas and bugs around performance and resource utilization materialized. Every year most of the Hadoop team at Oath travels to the Hadoop Summit where we meet our cohorts from the Apache community and we stand for hours discussing the state of the art and what is next for the project. One such discussion set the course for the next year and a half for me.

We needed an innovative way to shuffle data. Frameworks like MapReduce and Tez have a shuffle phase in their processing lifecycle wherein the data from upstream producers is made available to downstream consumers. Even though Apache Tez was designed with a feature set corresponding to optimization requirements in Pig and Hive, the Shuffle Handler Service was retrofitted from MapReduce at the time of the project’s inception. With several thousands of jobs on our clusters leveraging these features in Tez, the Shuffle Handler Service became a clear performance bottleneck. So as we stood talking about our experience with Tez with our friends from the community, we decided to implement a new Shuffle Handler for Tez. All the conversation points were tracked now through an umbrella JIRA TEZ-3334 and the to-do list was long. I picked a few JIRAs and as I started reading through I realized, this is all new code I get to contribute to and review. There might be a better way to put this, but to be honest it was just a lot of fun! All the whiteboards were full, the team took walks post lunch and discussed how to go about defining the API. Countless hours were spent debugging hangs while fetching data and looking at stack traces and Wireshark captures from our test runs. Six months in and we had the feature on our sandbox clusters. There were moments ranging from sheer frustration to absolute exhilaration with high fives as we continued to address review comments and fixing big and small issues with this evolving feature.

As much as owning your code is valued everywhere in the software community, I would never go on to say “I did this!” In fact, “we did!” It is this strong sense of shared ownership and fluid team structure that makes the open source experience at Apache truly rewarding. This is just one example. A lot of the work that was done in Tez was leveraged by the Hive and Pig community and cross Apache product community interaction made the work ever more interesting and challenging. Triaging and fixing issues with the Tez rollout led us to hit a 100% migration score last year and we also rolled the Tez Shuffle Handler Service out to our research clusters. As of last year we have run around 100 million Tez DAGs with a total of 50 billion tasks over almost 38,000 nodes.

In 2018 as I move on to explore Hadoop 3.0 as our future release, I hope that if someone outside the Apache community is reading this, it will inspire and intrigue them to contribute to a project of their choice. As an astronomy aficionado, going from a newbie Apache contributor to a newbie Apache committer was very much like looking through my telescope - it has endless possibilities and challenges you to be your best.

About the Author:

Kuhu Shukla is a software engineer at Oath and did her Masters in Computer Science at North Carolina State University. She works on the Big Data Platforms team on Apache Tez, YARN and HDFS with a lot of talented Apache PMCs and Committers in Champaign, Illinois. A recent Apache Tez Committer herself she continues to contribute to YARN and HDFS and spoke at the 2017 Dataworks Hadoop Summit on “Tez Shuffle Handler: Shuffling At Scale With Apache Hadoop”. Prior to that she worked on Juniper Networks’ router and switch configuration APIs. She likes to participate in open source conferences and women in tech events. In her spare time she loves singing Indian classical and jazz, laughing, whale watching, hiking and peering through her Dobsonian telescope.

Yahoo Mail’s New Tech Stack, Built for Performance and Reliability

Post Syndicated from mikesefanov original https://yahooeng.tumblr.com/post/162320493306

By Suhas Sadanandan, Director of Engineering 

When it comes to performance and reliability, there is perhaps no application where this matters more than with email. Today, we announced a new Yahoo Mail experience for desktop based on a completely rewritten tech stack that embodies these fundamental considerations and more.

We built the new Yahoo Mail experience using a best-in-class front-end tech stack with open source technologies including React, Redux, Node.js, react-intl (open-sourced by Yahoo), and others. A high-level architectural diagram of our stack is below.

image

New Yahoo Mail Tech Stack

In building our new tech stack, we made use of the most modern tools available in the industry to come up with the best experience for our users by optimizing the following fundamentals:

Performance

A key feature of the new Yahoo Mail architecture is blazing-fast initial loading (aka, launch).

We introduced new network routing which sends users to their nearest geo-located email servers (proximity-based routing). This has resulted in a significant reduction in time to first byte and should be immediately noticeable to our international users in particular.

We now do server-side rendering to allow our users to see their mail sooner. This change will be immediately noticeable to our low-bandwidth users. Our application is isomorphic, meaning that the same code runs on the server (using Node.js) and the client. Prior versions of Yahoo Mail had programming logic duplicated on the server and the client because we used PHP on the server and JavaScript on the client.   

Using efficient bundling strategies (JavaScript code is separated into application, vendor, and lazy loaded bundles) and pushing only the changed bundles during production pushes, we keep the cache hit ratio high. By using react-atomic-css, our homegrown solution for writing modular and scoped CSS in React, we get much better CSS reuse.  

In prior versions of Yahoo Mail, the need to run various experiments in parallel resulted in additional branching and bloating of our JavaScript and CSS code. While rewriting all of our code, we solved this issue using Mendel, our homegrown solution for bucket testing isomorphic web apps, which we have open sourced.  

Rather than using custom libraries, we use native HTML5 APIs and ES6 heavily and use PolyesterJS, our homegrown polyfill solution, to fill the gaps. These factors have further helped us to keep payload size minimal.

With all the above optimizations, we have been able to reduce our JavaScript and CSS footprint by approximately 50% compared to the previous desktop version of Yahoo Mail, helping us achieve a blazing-fast launch.

In addition to initial launch improvements, key features like search and message read (when a user opens an email to read it) have also benefited from the above optimizations and are considerably faster in the latest version of Yahoo Mail.

We also significantly reduced the memory consumed by Yahoo Mail on the browser. This is especially noticeable during a long running session.

Reliability

With this new version of Yahoo Mail, we have a 99.99% success rate on core flows: launch, message read, compose, search, and actions that affect messages. Accomplishing this over several billion user actions a day is a significant feat. Client-side errors (JavaScript exceptions) are reduced significantly when compared to prior Yahoo Mail versions.

Product agility and launch velocity

We focused on independently deployable components. As part of the re-architecture of Yahoo Mail, we invested in a robust continuous integration and delivery flow. Our new pipeline allows for daily (or more) pushes to all Mail users, and we push only the bundles that are modified, which keeps the cache hit ratio high.

Developer effectiveness and satisfaction

In developing our tech stack for the new Yahoo Mail experience, we heavily leveraged open source technologies, which allowed us to ensure a shorter learning curve for new engineers. We were able to implement a consistent and intuitive onboarding program for 30+ developers and are now using our program for all new hires. During the development process, we emphasise predictable flows and easy debugging.

Accessibility

The accessibility of this new version of Yahoo Mail is state of the art and delivers outstanding usability (efficiency) in addition to accessibility. It features six enhanced visual themes that can provide accommodation for people with low vision and has been optimized for use with Assistive Technology including alternate input devices, magnifiers, and popular screen readers such as NVDA and VoiceOver. These features have been rigorously evaluated and incorporate feedback from users with disabilities. It sets a new standard for the accessibility of web-based mail and is our most-accessible Mail experience yet.

Open source 

We have open sourced some key components of our new Mail stack, like Mendel, our solution for bucket testing isomorphic web applications. We invite the community to use and build upon our code. Going forward, we plan on also open sourcing additional components like react-atomic-css, our solution for writing modular and scoped CSS in React, and lazy-component, our solution for on-demand loading of resources.

Many of our company’s best technical minds came together to write a brand new tech stack and enable a delightful new Yahoo Mail experience for our users.

We encourage our users and engineering peers in the industry to test the limits of our application, and to provide feedback by clicking on the Give Feedback call out in the lower left corner of the new version of Yahoo Mail.

Open Sourcing Bullet, Yahoo’s Forward-Looking Query Engine for Streaming Data

Post Syndicated from mikesefanov original https://yahooeng.tumblr.com/post/161855616651

image

By Michael Natkovich, Akshai Sarma, Nathan Speidel, Marcus Svedman, and Cat Utah

Big Data is no longer just Apache server logs. Nowadays, the data may be user engagement data, performance metrics, IoT (Internet of Things) data, or something else completely atypical. Regardless of the size of the data, or the type of querying patterns on it (exploratory, ad-hoc, periodic, long-term, etc.), everyone wants queries to be as fast as possible and cheap to run in terms of resources. Data can be broadly split into two kinds: the streaming (generally real-time) kind or the batched-up-over-a-time-interval (e.g., hourly or daily) kind. The batch version is typically easier to query since it is stored somewhere like a data warehouse that has nice SQL-like interfaces or an easy to use UI provided by tools such as Tableau, Looker, or Superset. Running arbitrary queries on streaming data quickly and cheaply though, is generally much harder… until now. Today, we are pleased to share our newly open sourced, forward-looking general purpose query engine, called Bullet, with the community on GitHub.

With Bullet, you can: 

  • Powerful and nested filtering
  • Fetching raw data records
  • Aggregating data using Group Bys (Sum, Count, Average, etc.), Count Distincts, Top Ks
  • Getting distributions of fields like Percentiles or Frequency histograms 

One of the key differences between how Bullet queries data and the standard querying paradigm is that Bullet does not store any data. In most other systems where you have a persistence layer (including in-memory storage), you are doing a look-back when you query the layer. Instead, Bullet operates on data flowing through the system after the query is started – it’s a look-forward system that doesn’t need persistence. On a real-time data stream, this means that Bullet is querying data after the query is submitted. This also means that Bullet does not query any data that has already passed through the stream. The fact that Bullet does not rely on a persistence layer is exactly what makes it extremely lightweight and cheap to run. 

To see why this is better for the kinds of use cases Bullet is meant for – such as quickly looking at some metric, checking some assumption, iterating on a query, checking the status of something right now, etc. – consider the following: if you had a 1000 queries in a traditional query system that operated on the same data, these query systems would most likely scan the data 1000 times each. By the very virtue of it being forward looking, 1000 queries in Bullet scan the data only once because the arrival of the query determines and fixes the data that it will see. Essentially, the data is coming to the queries instead of the queries being farmed out to where the data is. When the conditions of the query are satisfied (usually a time window or a number of events), the query terminates and returns you the result. 

A Brief Architecture Overview

image

High Level Bullet Architecture

The Bullet architecture is multi-tenant, can scale linearly for more queries and/or more data, and has been tested to handle 700+ simultaneous queries on a data stream that had up to 1.5 million records per second, or 5-6 GB/s. Bullet is currently implemented on top of Storm and can be extended to support other stream processing engines as well, like Spark Streaming or Flink. Bullet is pluggable, so you can plug in any source of data that can be read in Storm by implementing a simple data container interface to let Bullet work with it. 

The UI, web service, and the backend layers constitute your standard three-tier architecture. The Bullet backend can be split into three main subsystems:

  1. Request Processor – receives queries, adds metadata, and sends it to the rest of the system
  2. Data Processor – reads data from an input stream, converts it to a unified data format, and matches it against queries
  3. Combiner – combines results for different queries, performs final aggregations, and returns results 

The web service can be deployed on any servlet container, like Jetty. The UI is a Node-based Ember application that runs in the client browser. Our full documentation contains all the details on exactly how we perform computationally-intractable queries like Count Distincts on fields with cardinality in the millions, etc. (DataSketches). 

Usage at Yahoo 

An instance of Bullet is currently running at Yahoo in production against a small subset of Yahoo’s user engagement data stream. This data is roughly 100,000 records per second and is about 130 MB/s compressed. Bullet queries this with about 100 CPU Virtual Cores and 120 GB of RAM. This fits on less than 2 of our (64 Virtual Cores, 256 GB RAM each) test Storm cluster machines. 

One of the most popular use cases at Yahoo is to use Bullet to manually validate the instrumentation of an app or web application. Instrumentation produces user engagement data like clicks, views, swipes, etc. Since this data powers everything we do from analytics to personalization to targeting, it is absolutely critical that the data is correct. The usage pattern is generally to: 

  1. Submit a Bullet query to obtain data associated with your mobile device or browser (filter on a cookie value or mobile device ID)
  2. Open and use the application to generate the data while the Bullet query is running
  3. Go back to Bullet and inspect the data 

In addition, Bullet is also used programmatically in continuous delivery pipelines for functional testing instrumentation on product releases. Product usage is simulated, then data is generated and validated in seconds using Bullet. Bullet is orders of magnitude faster to use for this kind of validation and for general data exploration use cases, as opposed to waiting for the data to be available in Hive or other systems. The Bullet UI supports pivot tables and a multitude of charting options that may speed up analysis further compared to other querying options. 

We also use Bullet to do a bunch of other interesting things, including instances where we dynamically compute cardinalities (using a Count Distinct Bullet query) of fields as a check to protect systems that can’t support extremely high cardinalities for fields like Druid. 

What you do with Bullet is entirely determined by the data you put it on. If you put it on data that is essentially some set of performance metrics (data center statistics for example), you could be running a lot of queries that find the 95th and 99th percentile of a metric. If you put it on user engagement data, you could be validating instrumentation and mostly looking at raw data. 

We hope you will find Bullet interesting and tell us how you use it. If you find something you want to change, improve, or fix, your contributions and ideas are always welcome! You can contact us here

Helpful Links 

Open Sourcing Daytona: A Framework For Automated and Application-agnostic Performance Analysis

Post Syndicated from mikesefanov original https://yahooeng.tumblr.com/post/160987779296

By Sapan Panigrahi and Deepesh Mittal

Today, we are pleased to offer Daytona, an open-source framework for automated performance testing and analysis, to the community. Daytona is an application-agnostic framework to conduct integrated performance testing and analysis with repeatable test execution, standardized reporting, and built-in profiling support.

Daytona gives you the capability to build a customized test harness in a single, unified framework to test and analyze the performance of any application. You’ll get easy repeatability, consistent reporting, and the ability to capture trends. Daytona’s UI accepts a performance testing script that can run on a command line. This includes websites, databases, networks, or any workload you need to test and tune for performance. You can submit tests to the scheduler queue from the Daytona UI or from your CI/CD tool. You can deploy Daytona as a hosted service in your on-prem environment or on the public cloud of your choice. In fact, you can even host test harnesses for multiple applications with a single centralized service so that developers, architects, and systems engineers from different parts of your organization can work together on a unified view and manage your performance analysis on a continuous basis.

Daytona’s differentiation lies in its ability to aggregate and present essential aspects of application, system, and hardware performance metrics with a simple and unified user interface. This helps you maintain your focus on performance analysis without changing context across various sources and formats of data. The overall goal of performance analysis is to find ways of maximizing application throughput with minimum hardware resource and the best user experience. Metrics and insights from Daytona help achieve this objective.

Prior to Daytona, we created multiple, heterogenous performance tools to meet the specific needs of various applications. This meant that we often stored test results inconsistently, making it harder to analyze performance in a comprehensive manner. We had a difficult time sharing results and analyzing differences in test runs in a standard manner, which could lead to confusion.

With Daytona, we are now able to integrate all our load testing tools under a single framework and aggregate test results in one common central repository. We are gaining insight into the performance characteristics of many of our applications on a continuous basis. These insights help us optimize our applications which results in better utilization of our hardware resources and helps improve user experience by reducing the latency to serve end-user requests. Ultimately, Daytona helps us reduce capital expenditure on our large-scale infrastructure and makes our applications more robust under load. Sharing performance results in a common format encourages the use of common optimization techniques that we can leverage across many different applications.

Daytona was built knowing that we would want to publish it as open source and share the technology with the community for validation and improvement of the framework. We hope the community can help extend its use cases and make it suitable for an even broader set of applications and workloads.

Architecture

Daytona is comprised of a centralized scheduler, a distributed set of agents running on SUTs (systems under test), a MySQL database to store all metadata for tests, and a PHP-based UI. A test harness can be customized by answering a simple set of questions about the application/workload. A test can be submitted to Daytona’s queue through the UI or through a CLI (Command Line Interface) from the CI/CD system. The scheduler process polls the database for a test to be run and sends all the actions associated with the execution of the test to the agent running on a SUT. An agent process executes the test, collects application and system performance metrics, and sends the metrics back as a package to the scheduler. The scheduler saves the test metadata in the database and test results in the local file system. Tests from multiple harnesses proceed concurrently.

image

Architecture and Life Cycle Of A Test

Looking Forward

Our goal is to integrate Daytona with popular open source CI/CD tools and we welcome contributions from the community to make that happen. It is available under Apache License Version 2.0. To evaluate Daytona, we provide simple instructions to deploy it on your in-house bare metal, VM, or public cloud infrastructure. We also provide instructions so you can quickly have a test and development environment up and running on your laptop with Docker. Please join us on the path of making application performance analysis an enjoyable and insightful experience. Visit the Daytona Yahoo repo to get started!