Grafana is a rich interactive open-source tool by Grafana Labs for visualizing data across one or many data sources. It’s used in a variety of modern monitoring stacks, allowing you to have a common technical base and apply common monitoring practices across different systems. Amazon Managed Grafana is a fully managed, scalable, and secure Grafana-as-a-service solution developed by AWS in collaboration with Grafana Labs.
Amazon Redshift is the most widely used data warehouse in the cloud. You can view your Amazon Redshift cluster’s operational metrics on the Amazon Redshift console, use AWS CloudWatch, and query Amazon Redshift system tables directly from your cluster. The first two options provide a set of predefined general metrics and visualizations. The last one allows you to use the flexibility of SQL to get deep insights into the details of the workload. However, querying system tables requires knowledge of system table structures. To address that, we came up with a consolidated Amazon Redshift Grafana dashboard that visualizes a set of curated operational metrics and works on top of the Amazon Redshift Grafana data source. You can easily add it to an Amazon Managed Grafana workspace, as well as to any other Grafana deployments where the data source is installed.
This post guides you through a step-by-step process to create an Amazon Managed Grafana workspace and configure an Amazon Redshift cluster with a Grafana data source for it. Lastly, we show you how to set up the Amazon Redshift Grafana dashboard to visualize the cluster metrics.
Solution overview
The following diagram illustrates the solution architecture.
The solution includes the following components:
The Amazon Redshift cluster to get the metrics from.
Amazon Managed Grafana, with the Amazon Redshift data source plugin added to it. Amazon Managed Grafana communicates with the Amazon Redshift cluster via the Amazon Redshift Data Service API.
The Grafana web UI, with the Amazon Redshift dashboard using the Amazon Redshift cluster as the data source. The web UI communicates with Amazon Managed Grafana via an HTTP API.
We walk you through the following steps during the configuration process:
Configure an Amazon Redshift cluster.
Create a database user for Amazon Managed Grafana on the cluster.
Configure a user in AWS Single Sign-On (AWS SSO) for Amazon Managed Grafana UI access.
Configure an Amazon Managed Grafana workspace and sign in to Grafana.
Set up Amazon Redshift as the data source in Grafana.
Import the Amazon Redshift dashboard supplied with the data source.
Prerequisites
To follow along with this walkthrough, you should have the following prerequisites:
Familiarity with the basic concepts of the following services:
Amazon Redshift
Amazon Managed Grafana
AWS SSO
Configure an Amazon Redshift cluster
If you don’t have an Amazon Redshift cluster, create a sample cluster before proceeding with the following steps. For this post, we assume that the cluster identifier is called redshift-demo-cluster-1 and the admin user name is awsuser.
On the Amazon Redshift console, choose Clusters in the navigation pane.
Choose your cluster.
Choose the Properties tab.
To make the cluster discoverable by Amazon Managed Grafana, you must add a special tag to it.
Choose Add tags.
For Key, enter GrafanaDataSource.
For Value, enter true.
Choose Save changes.
Create a database user for Amazon Managed Grafana
Grafana will be directly querying the cluster, and it requires a database user to connect to the cluster. In this step, we create the user redshift_data_api_user and apply some security best practices.
On the cluster details page, choose Query data and Query in query editor v2.
Choose the redshift-demo-cluster-1 cluster we created previously.
For Database, enter the default dev.
Enter the user name and password that you used to create the cluster.
Choose Create connection.
In the query editor, enter the following statements and choose Run:
CREATE USER redshift_data_api_user PASSWORD '<password>' CREATEUSER;
ALTER USER redshift_data_api_user SET readonly TO TRUE;
ALTER USER redshift_data_api_user SET query_group TO 'superuser';
The first statement creates a user with superuser privileges necessary to access system tables and views (make sure to use a unique password). The second prohibits the user from making modifications. The last statement isolates the queries the user can run to the superuser queue, so they don’t interfere with the main workload.
In this example, we use service managed permissions in Amazon Managed Grafana and a workspace AWS Identity and Access Management (IAM) role as an authentication provider in the Amazon Redshift Grafana data source. We create the database user redshift_data_api_user using the AmazonGrafanaRedshiftAccess policy.
Configure a user in AWS SSO for Amazon Managed Grafana UI access
Two authentication methods are available for accessing Amazon Managed Grafana: AWS SSO and SAML. In this example, we use AWS SSO.
On the AWS SSO console, choose Users in the navigation pane.
Choose Add user.
In the Add user section, provide the required information.
In this post, we select Send an email to the user with password setup instructions. You need to be able to access the email address you enter because you use this email further in the process.
Choose Next to proceed to the next step.
Choose Add user.
An email is sent to the email address you specified.
Choose Accept invitation in the email.
You’re redirected to sign in as a new user and set a password for the user.
Enter a new password and choose Set new password to finish the user creation.
Configure an Amazon Managed Grafana workspace and sign in to Grafana
Now you’re ready to set up an Amazon Managed Grafana workspace.
On the Amazon Grafana console, choose Create workspace.
For Workspace name, enter a name, for example grafana-demo-workspace-1.
Choose Next.
For Authentication access, select AWS Single Sign-On.
For Permission type, select Service managed.
Chose Next to proceed.
For IAM permission access settings, select Current account.
For Data sources, select Amazon Redshift.
Choose Next to finish the workspace creation.
You’re redirected to the workspace page.
Next, we need to enable AWS SSO as an authentication method.
On the workspace page, choose Assign new user or group.
Select the previously created AWS SSO user under Users and Select users and groups tables.
You need to make the user an admin, because we set up the Amazon Redshift data source with it.
Select the user from the Users list and choose Make admin.
Go back to the workspace and choose the Grafana workspace URL link to open the Grafana UI.
Sign in with the user name and password you created in the AWS SSO configuration step.
Set up an Amazon Redshift data source in Grafana
To visualize the data in Grafana, we need to access the data first. To do so, we must create a data source pointing to the Amazon Redshift cluster.
On the navigation bar, choose the lower AWS icon (there are two) and then choose Redshift from the list.
For Regions, choose the Region of your cluster.
Select the cluster from the list and choose Add 1 data source.
On the Provisioned data sources page, choose Go to settings.
For Name, enter a name for your data source.
By default, Authentication Provider should be set as Workspace IAM Role, Default Region should be the Region of your cluster, and Cluster Identifier should be the name of the chosen cluster.
For Database, enter dev.
For Database User, enter redshift_data_api_user.
Choose Save & Test.
A success message should appear.
Import the Amazon Redshift dashboard supplied with the data source
As the last step, we import the default Amazon Redshift dashboard and make sure that it works.
In the data source we just created, choose Dashboards on the top navigation bar and choose Import to import the Amazon Redshift dashboard.
Under Dashboards on the navigation sidebar, choose Manage.
In the dashboards list, choose Amazon Redshift.
The dashboard appear, showing operational data from your cluster. When you add more clusters and create data sources for them in Grafana, you can choose them from the Data source list on the dashboard.
Clean up
To avoid incurring unnecessary charges, delete the Amazon Redshift cluster, AWS SSO user, and Amazon Managed Grafana workspace resources that you created as part of this solution.
Conclusion
In this post, we covered the process of setting up an Amazon Redshift dashboard working under Amazon Managed Grafana with AWS SSO authentication and querying from the Amazon Redshift cluster under the same AWS account. This is just one way to create the dashboard. You can modify the process to set it up with SAML as an authentication method, use custom IAM roles to manage permissions with more granularity, query Amazon Redshift clusters outside of the AWS account where the Grafana workspace is, use an access key and secret or AWS Secrets Manager based connection credentials in data sources, and more. You can also customize the dashboard by adding or altering visualizations using the feature-rich Grafana UI.
Because the Amazon Redshift data source plugin is an open-source project, you can install it in any Grafana deployment, whether it’s in the cloud, on premises, or even in a container running on your laptop. That allows you to seamlessly integrate Amazon Redshift monitoring into virtually all your existing Grafana-based monitoring stacks.
For more details about the systems and processes described in this post, refer to the following:
Sergey Konoplev is a Senior Database Engineer on the Amazon Redshift team. Sergey has been focusing on automation and improvement of database and data operations for more than a decade.
Milind Oke is a Data Warehouse Specialist Solutions Architect based out of New York. He has been building data warehouse solutions for over 15 years and specializes in Amazon Redshift.
In this post, we will walk through using Amazon Pinpoint and Amazon Quicksight to create customizable messaging campaign reports. Amazon Pinpoint is a flexible and scalable outbound and inbound marketing communications service that allows customers to connect with users over channels like email, SMS, push, or voice. Amazon QuickSight is a scalable, serverless, embeddable, machine learning-powered business intelligence (BI) service built for the cloud. This solution allows event and user data from Amazon Pinpoint to flow into Amazon Quicksight. Once in Quicksight, customers can build their own reports that shows campaign performance on a more granular level.
Engagement Event Dashboard
Customers want to view the results of their messaging campaigns in ever increasing levels of granularity and ensure their users see value from the email, SMS or push notifications they receive. Customers also want to analyze how different user segments respond to different messages, and how to optimize subsequent user communication. Previously, customers could only view this data in Amazon Pinpoint analytics, which offers robust reporting on: events, funnels, and campaigns. However, does not allow analysis across these different parameters and the building of custom reports. For example, show campaign revenue across different user segments, or show what events were generated after a user viewed a campaign in a funnel analysis. Customers would need to extract this data themselves and do the analysis in excel.
Customers should be prepared to purchase Amazon Quicksight because it has its own set of costs which is not covered within Amazon Pinpoint cost.
Solution Overview
This Solution uses the Athena tables created by Digital user engagement events database solution. The AWS CloudFormation template given in this post automatically sets up the different architecture components, to capture detailed notifications about Amazon Pinpoint engagement events and log those in Amazon Athena in the form of Athena views. You still need to manually configure Amazon Quicksight dashboards to link to these newly generated Athena views. Please follow the steps below in order for further information.
Use case(s)
Event dashboard solutions have following use cases: –
Deep dive into engagement insights. (eg: SMS events, Email events, Campaign events, Journey events)
The ability to view engagement events at the individual user level.
Data/process mining turn raw event data into useful marking insights.
User engagement benchmarking and end user event funneling.
Compute campaign conversions (post campaign user analysis to show campaign effectiveness)
Build funnels that shows user progression.
Getting started with solution deployment
Prerequisite tasks to be completed before deploying the logging solution
Step 1 – Create AWS account, Pinpoint Project, Implement Event-Database-Solution. As part of this step customers need to implement DUE Event database solution as the current solution (DUE event dashboard) is an extension of DUE event database solution. The basic assumption here is that the customer has already configured Amazon Pinpoint project or Amazon SES within the required AWS region before implementing this step.
The steps required to implement an event dashboard solution are as follows.
a/Follow the steps mentioned in Event database solution to implement the complete stack. Prior installing the complete stack copy and save the name Athena events database name as shown in the diagram. For my case it is due_eventdb. Database name is required as an input parameter for the current Event Dashboard solution.
b/Once the solution is deployed, navigate to the output page of the cloud formation stack, and copy, and save the following information, which will be required as input parameters in step 2 of the current Event Dashboard solution.
Step 2 – Deploy Cloud formation template for Event dashboard solution This step generates a number of new Amazon Athena views that will serve as a data source for Amazon Quicksight. Continue with the following actions.
Navigate to Cloud formation page in AWS console, click up right on “Create stack” and select the option “With new resources (standard)”
Leave the “Prerequisite – Prepare template” to “Template is ready” and for the “Specify template” option, select “Upload a template file”. On the same page, click on “Choose file”, browse to find the file “Event-dashboard.yaml” file and select it. Once the file is uploaded, click “Next” and deploy the stack.
Enter following information under the section “Specify stack details”:
EventAthenaDatabaseName – As mentioned in Step 1-a.
S3DataLogBucket- As mentioned in Step 1-b
This solution will create additional 5 Athena views which are
All_email_events
All_SMS_events
All_custom_events (Custom events can be Mobile app/WebApp/Push Events)
All_campaign_events
All_journey_events
Step 3 – Create Amazon Quicksight engagement Dashboard This step walks you through the process of creating an Amazon Quicksight dashboard for Amazon Pinpoint engagement events using the Athena views you created in step-2
To Setup Amazon Quicksight for the 1st time please follow this link (this process is not needed if you have already setup Amazon Quicksight). Please make sure you are an Amazon Quicksight Administrator.
Go/search Amazon Quicksight on AWS console.
Create New Analysis and then select “New dataset”
Select Athena as data source
As a next step, you need to select what all analysis you need for respective events. This solution provides option to create 5 different set of analysis as mentioned in Step 2. They are a/All email events, b/All SMS Events, c/All Custom Events (Mobile/Web App, web push etc), d/ All Campaign events, e/All Journey events. Dashboard can be created from Quicksight analysis and same can be shared among the organization stake holders. Following are the steps to create analysis and dashboards for different type of events.
Email Events –
For all email events, name the analysis “All-emails-events” (this can be any kind of customer preferred nomenclature), select Athena workgroup as primary, and then create a data source.
Once you create the data source Quicksight lists all the views and tables available under the specified database (in our case it is:- due_eventdb). Select the email_all_events view as data source.
Select the event data location for analysis. There are mainly two options available which are a/ Import to Spice quicker analysis b/ Directly query your data. Please select the preferred options and then click on “visualize the data”.
Import to Spice quicker analysis – SPICE is the Amazon QuickSight Super-fast, Parallel, In-memory Calculation Engine. It’s engineered to rapidly perform advanced calculations and serve data. In Enterprise edition, data stored in SPICE is encrypted at rest. (1 GB of storage is available for free for extra storage customer need to pay extra, please refer cost section in this document )
Directly query your data – This process enables Quicksight to query directly to the Athena or source database (In the current case it is Athena) and Quicksight will not store any data.
Now that you have selected a data source, you will be taken to a blank quick sight canvas (Blank analysis page) as shown in the following Image, please drag and drop what visualization type you need to visualize onto the auto-graph pane. Please note that Amazon QuickSight is a Busines intelligence platform, so customers are free to choose the desired visualization types to observe the individual engagement events.
As part of this blog, we have displayed how to create some simple analysis graphs to visualize the engagement events.
As an initial step please Select tabular Visualization as shown in the Image.
Select all the event dimensions that you want to put it as part of the Table in X axis. Amazon Quicksight table can be extended to show as many as tables columns, this completely depends upon the business requirement how much data marketers want to visualize.
Further filtering on the table can be done using Quicksight filters, you can apply the filter on specific granular values to enable further filtering. For Eg – If you want to apply filtering on the destination email Id then 1/Select the filter from left hand menu 2/Add destination field as the filtering criterion 3/ Tick on the destination field you are trying to filter or search for the Destination email ID that 4/ All the result in the table gets further filtered as per the filter criterion
As a next step please add another visual from top left corner “Add -> Add Visual”, then select the Donut Chart from Visual types pane. Donut charts are always used for displaying aggregation.
Then select the “event_type” as the Group to visualize the aggregated events, this helps marketers/business users to figure out how many email events occurred and what are the aggregated success ratio, click ratio, complain ratio or bounce ratio etc for the emails/Campaign that’s sent to end users.
To create a Quicksight dashboards from the Quicksight analysis click Share menu option at the top right corner then select publish dashboard”. Provide required dashboard name while publishing the dashboard”. Same dashboard can be shared with multiple audiences in the Organization.
Following is the final version of the dashboard. As mentioned above Quicksight dashboards can be shared with other stakeholders and also complete dashboard can be exported as excel sheet.
SMS Events-
As shown above SMS events can be analyzed using Quicksight and dash boards can be created out of the analysis. Please repeat all of the sub-steps listed in step 6. Following is a sample SMS dashboard.
Custom Events-
After you integrate your application (app) with Amazon Pinpoint, Amazon Pinpoint can stream event data about user activity, different type custom events, and message deliveries for the app. Eg :- Session.start, Product_page_view, _session.stop etc. Do repeat all of the sub-steps listed in step 6 create a custom event dashboards.
Campaign events
As shown before campaign also can be included in the same dashboard or you can create new dashboard only for campaign events.
Cost for Event dashboard solution You are responsible for the cost of the AWS services used while running this solution. As of the date of publication, the cost for running this solution with default settings in the US West (Oregon) Region is approximately $65 a month. The cost estimate includes the cost of AWS Lambda, Amazon Athena, Amazon Quicksight. The estimate assumes querying 1TB of data in a month, and two authors managing Amazon Quicksight every month, four Amazon Quicksight readers witnessing the events dashboard unlimited times in a month, and a Quicksight spice capacity is 50 GB per month. Prices are subject to change. For full details, see the pricing webpage for each AWS service you will be using in this solution.
Clean up
When you’re done with this exercise, complete the following steps to delete your resources and stop incurring costs:
On the CloudFormation console, select your stack and choose Delete. This cleans up all the resources created by the stack,
Delete the Amazon Quicksight Dashboards and data sets that you have created.
Conclusion
In this blog post, I have demonstrated how marketers, business users, and business analysts can utilize Amazon Quicksight dashboards to evaluate and exploit user engagement data from Amazon SES and Pinpoint event streams. Customers can also utilize this solution to understand how Amazon Pinpoint campaigns lead to business conversions, in addition to analyzing multi-channel communication metrics at the individual user level.
Next steps
The personas for this blog are both the tech team and the marketing analyst team, as it involves a code deployment to create very simple Athena views, as well as the steps to create an Amazon Quicksight dashboard to analyse Amazon SES and Amazon Pinpoint engagement events at the individual user level. Customers may then create their own Amazon Quicksight dashboards to illustrate the conversion ratio and propensity trends in real time by integrating campaign events with app-level events such as purchase conversions, order placement, and so on.
Extending the solution
You can download the AWS Cloudformation templates, code for this solution from our public GitHub repository and modify it to fit your needs.
About the Author
Satyasovan Tripathy works at Amazon Web Services as a Senior Specialist Solution Architect. He is based in Bengaluru, India, and specialises on the AWS Digital User Engagement product portfolio. He likes reading and travelling outside of work.
Today, dark mode is available for the Cloudflare Dashboard in beta! From your user profile, you can configure the Cloudflare Dashboard in light mode, dark mode, or match it to your system settings.
For those unfamiliar, dark mode, or light on dark color schemes, uses light text on dark backgrounds instead of the typical dark text on light (usually white) backgrounds. In low-light environments, this can help reduce eyestrain and actually reduce power consumption on OLED screens. For many though, dark mode is simply a preference supported widely by applications and devices.
Side by side comparing the Cloudflare dashboard in dark mode and in light mode
Under Appearance, select an option: Light, Dark, or Use system setting. For the time being, your choice is saved into local storage.
The appearance card in the dashboard for modifying color themes
There are many primers and how-tos on implementing dark mode, and you can find articles talking about the general complications of implementing a dark mode including this straightforward explanation. Instead, we will talk about what enabled us to be able to implement dark mode in only a matter of weeks.
Cloudflare’s Design System – Our Secret Weapon
Before getting into the specifics of how we implemented dark mode, it helps to understand the system that underpins all product design and UI work at Cloudflare – the Cloudflare Design System.
The six pillars of the design system: logo, typography, color, layout, icons, videos
Cloudflare’s Design System defines and documents the interface elements and patterns used to build products at Cloudflare. The system can be used to efficiently build consistent experiences for Cloudflare customers. In practice, the Design System defines primitives like typography, color, layout, and icons in a clear and standard fashion. What this means is that anytime a new interface is designed, or new UI code is written, an easily referenceable, highly detailed set of documentation is available to ensure that the work matches previous work. This increases productivity, especially for new employees, and prevents repetitious discussions about style choices and interaction design.
Built on top of these design primitives, we also have our own component library. This is a set of ready to use components that designers and engineers can combine to form the products our customers use every day. They adhere to the design system, are battle tested in terms of code quality, and enhance the user experience by providing consistent implementations of common UI components. Any button, table, or chart you see looks and works the same because it is the same underlying code with the relevant data changed for the specific use case.
So, what does all of this have to do with dark mode? Everything, it turns out. Due to the widespread adoption of the design system across the dashboard, changing a set of variables like background color and text color in a specific way and seeing the change applied nearly everywhere at once becomes much easier. Let’s take a closer look at how we did that.
Turning Out the Lights
The use of color at Cloudflare has a well documented history. When we originally set out to build our color system, the tools we built and the extensive research we performed resulted in a ten-hue, ten-luminosity set of colors that can be used to build digital products. These colors were built to be accessible — not just in terms of internal use, but for our customers. Take our blue hue scale, for example.
Our blue color scale, as used on the Cloudflare Dashboard. This shows color-contrast accessible text and background pairings for each step in the scale.
Each hue in our color scale contains ten colors, ordered by luminosity in ten increasing increments from low luminosity to high luminosity. This color scale allows us to filter down the choice of color from the 16,777,216 hex codes available on the web to a much simpler choice of just hue and brightness. As a result, we now have a methodology where designers know the first five steps in a scale have sufficient color contrast with white or lighter text, and the last five steps in a scale have sufficient contrast with black or darker text.
Color scales also allow us to make changes while designing in a far more fluid fashion. If a piece of text is too bright relative to its surroundings, drop down a step on the scale. If an element is too visually heavy, take a step-up. With the Design System and these color scales in place, we’ve been able to design and ship products at a rapid rate.
So, with this color system in place, how do we begin to ship a dark mode? It turns out there’s a simple solution to this, and it’s built into the JS standard library. We call reverse() and flip the luminosity scales.
Our blue color scale after calling reverse on it. High luminosity colors are now at the start of the scale, making them contrast accessible with darker backgrounds (and vice-versa).
By performing this small change within our dashboard’s React codebase and shipping a production preview deploy, we were able to see the Cloudflare Dashboard in dark mode with a whole new set of colors in a matter of minutes.
An early preview of the Cloudflare Dashboard after flipping our color scales.
While not perfect, this brief prototype gave us an incredibly solid baseline and validated the approach with a number of benefits.
Every product built using the Cloudflare Design System now had a dark mode theme built in for free, with no additional work required by teams.
Our color contrast principles remain sound — just as the first five colors in a scale would be accessible with light text, when flipped, the first five colors in the scale are accessible with dark text. Our scales aren’t perfectly symmetrical, but when using white and black, the principle still holds.
In a traditional approach of “inverting” colors, we face the issue of a color’s hue being changed too. When a color is broken down into its constituent hue, saturation, and luminosity values, inverting it would mean a vibrant light blue would become a dull dark orange. Our approach of just inverting the luminosity of a color means that we retain the saturation and hue of a color, meaning we retain Cloudflare’s brand aesthetic and the associated meaning of each hue (blue buttons as calls-to-action, and so on).
Of course, shipping a dark mode for a product as complex as the Cloudflare Dashboard can’t just be done in a matter of minutes.
Not Quite Just Turning the Lights Off
Although our prototype did meet our initial requirements of facilitating the dashboard in a dark theme, some details just weren’t quite right. The data visualization and mapping libraries we use, our icons, text, and various button and link states all had to be audited and required further iterations. One of the most obvious and prominent examples was the page background color. Our prototype had simply changed the background color from white (#FFFFFF) to black (#000000). It quickly became apparent that black wasn’t appropriate. We received feedback that it was “too intense” and “harsh.” We instead opted for off black, specifically what we refer to as “gray.0” or #1D1D1D. The difference may not seem noticeable, but at larger dimensions, the gray background is much less distracting.
Here is what it looks like in our design system:
Black background color contrast for white textGray background color contrast for white text
And here is a more realistic example:
lorem ipsum sample text on black background and on gray background
The numbers at the end of each row represent the contrast of the text color on the background. According to the Web Content Accessibility Guidelines (WCAG), the standard contrast ratio for text should be at least 4.5:1. In our case, while both of the above examples exceed the standard, the gray background ends up being less harsh to use across an entire application. This is not the case with light mode as dark text on white (#FFFFFF) background works well.
Our technique during the prototyping stage involved flipping our color scale; however, we additionally created a tool to let us replace any color within the scale arbitrarily. As the dashboard is made up of charts, icons, links, shadows, buttons and certainly other components, we needed to be able to see how they reacted in their various possible states. Importantly, we also wanted to improve the accessibility of these components and pay particular attention to color contrast.
Color picker tool screenshot showing a color scale
For example, a button is made up of four distinct states:
1) Default 2) Focus 3) Hover 4) Active
Example showing the various colors for states of buttons in light and dark mode
We wanted to ensure that each of these states would be at least compliant with the AA accessibility standards according to the WCAG. Using a combination of our design systems documentation and a prioritized list of components and pages based on occurrence and visits, we meticulously reviewed each state of our components to ensure their compliance.
Side by side comparison of the navbar in light and dark modes
The navigation bar used to select between the different applications was a component we wanted to treat differently compared to light mode. In light mode, the app icons are a solid blue with an outline of the icon; it’s a distinct look and certainly one that grabs your attention. However, for dark mode, the consensus was that it was too bright and distracting for the overall desired experience. We wanted the overall aesthetic of dark mode to be subtle, but it’s important to not conflate aesthetic with poor usability. With that in mind, we made the decision for the navigation bar to use outlines around each icon, instead of being filled in. Only the selected application has a filled state. By using outlines, we are able to create sufficient contrast between the current active application and the rest. Additionally, this provided a visually distinct way to present hover states, by displaying a filled state.
After applying the same methodology as described to other components like charts, icons, and links, we end up with a nicely tailored experience without requiring a substantial overhaul of our codebase. For any new UI that teams at Cloudflare build going forward, they will not have to worry about extra work to support dark mode. This means we get an improved customer experience without any impact to our long term ability to keep delivering amazing new capabilities — that’s a win-win!
Welcome to the Dark Side
We know many of you have been asking for this, and we are excited to bring dark mode to all. Without the investment into our design system by many folks at Cloudflare, dark mode would not have seen the light of day. You can enable dark mode on the Appearance card in your user profile. You can give feedback to shape the future of the dark theme with the feedback form in the card.
If you find these types of problems interesting, come help us tackle them! We are hiring across product, design, and engineering!
If you’re writing code: what can go wrong, will go wrong.
Many developers know the feeling: “It worked in the local testing suite, it worked in our staging environment, but… it’s broken in production?” Testing can reduce mistakes and debugging can help find them, but logs give us the tools to understand and improve what we are creating.
if (this === undefined) {
console.log("there’s no way… right?") // Narrator: there was.
}
While logging can help you understand when the seemingly impossible is actually possible, it’s something that no developer really wants to set up or maintain on their own. That’s why we’re excited to launch a new addition to the Cloudflare Workers platform: logs and exceptions from the dashboard.
Starting today, you can view and filter the console.log output and exceptions from a Worker… at no additional cost with no configuration needed!
View logs, just a click away
When you view a Worker in the dashboard, you’ll now see a “Logs” tab which you can click on to view a detailed stream of logs and exceptions. Here’s what it looks like in action:
Each log entry contains an event with a list of logs, exceptions, and request headers if it was triggered by an HTTP request. We also automatically redact sensitive URLs and headers such as Authorization, Cookie, or anything else that appears to have a sensitive name.
If you are in the Durable Objects open beta, you will also be able to view the logs and requests sent to each Durable Object. This is a great tool to help you understand and debug the interactions between your Worker and a Durable Object.
For now, we support filtering by event status and type. Though, you can expect more filters to be added to the dashboard very soon! Today, we support advanced filtering with the wrangler CLI, which will be discussed later in this blog.
console.log(), and you’re all set
It’s really simple to get started with logging for Workers. Simply invoke one of the standard console APIs, such as console.log(), and we handle the rest. That’s it! There’s no extra setup, no configuration needed, and no hidden logging fees.
function logRequest (request) {
const { cf, headers } = request
const { city, region, country, colo, clientTcpRtt } = cf
console.log("Detected location:", [city, region, country].filter(Boolean).join(", "))
if (clientTcpRtt) {
console.debug("Round-trip time from client to", colo, "is", clientTcpRtt, "ms")
}
// You can also pass an object, which will be interpreted as JSON.
// This is great if you want to define your own structured log schema.
console.log({ headers })
}
In fact, you don’t even need to use console.log to view an event from the dashboard. If your Worker doesn’t generate any logs or exceptions, you will still be able to see the request headers from the event.
Advanced filters, from your terminal
If you need more advanced filters you can use wrangler, our command-line tool for deploying Workers. We’ve updated the wrangler tail command to support sampling and a new set of advanced filters. You also no longer need to install or configure cloudflared to use the command. Not to mention it’s much faster, no more waiting around for logs to appear. Here are a few examples:
# Filter by your own IP address, and if there was an uncaught exception.
wrangler tail --format=pretty --ip-address=self --status=error
# Filter by HTTP method, then apply a 10% sampling rate.
wrangler tail --format=pretty --method=GET --sampling-rate=0.1
# Filter using a generic search query.
wrangler tail --format=pretty --search="TypeError"
We recommend using the “pretty” format, since wrangler will output your logs in a colored, human-readable format. (We’re also working on a similar display for the dashboard.)
However, if you want to access structured logs, you can use the “json” format. This is great if you want to pipe your logs to another tool, such as jq, or save them to a file. Here are a few more examples:
# Parses each log event, but only outputs the url.
wrangler tail --format=json | jq .event.request?.url
# You can also specify --once to disconnect the tail after receiving the first log.
# This is useful if you want to run tests in a CI/CD environment.
wrangler tail --format=json --once > event.json
Try it out!
Both logs from the dashboard and wrangler tail are available and free for existing Workers customers. If you would like more information or a step-by-step guide, check out any of the resources below.
Cloudflare’s dashboard now supports four new languages (and multiple locales): Spanish (with country-specific locales: Chile, Ecuador, Mexico, Peru, and Spain), Brazilian Portuguese, Korean, and Traditional Chinese. Our customers are global and diverse, so in helping build a better Internet for everyone, it is imperative that we bring our products and services to customers in their native language.
Since last year Cloudflare has been hard at work internationalizing our dashboard. At the end of 2019, we launched our first language other than US English: German. At the end of March 2020, we released three additional languages: French, Japanese, and Simplified Chinese. If you want to start using the dashboard in any of these languages, you can change your language preference in the top right of the Cloudflare dashboard. The preference selected will be saved and used across all sessions.
In this blog post, I want to help those unfamiliar with internationalization and localization to better understand how it works. I also would like to tell the story of how we made internationalizing and localizing our application a standard and repeatable process along with sharing a few tips that may help you as you do the same.
Beginning the journey
The first step in internationalization is externalizing all the strings in your application. In concrete terms this means taking any text that could be read by a user and extracting it from your application code into separate, stand-alone files. This needs to be done for a few reasons:
It enables translation teams to work on translating these strings without needing to view or change any application code.
Most translators typically use Translation Management applications which automate aspects of the workflow and provide them with useful utilities (like translation memory, change tracking, and a number of useful parsing and formatting tools). These applications expect standardized text formats (such as json, xml, md, or csv files).
From an engineering perspective, separating application code from translations allows for making changes to strings without re-compiling and/or re-deploying code. In our React based application, externalizing most of our strings boiled down to changing blocks of code like this:
<Button>Cancel</Button>
<Button>Next</Button>
Into this:
<Button><Trans id="signup.cancel" /></Button>
<Button><Trans id="signup.next" /></Button>
// And in a separate catalog.json file for en_US:
{
"signup.cancel": "Cancel",
"signup.next": "Next",
// ...many more keys
}
The <Trans> component shown above is the fundamental i18n building block in our application. In this scheme, translated strings are kept in large dictionaries keyed by a translation id. We call these dictionaries “translation catalogs”, and there are a set of translation catalogs for each language that we support.
At runtime, the <Trans> component looks up the translation in the correct catalog for the provided key and then inserts this translation into the page (via the DOM). All of an application’s static text can be externalized with simple transformations like these.
However, when dynamic data needs to be intermixed with static text, the solution becomes a little more complicated. Consider the following seemingly straightforward example which is riddled with i18n landmines:
This gets the job done and may even seem like an elegant solution. After all, both the selected.prefix and pageRules.suffix strings seem like they are destined to be reused. Unfortunately, chopping sentences up and then concatenating translated bits back together like this turns out to be the single largest pitfall when externalizing strings for internationalization.
The problem is that when translated, the various words that make up a sentence can be morphed in different ways based on context (singular vs plural contexts, due to word gender, subject/verb agreement, etc). This varies significantly from language to language, as does word order. For example in English, the sentence “We like them” follows a subject-verb-object order, while other languages might follow subject-object-verb (We them like), verb-subject-object (Like we them), or even other orderings. Because of these nuanced differences between languages, concatenating translated phrases into a sentence will almost always lead to localization errors.
The code example above contains actual translations we got back from our translation teams when we supplied them with “You’ve selected” and “Page Rules” as separate strings. Here’s how this sentence would look when rendered in the different languages:
Language
Translation
Japanese
選択しました { totalSelected } ページ ルール。
German
Sie haben ausgewählt { totalSelected } Page Rules
Portuguese (Brazil)
Você selecionou { totalSelected } Page Rules.
To compare, we also gave them the sentence as a single string using a placeholder for the variable, and here’s the result:
Language
Translation
Japanese
%{ totalSelected } 件のページ ルールを選択しました。
German
Sie haben %{ totalSelected } Page Rules ausgewählt.
Portuguese (Brazil)
Você selecionou %{ totalSelected } Page Rules.
As you can see, the translations differ for Japanese and German. We’ve got a localization bug on our hands.
So, In order to guarantee that translators will be able to convey the true meaning of your text with fidelity, it’s important to keep each sentence intact as a single externalized string. Our <Trans> component allows for easy injection of values into template strings which allows us to do exactly that:
This allows translators to have the full context of the sentence, ensuring that all words will be translated with the correct inflection.
You may have noticed another potential issue. What happens in this example when totalSelected is just 1? With the above code, the user would see “You’ve selected 1 Page Rules for deletion”. We need to conditionally pluralize the sentence based on the value of our dynamic data. This turns out to be a fairly common use case, and our <Trans> component handles this automatically via the smart_count feature:
Here, the singular and plural versions are delimited by ||||. <Trans> will automatically select the right translation to use depending on the value of the passed in totalSelected variable.
Yet another stumbling block occurs when markup is mixed in with a block of text we’d like to externalize as a single string. For example, what if you need some phrase in your sentence to be a link to another page?
<VerificationReminder>
Don't forget to <Link>verify your email address.</Link>
</VerificationReminder>
To solve for this use case, the <Trans> component allows for arbitrary elements to be injected into placeholders in a translation string, like so:
<VerificationReminder>
<Trans id="notification.email_verification" Components={[Link]} componentProps={[{ to: '/profile' }]} />
</VerificationReminder>
// catalog.json
{
"notification.email_verification": "Don't forget to <0>verify your email address.</0>",
// ...
}
In this example, the <Trans> component will replace placeholder elements (<0>,<1>, etc.) with instances of the component type located at that index in the Components array. It also passes along any data specified in componentProps to that instance. The example above would boil down to the following in React:
// en-US
<VerificationReminder>
Don't forget to <Link to="/profile">verify your email address.</Link>
</VerificationReminder>
// es-ES
<VerificationReminder>
No olvide <Link to="/profile">verificar la dirección de correo electrónico.</Link>
</VerificationReminder>
Safety third!
The functionality outlined above was enough for us to externalize our strings. However, it did at times result in bulky, repetitive code that was easy to mess up. A couple of pitfalls quickly became apparent.
The first was that small hardcoded strings were now easier to hide in plain sight, and because they weren’t glaringly obvious to a developer until the rest of the page had been translated, the feedback loop in finding these was often days or weeks. A common solution to surfacing these issues is introducing a pseudolocalization mode into your application during development which will transform all properly internationalized strings by replacing each character with a similar looking unicode character.
For example You've selected 3 Page Rules. might be transformed to Ýôú'Ʋè ƨèℓèçƭèδ 3 Þáϱè Rúℓèƨ.
Another handy feature at your disposal in a pseudolocalization mode is the ability to shrink or lengthen all strings by a fixed amount in order to plan for content width differences. Here’s the same pseudolocalized sentence increased in length by 50%: Ýôú'Ʋè ƨèℓèçƭèδ 3 Þáϱè Rúℓèƨ. ℓôřè₥ ïƥƨú₥ δô. This is useful in helping both engineers as well as designers spot places where content length could potentially be an issue. We first recognized this problem when rolling out support for German, which at times tends to have somewhat longer words than English.
This meant that in a lot of places the text in page elements would overflow, such as in this “Add” button:
There aren’t a lot of easy fixes for these types of problems that don’t compromise the user experience.
For best results, variable content width needs to be baked into the design itself. Since fixing these bugs often means sending it back upstream to request a new design, the process tends to be time consuming. If you haven’t given much thought to content design in general, an internationalization effort can be a good time to start. Having standards and consistency around the copy used for various elements in your app can not only cut down on the number of words that need translating, but also eliminate the need to think through the content length pitfalls of using a novel phrase.
The other pitfall we ran into was that the translation ids — especially long and repetitive ones — are highly susceptible to typos.
Pop quiz, which of these translation keys will break our app: traffic.load_balancing.analytics.filters.origin_health_title or traffic.load_balancing.analytics.filters.origin_heath_title?
Nestled among hundreds of other lines of changes, these are hard to spot in code review. Most apps have a fallback so missing translations don’t result in a page breaking error. As a result a bug like this might go unnoticed entirely if it’s hidden well enough (in say, a help text flyout).
Fortunately, with a growing percentage of our codebase in TypeScript, we were able to leverage the type-checker to give developers feedback as they wrote the code. Here’s an example where our code editor is helpfully showing us a red underline to indicate that the id property is invalid (due to the missing “l”):
Not only did it make the problems more obvious, but it also meant that violations would cause builds to fail, preventing bad code from entering the codebase.
Scaling locale files
In the beginning, you’ll probably start out with one translation file per locale that you support. In addition, the naming scheme you use for your keys can remain somewhat simple. As your app scales, your translation file will grow too large and need to be broken up into separate files. Files that are too large will overwhelm Translation Management applications, or if left unchecked, your code editor. All of our translation strings (not including keys), when lumped together into a single file, is around 50,000 words. For comparison, that’s roughly the same size as a copy of “The Hitchhiker’s Guide to the Galaxy” or “Slaughterhouse Five”.
We break up our translations into a number of “catalog” files roughly corresponding to feature verticals (like Firewall or Cloudflare Workers). This works out well for our developers since it provides a predictable place to find strings, and keeps the line count of a translation catalog down to a manageable length. It also works out well for the outside translation teams since a single feature vertical is a good unit of work for a translator (or small team).
In addition to per-feature catalogs, we have a common catalog file to hold strings that are re-used throughout the application. It allows us to keep ids short ( common.delete vs some_page.some_tab.some_feature.thing.delete ) and lowers the likelihood of duplication since developers habitually check the common catalog before adding new strings.
Libraries
So far we’ve talked at length about our <Trans> component and what it can do. Now, let’s talk about how it’s built.
Perhaps unsurprisingly, we didn’t want to reinvent the wheel and come up with a base i18n library from scratch. Due to prior efforts to internationalize the legacy parts of our application written in Backbone, we were already using Airbnb’s Polyglot library, a “tiny I18n helper library written in JavaScript” which, among other things, “provides a simple solution for interpolation and pluralization, based off of Airbnb’s experience adding I18n functionality to its Backbone.js and Node apps”.
We took a look at a few of the most popular libraries that had been purpose-built for internationalizing React applications, but ultimately decided to stick with Polyglot. We created our <Trans> component to bridge the gap to React. We chose this direction for a few reasons:
We didn’t want to re-internationalize the legacy code in our application in order to migrate to a new i18n support library.
We also didn’t want the combined overhead of supporting 2 different i18n schemes for new vs legacy code.
Writing our own trans component gave us the flexibility to write the interface we wanted. Since Trans is used just about everywhere, we wanted to make sure it was as ergonomic as possible to developers.
If you’re just getting started with i18n in a new React based web-app, react-intl and i18n-next are 2 popular libraries that supply a component similar to <Trans> described above.
The biggest pain point of the <Trans> component as outlined is that strings have to be kept in a separate file from your source code. Switching between multiple files as you author new code or modify existing features is just plain annoying. It’s even more annoying if the translation files are kept far away in the directory structure, as they often need to be.
There are some new i18n libraries such as jslingui that obviate this problem by taking an extraction based approach to handling translation catalogs. In this scheme, you still use a <Trans>component, but you keep your strings in the component itself, not a separate catalog:
<span>
<Trans>Hmm... We couldn't find any matching websites.</Trans>
</span>
A tool that you run at build time then does the work of finding all of these strings and extracting then into catalogs for you. For example, the above would result in the following generated catalogs:
// locales/en_US.json
{
"Hmm... We couldn't find any matching websites.": "Hmm... We couldn't find any matching websites.",
}
// locales/de_DE.json
{
"Hmm... We couldn't find any matching websites.": "Hmm... Wir konnten keine übereinstimmenden Websites finden."
}
The obvious advantage to this approach is that we no longer have separate files! The other advantage is that there’s no longer any need for type checking ids since typos can’t happen anymore.
However, at least for our use case, there were a few downsides.
First, human translators sometimes appreciate the context of the translation keys. It helps with organization, and it gives some clues about the string’s purpose.
And although we no longer have to worry about typos in translation ids, we’re just as susceptible to slight copy mismatches (ex. “Verify your email” vs “Verify your e-mail”). This is almost worse, since in this case it would introduce a near duplication which would be hard to detect. We’d also have to pay for it.
Whichever tech stack you’re working with, there are likely a few i18n libraries that can help you out. Which one to pick is highly dependent on technical constraints of your application and the context of your team’s goals and culture.
Numbers, Dates, and Times
Earlier when we talked about injecting data translated strings, we glossed over a major issue: the data we’re injecting may also need to be formatted to conform to the user’s local customs. This is true for dates, times, numbers, currencies and some other types of data.
Without proper formatting, this will appear correct for small numbers, but as soon as things get into the thousands, localization problems will arise, since the way that digits are grouped and separated with symbols varies by culture. Here’s how three-hundred thousand and three hundredths is formatted in a few different locales:
Language (Country)
Code
Formatted Date
German (Germany)
de-DE
300.000,03
English (US)
en-US
300,000.03
English (UK)
en-GB
300,000.03
Spanish (Spain)
es-ES
300.000,03
Spanish (Chile)
es-CL
300.000,03
French (France)
fr-FR
300 000,03
Hindi (India)
hi-IN
3,00,000.03
Indonesian (Indonesia)
in-ID
300.000,03
Japanese (Japan)
ja-JP
300,000.03
Korean (South Korea)
ko-KR
300,000.03
Portuguese (Brazil)
pt-BR
300.000,03
Portuguese (Portugal)
pt-PT
300 000,03
Russian (Russia)
ru-RU
300 000,03
The way that dates are formatted varies significantly from country to country. If you’ve developed your UI mainly with a US audience in mind, you’re probably displaying dates in a way that will feel foreign and perhaps un-intuitive to users from just about any other place in the world. Among other things, date formatting can vary in terms of separator choice, whether single digits are zero padded, and in the way that the day, month, and year portions are ordered. Here’s how the March 4th of the current year is formatted in a few different locales:
Language (Country)
Code
Formatted Date
German (Germany)
de-DE
4.3.2020
English (US)
en-US
3/4/2020
English (UK)
en-GB
04/03/2020
Spanish (Spain)
es-ES
4/3/2020
Spanish (Chile)
es-CL
04-03-2020
French (France)
fr-FR
04/03/2020
Hindi (India)
hi-IN
4/3/2020
Indonesian (Indonesia)
in-ID
4/3/2020
Japanese (Japan)
ja-JP
2020/3/4
Korean (South Korea)
ko-KR
2020. 3. 4.
Portuguese (Brazil)
pt-BR
04/03/2020
Portuguese (Portugal)
pt-PT
04/03/2020
Russian (Russia)
ru-RU
04.03.2020
Time format varies significantly as well. Here’s how time is formatted in a few selected locales:
Language (Country)
Code
Formatted Date
German (Germany)
de-DE
14:02:37
English (US)
en-US
2:02:37 PM
English (UK)
en-GB
14:02:37
Spanish (Spain)
es-ES
14:02:37
Spanish (Chile)
es-CL
14:02:37
French (France)
fr-FR
14:02:37
Hindi (India)
hi-IN
2:02:37 pm
Indonesian (Indonesia)
in-ID
14.02.37
Japanese (Japan)
ja-JP
14:02:37
Korean (South Korea)
ko-KR
오후 2:02:37
Portuguese (Brazil)
pt-BR
14:02:37
Portuguese (Portugal)
pt-PT
14:02:37
Russian (Russia)
ru-RU
14:02:37
Libraries for Handling Numbers, Dates, and Times
Ensuring the correct format for all these types of data for all supported locales is no easy task. Fortunately, there are a number of mature, battle-tested libraries that can help you out.
When we kicked off our project, we were using the Moment.js library extensively for date and time formatting. This handy library abstracts away the details of formatting dates to different lengths (“Jul 9th 20”, “July 9th 2020”, vs “Thursday”), displaying relative dates (“2 days ago”), amongst many other things. Since almost all of our dates were already being formatted via Moment.js for readability, and since Moment.js already has i18n support for a large number of locales, it meant that we were able to flip a couple of switches and have properly localized dates with very little effort.
There are some strong criticisms of Moment.js (mainly bloat), but ultimately the benefits realized from switching to a lower footprint alternative when compared to the cost it would take to redo every date and time didn’t add up.
Numbers were a very different story. We had, as you might imagine, thousands of raw, unformatted numbers being displayed throughout the dashboard. Hunting them down was a laborious and often manual process.
To handle the actual formatting of numbers, we used the Intl API (the Internationalization library defined by the ECMAScript standard):
var number = 300000.03;
var formatted = number.toLocaleString('hi-IN'); // 3,00,000.03
// This probably works in the browser you're using right now!
Fortunately, browser support for Intl has come quite a long way in recent years, with all modern browsers having full support.
Some modern JavaScript engines like V8 have even moved away from self-hosted JavaScript implementations of these libraries in favor of C++ based builtins, resulting in significant speedup.
Support for older browsers can be somewhat lacking however. Here’s a simple demo site ( source code) that’s built with Cloudflare Workersthat shows how dates, times, and numbers are rendered in a hand-full of locales.
Some combinations of old browsers and OS’s will yield less than ideal results. For example, here’s how the same dates and times from above are rendered on Windows 8 with IE 10:
If you need to support older browsers, this can be solved with a polyfill.
Translating
With all strings externalized, and all injected data being carefully formatted to locale specific standards, the bulk of the engineering work is complete. At this point, we can now claim that we’ve internationalized our application, since we’ve adapted it in a way that makes it easy to localize.
Next comes the process of localization where we actually create varying content based on the user’s language and cultural norms.
This is no small feat. Like we mentioned before, the strings in our application added together are the size of a small novel. It takes a significant amount of coordination and human expertise to create a translated copy that both captures the information with fidelity and speaks to the user in a familiar way.
There are many ways to handle the translation work: leveraging multi-lingual staff members, contracting the work out to individual translators, agencies, or even going all in and hiring teams of in-house translators. Whatever the case may be, there needs to be a smooth process for both workflow signalling and moving assets between the translation and development teams.
A healthy i18n program will provide developers with black-box interface with the process — they put new strings in a translation catalog file and commit the change, and without any more effort on their part, the feature code they wrote is available in production for all supported locales a few days later. Similarly, in a well run process translators will remain blissfully unaware of the particulars of the development process and application architecture. They receive files that easily load in their tools and clearly indicate what translation work needs to be done.
So, how does it actually work in practice?
We have a set of automated scripts that can be run on-demand by the localization team to package up a snapshot of our localization catalogs for all supported languages. During this process, a few things happen:
JSON files are generated from catalog files authored in TypeScript
If any new catalog files were added in English, placeholder copies are created for all other supported languages.
Placeholder strings are added for all languages when new strings are added to our base catalog
From there, the translation catalogs are uploaded to the Translation Management system via the UI or automated calls to the API. Before handing it off to translators, the files are pre-processed by comparing each new string against a Translation Memory (a cache of previously translated strings and substrings). If a match is found, the existing translation is used. Not only does this save cost by not re-translating strings, but it improves quality by ensuring that previously reviewed and approved translations are used when possible.
Suppose your locale files end up looking something like this:
Here, we have strings that are duplicated verbatim, as well as sub-strings that are copied. Translation services are billed by the word — you don’t want to pay for something twice and run the risk of a consistency issue arising. To this end, having a well-maintained Translation Memory will ensure that these strings are taken care of in the pre-translation steps before translators even see the file.
Once the translation job is marked as ready, it can take translation teams anywhere from hours to weeks to complete return translated copies depending on a number of factors such as the size of the job, the availability of translators, and the contract terms. The concerns of this phase could constitute another blog article of similar length: sourcing the right translation team, controlling costs, ensuring quality and consistency, making sure the company’s brand is properly conveyed, etc. Since the focus of this article is largely technical, we’ll gloss over the details here, but make no mistake — getting this part wrong will tank your entire effort, even if you’ve achieved your technical objectives.
After translation teams signal that new files are ready for pickup, the assets are pulled from the server and unpacked into their correct locations in the application code. We then run a suite of automated checks to make sure that all files are valid and free of any formatting issues.
An optional (but highly recommended) step takes place at this stage — in-context review. A team of translation reviewers then look at the translated output in context to make sure everything looks perfect in its finalized state. Having support staff that are both highly proficient with the product and fluent in the target language are especially useful in this effort. Shoutout to all our team members from around the company that have taken the time and effort to do this. To make this possible for outside contractors, we prepare special preview versions of our app that allow them to test with development mode locales enabled.
And there you have it, everything it takes to deliver a localized version of your application to your users all around the world.
Continual Localization
It would be great to stop here, but what we’ve discussed up until this point is the effort required to do it once. As we all know, code changes. New strings will be gradually added, modified, and deleted over the course of ti me as new features are launched and tweaked.
Since translation is a highly human process that often involves effort from people in different corners of the world, there is a lower bound to the timeframe in which turnover is possible. Since our release cadence (daily) is often faster than this turnover rate (2-5 days), it means that developers making changes to features have to make a choice: slow down to match this cadence, or ship slightly ahead of the localization schedule without full coverage.
In order to ensure that features shipping ahead of translations don’t cause application-breaking errors, we fallback to our base locale (en_US) if a string doesn’t exist for the configured language.
Some applications have a slightly different fallback behavior: displaying raw translation keys (perhaps you’ve seen some.funny.dot.delimited.string in an app you’re using). There’s a tradeoff between velocity and correctness here, and we chose to optimize for velocity and minimal overhead. In some apps correctness is important enough to slow down cadence for i18n. In our case it wasn’t.
Finishing Touches
There are a few more things we can do to optimize the user experience in our newly localized application.
First, we want to make sure there isn’t any performance degradation. If our application made the user fetch all of its translated strings before rendering the page, this would surely happen. So, in order to keep everything running smoothly, the translation catalogs are fetched asynchronously and only as the application needs them to render some content on the page. This is easy to accomplish nowadays with the code splitting features available in module bundlers that support dynamic import statements such as Parcel or Webpack.
We also want to eliminate any friction the user might experience with needing to constantly select their desired language when visiting different Cloudflare properties. To this end, we made sure that any language preference a user selects on our marketing site or our support site persists as they navigate to and from our dashboard (all links are in French to belabor the point).
What’s next?
It’s been an exciting journey, and we’ve learned a lot from the process. It’s difficult (perhaps impossible) to call an i18n project truly complete. Expanding into new languages will surface slippery bugs and expose new challenges. Budget pressure will challenge you to find ways of cutting costs and increasing efficiency. In addition, you will discover ways in which you can enhance the localized experience even more for users.
There’s a long list of things we’d like to improve upon, but here are some of the highlights:
Collation. String comparison is language sensitive, and as such, the code you’ve written to lexicographically sort lists and tables of data in your app is probably doing the wrong thing for some of your users. This is especially apparent in languages that use logographic writing systems (such as Chinese or Japanese) as opposed to languages that use alphabets (like English or Spanish).
Localizing API responses is harder than localizing static copy in your user interface, as it takes a coordinated effort between teams. In the age of microservices, finding a solution that works well across the myriad of tech stacks that power each service can be very challenging.
Localizing maps. We’ll be working on making sure all content in our map-based visualizations is translated.
Machine translation has come a long way in recent years, but not far enough to churn our translations unsupervised. We would however like to experiment more with using machine translation as a first pass that translation reviewers then edit for correctness and tone.
I hope you have enjoyed this overview of how Cloudflare internationalized and localized our dashboard. Check out our careers page for more information on full-time positions and internship roles across the globe.
DNS is the very first step in accessing any website, API, or pretty much anything on the Internet, which makes it mission-critical to keeping your site up and running. This week, we are launching two significant changes that allow our customers to better maintain and update their DNS records. For customers who use Cloudflare as their authoritative DNS provider, we’ve added a much asked for feature: confirmation to DNS record edits. For our secondary DNS customers, we’re excited to provide a brand new onboarding experience.
Confirm and Commit
One of the benefits of using Cloudflare DNS is that changes quickly propagate to our 200+ data centers. And I mean very quickly: DNS propagation typically takes <5 seconds worldwide. Our UI was set up to allow customers to edit records, click out of the input box, and boom! The record has propagated!
There are a lot of advantages to fast DNS, but there’s also one clear downside – it leaves room for fat fingering. What if you accidentally toggle the proxy icon, or mistype the content of your DNS record? This could result in users not being able to access your website or API and could cause a significant outage. To protect customers from these kinds of mistakes, we’ve added a Save button for DNS record changes.
Now editing records in the DNS table allows you to take an extra look before committing the change.
The new confirmation layout applies to all record types and affects any content, TTL, or proxy status changes.
Let us know what you think by filling out the feedback survey linked at the top of the DNS tab in the dashboard.
“Deep linking is the use of a hyperlink that links to a specific, generally searchable or indexed, piece of web content on a website (e.g. http://example.com/path/page), rather than the website’s home page (e.g., http://example.com). The URL contains all the information needed to point to a particular item.”
Why DeepLinks in Dashboard?
There are many user experiences in Cloudflare’s Dashboard that are enhanced by the use of deep linking, such as:
We’re able to direct users from marketing pages directly into the Dashboard so they can interact with new/changed features.
Troubleshooting docs can have clearer, more intently directions. e.g. “Enable SSL encryption here” vs “Log into the Dashboard, choose your account and zone, navigate to the security tab, change SSL encryption level, blah blah blah”.
One of the interesting challenges with deep linking in the Dashboard is that most interesting resources are “locked” behind the context of an account and a zone/domain/website. To illustrate this, look at a tree of possible URL paths into Cloudflare’s Dashboard:
You might notice that in order to deep link to anything more interesting than logging in, a deep linker will need to know a user’s account or zone beforehand. A troubleshooting doc might want to send a user to the Page Rules tab in Dashboard to help a user fix their zone, but the linker doesn’t know what that zone is.
Another highly desired feature was the ability for a deep link to scroll to a particular piece of content on a Dashboard page, making it even easier for users to navigate. Instead of a troubleshooting doc asking a user to fumble around to find a setting, we could helpfully scroll that setting right into view. Now that would be slick!
What do DeepLinks look like in Dashboard?
The solution we came up with involves 3 main parts:
Deep links URLs expose an intuitive schema for dynamic value resolution.
A React component, DeepLink, consolidates routing/resolving deep links.
A React component, ScrollAnchor, encapsulates a simple algorithm which scrolls its content into view when the DOM has “finished loading”.
Just to prove that it works, here’s a GIF of us deep linking to the “TLS 1.3” setting on the security settings page:
It works! I was asked to select one of my several accounts, then our DeepLink routing component was smart enough to know that I have only one zone within that account and auto-filled the rest of the URL path. After the page was fully loaded, we were automatically scrolled to the TLS 1.3 setting. If you’re curious how all of this works and want to jump into the nitty gritty details, read on!
How are DeepLinks exposed?
If you were paying attention to the URL bar in the GIF above, you already know what’s coming. In order to deal with dynamic account/zone resolution, a deep link can use a to query parameter to specify a path into Dashboard. I think it reads quite nicely:
This example is saying that we’d like to link to the “Edge Certificates” section of the “SSL-TLS” product for some account and some zone that a user needs to manually resolve, as you saw above. It’s easy to imagine removing “?to=/” to transform the link URL into the resolved one:
This link takes the user to the “Traffic” product tab for some zone inside of account 1234567890abcdef. Indeed, the :account and :zone symbols are placeholders for user-supplied values, but they can be replaced with any permutation of real, known values to speed up resolution time to provide a better UX.
DeepLink routing
These links are parsed and resolved in our top-level routing component, DeepLink. At a high level, this component contains a series of “resolvers” for unknown symbols that need automatic or user-interactive resolution (i.e. :account and :zone). But before we dive in, let’s take a step back and gain appreciation for how cool this component is.
Cloudflare’s Dashboard is a single page React app, which means we use React Router to create routing components that handle what’s rendered on different URLs:
When a page is loaded, a lot of things need to happen: API calls need to be made to fetch all the data needed to render a page, like account/user/zone info not cached in the browser. Many components need to be rendered. It turns out that we can improve the UX of many users by blocking React Router to make specific queries to our API instead of rendering an entire page that anecdotally fetches the information we need. For example, there’s no need to render a zone selection page if a user only has one zone, like in our GIF above ☝️.
Resolvers
When a deep link gets parsed and split into parts, the framework iterates over those parts and tries to build a URL string that is later used to redirect users to a specific location in the dashboard.
// to=/:account/:zone/traffic
// parts = [‘:account’, ‘:zone’, ‘traffic’]
for (const part of parts) {
// do something with each part
}
We can build up the dynamic URL by looking at prefixes. If a part starts with “:”, it’s considered a symbol that needs to be resolved. Everything else is a static string that just gets appended.
const resolvedParts: string[] = [];
// parts = [‘:account’, ‘:zone’, ‘traffic’]
for (let part of parts) {
if (part.startsWith(‘:’)) {
//resolve
}
resolvedParts.push(part);
}
const finalUrl = resolvedParts.join(‘/’);
Symbols are handled by functions we call “resolvers”. A resolver is a function that:
Is async.
Has a context parameter.
Always returns a string – the value it resolves to.
In JavaScript, async functions always return a promise. Return values that are not type of Promise are wrapped in a resolved promise implicitly. They also allow “await” to be used in them. The async/await syntax is used for resolvers so they can perform any kind of asynchronous work – such as calling the API, while being able to “pause” JavaScript with “await” until that asynchronous work is done.
Each dynamic symbol has its own resolver. We currently have two resolvers – for account and for zone.
const RESOLVERS: Resolvers = {
account: accountResolver,
zone: zoneResolver
};
const resolvedParts: string[] = [];
// parts = [‘:account’, ‘:zone’, ‘traffic’]
for (let part of parts) {
if (part.startsWith(‘:’)) {
// for :account, accountResolver is awaited and returns “abc123”
// for :zone, zoneResolver is awaited and returns “testsite.io”
part = await RESOLVERS[part.slice(1)];
}
resolvedParts.push(part);
}
const finalUrl = resolvedParts.join(‘/’);
The internal implementation is a little bit more complicated, but this is a rough overview of how our DeepLink works.
Resolver context
We mentioned that each resolver has a context parameter. Context is an object that is passed to resolvers from the DeepLink component and it contains a bunch of handy utilities that give resolvers control over any part of the app. For example, it has access to the Redux store (we use Redux.js in the Dashboard to help us manage the application’s state). It has access to previously resolved values, and to all other parts of the deep link. It also has functions to help with user interactions.
User interactions
In many cases, a resolver is not able to resolve without the user’s help. For example, if a user has multiple zones, the resolver working on :zone symbol needs to wait for the user to select a zone.
const zoneResolver: Resolver = async ctx => {
const zones = await fetchZone();
// Just one zone, :zone symbols can be resolved to zone.name without user’s help
if (zones.length === 1) return zones[0].name;
if (zones.length > 1) {
// need user’s help to pick a zone
}
};
We already have a page in the dashboard with a zone list that looks like this.
What we need to do is give the resolver the ability to somehow show this page, and wait for the result of the user’s interaction.You might be asking: “But how do we show this page? You just told me that DeepLink blocks the entire page!”That’s true!
We decided to block the React Router to prevent unnecessary API calls and DOM updates while a deep link is resolving. But there is no harm in showing some part of the UI, if needed. To be able to do that, we added two functions to context – unblockRouter and blockRouter. These functions just toggle the state that is gating our Router component.
const zoneResolver: Resolver = async ctx => {
// ...
if (zones.length > 1) {
// delegate to React Router to render the page with zone picker
ctx.unblockRouter();
// need users help to pick a zone
// block the router again
ctx.blockRouter();
}
};
Now, the last piece is to somehow observe user interactions from within the resolver. To be able to do that, we have written a powerful utility.
waitForPageAction
Resolvers are isolated functions that live outside of the application’s components. To be able to observe anything that happens in distant branches of React DOM, we created a function called waitForPageAction. This function takes two parameters:
1. pageToAwaitActionOn – URL string pointing to a page we want to await the user’s action on. For example, “dash.cloudflare.com/123abc”
2. actionType – Unique string describing the action. For example, ZONE_SELECTED.
As you may have guessed, waitForPageAction is an async function. It returns a promise that resolves with action metadata whenever that action happens on the page specified by pageToAwaitActionOn. The promise rejects when the user navigates away from pageToAwaitActionOn. Otherwise, it keeps waiting… forever.
This helps us to write a code that is very easy to understand.
const zoneResolver: Resolver = async ctx => {
// ...
if (zones.length > 1) {
// delegate to React Router to render the page with zone picker
ctx.unblockRouter();
// need users help to pick a zone. Wait for ‘ZONE_SELECTED’ action at ‘dash.cloudflare.com/abc123’
// action is an object with metadata about zone. It contains zoneName, which can be used in this resolver to resolve :zone symbol
const action = ctx.waitForPageAction(
‘dash.cloudflare.com/abc123’,
‘ZONE_SELECTED’
);
// block the router again
ctx.blockRouter();
return action.zoneName
}
};
How does waitForPageAction work?
As mentioned above, we use Redux to manage our state. The actionType parameter is nothing else than a type of Redux action. Whenever a zone is selected, React dispatches a Redux action in an onClick handler.
Now, how does waitForPageAction know that ZONE_SELECTED’ has been dispatched? Aren’t we supposed to write a reducer?!
Not really. waitForPageAction is not changing any state, it’s just an observer that resolves whenever some action, that is dispatched, satisfies a predicate. And Redux has an API to subscribe to any store changes – store.subscribe(listener).
The listener will be called any time an action is dispatched, and some part of the state tree may have changed. Unfortunately, the listener does not have access to the currently dispatched action. We can only read the current state.
Solution? Store the action in the Redux store!
Redux actions are just plain objects (mostly), and thus easy to serialize. We added a simple reducer that stores all actions in the Redux state.
Anytime an action is dispatched, we can read that action’s metadata in store.getState().lastAction. Now, we have everything we need to finally implement waitForPageAction.
export function waitForPageAction = (store: Store<DashState>) =>(
pageToAwaitActionOn: string,
actionType: string
) =>
new Promise<AnyAction>((resolve, reject) => {
// Subscribe to redux store
const unsubscribe = store.subscribe(() => {
const state = store.getState();
const currentPage = state.router.location.pathname;
const lastAction = state.lastAction;
if (currentPage !== pageToAwaitActionOn) {
// user navigated away -unsubscribe and reject
unsubscribe();
reject(‘User navigated away’);
} else if (lastAction.type === actionType) {
// Action types match! Unsubscribe and resolve with action object
unsubscribe();
resolve(lastAction);
}
});
});
The listener reads the current state and grabs the currentPage and lastAction data. If currentPage doesn’t match pageToAwaitActionOn, it means the user navigated away, and there’s no need to continue resolving the deep link – we unsubscribe, and reject the promise. Deep link resolvers are stopped, and React Router unblocked.
Else, if lastAction.type matches the actionType parameter, it means the action we are waiting on just happened! Unsubscribe, and resolve the promise with action metadata. The deep link keeps resolving.
That’s it! We also added a similar function – waitForAction – which does exactly the same thing, but is not restricted to a specific page.
ScrollAnchor component
We implemented a wrapper component ScrollAnchor that will scroll to its wrapped content, making our deep links even more targeted. A client would wrap some content like this:
We thought so too! But it turns out that there are a few problems that prevent this super simple approach:
The Dashboard’s fixed header
DOM updates after page load
Since the Dashboard contains a fixed header at the top of the page, we can’t simply anchor to any ID, since the content will be scrolled to the top of the browser window behind the header. Fortunately, there’s a simple CSS solution using negative margins:
This CSS trick alone would work for a static site with a fixed header, but the Dashboard is very dynamic. We found early on in testing that using a normal HTML ID anchor in a URL would cause the browser to jump to the tag on page load but the DOM would change in response to newly fetched information or re-rendering, and the anchored content would be pushed out of view.
A solution: scroll to the anchored content after the page content is fully loaded, i.e. after all API calls are resolved, spinners removed, content is rendered. Fortunately, there’s a good way to programmatically scroll a browser window: Element.scrollIntoView(). However, there isn’t a good way to tell when the DOM is finished changing, since it can be modified at any time after page load. Let’s consider two possible strategies for determining when to scroll anchored content into view.
Strategy #1: scroll after a fixed duration. If our goal is to make sure we only scroll to content after a page is “fully loaded”, we can simplify the problem by making some assumptions. Namely, we can assume a maximum amount of time it will take a given page to fetch resources from the backend and re-render the DOM. Let’s call this assumed max duration M milliseconds. We can then easily scroll to some content by running a timeout on page load:
setTimeout(() => scrollTo(htmlId), M)
The problem with this approach is that the DOM might finish updating before or after we scroll. We end up with vertical alignment problems (as the DOM is still settling) or a jarring, unexpected scroll (if we scroll long after the DOM is settled). Both options are bad UX, and in practice it’s difficult to choose a duration constant M that is “just right” for every single page.
Strategy #2: scroll after the DOM has “settled”. If we know that choosing a good duration M for every page isn’t practical, we should try to come up with an algorithm that can choose a better M:
Define an arbitrary threshold of DOM “busyness”, B milliseconds.
On page load, start a timer that will scroll to anchored content after B milliseconds.
If we observe any changes to the DOM, reset the timer.
Once the timer expires, we know that the DOM hasn’t changed in B milliseconds.
By varying our choice of B, we’re able to have some control over how long we’re willing to wait for a page to “finish loading”. If B is 0 milliseconds, we’ll scroll to the anchored content immediately. If it’s 1000 milliseconds, we’ll wait a full second after any DOM change before scrolling. This algorithm is more resilient than fixed threshold scrolling since it explicitly listens to the DOM, but the chosen threshold is somewhat arbitrary. After some trial and error loading a sample of Dashboard pages, we determined that a 500 millisecond busyness threshold was sufficient to allow all content to load onto a page. Here’s what the implementation looks like:
A key assumption is that API calls take roughly the same amount of time to resolve. If most fetches take 250ms to resolve but others take 1500ms, we might see that the DOM hasn’t been changed for a while and think that it’s settled. Who knew there would be so much work involved in scrolling!
Conclusion
There you have it. A fully-featured deep linking solution with an intuitive schema, React Router blocking, autofilling, and scrolling. Thanks for reading.
Color is my day-long obsession, joy and torment – Claude Monet
Over the last two years we’ve tried to improve our usage of color at Cloudflare. There were a number of forcing functions that made this work a priority. As a small team of designers and engineers we had inherited a bunch of design work that was a mix of values built by multiple teams. As a result it was difficult and unnecessarily time consuming to add new colors when building new components.
We also wanted to improve our accessibility. While we were doing pretty well, we had room for improvement, largely around how we used green. As our UI is increasingly centered around visualizations of large data sets we wanted to push the boundaries of making our analytics as visually accessible as possible.
Cloudflare had also undergone a rebrand around 2016. While our marketing site had rolled out an updated set of visuals, our product ui as well as a number of existing web properties were still using various versions of our old palette.
Cloudflare.com in 2016 & Cloudflare.com in 2019
Our product palette wasn’t well balanced by itself. Many colors had been chosen one or two at a time. You can see how we chose blueberry, ice, and water at a different point in time than marine and thunder.
The color section of our theme file was partially ordered chronologically
Lacking visual cohesion within our own product, we definitely weren’t providing a cohesive visual experience between our marketing site and our product. The transition from the nice blues and purples to our green CTAs wasn’t the streamlined experience we wanted to afford our users.
Cloudflare.com 2017 & Sign in page 2017Our app dashboard in 2017
Reworking our Palette
Our first step was to audit what we already had. Cloudflare has been around long enough to have more than one website. Beyond cloudflare.com we have dozens of web properties that are publicly accessible. From our community forums, support docs, blog, status page, to numerous micro-sites.
All-in-all we have dozens of front-end codebases that each represent one more chance to introduce entropy to our visual language. So we were curious to answer the question – what colors were we currently using? Were there consistent patterns we could document for further reuse? Could we build a living style guide that didn’t cover just one site, but all of them?
Screenshots of pages from cloudflare.com contrasted with screenshots from our product in 2017
Our curiosity got the best of us and we went about exploring ways we could visualize our design language across all of our sites.
As we first started to identify the scale of our color problems, we tried to think outside the box on how we might explore the problem space. After an initial brainstorming session we combined the Internet Archive’s Wayback Machine with the Css StatsAPI to build an audit tool that shows how our various websites visual properties change over time. We can dynamically select which sites we want to compare and scrub through time to see changes.
Below is a visualization of palettes from 9 different websites changing over a period of 6 years. Above the palettes is a component that spits out common colors, across all of these sites. The only two common colors across all properties (appearing for only a brief flash) were #ffffff (white) and transparent. Over time we haven’t been very consistent with ourselves.
If we drill in to look at our marketing site compared to our dashboard app – it looks like the video below. We see a bit more overlap at first and then a significant divergence at the 16 second mark when our product palette grew significantly. At the 22 second mark you can see the marketing palette completely change as a result of the rebrand while our product palette stays the same. As time goes on you can see us becoming more and more inconsistent across the two code bases.
As a product team we had some catching up to do improve our usage of color and to align ourselves with the company brand. The good news was, there was no where to go but up.
This style of historical audit gives us a visual indication with real data. We can visualize for stakeholders how consistent and similar our usage of color is across products and if we are getting better or worse over time. Having this type of feedback loop was invaluable for us – as auditing this manually is incredibly time consuming so it often doesn’t get done. Hopefully in the future as it’s standard to track various performance metrics over time at a company it will be standard to be able to visualize your current levels of design entropy.
Picking colors
After our initial audit revealed there wasn’t a lot of consistency across sites, we went to work to try and construct a color palette that could potentially be used for sites the product team owned. It was time to get our hands dirty and start “picking colors.”
Hindsight of course is always 20/20. We didn’t start out on day one trying to generate scales based on our brand palette. No, our first bright idea, was to generate the entire palette from a single color.
Our logo is made up of two oranges. Both of these seemed like prime candidates to generate a palette from.
We played around with a number of algorithms that took a single color and created a palette. From the initial color we generated an array scales for each hue. Initial attempts found us applying the exact same curves for luminosity to each hue, but as visual perception of hues is so different, this resulted in wildly different contrasts at each step of the scale.
We can see the intensity of the colors change across hue in peaks, with yellow and green being the most dominant. One of the downsides of this, is when you are rapidly iterating through theming options, the inconsistent relationships between steps across hues can make it time consuming or impossible to keep visual harmony in your interface.
The video below is another way to visualize this phenomenon. The dividing line in the color picker indicates which part of the palette will be accessible with black and white. Notice how drastically the line changes around green and yellow. And then look back at the charts above.
After fiddling with a few different generative algorithms (we made a lot of ugly palettes…) we decided to try a more manual approach. We pursued creating custom curves for each hue in an effort to keep the contrast scales as optically balanced as possible.
Heavily muted palette
Generating different color palettes makes you confront a basic question. How do you tell if a palette is good? Are some palettes better than others? In an effort to answer this question we constructed various feedback loops to help us evaluate palettes as quickly as possible. We tried a few methods to stress test a palette. At first we attempted to grab the “nearest color” for a bunch of our common UI colors. This wasn’t always helpful as sometimes you actually want the step above or below the closest existing color. But it was helpful to visualize for a few reasons.
Generated palette above a set of components previewing the old and new palette for comparison
Sometime during our exploration in this space, we stumbled across this tweet thread about building a palette for pixel art. There are a lot of places web and product designers can draw inspiration from game designers.
Here we see a similar concept where a number of different palettes are applied to the same component. This view shows us two things, the different ways a single palette can be applied to a sphere, and also the different aesthetics across color palettes.
It’s almost surprising that the default way to construct a color palette for apps and sites isn’t to build it while previewing its application against the most common UI patterns. As designers, there are a lot of consistent uses of color we could have baselines for. Many enterprise apps are centered around white background with blue as the primary color with mixtures of grays to add depth around cards and page sections. Red is often used for destructive actions like deleting some type of record. Gray for secondary actions. Maybe it’s an outline button with the primary color for secondary actions. Either way – the margins between the patterns aren’t that large in the grand scheme of things.
Consider the use case of designing UI while the palette or usage of color hasn’t been established. Given a single palette, you might want to experiment with applying that palette in a variety of ways that will output a wide variety of aesthetics. Alternatively you may need to test out several different palettes. These are two different modes of exploration that can be extremely time consuming to work through . It can be non-trivial to keep an in progress design synced with several different options for color application, even with the best use of layer comps or symbols.
How do we visualize the various ways a palette will look when applied to an interface? Here are examples of how palettes are shown on a palette list for pixel artists.
One method of visualization is to define a common set of primitive ui elements and show each one of them with a single set of colors applied. In isolation this can be helpful. This mode would make it easy to vet a single combination of colors and which ui elements it might be best applied to.
Alternatively we might want to see a composed interface with the closest colors from the palette applied. Consider a set of buttons that includes red, green, blue, and gray button styles. Seeing all of these together can help us visualize the relative nature of these buttons side by side. Given a baseline palette for common UI, we could swap to a new palette and replace each color with the “closest” color. This isn’t always a full-proof solution as there are many edge cases to cover. e.g. what happens when replacing a palette of 134 colors with a palette of 24 colors? Even still, this could allow us to quickly take a stab at automating how existing interfaces would change their appearance given a change to the underlying system. Whether locally or against a live site, this mode of working would allow for designers to view a color in multiple contets to truly asses its quality.
After moving on from the idea of generating a palette from a single color, we attempted to use our logo colors as well as our primary brand colors to drive the construction of modular scales. Our goal was to create a palette that would improve contrast for accessibility, stay true to our visual brand, work predictably for developers, work for data visualizations, and provide the ability to design visually balanced and attractive interfaces. No sweat.
Brand colors showing Hue and Saturation level
While we knew going in we might not use every step in every hue, we wanted full coverage across the spectrum so that each hue had a consistent optical difference between each step. We also had no idea which steps across which hues we were going to need just yet. As they would just be variables in a theme file it didn’t add any significant code footprint to expose the full generated palette either.
One of the more difficult parts, was deciding on a number of steps for the scales. This would allow us to edit the palette in the future to a variety of aesthetics and swap the palette out at the theme level without needing to update anything else.
In the future if when we did need to augment the available colors, we could edit the entire palette instead of adding a one-off addition as we had found this was a difficult way to work over time. In addition to our primary brand colors we also explored adding scales for yellow / gold, violet, teal as well as a gray scale.
The first interface we built for this work was to output all of the scales vertically, with their contrast scores with both white and black on the right hand side. To aid scannability we bolded the values that were above the 4.5 threshold. As we edited the curves, we could see how the contrast ratios were affected at each step. Below you can see an early starting point before the scales were balanced. Red has 6 accessible combos with white, while yellow only has 1. We initially explored having the gray scale be larger than the others.
Early iteration of palette preview during development
As both screen luminosity and ambient light can affect perception of color we developed on two monitors, one set to maximum and one set to minimum brightness levels. We also replicated the color scales with a grayscale filter immediately below to help illustrate visual contrast between steps AND across hues. Bouncing back and forth between the grayscale and saturated version of the scale serves as a great baseline reference. We found that going beyond 10 steps made it difficult to keep enough contrast between each step to keep them distinguishable from one another.
Monitors set to different luminosity & a close up of the visual difference
Taking a page from our game design friends – as we were balancing the scales and exploring how many steps we wanted in the scales, we were also stress testing the generated colors against various analytics components from our component library.
Our slightly random collection of grays had been a particular pain point as they appeared muddy in a number of places within our interface. For our new palette we used the slightest hint of blue to keep our grays consistent and just a bit off from being purely neutral.
Optically balanced scales
With a palette consisting of 90 colors, the amount of combinations and permutations that can be applied to data visualizations is vast and can result in a wide variety of aesthetic directions. The same palette applied to both line and bar charts with different data sets can look substantially different, enough that they might not be distinguishable as being the exact same palette. Working with some of our engineering counterparts, we built a pipeline that would put up the same components rendered against different data sets, to simulate the various shapes and sizes the graph elements would appear in. This allowed us to rapidly test the appearance of different palettes. This workflow gave us amazing insights into how a palette would look in our interface. No matter how many hours we spent staring at a palette, we couldn’t get an accurate sense of how the colors would look when composed within an interface.
Full theme minus green & blues and oranges
Grayscale example & monochromatic Indigo
Analytics charts with blues and purples & analytics charts with blues and greensAnalytics charts with a blues and oranges. Telling the colors of the lines apart is a different visual experience than separating out the dots in sequential order as they appear in the legend.
We experimented with a number of ideas on visualizing different sizes and shapes of colors and how they affected our perception of how much a color was changing element to element. In the first frame it is most difficult to tell the values at 2% and 6% apart given the size and shape of the elements.
Stress testing the application of a palette to many shapes and sizes
We’ve begun to package up some of this work into a web app others can use to create or import a palette and preview multiple depths of accessible combinations against a set of UI elements.
In an effort to make sure everything we are building will be visually accessible – we built a react component that will preview how a design would look if you were colorblind. The component overlays SVG filters to simulate alternate ways someone can perceive color.
Analytics component previewed against 8 different types of color blindness
While this is previewing an analytics component, really any component or page can be previewed with this method.
After quite a few iterations, we had finally come up with a color palette. Each scale was optically aligned with our brand colors. The 5th step in each scale is the closest to the original brand color, but adjusted slightly so it’s accessible with both black and white.
Our preview panel for palette development, showing a fully desaturated version of the palette for reference
A visual representation of how the legacy palette colors would translate to the new scales.
Getting a team of people to agree on a new color palette is a journey in and of itself. By the time you get everyone to consensus it’s tempting to just collapse into a heap and never think about colors ever again. Unfortunately the work doesn’t stop at this point. Now that we’ve picked our palette, it’s time to get it implemented so this bike shed is painted once and for all.
If you are porting an old legacy part of your app to be updated to the new style guide like we were, even the best color documentation can fall short in helping someone make the necessary changes.
We found it was more common than expected that engineers and designers wanted to know what the new version of a color they were familiar with was. During the transition between palettes we had an interface people could input any color and get the closest color within our palette.
There are times when migrating colors, the closest color isn’t actually what you want. Given the scenario where your brand color has changed from blue to purple, you might want to be porting all blues to the closest purple within the palette, not the closest blues which might still exist in your palette. To help visualize migrations as well as get suggestions on how to consolidate values within the old scale, we of course built a little tool. Here we can define those translations and import a color palette from a URL import. As we still have. a number of web properties to update to our palette, this simple tool has continued to prove useful.
We wanted to be as gentle as possible in transitioning to the new palette in usage. While the developers found string names for colors brittle and unpredictable, it was still a more familiar system for some than this new one. We first just added in our new palette to the existing theme for usage moving forward. Then we started to port colors for existing components and pages.
For our colleagues, we wrote out desired translations and offered warnings in the console that a color was deprecated, with a reference to the new theme value to use.
Example of console warning when using deprecated colorExample of how to check for usage of deprecated values
While we had a few bugs along the way, the team was supportive and helped us fix bugs almost as quickly as we could find them.
We’re still in the process of updating our web properties with our new palette, largely prioritizing accessibility first while trying to create a more consistent visual brand as a nice by-product of the work. A small example of this is our system status page. In the first image, the blue links in the header, the green status bar, and the about copy, were all inaccessible against their backgrounds.
cloudflarestatus.com in 2017 & cloudflarestatus.com in 2019 with an accessible green
A lot of the changes have been subtle. Most notably the green we use in the dashboard is a lot more inline with our brand colors than before. In addition we’ve also been able to add visual balance by not just using straight black text on background colors. Here we added one of the darker steps from the corresponding scale, to give it a bit more visual balance.
Notifications 2017 – Black text & Notifications 2018 – Updated palette. Text is set to 1st step in corresponding scale of background colorExample page within our Dashboard in 2017 vs 2019
While we aren’t perfect yet, we’re making progress towards more visual cohesion across our marketing materials and products.
2017
Cloudflare.com & Sign in page 2017Our app dashboard in 2017
2019
Cloudflare homepage & updated Dashboard in 2019
Next steps
Trying to keep dozens of sites all using the same palette in a consistent manner across time is a task that you can never complete. It’s an ongoing maintenance problem. Engineers familiar with the color system leave, new engineers join and need to learn how the system works. People still launch sites using a different palette that doesn’t meet accessibility standards. Our work continues to be cut out for us. As they say, a garden doesn’t tend itself.
If we do ever revisit our brand colors, we’re excited to have infrastructure in place to update our apps and several of our satellite sites with significantly less effort than our first time around.
Resources
Some of our favorite materials and resources we found while exploring this problem space.
Today, I’m very pleased to announce the release of a completely overhauled version of our Firewall Event log to our Free, Pro and Business customers. This new Firewall Events log is now available in your Dashboard, and you are not required to do anything to receive this new capability.
No more modals!
We have done away with those pesky modals, providing a much smoother user experience. To review more detailed information about an event, you simply click anywhere on the event list row.
In the expanded view, you are provided with all the information you may need to identify or diagnose issues with your Firewall or find more details about a potential threat to your application.
Additional matches per event
Cloudflare has several Firewall features to give customers granular control of their security. With this control comes some complexity when debugging why a request was stopped by the Firewall. To help clarify what happened, we have provided an “Additional matches” count at the bottom for events triggered by multiple services or rules for the same request. Clicking the number expands a list showing each rule and service along with the corresponding action.
Search for any field within a Firewall Event
This is one of my favourite parts of our new Firewall Event Log. Many of our customers have expressed their frustration with the difficulty of pinpointing specific events. This is where our new search capabilities come into their own. Customers can now filter and freeform search for any field that is visible in a Firewall Event!
Let’s say you want to find all the requests originating from a specific ISP or country where your Firewall Rules issued a JavaScript challenge. There are two different ways to do this in the UI.
Firstly, when in the detail view, you can create an include or exclude filter for that field value.
Secondly, you can create a freeform filter using the “+ Add Filter” button at the top, or edit one of the already filtered fields:
As illustrated above, with our WAF Managed Rules enabled in log only, we can see all the rules which would have triggered if this was a legitimate attack. This allows you to confirm that your configuration is working as expected.
Scoping your search to a specific date and time
In our old Firewall Event Log, to find an event, users had to traverse through many pages to find Events from a specific date. The last major change we have added is the capability to select a time window to view events between two points in time over the last 2 weeks. In the time selection window, Free and Pro customers can choose a 24 hour time window and our Business customers can view up to 72 hours.
We want your feedback!
We need your help! Please feel free to leave any feedback on our Community forums, or open a Support ticket with any problems you find. Your feedback is critical to our product improvement process, and we look forward to hearing from you.
Congratulations on making it through Speed Week. In the last week, Cloudflare has: described how our global network speeds up the Internet, launched a HTTP/2 prioritisation model that will improve web experiences on all browsers, launched an image resizing service which will deliver the optimal image to every device, optimized live video delivery, detailed how to stream progressive images so that they render twice as fast – using the flexibility of our new HTTP/2 prioritisation model and finally, prototyped a new over-the-wire format for JavaScript that could improve application start-up performance especially on mobile devices. As a bonus, we’re also rolling out one more new feature: “TCP Turbo” automatically chooses the TCP settings to further accelerate your website.
As a company, we want to help every one of our customers improve web experiences. The growth of Cloudflare, along with the increase in features, has often made simple questions difficult to answer:
How fast is my website?
How should I be thinking about performance features?
How much faster would the site be if I were to enable a particular feature?
This post will describe the exciting changes we have made to the Speed Page on the Cloudflare dashboard to give our customers a much clearer understanding of how their websites are performing and how they can be made even faster. The new Speed Page consists of :
A visual comparison of your website loading on Cloudflare, with caching enabled, compared to connecting directly to the origin.
The measured improvement expected if any performance feature is enabled.
A report describing how fast your website is on desktop and mobile.
We want to simplify the complexity of making web experiences fast and give our customers control. Take a look – We hope you like it.
Why do fast web experiences matter?
Customer experience : No one likes slow service. Imagine if you go to a restaurant and the service is slow, especially when you arrive; you are not likely to go back or recommend it to your friends. It turns out the web works in the same way and Internet customers are even more demanding. As many as 79% of customers who are “dissatisfied” with a website’s performance are less likely to buy from that site again.
Engagement and Revenue : There are many studies explaining how speed affects customer engagement, bounce rates and revenue.
Reputation : There is also brand reputation to consider as customers associate an online experience to the brand. A study found that for 66% of the sample website performance influences their impression of the company.
Diversity : Mobile traffic has grown to be larger than its desktop counterpart over the last few years. Mobile customers’ expectations have becoming increasingly demanding and expect seamless Internet access regardless of location.
Mobile provides a new set of challenges that includes the diversity of device specifications. When testing, be aware that the average mobile device is significantly less capable than the top-of-the-range models. For example, there can be orders-of-magnitude disparity in the time different mobile devices take to run JavaScript. Another challenge is the variance in mobile performance, as customers move from a strong, high quality office network to mobile networks of different speeds (3G/5G), and quality within the same browsing session.
New Speed Page
There is compelling evidence that a faster web experience is important for anyone online. Most of the major studies involve the largest tech companies, who have whole teams dedicated to measuring and improving web experiences for their own services. At Cloudflare we are on a mission to help build a better and faster Internet for everyone – not just the selected few.
Delivering fast web experiences is not a simple matter. That much is clear. To know what to send and when requires a deep understanding of every layer of the stack, from TCP tuning, protocol level prioritisation, content delivery formats through to the intricate mechanics of browser rendering. You will also need a global network that strives to be within 10 ms of every Internet user. The intrinsic value of such a network, should be clear to everyone. Cloudflare has this network, but it also offers many additional performance features.
With the Speed Page redesign, we are emphasizing the performance benefits of using Cloudflare and the additional improvements possible from our features.
The de facto standard for measuring website performance has been WebPageTest. Having its creator in-house at Cloudflare encouraged us to use it as the basis for website performance measurement. So, what is the easiest way to understand how a web page loads? A list of statistics do not paint a full picture of actual user experience. One of the cool features of WebPageTest is that it can generate a filmstrip of screen snapshots taken during a web page load, enabling us to quantify how a page loads, visually. This view makes it significantly easier to determine how long the page is blank for, and how long it takes for the most important content to render. Being able to look at the results in this way, provides the ability to empathise with the user.
How fast on Cloudflare ?
After moving your website to Cloudflare, you may have asked: How fast did this decision make my website? Well, now we provide the answer:
Comparison of website performance using Cloudflare.
As well as the increase in speed, we provide filmstrips of before and after, so that it is easy to compare and understand how differently a user will experience the website. If our tests are unable to reach your origin and you are already setup on Cloudflare, we will test with development mode enabled, which disables caching and minification.
Site performance statistics
How can we measure the user experience of a website?
Traditionally, page load was the important metric. Page load is a technical measurement used by browser vendors that has no bearing on the presentation or usability of a page. The metric reports on how long it takes not only to load the important content but also all of the 3rd party content (social network widgets, advertising, tracking scripts etc.). A user may very well not see anything until after all the page content has loaded, or they may be able to interact with a page immediately, while content continues to load.
A user will not decide whether a page is fast by a single measure or moment. A user will perceive how fast a website is from a combination of factors:
when they see any response
when they see the content they expect
when they can interact with the page
when they can perform the task they intended
Experience has shown that if you focus on one measure, it will likely be to the detriment of the others.
Importance of Visual response
If an impatient user navigates to your site and sees no content for several seconds or no valuable content, they are likely to get frustrated and leave. The paint timing spec defines a set of paint metrics, when content appears on a page, to measure the key moments in how a user perceives performance.
First Contentful Paint (FCP) is the time when the browser first renders any DOM content.
First Meaningful Paint (FMP) is the point in time when the page’s “primary” content appears on the screen. This metric should relate to what the user has come to the site to see and is designed as the point in time when the largest visible layout change happens.
Speed Index attempts to quantify the value of the filmstrip rather than using a single paint timing. The speed index measures the rate at which content is displayed – essentially the area above the curve. In the chart below from our progressive image feature you can see reaching 80% happens much earlier for the parallelized (red) load rather than the regular (blue).
Importance of interactivity
The same impatient user is now happy that the content they want to see has appeared. They will still become frustrated if they are unable to interact with the site. Time to Interactive is the time it takes for content to be rendered and the page is ready to receive input from the user. Technically this is defined as when the browser’s main processing thread has been idle for several seconds after first meaningful paint.
The Speed Tab displays these key metrics for mobile and desktop.
How much faster on Cloudflare ?
The Cloudflare Dashboard provides a list of performance features which can, admittedly, be both confusing and daunting. What would be the benefit of turning on Rocket Loader and on which performance metrics will it have the most impact ? If you upgrade to Pro what will be the value of the enhanced HTTP/2 prioritisation ? The optimization section answers these questions.
Tests are run with each performance feature turned on and off. The values for the tests for the appropriate performance metrics are displayed, along with the improvement. You can enable or upgrade the feature from this view. Here are a few examples :
If Rocket Loader were enabled for this website, the render-blocking JavaScript would be deferred causing first paint time to drop from 1.25s to 0.81s – an improvement of 32% on desktop.
Image heavy sites do not perform well on slow mobile connections. If you enable Mirage, your customers on 3G connections would see meaningful content 1s sooner – an improvement of 29.4%.
So how about our new features?
We tested the enhanced HTTP/2 prioritisation feature on an Edge browser on desktop and saw meaningful content display 2s sooner – an improvement of 64%.
This is a more interesting result taken from the blog example used to illustrate the progressive image streaming. At first glance the improvement of 29% in speed index is good. The filmstrip comparison shows a more significant difference. In this case the page with no images shown is already 43% visually complete for both scenarios after 1.5s. At 2.5s the difference is 77% compared to 50%.
This is a great example of how metrics do not tell the full story. They cannot completely replace viewing the page loading flow and understanding what is important for your site.
How to try
This is our first iteration of the new Speed Page and we are eager to get your feedback. We will be rolling this out to beta customers who are interested in seeing how their sites perform. To be added to the queue for activation of the new Speed Page please click on the banner on the overview page,
or click on the banner on the existing Speed Page.
Last year, we released Amazon Connect, a cloud-based contact center service that enables any business to deliver better customer service at low cost. This service is built based on the same technology that empowers Amazon customer service associates. Using this system, associates have millions of conversations with customers when they inquire about their shipping or order information. Because we made it available as an AWS service, you can now enable your contact center agents to make or receive calls in a matter of minutes. You can do this without having to provision any kind of hardware. 2
There are several advantages of building your contact center in the AWS Cloud, as described in our documentation. In addition, customers can extend Amazon Connect capabilities by using AWS products and the breadth of AWS services. In this blog post, we focus on how to get analytics out of the rich set of data published by Amazon Connect. We make use of an Amazon Connect data stream and create an end-to-end workflow to offer an analytical solution that can be customized based on need.
Solution overview
The following diagram illustrates the solution.
In this solution, Amazon Connect exports its contact trace records (CTRs) using Amazon Kinesis. CTRs are data streams in JSON format, and each has information about individual contacts. For example, this information might include the start and end time of a call, which agent handled the call, which queue the user chose, queue wait times, number of holds, and so on. You can enable this feature by reviewing our documentation.
In this architecture, we use Kinesis Firehose to capture Amazon Connect CTRs as raw data in an Amazon S3 bucket. We don’t use the recent feature added by Kinesis Firehose to save the data in S3 as Apache Parquet format. We use AWS Glue functionality to automatically detect the schema on the fly from an Amazon Connect data stream.
The primary reason for this approach is that it allows us to use attributes and enables an Amazon Connect administrator to dynamically add more fields as needed. Also by converting data to parquet in batch (every couple of hours) compression can be higher. However, if your requirement is to ingest the data in Parquet format on realtime, we recoment using Kinesis Firehose recently launched feature. You can review this blog post for further information.
By default, Firehose puts these records in time-series format. To make it easy for AWS Glue crawlers to capture information from new records, we use AWS Lambda to move all new records to a single S3 prefix called flatfiles. Our Lambda function is configured using S3 event notification. To comply with AWS Glue and Athena best practices, the Lambda function also converts all column names to lowercase. Finally, we also use the Lambda function to start AWS Glue crawlers. AWS Glue crawlers identify the data schema and update the AWS Glue Data Catalog, which is used by extract, transform, load (ETL) jobs in AWS Glue in the latter half of the workflow.
You can see our approach in the Lambda code following.
from __future__ import print_function
import json
import urllib
import boto3
import os
import re
s3 = boto3.resource('s3')
client = boto3.client('s3')
def convertColumntoLowwerCaps(obj):
for key in obj.keys():
new_key = re.sub(r'[\W]+', '', key.lower())
v = obj[key]
if isinstance(v, dict):
if len(v) > 0:
convertColumntoLowwerCaps(v)
if new_key != key:
obj[new_key] = obj[key]
del obj[key]
return obj
def lambda_handler(event, context):
bucket = event['Records'][0]['s3']['bucket']['name']
key = urllib.unquote_plus(event['Records'][0]['s3']['object']['key'].encode('utf8'))
try:
client.download_file(bucket, key, '/tmp/file.json')
with open('/tmp/out.json', 'w') as output, open('/tmp/file.json', 'rb') as file:
i = 0
for line in file:
for object in line.replace("}{","}\n{").split("\n"):
record = json.loads(object,object_hook=convertColumntoLowwerCaps)
if i != 0:
output.write("\n")
output.write(json.dumps(record))
i += 1
newkey = 'flatfiles/' + key.replace("/", "")
client.upload_file('/tmp/out.json', bucket,newkey)
s3.Object(bucket,key).delete()
return "success"
except Exception as e:
print(e)
print('Error coping object {} from bucket {}'.format(key, bucket))
raise e
We trigger AWS Glue crawlers based on events because this approach lets us capture any new data frame that we want to be dynamic in nature. CTR attributes are designed to offer multiple custom options based on a particular call flow. Attributes are essentially key-value pairs in nested JSON format. With the help of event-based AWS Glue crawlers, you can easily identify newer attributes automatically.
We recommend setting up an S3 lifecycle policy on the flatfiles folder that keeps records only for 24 hours. Doing this optimizes AWS Glue ETL jobs to process a subset of files rather than the entire set of records.
After we have data in the flatfiles folder, we use AWS Glue to catalog the data and transform it into Parquet format inside a folder called parquet/ctr/. The AWS Glue job performs the ETL that transforms the data from JSON to Parquet format. We use AWS Glue crawlers to capture any new data frame inside the JSON code that we want to be dynamic in nature. What this means is that when you add new attributes to an Amazon Connect instance, the solution automatically recognizes them and incorporates them in the schema of the results.
After AWS Glue stores the results in Parquet format, you can perform analytics using Amazon Redshift Spectrum, Amazon Athena, or any third-party data warehouse platform. To keep this solution simple, we have used Amazon Athena for analytics. Amazon Athena allows us to query data without having to set up and manage any servers or data warehouse platforms. Additionally, we only pay for the queries that are executed.
Try it out!
You can get started with our sample AWS CloudFormation template. This template creates the components starting from the Kinesis stream and finishes up with S3 buckets, the AWS Glue job, and crawlers. To deploy the template, open the AWS Management Console by clicking the following link.
In the console, specify the following parameters:
BucketName: The name for the bucket to store all the solution files. This name must be unique; if it’s not, template creation fails.
etlJobSchedule: The schedule in cron format indicating how often the AWS Glue job runs. The default value is every hour.
KinesisStreamName: The name of the Kinesis stream to receive data from Amazon Connect. This name must be different from any other Kinesis stream created in your AWS account.
s3interval: The interval in seconds for Kinesis Firehose to save data inside the flatfiles folder on S3. The value must between 60 and 900 seconds.
sampledata: When this parameter is set to true, sample CTR records are used. Doing this lets you try this solution without setting up an Amazon Connect instance. All examples in this walkthrough use this sample data.
Select the “I acknowledge that AWS CloudFormation might create IAM resources.” check box, and then choose Create. After the template finishes creating resources, you can see the stream name on the stack Outputs tab.
If you haven’t created your Amazon Connect instance, you can do so by following the Getting Started Guide. When you are done creating, choose your Amazon Connect instance in the console, which takes you to instance settings. Choose Data streaming to enable streaming for CTR records. Here, you can choose the Kinesis stream (defined in the KinesisStreamName parameter) that was created by the CloudFormation template.
Now it’s time to generate the data by making or receiving calls by using Amazon Connect. You can go to Amazon Connect Cloud Control Panel (CCP) to make or receive calls using a software phone or desktop phone. After a few minutes, we should see data inside the flatfiles folder. To make it easier to try this solution, we provide sample data that you can enable by setting the sampledata parameter to true in your CloudFormation template.
You can navigate to the AWS Glue console by choosing Jobs on the left navigation pane of the console. We can select our job here. In my case, the job created by CloudFormation is called glueJob-i3TULzVtP1W0; yours should be similar. You run the job by choosing Run job for Action.
After that, we wait for the AWS Glue job to run and to finish successfully. We can track the status of the job by checking the History tab.
When the job finishes running, we can check the Database section. There should be a new table created called ctr in Parquet format.
To query the data with Athena, we can select the ctr table, and for Action choose View data.
Doing this takes us to the Athena console. If you run a query, Athena shows a preview of the data.
When we can query the data using Athena, we can visualize it using Amazon QuickSight. Before connecting Amazon QuickSight to Athena, we must make sure to grant Amazon QuickSight access to Athena and the associated S3 buckets in the account. For more information on doing this, see Managing Amazon QuickSight Permissions to AWS Resources in the Amazon QuickSight User Guide. We can then create a new data set in Amazon QuickSight based on the Athena table that was created.
After setting up permissions, we can create a new analysis in Amazon QuickSight by choosing New analysis.
Then we add a new data set.
We choose Athena as the source and give the data source a name (in this case, I named it connectctr).
Choose the name of the database and the table referencing the Parquet results.
Then choose Visualize.
After that, we should see the following screen.
Now we can create some visualizations. First, search for the agent.username column, and drag it to the AutoGraph section.
We can see the agents and the number of calls for each, so we can easily see which agents have taken the largest amount of calls. If we want to see from what queues the calls came for each agent, we can add the queue.arn column to the visual.
After following all these steps, you can use Amazon QuickSight to add different columns from the call records and perform different types of visualizations. You can build dashboards that continuously monitor your connect instance. You can share those dashboards with others in your organization who might need to see this data.
Conclusion
In this post, you see how you can use services like AWS Lambda, AWS Glue, and Amazon Athena to process Amazon Connect call records. The post also demonstrates how to use AWS Lambda to preprocess files in Amazon S3 and transform them into a format that recognized by AWS Glue crawlers. Finally, the post shows how to used Amazon QuickSight to perform visualizations.
You can use the provided template to analyze your own contact center instance. Or you can take the CloudFormation template and modify it to process other data streams that can be ingested using Amazon Kinesis or stored on Amazon S3.
Luis Caro is a Big Data Consultant for AWS Professional Services. He works with our customers to provide guidance and technical assistance on big data projects, helping them improving the value of their solutions when using AWS.
Peter Dalbhanjan is a Solutions Architect for AWS based in Herndon, VA. Peter has a keen interest in evangelizing AWS solutions and has written multiple blog posts that focus on simplifying complex use cases. At AWS, Peter helps with designing and architecting variety of customer workloads.
The German charity Save Nemo works to protect coral reefs, and they are developing Nemo-Pi, an underwater “weather station” that monitors ocean conditions. Right now, you can vote for Save Nemo in the Google.org Impact Challenge.
Save Nemo
The organisation says there are two major threats to coral reefs: divers, and climate change. To make diving saver for reefs, Save Nemo installs buoy anchor points where diving tour boats can anchor without damaging corals in the process.
In addition, they provide dos and don’ts for how to behave on a reef dive.
The Nemo-Pi
To monitor the effects of climate change, and to help divers decide whether conditions are right at a reef while they’re still on shore, Save Nemo is also in the process of perfecting Nemo-Pi.
This Raspberry Pi-powered device is made up of a buoy, a solar panel, a GPS device, a Pi, and an array of sensors. Nemo-Pi measures water conditions such as current, visibility, temperature, carbon dioxide and nitrogen oxide concentrations, and pH. It also uploads its readings live to a public webserver.
The Save Nemo team is currently doing long-term tests of Nemo-Pi off the coast of Thailand and Indonesia. They are also working on improving the device’s power consumption and durability, and testing prototypes with the Raspberry Pi Zero W.
The web dashboard showing live Nemo-Pi data
Long-term goals
Save Nemo aims to install a network of Nemo-Pis at shallow reefs (up to 60 metres deep) in South East Asia. Then diving tour companies can check the live data online and decide day-to-day whether tours are feasible. This will lower the impact of humans on reefs and help the local flora and fauna survive.
A healthy coral reef
Nemo-Pi data may also be useful for groups lobbying for reef conservation, and for scientists and activists who want to shine a spotlight on the awful effects of climate change on sea life, such as coral bleaching caused by rising water temperatures.
A bleached coral reef
Vote now for Save Nemo
If you want to help Save Nemo in their mission today, vote for them to win the Google.org Impact Challenge:
Click “Abstimmen” in the footer of the page to vote
Click “JA” in the footer to confirm
Voting is open until 6 June. You can also follow Save Nemo on Facebook or Twitter. We think this organisation is doing valuable work, and that their projects could be expanded to reefs across the globe. It’s fantastic to see the Raspberry Pi being used to help protect ocean life.
Amazon QuickSight is a fully managed cloud business intelligence system that gives you Fast & Easy to Use Business Analytics for Big Data. QuickSight makes business analytics available to organizations of all shapes and sizes, with the ability to access data that is stored in your Amazon Redshift data warehouse, your Amazon Relational Database Service (RDS) relational databases, flat files in S3, and (via connectors) data stored in on-premises MySQL, PostgreSQL, and SQL Server databases. QuickSight scales to accommodate tens, hundreds, or thousands of users per organization.
Today we are launching a new, session-based pricing option for QuickSight, along with additional region support and other important new features. Let’s take a look at each one:
Pay-per-Session Pricing Our customers are making great use of QuickSight and take full advantage of the power it gives them to connect to data sources, create reports, and and explore visualizations.
However, not everyone in an organization needs or wants such powerful authoring capabilities. Having access to curated data in dashboards and being able to interact with the data by drilling down, filtering, or slicing-and-dicing is more than adequate for their needs. Subscribing them to a monthly or annual plan can be seen as an unwarranted expense, so a lot of such casual users end up not having access to interactive data or BI.
In order to allow customers to provide all of their users with interactive dashboards and reports, the Enterprise Edition of Amazon QuickSight now allows Reader access to dashboards on a Pay-per-Session basis. QuickSight users are now classified as Admins, Authors, or Readers, with distinct capabilities and prices:
Authors have access to the full power of QuickSight; they can establish database connections, upload new data, create ad hoc visualizations, and publish dashboards, all for $9 per month (Standard Edition) or $18 per month (Enterprise Edition).
Readers can view dashboards, slice and dice data using drill downs, filters and on-screen controls, and download data in CSV format, all within the secure QuickSight environment. Readers pay $0.30 for 30 minutes of access, with a monthly maximum of $5 per reader.
Admins have all authoring capabilities, and can manage users and purchase SPICE capacity in the account. The QuickSight admin now has the ability to set the desired option (Author or Reader) when they invite members of their organization to use QuickSight. They can extend Reader invites to their entire user base without incurring any up-front or monthly costs, paying only for the actual usage.
A New Region QuickSight is now available in the Asia Pacific (Tokyo) Region:
The UI is in English, with a localized version in the works.
Hourly Data Refresh Enterprise Edition SPICE data sets can now be set to refresh as frequently as every hour. In the past, each data set could be refreshed up to 5 times a day. To learn more, read Refreshing Imported Data.
Access to Data in Private VPCs This feature was launched in preview form late last year, and is now available in production form to users of the Enterprise Edition. As I noted at the time, you can use it to implement secure, private communication with data sources that do not have public connectivity, including on-premises data in Teradata or SQL Server, accessed over an AWS Direct Connect link. To learn more, read Working with AWS VPC.
Parameters with On-Screen Controls QuickSight dashboards can now include parameters that are set using on-screen dropdown, text box, numeric slider or date picker controls. The default value for each parameter can be set based on the user name (QuickSight calls this a dynamic default). You could, for example, set an appropriate default based on each user’s office location, department, or sales territory. Here’s an example:
URL Actions for Linked Dashboards You can now connect your QuickSight dashboards to external applications by defining URL actions on visuals. The actions can include parameters, and become available in the Details menu for the visual. URL actions are defined like this:
You can use this feature to link QuickSight dashboards to third party applications (e.g. Salesforce) or to your own internal applications. Read Custom URL Actions to learn how to use this feature.
Dashboard Sharing You can now share QuickSight dashboards across every user in an account.
Larger SPICE Tables The per-data set limit for SPICE tables has been raised from 10 GB to 25 GB.
Upgrade to Enterprise Edition The QuickSight administrator can now upgrade an account from Standard Edition to Enterprise Edition with a click. This enables provisioning of Readers with pay-per-session pricing, private VPC access, row-level security for dashboards and data sets, and hourly refresh of data sets. Enterprise Edition pricing applies after the upgrade.
Available Now Everything I listed above is available now and you can start using it today!
This post is courtesy of Otavio Ferreira, Manager, Amazon SNS, AWS Messaging.
Amazon SNS message filtering provides a set of string and numeric matching operators that allow each subscription to receive only the messages of interest. Hence, SNS message filtering can simplify your pub/sub messaging architecture by offloading the message filtering logic from your subscriber systems, as well as the message routing logic from your publisher systems.
After you set the subscription attribute that defines a filter policy, the subscribing endpoint receives only the messages that carry attributes matching this filter policy. Other messages published to the topic are filtered out for this subscription. In this way, the native integration between SNS and Amazon CloudWatch provides visibility into the number of messages delivered, as well as the number of messages filtered out.
CloudWatch metrics are captured automatically for you. To get started with SNS message filtering, see Filtering Messages with Amazon SNS.
Message Filtering Metrics
The following six CloudWatch metrics are relevant to understanding your SNS message filtering activity:
NumberOfMessagesPublished – Inbound traffic to SNS. This metric tracks all the messages that have been published to the topic.
NumberOfNotificationsDelivered – Outbound traffic from SNS. This metric tracks all the messages that have been successfully delivered to endpoints subscribed to the topic. A delivery takes place either when the incoming message attributes match a subscription filter policy, or when the subscription has no filter policy at all, which results in a catch-all behavior.
NumberOfNotificationsFilteredOut – This metric tracks all the messages that were filtered out because they carried attributes that didn’t match the subscription filter policy.
NumberOfNotificationsFilteredOut-NoMessageAttributes – This metric tracks all the messages that were filtered out because they didn’t carry any attributes at all and, consequently, didn’t match the subscription filter policy.
NumberOfNotificationsFilteredOut-InvalidAttributes – This metric keeps track of messages that were filtered out because they carried invalid or malformed attributes and, thus, didn’t match the subscription filter policy.
NumberOfNotificationsFailed – This last metric tracks all the messages that failed to be delivered to subscribing endpoints, regardless of whether a filter policy had been set for the endpoint. This metric is emitted after the message delivery retry policy is exhausted, and SNS stops attempting to deliver the message. At that moment, the subscribing endpoint is likely no longer reachable. For example, the subscribing SQS queue or Lambda function has been deleted by its owner. You may want to closely monitor this metric to address message delivery issues quickly.
Message filtering graphs
Through the AWS Management Console, you can compose graphs to display your SNS message filtering activity. The graph shows the number of messages published, delivered, and filtered out within the timeframe you specify (1h, 3h, 12h, 1d, 3d, 1w, or custom).
To compose an SNS message filtering graph with CloudWatch:
Open the CloudWatch console.
Choose Metrics, SNS, All Metrics, and Topic Metrics.
Select all metrics to add to the graph, such as:
NumberOfMessagesPublished
NumberOfNotificationsDelivered
NumberOfNotificationsFilteredOut
Choose Graphed metrics.
In the Statistic column, switch from Average to Sum.
Title your graph with a descriptive name, such as “SNS Message Filtering”
After you have your graph set up, you may want to copy the graph link for bookmarking, emailing, or sharing with co-workers. You may also want to add your graph to a CloudWatch dashboard for easy access in the future. Both actions are available to you on the Actions menu, which is found above the graph.
Summary
SNS message filtering defines how SNS topics behave in terms of message delivery. By using CloudWatch metrics, you gain visibility into the number of messages published, delivered, and filtered out. This enables you to validate the operation of filter policies and more easily troubleshoot during development phases.
SNS message filtering can be implemented easily with existing AWS SDKs by applying message and subscription attributes across all SNS supported protocols (Amazon SQS, AWS Lambda, HTTP, SMS, email, and mobile push). CloudWatch metrics for SNS message filtering is available now, in all AWS Regions.
The adoption of Apache Spark has increased significantly over the past few years, and running Spark-based application pipelines is the new normal. Spark jobs that are in an ETL (extract, transform, and load) pipeline have different requirements—you must handle dependencies in the jobs, maintain order during executions, and run multiple jobs in parallel. In most of these cases, you can use workflow scheduler tools like Apache Oozie, Apache Airflow, and even Cron to fulfill these requirements.
Apache Oozie is a widely used workflow scheduler system for Hadoop-based jobs. However, its limited UI capabilities, lack of integration with other services, and heavy XML dependency might not be suitable for some users. On the other hand, Apache Airflow comes with a lot of neat features, along with powerful UI and monitoring capabilities and integration with several AWS and third-party services. However, with Airflow, you do need to provision and manage the Airflow server. The Cron utility is a powerful job scheduler. But it doesn’t give you much visibility into the job details, and creating a workflow using Cron jobs can be challenging.
What if you have a simple use case, in which you want to run a few Spark jobs in a specific order, but you don’t want to spend time orchestrating those jobs or maintaining a separate application? You can do that today in a serverless fashion using AWS Step Functions. You can create the entire workflow in AWS Step Functions and interact with Spark on Amazon EMR through Apache Livy.
In this post, I walk you through a list of steps to orchestrate a serverless Spark-based ETL pipeline using AWS Step Functions and Apache Livy.
Input data
For the source data for this post, I use the New York City Taxi and Limousine Commission (TLC) trip record data. For a description of the data, see this detailed dictionary of the taxi data. In this example, we’ll work mainly with the following three columns for the Spark jobs.
Column name
Column description
RateCodeID
Represents the rate code in effect at the end of the trip (for example, 1 for standard rate, 2 for JFK airport, 3 for Newark airport, and so on).
FareAmount
Represents the time-and-distance fare calculated by the meter.
TripDistance
Represents the elapsed trip distance in miles reported by the taxi meter.
The trip data is in comma-separated values (CSV) format with the first row as a header. To shorten the Spark execution time, I trimmed the large input data to only 20,000 rows. During the deployment phase, the input file tripdata.csv is stored in Amazon S3 in the <<your-bucket>>/emr-step-functions/input/ folder.
The following image shows a sample of the trip data:
Solution overview
The next few sections describe how Spark jobs are created for this solution, how you can interact with Spark using Apache Livy, and how you can use AWS Step Functions to create orchestrations for these Spark applications.
At a high level, the solution includes the following steps:
Trigger the AWS Step Function state machine by passing the input file path.
The first stage in the state machine triggers an AWS Lambda
The Lambda function interacts with Apache Spark running on Amazon EMR using Apache Livy, and submits a Spark job.
The state machine waits a few seconds before checking the Spark job status.
Based on the job status, the state machine moves to the success or failure state.
Subsequent Spark jobs are submitted using the same approach.
The state machine waits a few seconds for the job to finish.
The job finishes, and the state machine updates with its final status.
Let’s take a look at the Spark application that is used for this solution.
Spark jobs
For this example, I built a Spark jar named spark-taxi.jar. It has two different Spark applications:
MilesPerRateCode – The first job that runs on the Amazon EMR cluster. This job reads the trip data from an input source and computes the total trip distance for each rate code. The output of this job consists of two columns and is stored in Apache Parquet format in the output path.
The following are the expected output columns:
rate_code – Represents the rate code for the trip.
total_distance – Represents the total trip distance for that rate code (for example, sum(trip_distance)).
RateCodeStatus – The second job that runs on the EMR cluster, but only if the first job finishes successfully. This job depends on two different input sets:
csv – The same trip data that is used for the first Spark job.
miles-per-rate – The output of the first job.
This job first reads the tripdata.csv file and aggregates the fare_amount by the rate_code. After this point, you have two different datasets, both aggregated by rate_code. Finally, the job uses the rate_code field to join two datasets and output the entire rate code status in a single CSV file.
The output columns are as follows:
rate_code_id – Represents the rate code type.
total_distance – Derived from first Spark job and represents the total trip distance.
total_fare_amount – A new field that is generated during the second Spark application, representing the total fare amount by the rate code type.
Note that in this case, you don’t need to run two different Spark jobs to generate that output. The goal of setting up the jobs in this way is just to create a dependency between the two jobs and use them within AWS Step Functions.
Both Spark applications take one input argument called rootPath. It’s the S3 location where the Spark job is stored along with input and output data. Here is a sample of the final output:
The next section discusses how you can use Apache Livy to interact with Spark applications that are running on Amazon EMR.
Using Apache Livy to interact with Apache Spark
Apache Livy provides a REST interface to interact with Spark running on an EMR cluster. Livy is included in Amazon EMR release version 5.9.0 and later. In this post, I use Livy to submit Spark jobs and retrieve job status. When Amazon EMR is launched with Livy installed, the EMR master node becomes the endpoint for Livy, and it starts listening on port 8998 by default. Livy provides APIs to interact with Spark.
Let’s look at a couple of examples how you can interact with Spark running on Amazon EMR using Livy.
To list active running jobs, you can execute the following from the EMR master node:
curl localhost:8998/sessions
If you want to do the same from a remote instance, just change localhost to the EMR hostname, as in the following (port 8998 must be open to that remote instance through the security group):
Through Spark submit, you can pass multiple arguments for the Spark job and Spark configuration settings. You can also do that using Livy, by passing the S3 path through the args parameter, as shown following:
For a detailed list of Livy APIs, see the Apache Livy REST API page. This post uses GET /batches and POST /batches.
In the next section, you create a state machine and orchestrate Spark applications using AWS Step Functions.
Using AWS Step Functions to create a Spark job workflow
AWS Step Functions automatically triggers and tracks each step and retries when it encounters errors. So your application executes in order and as expected every time. To create a Spark job workflow using AWS Step Functions, you first create a Lambda state machine using different types of states to create the entire workflow.
First, you use the Task state—a simple state in AWS Step Functions that performs a single unit of work. You also use the Wait state to delay the state machine from continuing for a specified time. Later, you use the Choice state to add branching logic to a state machine.
The following is a quick summary of how to use different states in the state machine to create the Spark ETL pipeline:
Task state – Invokes a Lambda function. The first Task state submits the Spark job on Amazon EMR, and the next Task state is used to retrieve the previous Spark job status.
Wait state – Pauses the state machine until a job completes execution.
Choice state – Each Spark job execution can return a failure, an error, or a success state So, in the state machine, you use the Choice state to create a rule that specifies the next action or step based on the success or failure of the previous step.
Here is one of my Task states, MilesPerRateCode, which simply submits a Spark job:
"MilesPerRate Job": {
"Type": "Task",
"Resource":"arn:aws:lambda:us-east-1:xxxxxx:function:blog-miles-per-rate-job-submit-function",
"ResultPath": "$.jobId",
"Next": "Wait for MilesPerRate job to complete"
}
This Task state configuration specifies the Lambda function to execute. Inside the Lambda function, it submits a Spark job through Livy using Livy’s POST API. Using ResultPath, it tells the state machine where to place the result of the executing task. As discussed in the previous section, Spark submit returns the session ID, which is captured with $.jobId and used in a later state.
The following code section shows the Lambda function, which is used to submit the MilesPerRateCode job. It uses the Python request library to submit a POST against the Livy endpoint hosted on Amazon EMR and passes the required parameters in JSON format through payload. It then parses the response, grabs id from the response, and returns it. The Next field tells the state machine which state to go to next.
Just like in the MilesPerRate job, another state submits the RateCodeStatus job, but it executes only when all previous jobs have completed successfully.
Here is the Task state in the state machine that checks the Spark job status:
Just like other states, the preceding Task executes a Lambda function, captures the result (represented by jobStatus), and passes it to the next state. The following is the Lambda function that checks the Spark job status based on a given session ID:
In the Choice state, it checks the Spark job status value, compares it with a predefined state status, and transitions the state based on the result. For example, if the status is success, move to the next state (RateCodeJobStatus job), and if it is dead, move to the MilesPerRate job failed state.
To set up this entire solution, you need to create a few AWS resources. To make it easier, I have created an AWS CloudFormation template. This template creates all the required AWS resources and configures all the resources that are needed to create a Spark-based ETL pipeline on AWS Step Functions.
This CloudFormation template requires you to pass the following four parameters during initiation.
Parameter
Description
ClusterSubnetID
The subnet where the Amazon EMR cluster is deployed and Lambda is configured to talk to this subnet.
KeyName
The name of the existing EC2 key pair to access the Amazon EMR cluster.
VPCID
The ID of the virtual private cloud (VPC) where the EMR cluster is deployed and Lambda is configured to talk to this VPC.
S3RootPath
The Amazon S3 path where all required files (input file, Spark job, and so on) are stored and the resulting data is written.
IMPORTANT: These templates are designed only to show how you can create a Spark-based ETL pipeline on AWS Step Functions using Apache Livy. They are not intended for production use without modification. And if you try this solution outside of the us-east-1 Region, download the necessary files from s3://aws-data-analytics-blog/emr-step-functions, upload the files to the buckets in your Region, edit the script as appropriate, and then run it.
To launch the CloudFormation stack, choose Launch Stack:
Launching this stack creates the following list of AWS resources.
Logical ID
Resource Type
Description
StepFunctionsStateExecutionRole
IAM role
IAM role to execute the state machine and have a trust relationship with the states service.
SparkETLStateMachine
AWS Step Functions state machine
State machine in AWS Step Functions for the Spark ETL workflow.
LambdaSecurityGroup
Amazon EC2 security group
Security group that is used for the Lambda function to call the Livy API.
RateCodeStatusJobSubmitFunction
AWS Lambda function
Lambda function to submit the RateCodeStatus job.
MilesPerRateJobSubmitFunction
AWS Lambda function
Lambda function to submit the MilesPerRate job.
SparkJobStatusFunction
AWS Lambda function
Lambda function to check the Spark job status.
LambdaStateMachineRole
IAM role
IAM role for all Lambda functions to use the lambda trust relationship.
EMRCluster
Amazon EMR cluster
EMR cluster where Livy is running and where the job is placed.
During the AWS CloudFormation deployment phase, it sets up S3 paths for input and output. Input files are stored in the <<s3-root-path>>/emr-step-functions/input/ path, whereas spark-taxi.jar is copied under <<s3-root-path>>/emr-step-functions/.
The following screenshot shows how the S3 paths are configured after deployment. In this example, I passed a bucket that I created in the AWS account s3://tm-app-demos for the S3 root path.
If the CloudFormation template completed successfully, you will see Spark-ETL-State-Machine in the AWS Step Functions dashboard, as follows:
Choose the Spark-ETL-State-Machine state machine to take a look at this implementation. The AWS CloudFormation template built the entire state machine along with its dependent Lambda functions, which are now ready to be executed.
On the dashboard, choose the newly created state machine, and then choose New execution to initiate the state machine. It asks you to pass input in JSON format. This input goes to the first state MilesPerRate Job, which eventually executes the Lambda function blog-miles-per-rate-job-submit-function.
Pass the S3 root path as input:
{
“rootPath”: “s3://tm-app-demos”
}
Then choose Start Execution:
The rootPath value is the same value that was passed when creating the CloudFormation stack. It can be an S3 bucket location or a bucket with prefixes, but it should be the same value that is used for AWS CloudFormation. This value tells the state machine where it can find the Spark jar and input file, and where it will write output files. After the state machine starts, each state/task is executed based on its definition in the state machine.
At a high level, the following represents the flow of events:
Execute the first Spark job, MilesPerRate.
The Spark job reads the input file from the location <<rootPath>>/emr-step-functions/input/tripdata.csv. If the job finishes successfully, it writes the output data to <<rootPath>>/emr-step-functions/miles-per-rate.
If the Spark job fails, it transitions to the error state MilesPerRate job failed, and the state machine stops. If the Spark job finishes successfully, it transitions to the RateCodeStatus Job state, and the second Spark job is executed.
If the second Spark job fails, it transitions to the error state RateCodeStatus job failed, and the state machine stops with the Failed status.
If this Spark job completes successfully, it writes the final output data to the <<rootPath>>/emr-step-functions/rate-code-status/ It also transitions the RateCodeStatus job finished state, and the state machine ends its execution with the Success status.
This following screenshot shows a successfully completed Spark ETL state machine:
The right side of the state machine diagram shows the details of individual states with their input and output.
When you execute the state machine for the second time, it fails because the S3 path already exists. The state machine turns red and stops at MilePerRate job failed. The following image represents that failed execution of the state machine:
You can also check your Spark application status and logs by going to the Amazon EMR console and viewing the Application history tab:
I hope this walkthrough paints a picture of how you can create a serverless solution for orchestrating Spark jobs on Amazon EMR using AWS Step Functions and Apache Livy. In the next section, I share some ideas for making this solution even more elegant.
Next steps
The goal of this post is to show a simple example that uses AWS Step Functions to create an orchestration for Spark-based jobs in a serverless fashion. To make this solution robust and production ready, you can explore the following options:
In this example, I manually initiated the state machine by passing the rootPath as input. You can instead trigger the state machine automatically. To run the ETL pipeline as soon as the files arrive in your S3 bucket, you can pass the new file path to the state machine. Because CloudWatch Events supports AWS Step Functions as a target, you can create a CloudWatch rule for an S3 event. You can then set AWS Step Functions as a target and pass the new file path to your state machine. You’re all set!
You can also improve this solution by adding an alerting mechanism in case of failures. To do this, create a Lambda function that sends an alert email and assigns that Lambda function to a Fail That way, when any part of your state fails, it triggers an email and notifies the user.
If you want to submit multiple Spark jobs in parallel, you can use the Parallel state type in AWS Step Functions. The Parallel state is used to create parallel branches of execution in your state machine.
With Lambda and AWS Step Functions, you can create a very robust serverless orchestration for your big data workload.
Cleaning up
When you’ve finished testing this solution, remember to clean up all those AWS resources that you created using AWS CloudFormation. Use the AWS CloudFormation console or AWS CLI to delete the stack named Blog-Spark-ETL-Step-Functions.
Summary
In this post, I showed you how to use AWS Step Functions to orchestrate your Spark jobs that are running on Amazon EMR. You used Apache Livy to submit jobs to Spark from a Lambda function and created a workflow for your Spark jobs, maintaining a specific order for job execution and triggering different AWS events based on your job’s outcome. Go ahead—give this solution a try, and share your experience with us!
Tanzir Musabbir is an EMR Specialist Solutions Architect with AWS. He is an early adopter of open source Big Data technologies. At AWS, he works with our customers to provide them architectural guidance for running analytics solutions on Amazon EMR, Amazon Athena & AWS Glue. Tanzir is a big Real Madrid fan and he loves to travel in his free time.
Thanks to Susan Ferrell, Senior Technical Writer, for a great blog post on how to use CodeCommit branch-level permissions. —-
AWS CodeCommit users have been asking for a way to restrict commits to some repository branches to just a few people. In this blog post, we’re going to show you how to do that by creating and applying a conditional policy, an AWS Identity and Access Management (IAM) policy that contains a context key.
Why would I do this?
When you create a branch in an AWS CodeCommit repository, the branch is available, by default, to all repository users. Here are some scenarios in which refining access might help you:
You maintain a branch in a repository for production-ready code, and you don’t want to allow changes to this branch except from a select group of people.
You want to limit the number of people who can make changes to the default branch in a repository.
You want to ensure that pull requests cannot be merged to a branch except by an approved group of developers.
We’ll show you how to create a policy in IAM that prevents users from pushing commits to and merging pull requests to a branch named master. You’ll attach that policy to one group or role in IAM, and then test how users in that group are affected when that policy is applied. We’ll explain how it works, so you can create custom policies for your repositories.
What you need to get started
You’ll need to sign in to AWS with sufficient permissions to:
Create and apply policies in IAM.
Create groups in IAM.
Add users to those groups.
Apply policies to those groups.
You can use existing IAM groups, but because you’re going to be changing permissions, you might want to first test this out on groups and users you’ve created specifically for this purpose.
You’ll need a repository in AWS CodeCommit with at least two branches: master and test-branch. For information about how to create repositories, see Create a Repository. For information about how to create branches, see Create a Branch. In this blog post, we’ve named the repository MyDemoRepo. You can use an existing repository with branches of another name, if you prefer.
Let’s get started!
Create two groups in IAM
We’re going to set up two groups in IAM: Developers and Senior_Developers. To start, both groups will have the same managed policy, AWSCodeCommitPowerUsers, applied. Users in each group will have exactly the same permissions to perform actions in IAM.
Figure 1: Two example groups in IAM, with distinct users but the same managed policy applied to each group
In the navigation pane, choose Groups, and then choose Create New Group.
In the Group Name box, type Developers, and then choose Next Step.
In the list of policies, select the check box for AWSCodeCommitPowerUsers, then choose Next Step.
Choose Create Group.
Now, follow these steps to create the Senior_Developers group and attach the AWSCodeCommitPowerUsers managed policy. You now have two empty groups with the same policy attached.
Create users in IAM
Next, add at least one unique user to each group. You can use existing IAM users, but because you’ll be affecting their access to AWS CodeCommit, you might want to create two users just for testing purposes. Let’s go ahead and create Arnav and Mary.
In the navigation pane, choose Users, and then choose Add user.
For the new user, type Arnav_Desai.
Choose Add another user, and then type Mary_Major.
Select the type of access (programmatic access, access to the AWS Management Console, or both). In this blog post, we’ll be testing everything from the console, but if you want to test AWS CodeCommit using the AWS CLI, make sure you include programmatic access and console access.
For Console password type, choose Custom password. Each user is assigned the password that you type in the box. Write these down so you don’t forget them. You’ll need to sign in to the console using each of these accounts.
Choose Next: Permissions.
On the Set permissions page, choose Add user to group. Add Arnav to the Developers group. Add Mary to the Senior_Developers group.
Choose Next: Review to see all of the choices you made up to this point. When you are ready to proceed, choose Create user.
Sign in as Arnav, and then follow these steps to go to the master branch and add a file. Then sign in as Mary and follow the same steps.
On the Dashboard page, from the list of repositories, choose MyDemoRepo.
In the Code view, choose the branch named master.
Choose Add file, and then choose Create file. Type some text or code in the editor.
Provide information to other users about who added this file to the repository and why.
In Author name, type the name of the user (Arnav or Mary).
In Email address, type an email address so that other repository users can contact you about this change.
In Commit message, type a brief description to help you remember why you added this file or any other details you might find helpful.
Type a name for the file.
Choose Commit file.
Now follow the same steps to add a file in a different branch. (In our example repository, that’s the branch named test-branch.) You should be able to add a file to both branches regardless of whether you’re signed in as Arnav or Mary.
Let’s change that.
Create a conditional policy in IAM
You’re going to create a policy in IAM that will deny API actions if certain conditions are met. We want to prevent users with this policy applied from updating a branch named master, but we don’t want to prevent them from viewing the branch, cloning the repository, or creating pull requests that will merge to that branch. For this reason, we want to pick and choose our APIs carefully. Looking at the Permissions Reference, the logical permissions for this are:
GitPush
PutFile
MergePullRequestByFastForward
Now’s the time to think about what else you might want this policy to do. For example, because we don’t want users with this policy to make changes to this branch, we probably don’t want them to be able to delete it either, right? So let’s add one more permission:
DeleteBranch
The branch in which we want to deny these actions is master. The repository in which the branch resides is MyDemoRepo. We’re going to need more than just the repository name, though. We need the repository ARN. Fortunately, that’s easy to find. Just go to the AWS CodeCommit console, choose the repository, and choose Settings. The repository ARN is displayed on the General tab.
Now we’re ready to create a policy. 1. Open the IAM console at https://console.aws.amazon.com/iam/. Make sure you’re signed in with the account that has sufficient permissions to create policies, and not as Arnav or Mary. 2. In the navigation pane, choose Policies, and then choose Create policy. 3. Choose JSON, and then paste in the following:
You’ll notice a few things here. First, change the repository ARN to the ARN for your repository and include the repository name. Second, if you want to restrict access to a branch with a name different from our example, master, change that reference too.
Now let’s talk about this policy and what it does. You might be wondering why we’re using a Git reference (refs/heads) value instead of just the branch name. The answer lies in how Git references things, and how AWS CodeCommit, as a Git-based repository service, implements its APIs. A branch in Git is a simple pointer (reference) to the SHA-1 value of the head commit for that branch.
You might also be wondering about the second part of the condition, the nullification language. This is necessary because of the way git push and git-receive-pack work. Without going into too many technical details, when you attempt to push a change from a local repo to AWS CodeCommit, an initial reference call is made to AWS CodeCommit without any branch information. AWS CodeCommit evaluates that initial call to ensure that:
a) You’re authorized to make calls.
b) A repository exists with the name specified in the initial call. If you left that null out of the policy, users with that policy would be unable to complete any pushes from their local repos to the AWS CodeCommit remote repository at all, regardless of which branch they were trying to push their commits to.
Could you write a policy in such a way that the null is not required? Of course. IAM policy language is flexible. There’s an example of how to do this in the AWS CodeCommit User Guide, if you’re curious. But for the purposes of this blog post, let’s continue with this policy as written.
So what have we essentially said in this policy? We’ve asked IAM to deny the relevant CodeCommit permissions if the request is made to the resource MyDemoRepo and it meets the following condition: the reference is to refs/heads/master. Otherwise, the deny does not apply.
I’m sure you’re wondering if this policy has to be constrained to a specific repository resource like MyDemoRepo. After all, it would be awfully convenient if a single policy could apply to all branches in any repository in an AWS account, particularly since the default branch in any repository is initially the master branch. Good news! Simply replace the ARN with an *, and your policy will affect ALL branches named master in every AWS CodeCommit repository in your AWS account. Make sure that this is really what you want, though. We suggest you start by limiting the scope to just one repository, and then changing things when you’ve tested it and are happy with how it works.
When you’re sure you’ve modified the policy for your environment, choose Review policy to validate it. Give this policy a name, such as DenyChangesToMaster, provide a description of its purpose, and then choose Create policy.
Now that you have a policy, it’s time to apply and test it.
Apply the policy to a group
In theory, you could apply the policy you just created directly to any IAM user, but that really doesn’t scale well. You should apply this policy to a group, if you use IAM groups to manage users, or to a role, if your users assume a role when interacting with AWS resources.
In the IAM console, choose Groups, and then choose Developers.
On the Permissions tab, choose Attach Policy.
Choose DenyChangesToMaster, and then choose Attach policy.
Your groups now have a critical difference: users in the Developers group have an additional policy applied that restricts their actions in the master branch. In other words, Mary can continue to add files, push commits, and merge pull requests in the master branch, but Arnav cannot.
Figure 2: Two example groups in IAM, one with an additional policy applied that will prevent users in this group from making changes to the master branch
Test it out. Sign in as Arnav, and do the following:
On the Dashboard page, from the list of repositories, choose MyDemoRepo.
In the Code view, choose the branch named master.
Choose Add file, and then choose Create file, just as you did before. Provide some text, and then add the file name and your user information.
Choose Commit file.
This time you’ll see an error after choosing Commit file. It’s not a pretty message, but at the very end, you’ll see a telling phrase: “explicit deny”. That’s the policy in action. You, as Arnav, are explicitly denied PutFile, which prevents you from adding a file to the master branch. You’ll see similar results if you try other actions denied by that policy, such as deleting the master branch.
Stay signed in as Arnav, but this time add a file to test-branch. You should be able to add a file without seeing any errors. You can create a branch based on the master branch, add a file to it, and create a pull request that will merge to the master branch, all just as before. However, you cannot perform denied actions on that master branch.
Sign out as Arnav and sign in as Mary. You’ll see that as that IAM user, you can add and edit files in the master branch, merge pull requests to it, and even, although we don’t recommend this, delete it.
Conclusion
You can use conditional statements in policies in IAM to refine how users interact with your AWS CodeCommit repositories. This blog post showed how to use such a policy to prevent users from making changes to a branch named master. There are many other options. We hope this blog post will encourage you to experiment with AWS CodeCommit, IAM policies, and permissions. If you have any questions or suggestions, we’d love to hear from you.
Amazon Kinesis Data Firehose is the easiest way to capture and stream data into a data lake built on Amazon S3. This data can be anything—from AWS service logs like AWS CloudTrail log files, Amazon VPC Flow Logs, Application Load Balancer logs, and others. It can also be IoT events, game events, and much more. To efficiently query this data, a time-consuming ETL (extract, transform, and load) process is required to massage and convert the data to an optimal file format, which increases the time to insight. This situation is less than ideal, especially for real-time data that loses its value over time.
To solve this common challenge, Kinesis Data Firehose can now save data to Amazon S3 in Apache Parquet or Apache ORC format. These are optimized columnar formats that are highly recommended for best performance and cost-savings when querying data in S3. This feature directly benefits you if you use Amazon Athena, Amazon Redshift, AWS Glue, Amazon EMR, or any other big data tools that are available from the AWS Partner Network and through the open-source community.
Amazon Connect is a simple-to-use, cloud-based contact center service that makes it easy for any business to provide a great customer experience at a lower cost than common alternatives. Its open platform design enables easy integration with other systems. One of those systems is Amazon Kinesis—in particular, Kinesis Data Streams and Kinesis Data Firehose.
What’s really exciting is that you can now save events from Amazon Connect to S3 in Apache Parquet format. You can then perform analytics using Amazon Athena and Amazon Redshift Spectrum in real time, taking advantage of this key performance and cost optimization. Of course, Amazon Connect is only one example. This new capability opens the door for a great deal of opportunity, especially as organizations continue to build their data lakes.
Amazon Connect includes an array of analytics views in the Administrator dashboard. But you might want to run other types of analysis. In this post, I describe how to set up a data stream from Amazon Connect through Kinesis Data Streams and Kinesis Data Firehose and out to S3, and then perform analytics using Athena and Amazon Redshift Spectrum. I focus primarily on the Kinesis Data Firehose support for Parquet and its integration with the AWS Glue Data Catalog, Amazon Athena, and Amazon Redshift.
Solution overview
Here is how the solution is laid out:
The following sections walk you through each of these steps to set up the pipeline.
1. Define the schema
When Kinesis Data Firehose processes incoming events and converts the data to Parquet, it needs to know which schema to apply. The reason is that many times, incoming events contain all or some of the expected fields based on which values the producers are advertising. A typical process is to normalize the schema during a batch ETL job so that you end up with a consistent schema that can easily be understood and queried. Doing this introduces latency due to the nature of the batch process. To overcome this issue, Kinesis Data Firehose requires the schema to be defined in advance.
To see the available columns and structures, see Amazon Connect Agent Event Streams. For the purpose of simplicity, I opted to make all the columns of type String rather than create the nested structures. But you can definitely do that if you want.
The simplest way to define the schema is to create a table in the Amazon Athena console. Open the Athena console, and paste the following create table statement, substituting your own S3 bucket and prefix for where your event data will be stored. A Data Catalog database is a logical container that holds the different tables that you can create. The default database name shown here should already exist. If it doesn’t, you can create it or use another database that you’ve already created.
That’s all you have to do to prepare the schema for Kinesis Data Firehose.
2. Define the data streams
Next, you need to define the Kinesis data streams that will be used to stream the Amazon Connect events. Open the Kinesis Data Streams console and create two streams. You can configure them with only one shard each because you don’t have a lot of data right now.
3. Define the Kinesis Data Firehose delivery stream for Parquet
Let’s configure the Data Firehose delivery stream using the data stream as the source and Amazon S3 as the output. Start by opening the Kinesis Data Firehose console and creating a new data delivery stream. Give it a name, and associate it with the Kinesis data stream that you created in Step 2.
As shown in the following screenshot, enable Record format conversion (1) and choose Apache Parquet (2). As you can see, Apache ORC is also supported. Scroll down and provide the AWS Glue Data Catalog database name (3) and table names (4) that you created in Step 1. Choose Next.
To make things easier, the output S3 bucket and prefix fields are automatically populated using the values that you defined in the LOCATION parameter of the create table statement from Step 1. Pretty cool. Additionally, you have the option to save the raw events into another location as defined in the Source record S3 backup section. Don’t forget to add a trailing forward slash “ / “ so that Data Firehose creates the date partitions inside that prefix.
On the next page, in the S3 buffer conditions section, there is a note about configuring a large buffer size. The Parquet file format is highly efficient in how it stores and compresses data. Increasing the buffer size allows you to pack more rows into each output file, which is preferred and gives you the most benefit from Parquet.
Compression using Snappy is automatically enabled for both Parquet and ORC. You can modify the compression algorithm by using the Kinesis Data Firehose API and update the OutputFormatConfiguration.
Be sure to also enable Amazon CloudWatch Logs so that you can debug any issues that you might run into.
Lastly, finalize the creation of the Firehose delivery stream, and continue on to the next section.
4. Set up the Amazon Connect contact center
After setting up the Kinesis pipeline, you now need to set up a simple contact center in Amazon Connect. The Getting Started page provides clear instructions on how to set up your environment, acquire a phone number, and create an agent to accept calls.
After setting up the contact center, in the Amazon Connect console, choose your Instance Alias, and then choose Data Streaming. Under Agent Event, choose the Kinesis data stream that you created in Step 2, and then choose Save.
At this point, your pipeline is complete. Agent events from Amazon Connect are generated as agents go about their day. Events are sent via Kinesis Data Streams to Kinesis Data Firehose, which converts the event data from JSON to Parquet and stores it in S3. Athena and Amazon Redshift Spectrum can simply query the data without any additional work.
So let’s generate some data. Go back into the Administrator console for your Amazon Connect contact center, and create an agent to handle incoming calls. In this example, I creatively named mine Agent One. After it is created, Agent One can get to work and log into their console and set their availability to Available so that they are ready to receive calls.
To make the data a bit more interesting, I also created a second agent, Agent Two. I then made some incoming and outgoing calls and caused some failures to occur, so I now have enough data available to analyze.
5. Analyze the data with Athena
Let’s open the Athena console and run some queries. One thing you’ll notice is that when we created the schema for the dataset, we defined some of the fields as Strings even though in the documentation they were complex structures. The reason for doing that was simply to show some of the flexibility of Athena to be able to parse JSON data. However, you can define nested structures in your table schema so that Kinesis Data Firehose applies the appropriate schema to the Parquet file.
Let’s run the first query to see which agents have logged into the system.
The query might look complex, but it’s fairly straightforward:
WITH dataset AS (
SELECT
from_iso8601_timestamp(eventtimestamp) AS event_ts,
eventtype,
-- CURRENT STATE
json_extract_scalar(
currentagentsnapshot,
'$.agentstatus.name') AS current_status,
from_iso8601_timestamp(
json_extract_scalar(
currentagentsnapshot,
'$.agentstatus.starttimestamp')) AS current_starttimestamp,
json_extract_scalar(
currentagentsnapshot,
'$.configuration.firstname') AS current_firstname,
json_extract_scalar(
currentagentsnapshot,
'$.configuration.lastname') AS current_lastname,
json_extract_scalar(
currentagentsnapshot,
'$.configuration.username') AS current_username,
json_extract_scalar(
currentagentsnapshot,
'$.configuration.routingprofile.defaultoutboundqueue.name') AS current_outboundqueue,
json_extract_scalar(
currentagentsnapshot,
'$.configuration.routingprofile.inboundqueues[0].name') as current_inboundqueue,
-- PREVIOUS STATE
json_extract_scalar(
previousagentsnapshot,
'$.agentstatus.name') as prev_status,
from_iso8601_timestamp(
json_extract_scalar(
previousagentsnapshot,
'$.agentstatus.starttimestamp')) as prev_starttimestamp,
json_extract_scalar(
previousagentsnapshot,
'$.configuration.firstname') as prev_firstname,
json_extract_scalar(
previousagentsnapshot,
'$.configuration.lastname') as prev_lastname,
json_extract_scalar(
previousagentsnapshot,
'$.configuration.username') as prev_username,
json_extract_scalar(
previousagentsnapshot,
'$.configuration.routingprofile.defaultoutboundqueue.name') as current_outboundqueue,
json_extract_scalar(
previousagentsnapshot,
'$.configuration.routingprofile.inboundqueues[0].name') as prev_inboundqueue
from kfhconnectblog
where eventtype <> 'HEART_BEAT'
)
SELECT
current_status as status,
current_username as username,
event_ts
FROM dataset
WHERE eventtype = 'LOGIN' AND current_username <> ''
ORDER BY event_ts DESC
The query output looks something like this:
Here is another query that shows the sessions each of the agents engaged with. It tells us where they were incoming or outgoing, if they were completed, and where there were missed or failed calls.
WITH src AS (
SELECT
eventid,
json_extract_scalar(currentagentsnapshot, '$.configuration.username') as username,
cast(json_extract(currentagentsnapshot, '$.contacts') AS ARRAY(JSON)) as c,
cast(json_extract(previousagentsnapshot, '$.contacts') AS ARRAY(JSON)) as p
from kfhconnectblog
),
src2 AS (
SELECT *
FROM src CROSS JOIN UNNEST (c, p) AS contacts(c_item, p_item)
),
dataset AS (
SELECT
eventid,
username,
json_extract_scalar(c_item, '$.contactid') as c_contactid,
json_extract_scalar(c_item, '$.channel') as c_channel,
json_extract_scalar(c_item, '$.initiationmethod') as c_direction,
json_extract_scalar(c_item, '$.queue.name') as c_queue,
json_extract_scalar(c_item, '$.state') as c_state,
from_iso8601_timestamp(json_extract_scalar(c_item, '$.statestarttimestamp')) as c_ts,
json_extract_scalar(p_item, '$.contactid') as p_contactid,
json_extract_scalar(p_item, '$.channel') as p_channel,
json_extract_scalar(p_item, '$.initiationmethod') as p_direction,
json_extract_scalar(p_item, '$.queue.name') as p_queue,
json_extract_scalar(p_item, '$.state') as p_state,
from_iso8601_timestamp(json_extract_scalar(p_item, '$.statestarttimestamp')) as p_ts
FROM src2
)
SELECT
username,
c_channel as channel,
c_direction as direction,
p_state as prev_state,
c_state as current_state,
c_ts as current_ts,
c_contactid as id
FROM dataset
WHERE c_contactid = p_contactid
ORDER BY id DESC, current_ts ASC
The query output looks similar to the following:
6. Analyze the data with Amazon Redshift Spectrum
With Amazon Redshift Spectrum, you can query data directly in S3 using your existing Amazon Redshift data warehouse cluster. Because the data is already in Parquet format, Redshift Spectrum gets the same great benefits that Athena does.
Here is a simple query to show querying the same data from Amazon Redshift. Note that to do this, you need to first create an external schema in Amazon Redshift that points to the AWS Glue Data Catalog.
SELECT
eventtype,
json_extract_path_text(currentagentsnapshot,'agentstatus','name') AS current_status,
json_extract_path_text(currentagentsnapshot, 'configuration','firstname') AS current_firstname,
json_extract_path_text(currentagentsnapshot, 'configuration','lastname') AS current_lastname,
json_extract_path_text(
currentagentsnapshot,
'configuration','routingprofile','defaultoutboundqueue','name') AS current_outboundqueue,
FROM default_schema.kfhconnectblog
The following shows the query output:
Summary
In this post, I showed you how to use Kinesis Data Firehose to ingest and convert data to columnar file format, enabling real-time analysis using Athena and Amazon Redshift. This great feature enables a level of optimization in both cost and performance that you need when storing and analyzing large amounts of data. This feature is equally important if you are investing in building data lakes on AWS.
Roy Hasson is a Global Business Development Manager for AWS Analytics. He works with customers around the globe to design solutions to meet their data processing, analytics and business intelligence needs. Roy is big Manchester United fan cheering his team on and hanging out with his family.
AWS Config enables continuous monitoring of your AWS resources, making it simple to assess, audit, and record resource configurations and changes. AWS Config does this through the use of rules that define the desired configuration state of your AWS resources. AWS Config provides a number of AWS managed rules that address a wide range of security concerns such as checking if you encrypted your Amazon Elastic Block Store (Amazon EBS) volumes, tagged your resources appropriately, and enabled multi-factor authentication (MFA) for root accounts. You can also create custom rules to codify your compliance requirements through the use of AWS Lambda functions.
In this post we’ll show you how to use AWS Config to monitor our Amazon Simple Storage Service (S3) bucket ACLs and policies for violations which allow public read or public write access. If AWS Config finds a policy violation, we’ll have it trigger an Amazon CloudWatch Event rule to trigger an AWS Lambda function which either corrects the S3 bucket ACL, or notifies you via Amazon Simple Notification Service (Amazon SNS) that the policy is in violation and allows public read or public write access. We’ll show you how to do this in five main steps.
Enable AWS Config to monitor Amazon S3 bucket ACLs and policies for compliance violations.
Create an IAM Role and Policy that grants a Lambda function permissions to read S3 bucket policies and send alerts through SNS.
Create and configure a CloudWatch Events rule that triggers the Lambda function when AWS Config detects an S3 bucket ACL or policy violation.
Create a Lambda function that uses the IAM role to review S3 bucket ACLs and policies, correct the ACLs, and notify your team of out-of-compliance policies.
Verify the monitoring solution.
Note: This post assumes your compliance policies require the buckets you monitor not allow public read or write access. If you have intentionally open buckets serving static content, for example, you can use this post as a jumping-off point for a solution tailored to your needs.
At the end of this post, we provide an AWS CloudFormation template that implements the solution outlined. The template enables you to deploy the solution in multiple regions quickly.
Important: The use of some of the resources deployed, including those deployed using the provided CloudFormation template, will incur costs as long as they are in use. AWS Config Rules incur costs in each region they are active.
Architecture
Here’s an architecture diagram of what we’ll implement:
Figure 1: Architecture diagram
Step 1: Enable AWS Config and Amazon S3 Bucket monitoring
The following steps demonstrate how to set up AWS Config to monitor Amazon S3 buckets.
If this is your first time using AWS Config, select Get started. If you’ve already used AWS Config, select Settings.
In the Settings page, under Resource types to record, clear the All resources checkbox. In the Specific types list, select Bucket under S3.
Figure 2: The Settings dialog box showing the “Specific types” list
Choose the Amazon S3 bucket for storing configuration history and snapshots. We’ll create a new Amazon S3 bucket.
Figure 3: Creating an S3 bucket
If you prefer to use an existing Amazon S3 bucket in your account, select the Choose a bucket from your account radio button and, using the dropdown, select an existing bucket.
Figure 4: Selecting an existing S3 bucket
Under Amazon SNS topic, check the box next to Stream configuration changes and notifications to an Amazon SNS topic, and then select the radio button to Create a topic.
Alternatively, you can choose a topic that you have previously created and subscribed to.
Figure 5: Selecting a topic that you’ve previously created and subscribed to
If you created a new SNS topic you need to subscribe to it to receive notifications. We’ll cover this in a later step.
Under AWS Config role, choose Create a role (unless you already have a role you want to use). We’re using the auto-suggested role name.
Figure 6: Creating a role
Select Next.
Configure Amazon S3 bucket monitoring rules:
On the AWS Config rules page, search for S3 and choose the s3-bucket-publice-read-prohibited and s3-bucket-public-write-prohibited rules, then click Next.
Figure 7: AWS Config rules dialog
On the Review page, select Confirm. AWS Config is now analyzing your Amazon S3 buckets, capturing their current configurations, and evaluating the configurations against the rules we selected.
If you created a new Amazon SNS topic, open the Amazon SNS Management Console and locate the topic you created:
Figure 8: Amazon SNS topic list
Copy the ARN of the topic (the string that begins with arn:) because you’ll need it in a later step.
Select the checkbox next to the topic, and then, under the Actions menu, select Subscribe to topic.
Select Email as the protocol, enter your email address, and then select Create subscription.
After several minutes, you’ll receive an email asking you to confirm your subscription for notifications for this topic. Select the link to confirm the subscription.
Step 2: Create a Role for Lambda
Our Lambda will need permissions that enable it to inspect and modify Amazon S3 bucket ACLs and policies, log to CloudWatch Logs, and publishing to an Amazon SNS topic. We’ll now set up a custom AWS Identity and Access Management (IAM) policy and role to support these actions and assign them to the Lambda function we’ll create in the next section.
In the AWS Management Console, under Services, select IAM to access the IAM Console.
Create a policy with the following permissions, or copy the following policy:
Select Lambda from the list of services that will use this role.
Select the check box next to the policy you created previously, and then select Next: Review
Name your role, give it a description, and then select Create Role. In this example, we’re naming the role LambdaS3PolicySecuringRole.
Step 3: Create and Configure a CloudWatch Rule
In this section, we’ll create a CloudWatch Rule to trigger the Lambda function when AWS Config determines that your Amazon S3 buckets are non-compliant.
In the AWS Management Console, under Services, select CloudWatch.
On the left-hand side, under Events, select Rules.
Click Create rule.
In Step 1: Create rule, under Event Source, select the dropdown list and select Build custom event pattern.
Copy the following pattern and paste it into the text box:
The pattern matches events generated by AWS Config when it checks the Amazon S3 bucket for public accessibility.
We’ll add a Lambda target later. For now, select your Amazon SNS topic created earlier, and then select Configure details.
Figure 9: The “Create rule” dialog
Give your rule a name and description. For this example, we’ll name ours AWSConfigFoundOpenBucket
Click Create rule.
Step 4: Create a Lambda Function
In this section, we’ll create a new Lambda function to examine an Amazon S3 bucket’s ACL and bucket policy. If the bucket ACL is found to allow public access, the Lambda function overwrites it to be private. If a bucket policy is found, the Lambda function creates an SNS message, puts the policy in the message body, and publishes it to the Amazon SNS topic we created. Bucket policies can be complex, and overwriting your policy may cause unexpected loss of access, so this Lambda function doesn’t attempt to alter your policy in any way.
Get the ARN of the Amazon SNS topic created earlier.
In the AWS Management Console, under Services, select Lambda to go to the Lambda Console.
From the Dashboard, select Create Function. Or, if you were taken directly to the Functions page, select the Create Function button in the upper-right.
On the Create function page:
Choose Author from scratch.
Provide a name for the function. We’re using AWSConfigOpenAccessResponder.
The Lambda function we’ve written is Python 3.6 compatible, so in the Runtime dropdown list, select Python 3.6.
Under Role, select Choose an existing role. Select the role you created in the previous section, and then select Create function.
Figure 10: The “Create function” dialog
We’ll now add a CloudWatch Event based on the rule we created earlier.
In the Add triggers section, select CloudWatch Events. A CloudWatch Events box should appear connected to the left side of the Lambda Function and have a note that states Configuration required.
Figure 11: CloudWatch Events in the “Add triggers” section
From the Rule dropdown box, choose the rule you created earlier, and then select Add.
Scroll up to the Designer section and select the name of your Lambda function.
Delete the default code and paste in the following code:
import boto3
from botocore.exceptions import ClientError
import json
import os
ACL_RD_WARNING = "The S3 bucket ACL allows public read access."
PLCY_RD_WARNING = "The S3 bucket policy allows public read access."
ACL_WRT_WARNING = "The S3 bucket ACL allows public write access."
PLCY_WRT_WARNING = "The S3 bucket policy allows public write access."
RD_COMBO_WARNING = ACL_RD_WARNING + PLCY_RD_WARNING
WRT_COMBO_WARNING = ACL_WRT_WARNING + PLCY_WRT_WARNING
def policyNotifier(bucketName, s3client):
try:
bucketPolicy = s3client.get_bucket_policy(Bucket = bucketName)
# notify that the bucket policy may need to be reviewed due to security concerns
sns = boto3.client('sns')
subject = "Potential compliance violation in " + bucketName + " bucket policy"
message = "Potential bucket policy compliance violation. Please review: " + json.dumps(bucketPolicy['Policy'])
# send SNS message with warning and bucket policy
response = sns.publish(
TopicArn = os.environ['TOPIC_ARN'],
Subject = subject,
Message = message
)
except ClientError as e:
# error caught due to no bucket policy
print("No bucket policy found; no alert sent.")
def lambda_handler(event, context):
# instantiate Amazon S3 client
s3 = boto3.client('s3')
resource = list(event['detail']['requestParameters']['evaluations'])[0]
bucketName = resource['complianceResourceId']
complianceFailure = event['detail']['requestParameters']['evaluations'][0]['annotation']
if(complianceFailure == ACL_RD_WARNING or complianceFailure == PLCY_RD_WARNING):
s3.put_bucket_acl(Bucket = bucketName, ACL = 'private')
elif(complianceFailure == PLCY_RD_WARNING or complianceFailure == PLCY_WRT_WARNING):
policyNotifier(bucketName, s3)
elif(complianceFailure == RD_COMBO_WARNING or complianceFailure == WRT_COMBO_WARNING):
s3.put_bucket_acl(Bucket = bucketName, ACL = 'private')
policyNotifier(bucketName, s3)
return 0 # done
Scroll down to the Environment variables section. This code uses an environment variable to store the Amazon SNS topic ARN.
For the key, enter TOPIC_ARN.
For the value, enter the ARN of the Amazon SNS topic created earlier.
Under Execution role, select Choose an existing role, and then select the role created earlier from the dropdown.
Leave everything else as-is, and then, at the top, select Save.
Step 5: Verify it Works
We now have the Lambda function, an Amazon SNS topic, AWS Config watching our Amazon S3 buckets, and a CloudWatch Rule to trigger the Lambda function if a bucket is found to be non-compliant. Let’s test them to make sure they work.
We have an Amazon S3 bucket, myconfigtestbucket that’s been created in the region monitored by AWS Config, as well as the associated Lambda function. This bucket has no public read or write access set in an ACL or a policy, so it’s compliant.
Figure 12: The “Config Dashboard”
Let’s change the bucket’s ACL to allow public listing of objects:
Figure 13: Screen shot of “Permissions” tab showing Everyone granted list access
After saving, the bucket now has public access. After several minutes, the AWS Config Dashboard notes that there is one non-compliant resource:
Figure 14: The “Config Dashboard” shown with a non-compliant resource
In the Amazon S3 Console, we can see that the bucket no longer has public listing of objects enabled after the invocation of the Lambda function triggered by the CloudWatch Rule created earlier.
Figure 15: The “Permissions” tab showing list access no longer allowed
Notice that the AWS Config Dashboard now shows that there are no longer any non-compliant resources:
Figure 16: The “Config Dashboard” showing zero non-compliant resources
Now, let’s try out the Amazon S3 bucket policy check by configuring a bucket policy that allows list access:
Figure 17: A bucket policy that allows list access
A few minutes after setting this bucket policy on the myconfigtestbucket bucket, AWS Config recognizes the bucket is no longer compliant. Because this is a bucket policy rather than an ACL, we publish a notification to the SNS topic we created earlier that lets us know about the potential policy violation:
Figure 18: Notification about potential policy violation
Knowing that the policy allows open listing of the bucket, we can now modify or delete the policy, after which AWS Config will recognize that the resource is compliant.
Conclusion
In this post, we demonstrated how you can use AWS Config to monitor for Amazon S3 buckets with open read and write access ACLs and policies. We also showed how to use Amazon CloudWatch, Amazon SNS, and Lambda to overwrite a public bucket ACL, or to alert you should a bucket have a suspicious policy. You can use the CloudFormation template to deploy this solution in multiple regions quickly. With this approach, you will be able to easily identify and secure open Amazon S3 bucket ACLs and policies. Once you have deployed this solution to multiple regions you can aggregate the results using an AWS Config aggregator. See this post to learn more.
If you have feedback about this blog post, submit comments in the Comments section below. If you have questions about this blog post, start a new thread on the AWS Config forum or contact AWS Support.
Want more AWS Security news? Follow us on Twitter.
If you’re not already familiar with building visualizations for quick access to business insights using Amazon QuickSight, consider this your introduction. In this post, we’ll walk through some common scenarios with sample datasets to provide an overview of how you can connect yuor data, perform advanced analysis and access the results from any web browser or mobile device.
The following visualizations are built from the public datasets available in the links below. Before we jump into that, let’s take a look at the supported data sources, file formats and a typical QuickSight workflow to build any visualization.
Which data sources does Amazon QuickSight support?
At the time of publication, you can use the following data methods:
Connect to AWS data sources, including:
Amazon RDS
Amazon Aurora
Amazon Redshift
Amazon Athena
Amazon S3
Upload Excel spreadsheets or flat files (CSV, TSV, CLF, and ELF)
Connect to on-premises databases like Teradata, SQL Server, MySQL, and PostgreSQL
Import data from SaaS applications like Salesforce and Snowflake
Use big data processing engines like Spark and Presto
SPICE is the Amazon QuickSight super-fast, parallel, in-memory calculation engine, designed specifically for ad hoc data visualization. SPICE stores your data in a system architected for high availability, where it is saved until you choose to delete it. Improve the performance of database datasets by importing the data into SPICE instead of using a direct database query. To calculate how much SPICE capacity your dataset needs, see Managing SPICE Capacity.
Typical Amazon QuickSight workflow
When you create an analysis, the typical workflow is as follows:
Connect to a data source, and then create a new dataset or choose an existing dataset.
(Optional) If you created a new dataset, prepare the data (for example, by changing field names or data types).
Create a new analysis.
Add a visual to the analysis by choosing the fields to visualize. Choose a specific visual type, or use AutoGraph and let Amazon QuickSight choose the most appropriate visual type, based on the number and data types of the fields that you select.
(Optional) Modify the visual to meet your requirements (for example, by adding a filter or changing the visual type).
(Optional) Add more visuals to the analysis.
(Optional) Add scenes to the default story to provide a narrative about some aspect of the analysis data.
(Optional) Publish the analysis as a dashboard to share insights with other users.
The following graphic illustrates a typical Amazon QuickSight workflow.
Visualizations created in Amazon QuickSight with sample datasets
Data catalog: The World Bank invests into multiple development projects at the national, regional, and global levels. It’s a great source of information for data analysts.
The following graph shows the percentage of the population that has access to electricity (rural and urban) during 2000 in Asia, Africa, the Middle East, and Latin America.
The following graph shows the share of healthcare costs that are paid out-of-pocket (private vs. public). Also, you can maneuver over the graph to get detailed statistics at a glance.
Data catalog: The DBG PDS project makes real-time data derived from Deutsche Börse’s trading market systems available to the public for free. This is the first time that such detailed financial market data has been shared freely and continually from the source provider.
The following graph shows the market trend of max trade volume for different EU banks. It builds on the data available on XETRA engines, which is made up of a variety of equities, funds, and derivative securities. This graph can be scrolled to visualize trade for a period of an hour or more.
The following graph shows the common stock beating the rest of the maximum trade volume over a period of time, grouped by security type.
Data catalog: Data derived from different sensor stations placed on the city bridges and surface streets are a core information source. The road weather information station has a temperature sensor that measures the temperature of the street surface. It also has a sensor that measures the ambient air temperature at the station each second.
The following graph shows the present max air temperature in Seattle from different RWI station sensors.
The following graph shows the minimum temperature of the road surface at different times, which helps predicts road conditions at a particular time of the year.
Data catalog: Kaggle has come up with a platform where people can donate open datasets. Data engineers and other community members can have open access to these datasets and can contribute to the open data movement. They have more than 350 datasets in total, with more than 200 as featured datasets. It has a few interesting datasets on the platform that are not present at other places, and it’s a platform to connect with other data enthusiasts.
The following graph shows the trending YouTube videos and presents the max likes for the top 20 channels. This is one of the most popular datasets for data engineers.
The following graph shows the YouTube daily statistics for the max views of video titles published during a specific time period.
Data catalog: NYC Open data hosts some very popular open data sets for all New Yorkers. This platform allows you to get involved in dive deep into the data set to pull some useful visualizations. 2016 Green taxi trip dataset includes trip records from all trips completed in green taxis in NYC in 2016. Records include fields capturing pick-up and drop-off dates/times, pick-up and drop-off locations, trip distances, itemized fares, rate types, payment types, and driver-reported passenger counts.
The following graph presents maximum fare amount grouped by the passenger count during a period of time during a day. This can be further expanded to follow through different day of the month based on the business need.
The following graph shows the NewYork taxi data from January 2016, showing the dip in the number of taxis ridden on January 23, 2016 across all types of taxis.
A quick search for that date and location shows you the following news report:
Summary
Using Amazon QuickSight, you can see patterns across a time-series data by building visualizations, performing ad hoc analysis, and quickly generating insights. We hope you’ll give it a try today!
Karthik Odapally is a Sr. Solutions Architect in AWS. His passion is to build cost effective and highly scalable solutions on the cloud. In his spare time, he bakes cookies and cupcakes for family and friends here in the PNW. He loves vintage racing cars.
Pranabesh Mandal is a Solutions Architect in AWS. He has over a decade of IT experience. He is passionate about cloud technology and focuses on Analytics. In his spare time, he likes to hike and explore the beautiful nature and wild life of most divine national parks around the United States alongside his wife.
In today’s guest post, seventh-grade students Evan Callas, Will Ross, Tyler Fallon, and Kyle Fugate share their story of using the Raspberry Pi Oracle Weather Station in their Innovation Lab class, headed by Raspberry Pi Certified Educator Chris Aviles.
United Nations Sustainable Goals
The past couple of weeks in our Innovation Lab class, our teacher, Mr Aviles, has challenged us students to design a project that helps solve one of the United Nations Sustainable Goals. We chose Climate Action. Innovation Lab is a class that gives students the opportunity to learn about where the crossroads of technology, the environment, and entrepreneurship meet. Everyone takes their own paths in innovation and learns about the environment using project-based learning.
Raspberry Pi Oracle Weather Station
For our climate change challenge, we decided to build a Raspberry Pi Oracle Weather Station. Tackling the issues of climate change in a way that helps our community stood out to us because we knew with the help of this weather station we can send the local data to farmers and fishermen in town. Recent changes in climate have been affecting farmers’ crops. Unexpected rain, heat, and other unusual weather patterns can completely destabilize the natural growth of the plants and destroy their crops altogether. The amount of labour output needed by farmers has also significantly increased, forcing farmers to grow more food on less resources. By using our Raspberry Pi Oracle Weather Station to alert local farmers, they can be more prepared and aware of the weather, leading to better crops and safe boating.
Growing teamwork and coding skills
The process of setting up our weather station was fun and simple. Raspberry Pi made the instructions very easy to understand and read, which was very helpful for our team who had little experience in coding or physical computing. We enjoyed working together as a team and were happy to be growing our teamwork skills.
Once we constructed and coded the weather station, we learned that we needed to support the station with PVC pipes. After we completed these steps, we brought the weather station up to the roof of the school and began collecting data. Our information is currently being sent to the Initial State dashboard so that we can share the information with anyone interested. This information will also be recorded and seen by other schools, businesses, and others from around the world who are using the weather station. For example, we can see the weather in countries such as France, Greece and Italy.
Raspberry Pi allows us to build these amazing projects that help us to enjoy coding and physical computing in a fun, engaging, and impactful way. We picked climate change because we care about our community and would like to make a substantial contribution to our town, Fair Haven, New Jersey. It is not every day that kids are given these kinds of opportunities, and we are very lucky and grateful to go to a school and learn from a teacher where these opportunities are given to us. Thanks, Mr Aviles!
To see more awesome projects by Mr Avile’s class, you can keep up with him on his blog and follow him on Twitter.
By continuing to use the site, you agree to the use of cookies. more information
The cookie settings on this website are set to "allow cookies" to give you the best browsing experience possible. If you continue to use this website without changing your cookie settings or you click "Accept" below then you are consenting to this.