Cloud solutions architects should ideally “build today with tomorrow in mind,” meaning their solutions need to cater to current scale requirements as well as the anticipated growth of the solution. This growth can be either the organic growth of a solution or it could be related to a merger and acquisition type of scenario, where its size is increased dramatically within a short period of time.
Still, when a solution scales, many architects experience added complexity to the overall architecture in terms of its manageability, performance, security, etc. By architecting your solution or application to scale reliably, you can avoid the introduction of additional complexity, degraded performance, or reduced security as a result of scaling.
Generally, a solution or service’s reliability is influenced by its up time, performance, security, manageability, etc. In order to achieve reliability in the context of scale, take into consideration the following primary design principals.
Modularity
Modularity aims to break a complex component or solution into smaller parts that are less complicated and easier to scale, secure, and manage.
Figure 1: Monolithic architecture vs. modular architecture
Modular design is commonly used in modern application developments. where an application’s software is constructed of multiple and loosely coupled building blocks (functions). These functions collectively integrate through pre-defined common interfaces or APIs to form the desired application functionality (commonly referred to as microservices architecture).
This design principle can also be applied to different components of the solution’s architecture. For example, when building a cloud solution on a single Amazon VPC, it may reach certain scaling limits and make it harder to introduce changes at scale due to the higher level of dependencies. This single complex VPC can be divided into multiple smaller and simpler VPCs. The architecture based on multiple VPCs can vary. For example, the VPCs can be divided based on a service or application building block, a specific function of the application, or on organizational functions like a VPC for various departments. This principle can also be leveraged at a regional level for very high scale global architectures. You can make the architecture modular at a global level by distributing the multiple VPCs across different AWS Regions to achieve global scale (facilitated by AWS Global Infrastructure).
In addition, modularity promotes separation of concerns by having well-defined boundaries among the different components of the architecture. As a result, each component can be managed, secured, and scaled independently. Also, it helps you avoid what is commonly known as “fate sharing,” where a vertically scaled server hosts a monolithic application, and any failure to this server will impact the entire application.
Horizontal scaling
Horizontal scaling, commonly referred to as scale-out, is the capability to automatically add systems/instances in a distributed manner in order to handle an increase in load. Examples of this increase in load could be the increase of number of sessions to a web application. With horizontal scaling, the load is distributed across multiple instances. By distributing these instances across Availability Zones, horizontal scaling not only increases performance, but also improves the overall reliability.
In order for the application to work seamlessly in a scale-out distributed manner, the application needs to be designed to support a stateless scaling model, where the application’s state information is stored and requested independently from the application’s instances. This makes the on-demand horizontal scaling easier to achieve and manage.
This principle can be complemented with a modularity design principle, in which the scaling model can be applied to certain component(s) or microservice(s) of the application stack. For example, only scale-out Amazon Elastic Cloud Compute (EC2) front-end web instances that reside behind an Elastic Load Balancing (ELB) layer with auto-scaling groups. In contrast, this elastic horizontal scalability might be very difficult to achieve for a monolithic type of application.
Leverage the content delivery network
Leveraging Amazon CloudFront and its edge locations as part of the solution architecture can enable your application or service to scale rapidly and reliably at a global level, without adding any complexity to the solution. The integration of a CDN can take different forms depending on the solution use case.
For example, CloudFront played an important role to enable the scale required throughout Amazon Prime Day 2020 by serving up web and streamed content to a worldwide audience, which handled over 280 million HTTP requests per minute.
Go serverless where possible
As discussed earlier in this post, modular architectures based on microservices reduce the complexity of the individual component or microservice. At scale it may introduce a different type of complexity related to the number of these independent components (microservices). This is where serverless services can help to reduce such complexity reliably and at scale. With this design model you no longer have to provision, manually scale, maintain servers, operating systems, or runtimes to run your applications.
For example, you may consider using a microservices architecture to modernize an application at the same time to simplify the architecture at scale using Amazon Elastic Kubernetes Service (EKS) with AWS Fargate.
Figure 3: Example of a serverless microservices architecture
To avoid any major changes at a later stage to accommodate security requirements, it’s essential that security is taken into consideration as part of the initial solution design. For example, if the cloud project is new or small, and you don’t consider security properly at the initial stages, once the solution starts to scale, redesigning the entire cloud project from scratch to accommodate security best practices is usually not a simple option, which may lead to consider suboptimal security solutions that may impact the desired scale to be achieved. By leveraging CDN as part of the solution architecture (as discussed above), using Amazon CloudFront, you can minimize the impact of distributed denial of service (DDoS) attacks as well as perform application layer filtering at the edge. Also, when considering serverless services and the Shared Responsibility Model, from a security lens you can delegate a considerable part of the application stack to AWS so that you can focus on building applications. See The Shared Responsibility Model for AWS Lambda.
Design with security in mind by incorporating the necessary security services as part of the initial cloud solution. This will allow you to add more security capabilities and features as the solution grows, without the need to make major changes to the design.
Design for failure
The reliability of a service or solution in the cloud depends on multiple factors, the primary of which is resiliency. This design principle becomes even more critical at scale because the failure impact magnitude typically will be higher. Therefore, to achieve a reliable scalability, it is essential to design a resilient solution, capable of recovering from infrastructure or service disruptions. This principle involves designing the overall solution in such a way that even if one or more of its components fail, the solution is still be capable of providing an acceptable level of its expected function(s). See AWS Well-Architected Framework – Reliability Pillar for more information.
Conclusion
Designing for scale alone is not enough. Reliable scalability should be always the targeted architectural attribute. The design principles discussed in this blog act as the foundational pillars to support it, and ideally should be combined with adopting a DevOps model.
The Workers team here at Cloudflare has been hard at work shipping a bunch of new features in the last year and we’ve seen some amazing things built with the tools we’ve provided. However, as my uncle once said, with great serverless platform growth comes great responsibility.
One of the ways we can help is by ensuring that deploying and maintaining your Workers scripts is a low risk endeavor. Rotating a set of API keys shouldn’t require risking downtime through code edits and redeployments and in some cases it may not make sense for the developer writing the script to know the actual API key value at all. To help tackle this problem, we’re releasing Secrets and Environment Variables to the Wrangler CLI and Workers Dashboard.
Supporting secrets
As we started to design support for secrets in Workers we had a sense that this was already a big concern for a lot of our users but we wanted to learn about all of the use cases to ensure we were building the right thing. We headed to the community forums, twitter, and the inbox of Louis Grace, business development representative extraordinaire, for some anecdotes about Secrets usage. We also sent out a survey to our existing users to learn about use cases and pain points.
We learned that even though there was already a way to store secrets without exposing them via Workers KV, the solution was not very intuitive, nor did it meet all the needs of our users. Many users didn’t even know we had an interim solution in place. Recognizing that we were not the first platform to encounter this problem, we surveyed the existing landscape of Platform as a Service offerings to get a better sense for what our users would expect of us.
Deciding on a solution
One of the first things we found was that not all environment variables are created equal. While the simplest use case for having a defined environment variable may be storing a piece of text that can be updated no matter where it is referenced in a script, sometimes those variables may have higher stakes associated with them. If you’re storing an API key that controls access to an important system, you may not want to allow anyone with dashboard access to see it, maybe not even the developers themselves.
With this in mind, we had to ensure the feature covered two different use cases: one for storing variables in plain text where you could see the variable being referenced and make edits to it and another where the variable would be encrypted as soon as you save it, never to be seen again. This way, we were able to serve both needs of our users, side by side, without one compromising for the other.
Testing our prototypes
Once we had a fairly good idea of what we wanted to build, we built some prototypes and rough implementations in staging environments so we would be able to perform some usability testing. We wrangled up some developers and observed them as they performed a series of tasks where they were asked to add some secrets and plain-text environment variables, reference them in one of their Workers, and bind their Worker to a Worker KV namespace.
Along the way we also asked questions to understand the developer’s professional background, familiarity with the product, and the use cases they’ve had for using Workers in the past along with any paint points they experienced.
While we were testing the new dashboard interface we also began testing the usability of the Wrangler CLI. We had Wrangler users perform the same tasks as the Workers dashboard users to help us find out if users are expecting different things out of their command-line tooling.
Findings and fixes
Through our testing we were able to make a number of changes before the final release. Some of the smaller changes included things like adjusting the behavior of form fields to ensure users knew which variable would be associated with each value. We also made larger changes like electing to separate the KV namespace bindings from the other environment variables as a way to emphasize that KV namespace bindings are not the keys and values themselves but a reference to a namespace where those keys are stored.
Cina, one of our engineers, put together a proposal to align some of our terminology with the terms that our developers were naturally using to describe their workflow. In Wrangler users were accustomed to referencing their KV namespaces by adding a KV namespace binding so when they came to the Workers dashboard interface and saw a field called “KV Variables” they were often confused, thinking they were adding keys and values to the namespace itself instead of establishing a variable that could be used to reference the namespace. As a fix, we decided to call it a “KV namespace binding” throughout the experience.
Try it out
Environment variables are available now with the Wrangler CLI and in the Workers Dashboard so go ahead and give them a shot today!
Adding a secret with WranglerManaging environment variables and KV bindings in the Workers Dashboard
As we continue to build out the Workers platform we’d love to hear from you. Let us know if you’re interested in participating in user research or just have something to say as we’d love to hear from you.
Color is my day-long obsession, joy and torment – Claude Monet
Over the last two years we’ve tried to improve our usage of color at Cloudflare. There were a number of forcing functions that made this work a priority. As a small team of designers and engineers we had inherited a bunch of design work that was a mix of values built by multiple teams. As a result it was difficult and unnecessarily time consuming to add new colors when building new components.
We also wanted to improve our accessibility. While we were doing pretty well, we had room for improvement, largely around how we used green. As our UI is increasingly centered around visualizations of large data sets we wanted to push the boundaries of making our analytics as visually accessible as possible.
Cloudflare had also undergone a rebrand around 2016. While our marketing site had rolled out an updated set of visuals, our product ui as well as a number of existing web properties were still using various versions of our old palette.
Cloudflare.com in 2016 & Cloudflare.com in 2019
Our product palette wasn’t well balanced by itself. Many colors had been chosen one or two at a time. You can see how we chose blueberry, ice, and water at a different point in time than marine and thunder.
The color section of our theme file was partially ordered chronologically
Lacking visual cohesion within our own product, we definitely weren’t providing a cohesive visual experience between our marketing site and our product. The transition from the nice blues and purples to our green CTAs wasn’t the streamlined experience we wanted to afford our users.
Cloudflare.com 2017 & Sign in page 2017Our app dashboard in 2017
Reworking our Palette
Our first step was to audit what we already had. Cloudflare has been around long enough to have more than one website. Beyond cloudflare.com we have dozens of web properties that are publicly accessible. From our community forums, support docs, blog, status page, to numerous micro-sites.
All-in-all we have dozens of front-end codebases that each represent one more chance to introduce entropy to our visual language. So we were curious to answer the question – what colors were we currently using? Were there consistent patterns we could document for further reuse? Could we build a living style guide that didn’t cover just one site, but all of them?
Screenshots of pages from cloudflare.com contrasted with screenshots from our product in 2017
Our curiosity got the best of us and we went about exploring ways we could visualize our design language across all of our sites.
As we first started to identify the scale of our color problems, we tried to think outside the box on how we might explore the problem space. After an initial brainstorming session we combined the Internet Archive’s Wayback Machine with the Css StatsAPI to build an audit tool that shows how our various websites visual properties change over time. We can dynamically select which sites we want to compare and scrub through time to see changes.
Below is a visualization of palettes from 9 different websites changing over a period of 6 years. Above the palettes is a component that spits out common colors, across all of these sites. The only two common colors across all properties (appearing for only a brief flash) were #ffffff (white) and transparent. Over time we haven’t been very consistent with ourselves.
If we drill in to look at our marketing site compared to our dashboard app – it looks like the video below. We see a bit more overlap at first and then a significant divergence at the 16 second mark when our product palette grew significantly. At the 22 second mark you can see the marketing palette completely change as a result of the rebrand while our product palette stays the same. As time goes on you can see us becoming more and more inconsistent across the two code bases.
As a product team we had some catching up to do improve our usage of color and to align ourselves with the company brand. The good news was, there was no where to go but up.
This style of historical audit gives us a visual indication with real data. We can visualize for stakeholders how consistent and similar our usage of color is across products and if we are getting better or worse over time. Having this type of feedback loop was invaluable for us – as auditing this manually is incredibly time consuming so it often doesn’t get done. Hopefully in the future as it’s standard to track various performance metrics over time at a company it will be standard to be able to visualize your current levels of design entropy.
Picking colors
After our initial audit revealed there wasn’t a lot of consistency across sites, we went to work to try and construct a color palette that could potentially be used for sites the product team owned. It was time to get our hands dirty and start “picking colors.”
Hindsight of course is always 20/20. We didn’t start out on day one trying to generate scales based on our brand palette. No, our first bright idea, was to generate the entire palette from a single color.
Our logo is made up of two oranges. Both of these seemed like prime candidates to generate a palette from.
We played around with a number of algorithms that took a single color and created a palette. From the initial color we generated an array scales for each hue. Initial attempts found us applying the exact same curves for luminosity to each hue, but as visual perception of hues is so different, this resulted in wildly different contrasts at each step of the scale.
We can see the intensity of the colors change across hue in peaks, with yellow and green being the most dominant. One of the downsides of this, is when you are rapidly iterating through theming options, the inconsistent relationships between steps across hues can make it time consuming or impossible to keep visual harmony in your interface.
The video below is another way to visualize this phenomenon. The dividing line in the color picker indicates which part of the palette will be accessible with black and white. Notice how drastically the line changes around green and yellow. And then look back at the charts above.
After fiddling with a few different generative algorithms (we made a lot of ugly palettes…) we decided to try a more manual approach. We pursued creating custom curves for each hue in an effort to keep the contrast scales as optically balanced as possible.
Heavily muted palette
Generating different color palettes makes you confront a basic question. How do you tell if a palette is good? Are some palettes better than others? In an effort to answer this question we constructed various feedback loops to help us evaluate palettes as quickly as possible. We tried a few methods to stress test a palette. At first we attempted to grab the “nearest color” for a bunch of our common UI colors. This wasn’t always helpful as sometimes you actually want the step above or below the closest existing color. But it was helpful to visualize for a few reasons.
Generated palette above a set of components previewing the old and new palette for comparison
Sometime during our exploration in this space, we stumbled across this tweet thread about building a palette for pixel art. There are a lot of places web and product designers can draw inspiration from game designers.
Here we see a similar concept where a number of different palettes are applied to the same component. This view shows us two things, the different ways a single palette can be applied to a sphere, and also the different aesthetics across color palettes.
It’s almost surprising that the default way to construct a color palette for apps and sites isn’t to build it while previewing its application against the most common UI patterns. As designers, there are a lot of consistent uses of color we could have baselines for. Many enterprise apps are centered around white background with blue as the primary color with mixtures of grays to add depth around cards and page sections. Red is often used for destructive actions like deleting some type of record. Gray for secondary actions. Maybe it’s an outline button with the primary color for secondary actions. Either way – the margins between the patterns aren’t that large in the grand scheme of things.
Consider the use case of designing UI while the palette or usage of color hasn’t been established. Given a single palette, you might want to experiment with applying that palette in a variety of ways that will output a wide variety of aesthetics. Alternatively you may need to test out several different palettes. These are two different modes of exploration that can be extremely time consuming to work through . It can be non-trivial to keep an in progress design synced with several different options for color application, even with the best use of layer comps or symbols.
How do we visualize the various ways a palette will look when applied to an interface? Here are examples of how palettes are shown on a palette list for pixel artists.
One method of visualization is to define a common set of primitive ui elements and show each one of them with a single set of colors applied. In isolation this can be helpful. This mode would make it easy to vet a single combination of colors and which ui elements it might be best applied to.
Alternatively we might want to see a composed interface with the closest colors from the palette applied. Consider a set of buttons that includes red, green, blue, and gray button styles. Seeing all of these together can help us visualize the relative nature of these buttons side by side. Given a baseline palette for common UI, we could swap to a new palette and replace each color with the “closest” color. This isn’t always a full-proof solution as there are many edge cases to cover. e.g. what happens when replacing a palette of 134 colors with a palette of 24 colors? Even still, this could allow us to quickly take a stab at automating how existing interfaces would change their appearance given a change to the underlying system. Whether locally or against a live site, this mode of working would allow for designers to view a color in multiple contets to truly asses its quality.
After moving on from the idea of generating a palette from a single color, we attempted to use our logo colors as well as our primary brand colors to drive the construction of modular scales. Our goal was to create a palette that would improve contrast for accessibility, stay true to our visual brand, work predictably for developers, work for data visualizations, and provide the ability to design visually balanced and attractive interfaces. No sweat.
Brand colors showing Hue and Saturation level
While we knew going in we might not use every step in every hue, we wanted full coverage across the spectrum so that each hue had a consistent optical difference between each step. We also had no idea which steps across which hues we were going to need just yet. As they would just be variables in a theme file it didn’t add any significant code footprint to expose the full generated palette either.
One of the more difficult parts, was deciding on a number of steps for the scales. This would allow us to edit the palette in the future to a variety of aesthetics and swap the palette out at the theme level without needing to update anything else.
In the future if when we did need to augment the available colors, we could edit the entire palette instead of adding a one-off addition as we had found this was a difficult way to work over time. In addition to our primary brand colors we also explored adding scales for yellow / gold, violet, teal as well as a gray scale.
The first interface we built for this work was to output all of the scales vertically, with their contrast scores with both white and black on the right hand side. To aid scannability we bolded the values that were above the 4.5 threshold. As we edited the curves, we could see how the contrast ratios were affected at each step. Below you can see an early starting point before the scales were balanced. Red has 6 accessible combos with white, while yellow only has 1. We initially explored having the gray scale be larger than the others.
Early iteration of palette preview during development
As both screen luminosity and ambient light can affect perception of color we developed on two monitors, one set to maximum and one set to minimum brightness levels. We also replicated the color scales with a grayscale filter immediately below to help illustrate visual contrast between steps AND across hues. Bouncing back and forth between the grayscale and saturated version of the scale serves as a great baseline reference. We found that going beyond 10 steps made it difficult to keep enough contrast between each step to keep them distinguishable from one another.
Monitors set to different luminosity & a close up of the visual difference
Taking a page from our game design friends – as we were balancing the scales and exploring how many steps we wanted in the scales, we were also stress testing the generated colors against various analytics components from our component library.
Our slightly random collection of grays had been a particular pain point as they appeared muddy in a number of places within our interface. For our new palette we used the slightest hint of blue to keep our grays consistent and just a bit off from being purely neutral.
Optically balanced scales
With a palette consisting of 90 colors, the amount of combinations and permutations that can be applied to data visualizations is vast and can result in a wide variety of aesthetic directions. The same palette applied to both line and bar charts with different data sets can look substantially different, enough that they might not be distinguishable as being the exact same palette. Working with some of our engineering counterparts, we built a pipeline that would put up the same components rendered against different data sets, to simulate the various shapes and sizes the graph elements would appear in. This allowed us to rapidly test the appearance of different palettes. This workflow gave us amazing insights into how a palette would look in our interface. No matter how many hours we spent staring at a palette, we couldn’t get an accurate sense of how the colors would look when composed within an interface.
Full theme minus green & blues and oranges
Grayscale example & monochromatic Indigo
Analytics charts with blues and purples & analytics charts with blues and greensAnalytics charts with a blues and oranges. Telling the colors of the lines apart is a different visual experience than separating out the dots in sequential order as they appear in the legend.
We experimented with a number of ideas on visualizing different sizes and shapes of colors and how they affected our perception of how much a color was changing element to element. In the first frame it is most difficult to tell the values at 2% and 6% apart given the size and shape of the elements.
Stress testing the application of a palette to many shapes and sizes
We’ve begun to package up some of this work into a web app others can use to create or import a palette and preview multiple depths of accessible combinations against a set of UI elements.
In an effort to make sure everything we are building will be visually accessible – we built a react component that will preview how a design would look if you were colorblind. The component overlays SVG filters to simulate alternate ways someone can perceive color.
Analytics component previewed against 8 different types of color blindness
While this is previewing an analytics component, really any component or page can be previewed with this method.
After quite a few iterations, we had finally come up with a color palette. Each scale was optically aligned with our brand colors. The 5th step in each scale is the closest to the original brand color, but adjusted slightly so it’s accessible with both black and white.
Our preview panel for palette development, showing a fully desaturated version of the palette for reference
A visual representation of how the legacy palette colors would translate to the new scales.
Getting a team of people to agree on a new color palette is a journey in and of itself. By the time you get everyone to consensus it’s tempting to just collapse into a heap and never think about colors ever again. Unfortunately the work doesn’t stop at this point. Now that we’ve picked our palette, it’s time to get it implemented so this bike shed is painted once and for all.
If you are porting an old legacy part of your app to be updated to the new style guide like we were, even the best color documentation can fall short in helping someone make the necessary changes.
We found it was more common than expected that engineers and designers wanted to know what the new version of a color they were familiar with was. During the transition between palettes we had an interface people could input any color and get the closest color within our palette.
There are times when migrating colors, the closest color isn’t actually what you want. Given the scenario where your brand color has changed from blue to purple, you might want to be porting all blues to the closest purple within the palette, not the closest blues which might still exist in your palette. To help visualize migrations as well as get suggestions on how to consolidate values within the old scale, we of course built a little tool. Here we can define those translations and import a color palette from a URL import. As we still have. a number of web properties to update to our palette, this simple tool has continued to prove useful.
We wanted to be as gentle as possible in transitioning to the new palette in usage. While the developers found string names for colors brittle and unpredictable, it was still a more familiar system for some than this new one. We first just added in our new palette to the existing theme for usage moving forward. Then we started to port colors for existing components and pages.
For our colleagues, we wrote out desired translations and offered warnings in the console that a color was deprecated, with a reference to the new theme value to use.
Example of console warning when using deprecated colorExample of how to check for usage of deprecated values
While we had a few bugs along the way, the team was supportive and helped us fix bugs almost as quickly as we could find them.
We’re still in the process of updating our web properties with our new palette, largely prioritizing accessibility first while trying to create a more consistent visual brand as a nice by-product of the work. A small example of this is our system status page. In the first image, the blue links in the header, the green status bar, and the about copy, were all inaccessible against their backgrounds.
cloudflarestatus.com in 2017 & cloudflarestatus.com in 2019 with an accessible green
A lot of the changes have been subtle. Most notably the green we use in the dashboard is a lot more inline with our brand colors than before. In addition we’ve also been able to add visual balance by not just using straight black text on background colors. Here we added one of the darker steps from the corresponding scale, to give it a bit more visual balance.
Notifications 2017 – Black text & Notifications 2018 – Updated palette. Text is set to 1st step in corresponding scale of background colorExample page within our Dashboard in 2017 vs 2019
While we aren’t perfect yet, we’re making progress towards more visual cohesion across our marketing materials and products.
2017
Cloudflare.com & Sign in page 2017Our app dashboard in 2017
2019
Cloudflare homepage & updated Dashboard in 2019
Next steps
Trying to keep dozens of sites all using the same palette in a consistent manner across time is a task that you can never complete. It’s an ongoing maintenance problem. Engineers familiar with the color system leave, new engineers join and need to learn how the system works. People still launch sites using a different palette that doesn’t meet accessibility standards. Our work continues to be cut out for us. As they say, a garden doesn’t tend itself.
If we do ever revisit our brand colors, we’re excited to have infrastructure in place to update our apps and several of our satellite sites with significantly less effort than our first time around.
Resources
Some of our favorite materials and resources we found while exploring this problem space.
Southeast Asia is home to around 650 million people from diverse and comparatively different economic, political and social backgrounds. Many people in the region today rely on super apps like Grab to earn a daily living or get from A to B more efficiently and safely. This means that decisions made have real impact on people’s lives – so how do you know when your decisions are right or wrong?
In this post, I’ll share key customer insights that have guided my decisions and informed my design thinking over the last year whilst working as a product designer for Grab in Singapore. I’ve broken my learnings down into 3 transient areas for thinking about product development and how each one addressed our customers’ needs.
Relevance – does the design solve the customer problem? For example, loss of connectivity which is common in Southeast Asia should not completely prevent a customer from accessing the content on our app.
Inclusivity – does the design consider the full range of customer diversity? For example, a driver waiting in the hot sun for his passenger can still use the product. Inclusive design covers people with a range of perspectives, disabilities and environments.
Engagement – does the design invoke a feeling of satisfaction? For example, building a compelling narrative around your product that solicits a higher engagement.
Under each of these areas, I’ll elaborate on how we’ve built empathy from customer insights and applied these to our design thinking.
But before jumping in, think about the lens which frames any customer experience – the mobile device. In Southeast Asia, the commonly used devices are inexpensive low-end devices made by OPPO, Xiaomi, and Samsung. Knowing which devices customers use helps us understand potential performance constraints, different screen resolutions, and custom Android UIs.
Designing for relevance
Shopping mall in Medan, Indonesia
Connectivity
In Southeast Asia, it’s not too hard to find public WiFi. However, the main challenge is finding a reliable network. Take this shopping mall in Medan, Indonesia. The WiFi connectivity didn’t live up to the modern infrastructure of the building. The locals knew this and used mobile data over spotty and congested connectivity. Mobile data is the norm for most people and 4G reach is high, but the power of the connections is relatively low.
Building empathy
To genuinely design for customers’ needs, designers at Grab regularly get out the office to understand what people are doing in the real world. But how do we integrate empathy and compassion into the design process? Throughout this article, I’ll explain how the insights we gathered from around Southeast Asia can inform your decision making process.
For simulating a loss of connectivity, switch to airplane mode to observe current UI states and limitations. If you have the resources, create a 2G network to compare how bandwidth constraints page loading speeds.Network Link Conditioner for Mac and iOS or Lighthouse by Chrome DevTools can replicate a slower network.
Design implications
This diagram is from Scott Hurff’s book, Designing Products People Love. The book is amazing, but if you don’t have the time to read it, this article offers a quick overview.
Scott Hurff’s UI Stack
An ideal state (the fully loaded experience) is primarily what a lot of designers think about when problem-solving. However, when connectivity is a common customer pain-point, designers at Grab have to design for the less desirable: Blank, Loading, Partial, and Error states in tandem with all the happy paths. Latency can make or break the user experience, so buffer wait times with visual progress to cushion each millisecond. Loading skeletons when you open Grab, act as momentary placeholders for content and reduce the perceived latency to load the full experience.
A loss of connectivity shouldn’t mean the end of your product’s experience. Prepare connectivity issues by keeping screens alive through intuitive visual cues, messaging, and cached content.
Device type and condition
In Southeast Asia, people tend to opt for low-end or hand-me-down devices that can sometimes have cracked screens or depleting batteries. These devices are usually in circulation much longer than in developed markets, and the device’s OS might not be the latest version because of the perceived effort or risk to update.
A driver’s device taken during research in Indonesia
Building empathy
At Grab, we often use a range of popular, in-market devices to understand compatibility during the design process. Installing mainstream apps to a device with a small screen size, 512MB internal memory, low resolution and depleting battery life will provide insights into performance. If these apps have lite versions or Progressive Web Apps (PWA), try to understand the trade-offs in user experience compared to the parent app.
Grab’s passenger app on the left versus the driver app
Design implications
Design for small screens first to reduce the chances of design debt later in the development lifecycle. For interactive elements, it’s important to think about all types of customers that will use the product and in what circumstances. For Grab’s driver-partners who may have their devices mounted to the dashboard, tap targets need to be larger and more explicit.
Similarly, color contrast will vary depending on screen resolution and time of the day. Practical tests involve dimming the screen and standing near a window in bright sunshine (our HQ is in Singapore which helps!). To further improve accessibility, use a tool like Sketch’s Stark plugin to understand if contrast ratios are accessible to visually impaired customers. A general rule is to aim for higher contrast between essential UI components, text and interactive affordances.
Fancy transitions can look great on high-end devices but can appear choppy or delayed on older and less performant phones. Aim for simple animations to offer a more seamless experience.
Passenger verification to improve safety
Day-to-day budgeting
Many people in Southeast Asia earn a daily income, so it’s no surprise that prepaid mobile is more common over a monthly contract. This mindset to ration on a day-to-day basis also extends itself to other essentials like washing powder and nappies. Data can be an expensive necessity, and customers are selective over the types of content that will consume a daily or weekly budget. Some customers might turn off data after getting a ride, and not turn it back on until another Grab service is required.
Building empathy
Rationing data consumption daily can be achieved through not connecting to WiFi, or a more granular way is to turn off WiFi and use an app like Google’s Datally on Android to cap data usage. Starting low, around 50MB per day will increase your understanding around the data trade-offs you make and highlight the apps that require more data to perform certain actions.
Design implications
Where possible, avoid using video when SVG animations can be just as effective, scalable and lightweight. For Grab’s passenger verification flow, we decided to move away from a video tutorial and keep data consumption to a minimum through utilising SVG animations. When a video experience is required, like Grab’s feed on the home screen, disabling autoplay and clearly distinguishing the media as video allowed customers to decide on committing data.
Design for inclusivity
Mobile-only
The expression “mobile-first” has been bounced around for the last decade, but in Southeast Asia, “mobile-only” is probably more accurate. Most customers have never owned a tablet or laptop, and mobile numbers are more synonymous with a method of registration over an email address. In the region, people rely more on social media and chat apps to understand broadcast or published news reports, events and recommendations. Customers who sign up for a new Grab account, prefer phone numbers and OTP (one-time-password) registration over providing an email address and password. And anecdotally from interviews conducted at Grab, customers didn’t feel the need for email when communication can take place via SMS, WhatsApp, or other messaging apps.
Building empathy
At Grab, we apply design thinking from a mobile-only perspective for our passenger, merchant, and driver-partner experiences by understanding our customers’ journeys online and off. These journeys are synthesized back in the office and sometimes recreated with video and physical artifacts to simulate the customer experience. It’s always helpful to remove smartwatches, put away laptops and use an in-market device that offers a similar experience to your customers.
Design implications
When onboarding new customers, offer a relevant sign-in method for a mobile-only customer, like phone number and social account registration. Grab’s passenger sign-up experience addresses these priorities with phone number first, social accounts second.
Grab’s sign-in screen
PC-era icons are also widely misunderstood by mobile-only customers, so avoid floppy disks to imply Save, or a folder to Change Directory as these offer little symbolic meaning. When icons are paired with text, this can often reinforce meaning and quicken recognition. For example, a pencil icon alone can be confusing, so adding the word “Edit” will provide more clarity.
Nightfall in Yogyakarta, Indonesia
Diversity and safety
This photo was taken in Yogyakarta, Indonesia. In the evening, women often formed groups to improve personal safety. In an online environment, women often face discrimination, harassment, blackmail, cyberstalking, and more. Minorities in emerging markets are further marginalised due to employment, literacy, and financial issues.
Building empathy
Southeast Asia has a very diverse population, and it’s important to understand gender, ethnic, and class demographics before you plan any research.Research recruitment at Grab involves working with local vendors to recruit diverse groups of customers for interviews and focus groups. When spending time with customers, we try to understand how diversity and safety factors contribute to the experience of the product.
If you don’t have the time and resources to arrange face-to-face interviews, I’d recommend this article for creating a survey: Respectful Collection of Demographic Data
Design for inclusivity
Allow people to control how they represent their identities through pseudonym names and avatars. But does this undermine trust on the platform? No, not really. Credit card registration or more recently, Grab’s passenger and driver selfie verification feature has closed the loop on suspect accounts whilst maintaining everyone’s privacy and safety.
On the visual design side, our illustration and content guide incorporates diverse representations of ethnic backgrounds, clothing, physical ability, and social class. You can see examples in the app or through our Dribbble page. For user-generated content, allow people to report and flag abusive material. While data and algorithms can do so much, facts and ethics cannot be policed by machine learning.
Language
In Southeast Asia and other emerging markets, customers may set their phone to a language which they aspire to learn but may not fully comprehend. Swipe, tap, drag, pinch, and other specific terms relating to interactions might not easily translate into the local language, and English might be the preferred language regardless of comprehension. It’s surprisingly common to attend an interview with a translator but the device’s UI is set to English.
A Grab pick-up point taken in Medan, Indonesia
Building empathy
If your app supports multiple languages, try setting your phone to a different language but know how to change it back again! At Grab, we test design robustness by incorporating translated text strings into our mocks. Look for visual cues to infer meaning since some customers might be illiterate or not fully comprehend English.
Grab’s Safety Centre in different languages
Design for different languages, formats and visual cues
To reduce design debt later on, it’s a good idea to start with the smallest screen size and test the most vulnerable parts of the UI with translated text strings. Keep in mind, dates, times, addresses, and phone numbers may have different formats and require special attention. You can apply multiple visual cues to represent important UI states, such as a change in colour, shape and imagery.
Design for engagement
Sharing
From our research studies, word-of-mouth communication and consuming viral content via Instagram or Facebook was more popular than trawling through search page results. The social aspect is extended to the physical environment where devices can sometimes be shared with more than one person, or in some cases, one mobile is used concurrently with more than one user at a time. In numerous interviews, customers talk about not using biometric authentication so that family members can access their devices.
Building empathy
To understand the layers of personalisation, privacy and security on a device, it’s worth loaning a device from your research team or just borrow a friend’s phone (if they let you!). How far do you get before you require biometric authentication or a PIN to proceed further? If you decide to wipe a personal device, what steps can you miss out from the setup, and how does that affect your experience post setup?
Offline to Online: GrabNow connecting with driver
Design for sharing
If necessary, facilitate device sharing through easy switching of accounts, and enable people to remove or hide private content after use. Allow content to be easily shared for both online and offline in-person situations. Using this approach, GrabNow allows passengers to find and connect with a nearby driver without having to pre-book and wait for a driver to arrive. This offline to online interaction also saves data and battery for the customer.
Support and tutoring
In Southeast Asia, people find troubleshooting issues from inside a help page troublesome and generally prefer human assistance, like speaking to someone through a call centre. The opportunity for face-to-face tutoring on how something works is often highly desired and is much more effective than standard onboarding flows that many apps use. From the many physical phone stores, it’s not uncommon for people to go and ask for help or get apps manually installed onto their device.
Building empathy
Apart from speaking with your customers regularly, always look through the Play and App Store reviews for common issues. Understand your customers’ problems and the jargon they use to describe what happened. If you have a customer support team, the tickets created will be a key indicator of where your customers need the most support.
Help and Feedback Design implications
Make support accessible through a variety of methods: online forms, email, and if possible, allow customers to call in. With in-app or online forms, try to use drop-downs or pre-populated quick responses to reduce typing, triage the type of support, and decrease misunderstanding when a request comes in. When a customer makes a Grab transport booking for the first time, we assist the customer through step-by-step contextual call-outs.
Local aesthetics
This photo was taken in Medan, Indonesia, on the day of an important wedding. It was impressive to see hundreds of handcrafted, colourful placards lining the streets for miles, but maybe more admirable that such an occasion was shared with the community and passers-by, and not only for the wedding guests.
A wedding celebration flower board in Medan, Indonesia
These types of public displays are not exclusive to weddings in Southeast Asia, vibrant colours and decorative patterns are woven into the fabric of everyday life, indicative of a jovial spirit that many people in the region possess.
Building empathy
What are some of the immediate patterns and surfaces that exist in your workspace? Looking around your immediate environment can provide an immediate assessment of visual stimuli that can influence your decisions on a day-to-day basis.
Wall space can be incredibly valuable when you can display photos from your research trip, or find online inspiration to recreate some of the visual imagery from your target markets. When speaking with your customers, ask to see mobile wallpapers, and think about how fashion could also play a role in determining an aesthetic choice. Lastly, take time out when on a research trip to explore the streets, museums, and absorb some of the local cultures.
Design to delight and surprise customers
Capture local inspiration on research trips to incorporate into visual collections that can be a source of inspiration for colour, imagery, and textures. Find opportunities in your product to delight and engage customers through appropriate images and visuals. Grab’s marketing consent experience leverages illustrative visuals to help customers understand the different categories that require their consent.
For all our markets, we work with local teams around culturally sensitive visuals and imagery to ensure our content is not offensive or portrays the wrong connotations.
My top 5 for guerrilla field research
If you don’t have enough time, stakeholder buy-in or budget to do research, getting out of the office to do your own is sometimes the only answer. Here are my top 5 things to keep in mind.
Don’t jump in. Always start with observation to capture customers’ natural behaviour.
Sanity check scripts. Your time and customers’ time is valuable; streamline your script and prepare for u-turns and potential Facebook and Instagram friend requests at the end!
Ask the right people. It’s difficult to know who wants to or has time for your 10-minute intercept. Look for individuals sitting around and not groups if possible (group feedback can be influenced by the most vocal person).
Focus on the user.Never multitask when speaking to the user. Jotting notes on an answer sheet is less distracting than using your mobile or laptop (and less dangerous in some places!). Ask permission to record audio if you want to avoid notetaking all together but this does create more work later on.
Use insights to enrich understanding. Insights are not trends and should be used in conjunction with quantitative data to validate decision making.
Feel inspired by this article and want to learn more? Grab is hiring across Southeast Asia and Seattle. Connect with me on LinkedIn or Twitter @PhilipMadeley to learn more about design at Grab.
In the world of design jargon, meet “service design”. Unlike other objectives in design to simplify and clarify, service design is not about building singular touchpoints. Rather, it is about bringing ease and harmony into large and often complex ecosystems.
Think of the human body. There are organ systems such as the cardiovascular, respiratory, musculoskeletal, and nervous systems. These systems perform key functions that we see and feel everyday, like breathing, moving, and feeling.
Service design serves as the connective tissue that brings the amazing systems together to work in harmony. Much of the work done by the service design team at Grab revolves around connecting online experiences to the offline world, connecting challenges across a complex ecosystem, and enabling effective collaboration across cross-functional teams.
Connecting online experiences to the offline world
We explore holistic experiences by visualizing the connections across features, both through the online-offline as well as internal-external interactions. At Grab, we have a collection of (very cool!) features that many teams have worked hard to build. However, equally important is how a person arrives from feature to feature seamlessly, from the app to their physical experiences, as well as how our internal teams at Grab support and execute behind-the-scenes throughout our various systems.
For example, placing an order on GrabFood requires much more work than sending information to the merchant through the Grab app. How might Grab
allocate drivers effectively,
support unhappy paths with our customer support team,
resolve discrepancies in our operations teams, and
store this data in a system that can continue to expand for future uses to come?
Connecting challenges across a complex ecosystem
Sometimes, as designers, we might get too caught up in solving problems through a singular lens, and overlook how it affects the rest of the system. Meanwhile, many problems are part of a connected network. Changing one part of the problem can potentially affect other parts of the network.
Considering those connections, or the “stuff in between”, makes service design a holistic practice – crossing boundaries between teams in search of a root cause, and considering how treating one problem might affect other parts of the network.
If this happens, then what?
Which point in the system is easiest to fix and has the greatest impact?
For example, if we want to introduce a feature for drivers to report restaurant closings, how might Grab
Ensure the report is accurate?
Deal with accidental closings or fraud?
Use that data for our operations team to make decisions?
Let drivers know when their report has led to a successful action?
Last but not least, is this the easiest point in the system to fix restaurant opening inaccuracies, or should this be tackled through an operational fix?
Facilitating effective collaborations in cross-functional teams
Finally, we believe in the power of a participatory design process to unlock meaningful, customer-centric solutions. Working on the “stuff in between” often puts the service design team in the thick of alignment of priorities, creation of a common vision, and coherent action plans. Achieving this requires solid facilitation and processes for cross-team collaboration.
Who are the right stakeholders and how do we engage?
How does an initiative affect stakeholders, and how can they contribute?
How can we create visual processes that allow diverse stakeholders to have a shared understanding and co-create solutions?
What’s the ultimate goal? A Harmonious Backstage for a Delightful Customer Experience
By facilitating cross-functional collaborations and espousing a whole-of-Grab approach, the service design team at Grab helps to connect the dots in an interconnected ‘super-app’ service ecosystem. By empathising with our users, and having a deep understanding of how different parts of the Grab ecosystem affect one another, we hope to unleash the full power of Grab to deliver maximum value and delight to serve our users.
Sherizan Sheikh is a Design Lead at Grab Ventures, an incubation arm that looks at experiences beyond ride-hailing, for example, groceries, healthcare and autonomous deliveries.
Grab Ventures is where exciting initiatives are birthed in Grab. From strategic partnerships like GrabFresh, Grab’s first on-demand grocery delivery service, to exploratory concepts such as on-demand e-scooter rentals, there has never been a more exciting time to be in this unique space.
In my role as Design Lead for Grab Ventures, I juggle between both sides of the coin and whether it’s a partnership or exploratory concept, I ask myself:
“How do I know who my customers are, and what are their pain points?”
So I like to answer that question by starting with traditional research methods like desktop research and surveys, just to name a few. At Grab, it’s usually not enough to answer those questions.
That said, I find that some of the best insights are formed from immersion trips.
In one sentence, an immersion trip is getting deeply involved in a user’s life by understanding him or her through observation and conversation.
Our CEO, Anthony Tan, picking items for a customer, on an immersion trip.
For designers and researchers in Singapore, it plucks you out of your everyday reality and drops you into someone else’s, somewhere else, where 99.9% of the time, everything you expect and anticipate gets thrown out in a matter of minutes. I’ve trained myself to switch my mindset, go back to basics, and learn (or relearn) everything I need to know about the country I’d be visiting even if I’ve been there countless times.
Fun fact: In 2018, I spent about 100 days in Indonesia. That means roughly 30% of 2018 was spent on the ground, observing, shadowing, interviewing people (and getting stuck in traffic) and loving it.
Why immersions?
Understanding one’s country, culture and her people is something that gets me excited as I continuously build empathy visit upon a visit, interview after interview.
I remembered one time during an immersion trip, we interviewed locals at different supermarkets to learn and understand their motivations: why they prefer to visit the supermarket vs purchasing them online. One of our hypotheses was that the key motivator for Indonesians to buy groceries online must be down to convenience. We were wrong.
It boiled down to 2 key factors.
1) Freshness: We found out that many of the locals still felt the need to touch and feel the products before they buy. There were many instances where they felt the need to touch the fresh produce on the shelves, cutting off a piece of fruit or even poking the eyes of the fish to check its freshness.
2) Price: The de-facto for most locals as they are price-sensitive. Every decision was made with the price tag in mind. They are willing to travel far, spend the time to go through the traffic just to get to the wet or supermarket that offers the lowest prices and value for money. Through observations, while shadowing at a local wet market, we also found something interesting. Most of the wet market vendors are getting WhatsApp messages from their regular customers seeking fresh produce and making orders. The transactions were mostly via e-wallets or bank transfers. The vendors then packed them and get bike drivers to help with the delivery. I couldn’t have gotten this valuable information if I was just sitting at my desk.
An immersion trip is an excellent opportunity to learn about our customers and the meaning behind their behaviours. There is only so much we can learn from white papers and reports. As soon as you are in the same environment as your users, seeing your users do everyday errands or acts, like grocery shopping or hopping on a bike, feeling their frustrations and experiencing them yourself, you’ll get so much more fruitful and valuable insights to help shape your next product. (Or even, improve an existing one!)
My colleagues trying to blend in.
Now that I’ve sold you on this idea, here are some tips on how to plan and execute effective immersion trips, share your findings and turn them into actionable insights for your team and stakeholders.
Pro tip #1 – Generate a hypothesis
Generating a hypothesis is a valuable exercise. It enables you to focus on the “wants vs. needs” and to validate your assumptions beyond desktop research. Be sure to get your core team members together, including Business, Ops and Tech, to generate a hypothesis. I’ll give an example below.
Pro tip #2 – Have short immersion days with a debrief at the end for everyone
Scheduling really depends on your project. I have planned for trips that are from a few hours to up to fourteen days long. Be sure not to have too many locations in a single day and spread them out evenly in case there are unexpected roadblocks such as traffic jams that might contribute to rushed research.
Do include Brief and Debrief sessions into your schedule. I’d recommend shorter immersion days so that you have enough energy left for the critical Debrief session at the end of the day. The structure should be kept very simple with focus of collating ALL observations from the contextual inquiries you did into writing. It’s actually up to you how you structure your document.
Be prepared for the unexpected.
Pro tip #3 – Recce locations beforehand
Once you’ve nailed down the locations, it is essential for you to get a local resident to recce the places first. In Southeast Asia, more often than so would you realise that information found online is unreliable and misleading, so doing a physical recce will save you a lot of time.
I had experienced a few time-wasting incidents when we did not expect specific locations to be what was intended. For example, while on our grocery-run, we wanted to visit a local wet market that opens only very early in the morning. We got up at 5 am, drove about 1.5 hours and only to realize the wet market is not open to the public and we eventually got chased out by the security guards.
Pro tip #4 – Never assume a customer’s journey
(even though you’ve experienced it before as a customer)
One of the most important processes throughout a product life cycle is to understand a customer’s journey. It’s particularly important to understand the journey if we are not familiar with the actual environment. Take our GrabFresh service as an example. It’s a complex journey that happens behind the scenes. Desktop research might not be enough to fully validate the journey hence, an immersion trip that allows you to be on the field will ensure you go through the lifecycle of the entire process to observe and note all the phases that happen in the real environment.
Pro tip #5 – Be 100% sure of your open-ended, non-leading questions that will validate your hypothesis.
This part is an essential piece to the quality of your immersion outcome. Not spending enough time crafting or vetting the questions thoroughly might end up with skewed insights and could jeopardise your entire immersion. Please be sure your questions links up with your hypothesis and provide backup questions to support your assumptions.
For example, don’t suggest answers in questions.
Bad: “Why do you like this supermarket? Cheap? Convenient?”
Good: “Tell me why you chose this particular supermarket?”
Pro tip #6 – Break into smaller groups of 2 to 3. Dress comfortably and like a local. Keep your expensive belongings out of sight.
During my recent trip, I visited a lot of places that unknowingly had very tight security. One of the mistakes I made was going as a group of 6 (foreign-looking, and – okay – maybe a little touristy with our appearances and expensive gadgets).
Out of nowhere, once we started approaching customers for interviews, and snapping photos with our cameras and phones, we could see the security teams walking towards us. Unfortunately, we were asked to leave the premises when we could not provide a permit.
As luck would have it, we eyed a few customers and approached them when they were further away from the original location. Success!
Pro tip #7 – Find translators with people skills and interview experience.
Most of my immersion trips are overseas, where English is not the main language. I get annoyed at myself for not being able to interview non-English speaking customers. Having seasoned, outgoingtranslators does help a lot! If you feel awkward standing around waiting for a translated answer, feel free to step away and let the translator interview the customer without feeling pressured. Be sure it’s all recorded for transcription later.
Insights + Action plan= Strategy
Findings are significant, it’s the basis of everything that you do while you are in immersion. But what’s more important is the ability to connect those dots and extract value from them. It’s similar to how we can amass tons of raw data but entirely pointless if nothing is done with it.
A good strategy usually comes from good insights that are actionable.
For example, we found out that a % of customers that we interviewed did not know that GrabFresh has a pool of professional shoppers who pick grocery items for customers. Their impression was that a driver would receive their order, drive to the location, get out of their vehicle and go into the store to do the picking. That’s not right. It hinders customers from making their first purchase through the app.
Observing a personal shopper interacting with Grab driver-partner.
So, in this case, our hypothesis was: if customers are aware of personal shoppers, the number of orders will increase.
This opinion was a shared one that may have had an impact on our business. So we needed to take this back to the team, look at the data, brainstorm, and come up with a great strategy to improve the perception and its impact on our business (whether good or bad).
Wrapping Up
After a full immersion, it is always important to ask each and every member of some of these questions:
“What went well? What did you learn?”
“What can be improved? If you could change one thing, what would it be?”
I’d usually document them and have a reflection for myself so that I can pick up what worked, what didn’t and continue to improve for my next immersion trip.
Following the Double Diamond framework, immersion trips are part of the “Discover”phase where we gather customer insights. Typically, I follow up with a Design sprint workshop where we start framing the problems. This is where we have a session where experts and researchers share their domain knowledge and research insights uncovered from various methodologies including immersions.
Then, hopefully, we will have some actionable changes that we can execute confidently.
So, good luck, bring some sunblock and see you on the ground!
If you’d like to connect with Sherizan, you can find him on LinkedIn.
Stuck in traffic in a Grab ride? Pass the time by opening your Grab app and checking out the Feed – just scroll down! You’ll find widgets for games, polls, videos, news and even food recommendations!
Beyond serving your everyday needs, we want to provide our users with information that is interesting, useful and relevant. That’s why we’re always coming up with new widgets.
Building each widget takes close collaboration across multiple different teams – from Product Management to Design, Engineering, Behavioral Science, and Data Science and Analytics. Sounds like a lot of people, doesn’t it? But you’ll be surprised to hear that this behind-the-scenes collaboration works rapidly, usually in the span of one month! Which means we’re often moving from ideation phase to product release in just a few weeks.
This fast-and-furious process is anchored on one word – “customer-centric”. And that’s how it all began with our “Travel Trends Widget” – a widget that provides passengers with an overview of historical supply and demand trends for their current location and nearby time periods.
Because we had so much fun developing this widget, we wanted to write a blog post to share with you what we did and how we did it!
Inspiration: Where it all started
Transport demand can be rather lumpy. Owing to organic patterns (e.g. office hours), a lot of passengers tend to request for cars around the same time. In periods like this, the increase in demand could outpace the arrival of driver supply, increasing the waiting time for passengers.
Our goal at Grab is to make sure people get a ride when they want it and at the price they want, so we got to thinking about how we can ease this friction by leveraging our treasure trove – Big Data! – to help our passengers better plan their trips.
As we were looking at the data, we noticed that there is a seasonality to demand and supply: at certain times and days, imbalances appear, peak and disappear, and the process repeats itself. Studies say that humans in general, unless shown a compelling reason or benefit for change, are habitual beings subject to inertia. So we set out to achieve exactly that: To create a widget to surface information to our passengers that may help them alter their decisions on when they choose to book a ride, thereby redistributing some of the present peak demands to periods just before and after peak – also known as “peak shifting the demand”!
While this widget is the first-of-its-kind in the ride-hailing industry, “peak-shifting” was actually coined and introduced long ago!
As you can see from this post from the London Transport Museum (Source: Transport for London), London tube tried peak-shifting long before anyone else: Original Ad from 1928 displayed on the left, and Ad from 2015 displayed on the right, comparing the trends to 1928.
You may also have seen something similar at the last hotel you stayed at. Notice here a poster in an elevator at a Beijing hotel, announcing the best times to eat breakfast in comfort and avoid the crowd. (Photo credits to Prashant, our Product Manager, who saw this on holiday.)
How the Travel Trends Widget works
To apply “peak-shifting” and help our users better plan their trips, we decided to dig in and leverage our data. It was way more complex than we had initially thought, as market conditions could be different on different days. This meant that generic statements like “5PM-8PM are peak hours and prices will be hight” would not hold true. Contrary to general perception, we observed that even during peak hours, there are buckets of time when there is no surge or low surge.
For instance, plot 1 and plot 2 below shows how a typical Monday and Tuesday surge looks like in a given month respectively. One of the key insights is that the surge trends during peak hour is different on Monday from Tuesday. It reinforces our initial hypothesis that every day is unique.
So we used machine learning techniques to build a forecasting widget which can help our users and give them the power to plan their trips beforehand. This widget is able to provide the pricing trends for the next 2 hours. So with a bit of flexibility, riders can ride the tide!
So how exactly does this widget work?!
It pulls together historically observed imbalances between supply and demand, for the consumer’s current location and nearby time periods. Aggregated data is displayed to consumers in easily interpreted visualisations, so that they can plan to leave at times when there are more supply, and with potentially more savings for fares.
How did we build the widget? Loop, agile working process, POC & workstream
Widget-building is an agile, collaborative, and simultaneous process. First, we started the process with analysis from Product Analytics team, pulling out data on traffic trends, surge patterns, and behavioral insights of both passengers and drivers in Singapore.
When we noticed the existence of seasonality for each day of the week, we came up with more precise analytical and business questions to dig deeper into the data. Upon verification of hypotheses, we decided that we will build a widget.
Then joined the Behavioural Science, UX (User Experience) Design and the Product Management teams, who started giving shape to the problem we are solving. Our Behavioural Scientists shared their expertise on how information, suggestions and choices should be presented to enable easy assimilation and beneficial action. Daily whiteboarding breakouts, endless back-and forth conversations, and a healthy amount of challenge-and-accept culture ensured that we distilled the idea down to its core. We then presented the relevant information with just the right level of detail, and with the right amount of messaging, to allow users to take the intended action i.e. shift his/her demand outside of peak periods if possible.
Our amazing regional Copywriting team then swung in to put our intent into words in 7 different languages for our users across South-East Asia. Simultaneously, our UX designers and Full-stack Engineers started exploring the best visual components to communicate data on time trends to users. More on this later, but suffice to say that plenty of ideas were explored and discarded in a collaborative process, which aimed to create something that’s intuitive and engaging while being robust and scalable to work across all types of devices.
While these designs made their way up to engineering, the Data Science team worked on finding the most rigorous method to deduce the historical trend of surge across all our cities and areas, and time periods within them. There were discussions on how to best store and update this data reliably so that the widget itself can access it with great performance.
Soon after, we went into the development process, and voila! We had the first iteration of the widget ready on our staging (internal testing) servers in just 2 weeks! This prototype was opened up to the core team for influx of feedback.
And just two weeks later, the widget made its way to our Singapore and Jakarta Feeds, accessible to the world at large! Feedback from our users started pouring in almost immediately (thanks to the rich feedback functionality that comes with each widget), ranging from great to sometimes not-so-great, and we listened to all of it with a keen ear! And thus began a new cycle of iterations and continuous improvement, more of which we will share in a subsequent post.
In the trenches with the creators: How multiple teams got together to make this come true
Various disciplines within our cross functional team came together to whip out this widget by quipping their expertise to the end product.
Using Behavioural Science to simplify choices and design good outcomes
Behavioural Science helped to explore many facets of consumer behaviour in order to plan and design the widget: understanding how consumers think and conceptualizing a widget that can be easily understood and used by the consumers.
While fares are governed entirely by market conditions, it’s important for us to explain the economics to customers. As a customer-centric company, we aim to make the consumers feel like they own their decisions, which they can take based on full information. And this is the role of Behavioral Scientists at Grab!
In guiding the customers through the information, Behavioural Science team had the following three objectives in mind while building this Travel Trends widget:
Offer transparency on the fares: By exposing our historic surge levels for a 4 hour period, we wanted to ensure that the passenger is aware of the surge levels and does not treat the fare as a nasty shock.
Give information that helps them plan: By showing them surge levels for the future 2 hours, we wanted to help customers who have the flexibility, plan for a better time, hence, giving them the power to decide based on transparent information.
Provide helpful tips: Every bar gives users tips on the conditions at that time and the immediate future. For instance, a low surge bar, followed by a high surge bar gives the tip “Psst… Leave now, It might get busy later!”, helping people understand the graph better and nudging them to take an action. If you are interested in saving fares, may we suggest tapping around all the bars to reveal the secret pro-tips?
Designing interfaces that lead to consumer success by abstracting complexity
Design team is the one behind the colors and shapes that make up the widget that you see and interact with! The team took inspiration from Google’s Popular Times.
Source/Credits: Google Live Popular Times
Right from the offset, our content and product teams were keen to surface additional information and actions with each bar to keep the widget interactive and useful. One of the early challenges was to arrive at the right gesture that invites the user to interact and intuitively navigate the bars on the widget but also does not conflict with other gestures (eg scrolling and scrubbing) that the user was pre-trained to perform on the feed. We found out that tapping was simultaneously an unused and yet intuitive gesturethat we could use for interaction with the bars.
We then went into rounds of iteration on the visual design of the widget. In this process, multiple stakeholders were involved ranging from Product to Content to Engineering. We had to overcome a number of constraints i.e. the limited canvas of a widget and the context of a user when she is exploring the feed. By re-using existing libraries and components, we managed to keep the development light and ship something fast.
Dozens of revisions and four iterations later, we landed with a design that we felt equipped the feature for its user-facing goal, and did so in a manner which was aesthetically appealing!
And finally we managed to deliver on the feature’s goal, by surfacing just the right detail of information in a manner that is intuitive yet effective to peak-shift demand.
Bringing all of this to fruition through high performance engineering
Our Development Engineering team was in charge of developing the widget and making it available to our users in just a few weeks’ time – materialising the work of the other teams.
One of their challenges was to find the best way to process the vast amount of data (millions of database entries) so it can be visualized simply as bar charts. Grab’s engineers had to achieve this while making sure performance is as resilient as possible.
There were two options in doing this:
a) Fetch the data directly from the DB for each API call; or
b) Store the data in an in-memory data structure on a timely basis, so when a user calls the API will no longer have to hit the DB.
After considering that this feature will likely expect a lot of traffic thus high QPS, we decided that the former option would be too costly. Ultimately, we chose the latter option since it is more performant and more scalable.
At the frontend, the challenge was to cater to the intricate request from our designers. We use chart libraries to increase our development speed, and not all of the requirements were readily supported by these libraries.
For instance, let’s say this library makes visualising charts easy, but not so much for customising them. If designers wanted to have an average line in a dotted form, the library did not support this so easily. Also, the moving arrow pointers as you move between bar chart, changing colors of the bars changes when clicked – all required countless CSS tweaks.
Closing the product loop with user feedback and data driven insights
One of the most crucial parts of launching any product is to ensure that customers are engaging with the widget and finding it useful.
To understand what customers think about the widget, whether they find it useful and whether it is helping them to plan better, we delved into the huge mine of clickstream data.
We found that 1 in 3 users who make a booking everyday interact with the widget. And of these people, more than 70% users have given positive rating for the widget. This validates our initial hypothesis that if given an option, our customers will love the freedom to plan their trips and inculcate more transparent ecosystem.
These users also indicate the things they like most about the widget. 61% of users gave positive rating for usefulness, 20% were impressed by the design (Kudos to our fantastic designer Ajmal!!) and 13% for usability.
Beyond internal data, our widget made some rounds on social media channels. For Example, here is screenshot of what our users have to say on Twitter.
We closely track these metrics on user engagement and feedback to ensure that we keep improving and coming up with new iterations which helps us to serve our customers in a better way.
Conclusion
We hope you enjoyed reading about how we went from ideation, through iterations to a finished widget in the hands of the user, all in 1 month! Many hands helped along the way. If you are interested in joining this hyper-proactive problem-solving team, please check out Grab’s career site!
And if you have feedback for us, we are here to listen! While we cannot be happier to see some positive reaction from the public, we are also thrilled to hear your suggestions and advice. Please leave us a memo using the Widget’s comment function!
Epilogue
We just released an upgrade to this widget which allows users to set reminders and be notified about availability of good fares in a time period of their choosing. We will keep a watch and come knocking! Go ahead, find the widget on your Grab feed, set a reminder and save on fares on your next ride!
One of the most common enquiries I receive at Pi Towers is “How can I get my hands on a Raspberry Pi Oracle Weather Station?” Now the answer is: “Why not build your own version using our guide?”
Tadaaaa! The BYO weather station fully assembled.
Our Oracle Weather Station
In 2016 we sent out nearly 1000 Raspberry Pi Oracle Weather Station kits to schools from around the world who had applied to be part of our weather station programme. In the original kit was a special HAT that allows the Pi to collect weather data with a set of sensors.
The original Raspberry Pi Oracle Weather Station HAT
We designed the HAT to enable students to create their own weather stations and mount them at their schools. As part of the programme, we also provide an ever-growing range of supporting resources. We’ve seen Oracle Weather Stations in great locations with a huge differences in climate, and they’ve even recorded the effects of a solar eclipse.
Our new BYO weather station guide
We only had a single batch of HATs made, and unfortunately we’ve given nearly* all the Weather Station kits away. Not only are the kits really popular, we also receive lots of questions about how to add extra sensors or how to take more precise measurements of a particular weather phenomenon. So today, to satisfy your demand for a hackable weather station, we’re launching our Build your own weather station guide!
Fun with meteorological experiments!
Our guide suggests the use of many of the sensors from the Oracle Weather Station kit, so can build a station that’s as close as possible to the original. As you know, the Raspberry Pi is incredibly versatile, and we’ve made it easy to hack the design in case you want to use different sensors.
Many other tutorials for Pi-powered weather stations don’t explain how the various sensors work or how to store your data. Ours goes into more detail. It shows you how to put together a breadboard prototype, it describes how to write Python code to take readings in different ways, and it guides you through recording these readings in a database.
There’s also a section on how to make your station weatherproof. And in case you want to move past the breadboard stage, we also help you with that. The guide shows you how to solder together all the components, similar to the original Oracle Weather Station HAT.
Who should try this build
We think this is a great project to tackle at home, at a STEM club, Scout group, or CoderDojo, and we’re sure that many of you will be chomping at the bit to get started. Before you do, please note that we’ve designed the build to be as straight-forward as possible, but it’s still fairly advanced both in terms of electronics and programming. You should read through the whole guide before purchasing any components.
The sensors and components we’re suggesting balance cost, accuracy, and easy of use. Depending on what you want to use your station for, you may wish to use different components. Similarly, the final soldered design in the guide may not be the most elegant, but we think it is achievable for someone with modest soldering experience and basic equipment.
You can build a functioning weather station without soldering with our guide, but the build will be more durable if you do solder it. If you’ve never tried soldering before, that’s OK: we have a Getting started with soldering resource plus video tutorial that will walk you through how it works step by step.
For those of you who are more experienced makers, there are plenty of different ways to put the final build together. We always like to hear about alternative builds, so please post your designs in the Weather Station forum.
Our plans for the guide
Our next step is publishing supplementary guides for adding extra functionality to your weather station. We’d love to hear which enhancements you would most like to see! Our current ideas under development include adding a webcam, making a tweeting weather station, adding a light/UV meter, and incorporating a lightning sensor. Let us know which of these is your favourite, or suggest your own amazing ideas in the comments!
*We do have a very small number of kits reserved for interesting projects or locations: a particularly cool experiment, a novel idea for how the Oracle Weather Station could be used, or places with specific weather phenomena. If have such a project in mind, please send a brief outline to [email protected], and we’ll consider how we might be able to help you.
Last year, we released Amazon Connect, a cloud-based contact center service that enables any business to deliver better customer service at low cost. This service is built based on the same technology that empowers Amazon customer service associates. Using this system, associates have millions of conversations with customers when they inquire about their shipping or order information. Because we made it available as an AWS service, you can now enable your contact center agents to make or receive calls in a matter of minutes. You can do this without having to provision any kind of hardware. 2
There are several advantages of building your contact center in the AWS Cloud, as described in our documentation. In addition, customers can extend Amazon Connect capabilities by using AWS products and the breadth of AWS services. In this blog post, we focus on how to get analytics out of the rich set of data published by Amazon Connect. We make use of an Amazon Connect data stream and create an end-to-end workflow to offer an analytical solution that can be customized based on need.
Solution overview
The following diagram illustrates the solution.
In this solution, Amazon Connect exports its contact trace records (CTRs) using Amazon Kinesis. CTRs are data streams in JSON format, and each has information about individual contacts. For example, this information might include the start and end time of a call, which agent handled the call, which queue the user chose, queue wait times, number of holds, and so on. You can enable this feature by reviewing our documentation.
In this architecture, we use Kinesis Firehose to capture Amazon Connect CTRs as raw data in an Amazon S3 bucket. We don’t use the recent feature added by Kinesis Firehose to save the data in S3 as Apache Parquet format. We use AWS Glue functionality to automatically detect the schema on the fly from an Amazon Connect data stream.
The primary reason for this approach is that it allows us to use attributes and enables an Amazon Connect administrator to dynamically add more fields as needed. Also by converting data to parquet in batch (every couple of hours) compression can be higher. However, if your requirement is to ingest the data in Parquet format on realtime, we recoment using Kinesis Firehose recently launched feature. You can review this blog post for further information.
By default, Firehose puts these records in time-series format. To make it easy for AWS Glue crawlers to capture information from new records, we use AWS Lambda to move all new records to a single S3 prefix called flatfiles. Our Lambda function is configured using S3 event notification. To comply with AWS Glue and Athena best practices, the Lambda function also converts all column names to lowercase. Finally, we also use the Lambda function to start AWS Glue crawlers. AWS Glue crawlers identify the data schema and update the AWS Glue Data Catalog, which is used by extract, transform, load (ETL) jobs in AWS Glue in the latter half of the workflow.
You can see our approach in the Lambda code following.
from __future__ import print_function
import json
import urllib
import boto3
import os
import re
s3 = boto3.resource('s3')
client = boto3.client('s3')
def convertColumntoLowwerCaps(obj):
for key in obj.keys():
new_key = re.sub(r'[\W]+', '', key.lower())
v = obj[key]
if isinstance(v, dict):
if len(v) > 0:
convertColumntoLowwerCaps(v)
if new_key != key:
obj[new_key] = obj[key]
del obj[key]
return obj
def lambda_handler(event, context):
bucket = event['Records'][0]['s3']['bucket']['name']
key = urllib.unquote_plus(event['Records'][0]['s3']['object']['key'].encode('utf8'))
try:
client.download_file(bucket, key, '/tmp/file.json')
with open('/tmp/out.json', 'w') as output, open('/tmp/file.json', 'rb') as file:
i = 0
for line in file:
for object in line.replace("}{","}\n{").split("\n"):
record = json.loads(object,object_hook=convertColumntoLowwerCaps)
if i != 0:
output.write("\n")
output.write(json.dumps(record))
i += 1
newkey = 'flatfiles/' + key.replace("/", "")
client.upload_file('/tmp/out.json', bucket,newkey)
s3.Object(bucket,key).delete()
return "success"
except Exception as e:
print(e)
print('Error coping object {} from bucket {}'.format(key, bucket))
raise e
We trigger AWS Glue crawlers based on events because this approach lets us capture any new data frame that we want to be dynamic in nature. CTR attributes are designed to offer multiple custom options based on a particular call flow. Attributes are essentially key-value pairs in nested JSON format. With the help of event-based AWS Glue crawlers, you can easily identify newer attributes automatically.
We recommend setting up an S3 lifecycle policy on the flatfiles folder that keeps records only for 24 hours. Doing this optimizes AWS Glue ETL jobs to process a subset of files rather than the entire set of records.
After we have data in the flatfiles folder, we use AWS Glue to catalog the data and transform it into Parquet format inside a folder called parquet/ctr/. The AWS Glue job performs the ETL that transforms the data from JSON to Parquet format. We use AWS Glue crawlers to capture any new data frame inside the JSON code that we want to be dynamic in nature. What this means is that when you add new attributes to an Amazon Connect instance, the solution automatically recognizes them and incorporates them in the schema of the results.
After AWS Glue stores the results in Parquet format, you can perform analytics using Amazon Redshift Spectrum, Amazon Athena, or any third-party data warehouse platform. To keep this solution simple, we have used Amazon Athena for analytics. Amazon Athena allows us to query data without having to set up and manage any servers or data warehouse platforms. Additionally, we only pay for the queries that are executed.
Try it out!
You can get started with our sample AWS CloudFormation template. This template creates the components starting from the Kinesis stream and finishes up with S3 buckets, the AWS Glue job, and crawlers. To deploy the template, open the AWS Management Console by clicking the following link.
In the console, specify the following parameters:
BucketName: The name for the bucket to store all the solution files. This name must be unique; if it’s not, template creation fails.
etlJobSchedule: The schedule in cron format indicating how often the AWS Glue job runs. The default value is every hour.
KinesisStreamName: The name of the Kinesis stream to receive data from Amazon Connect. This name must be different from any other Kinesis stream created in your AWS account.
s3interval: The interval in seconds for Kinesis Firehose to save data inside the flatfiles folder on S3. The value must between 60 and 900 seconds.
sampledata: When this parameter is set to true, sample CTR records are used. Doing this lets you try this solution without setting up an Amazon Connect instance. All examples in this walkthrough use this sample data.
Select the “I acknowledge that AWS CloudFormation might create IAM resources.” check box, and then choose Create. After the template finishes creating resources, you can see the stream name on the stack Outputs tab.
If you haven’t created your Amazon Connect instance, you can do so by following the Getting Started Guide. When you are done creating, choose your Amazon Connect instance in the console, which takes you to instance settings. Choose Data streaming to enable streaming for CTR records. Here, you can choose the Kinesis stream (defined in the KinesisStreamName parameter) that was created by the CloudFormation template.
Now it’s time to generate the data by making or receiving calls by using Amazon Connect. You can go to Amazon Connect Cloud Control Panel (CCP) to make or receive calls using a software phone or desktop phone. After a few minutes, we should see data inside the flatfiles folder. To make it easier to try this solution, we provide sample data that you can enable by setting the sampledata parameter to true in your CloudFormation template.
You can navigate to the AWS Glue console by choosing Jobs on the left navigation pane of the console. We can select our job here. In my case, the job created by CloudFormation is called glueJob-i3TULzVtP1W0; yours should be similar. You run the job by choosing Run job for Action.
After that, we wait for the AWS Glue job to run and to finish successfully. We can track the status of the job by checking the History tab.
When the job finishes running, we can check the Database section. There should be a new table created called ctr in Parquet format.
To query the data with Athena, we can select the ctr table, and for Action choose View data.
Doing this takes us to the Athena console. If you run a query, Athena shows a preview of the data.
When we can query the data using Athena, we can visualize it using Amazon QuickSight. Before connecting Amazon QuickSight to Athena, we must make sure to grant Amazon QuickSight access to Athena and the associated S3 buckets in the account. For more information on doing this, see Managing Amazon QuickSight Permissions to AWS Resources in the Amazon QuickSight User Guide. We can then create a new data set in Amazon QuickSight based on the Athena table that was created.
After setting up permissions, we can create a new analysis in Amazon QuickSight by choosing New analysis.
Then we add a new data set.
We choose Athena as the source and give the data source a name (in this case, I named it connectctr).
Choose the name of the database and the table referencing the Parquet results.
Then choose Visualize.
After that, we should see the following screen.
Now we can create some visualizations. First, search for the agent.username column, and drag it to the AutoGraph section.
We can see the agents and the number of calls for each, so we can easily see which agents have taken the largest amount of calls. If we want to see from what queues the calls came for each agent, we can add the queue.arn column to the visual.
After following all these steps, you can use Amazon QuickSight to add different columns from the call records and perform different types of visualizations. You can build dashboards that continuously monitor your connect instance. You can share those dashboards with others in your organization who might need to see this data.
Conclusion
In this post, you see how you can use services like AWS Lambda, AWS Glue, and Amazon Athena to process Amazon Connect call records. The post also demonstrates how to use AWS Lambda to preprocess files in Amazon S3 and transform them into a format that recognized by AWS Glue crawlers. Finally, the post shows how to used Amazon QuickSight to perform visualizations.
You can use the provided template to analyze your own contact center instance. Or you can take the CloudFormation template and modify it to process other data streams that can be ingested using Amazon Kinesis or stored on Amazon S3.
Luis Caro is a Big Data Consultant for AWS Professional Services. He works with our customers to provide guidance and technical assistance on big data projects, helping them improving the value of their solutions when using AWS.
Peter Dalbhanjan is a Solutions Architect for AWS based in Herndon, VA. Peter has a keen interest in evangelizing AWS solutions and has written multiple blog posts that focus on simplifying complex use cases. At AWS, Peter helps with designing and architecting variety of customer workloads.
Python code creates curious, wordless comic strips at random, spewing them from the thermal printer mouth of a laser-cut body reminiscent of Disney Pixar’s WALL-E: meet the Vomit Comic Robot!
The age of the thermal printer!
Thermal printers allow you to instantly print photos, data, and text using a few lines of code, with no need for ink. More and more makers are using this handy, low-maintenance bit of kit for truly creative projects, from Pierre Muth’s tiny PolaPi-Zero camera to the sound-printing Waves project by Eunice Lee, Matthew Zhang, and Bomani McClendon (and our own Secret Santa Babbage).
Vomiting robots
Interaction designer and developer Cadin Batrack, whose background is in game design and interactivity, has built the Vomit Comic Robot, which creates “one-of-a-kind comics on demand by processing hand-drawn images through a custom software algorithm.”
The robot is made up of a Raspberry Pi 3, a USB thermal printer, and a handful of LEDs.
At the press of a button, Processing code selects one of a set of Cadin’s hand-drawn empty comic grids and then randomly picks images from a library to fill in the gaps.
Each image is associated with data that allows the code to fit it correctly into the available panels. Cadin says about the concept behing his build:
Although images are selected and placed randomly, the comic panel format suggests relationships between elements. Our minds create a story where there is none in an attempt to explain visuals created by a non-intelligent machine.
The Raspberry Pi saves the final image as a high-resolution PNG file (so that Cadin can sell prints on thick paper via Etsy), and a Python script sends it to be vomited up by the thermal printer.
For more about the Vomit Comic Robot, check out Cadin’s blog. If you want to recreate it, you can find the info you need in the Imgur album he has put together.
We cute robots
We have a soft spot for cute robots here at Pi Towers, and of course we make no exception for the Vomit Comic Robot. If, like us, you’re a fan of adorable bots, check out Mira, the tiny interactive robot by Alonso Martinez, and Peeqo, the GIF bot by Abhishek Singh.
Warning: a GIF used in today’s blog contains flashing images.
Students at the University of Bremen, Germany, have built a wearable camera that records the seconds of vision lost when you blink. Augenblick uses a Raspberry Pi Zero and Camera Module alongside muscle sensors to record footage whenever you close your eyes, producing a rather disjointed film of the sights you miss out on.
Blink and you’ll miss it
The average person blinks up to five times a minute, with each blink lasting 0.5 to 0.8 seconds. These half-seconds add up to about 30 minutes a day. What sights are we losing during these minutes? That is the question asked by students Manasse Pinsuwan and René Henrich when they set out to design Augenblick.
Blinking is a highly invasive mechanism for our eyesight. Every day we close our eyes thousands of times without noticing it. Our mind manages to never let us wonder what exactly happens in the moments that we miss.
Capturing lost moments
For Augenblick, the wearer sticks MyoWare Muscle Sensor pads to their face, and these detect the electrical impulses that trigger blinking.
Two pads are applied over the orbicularis oculi muscle that forms a ring around the eye socket, while the third pad is attached to the cheek as a neutral point.
Biology fact: there are two muscles responsible for blinking. The orbicularis oculi muscle closes the eye, while the levator palpebrae superioris muscle opens it — and yes, they both sound like the names of Harry Potter spells.
The sensor is read 25 times a second. Whenever it detects that the orbicularis oculi is active, the Camera Module records video footage.
Pressing a button on the side of the Augenblick glasses set the code running. An LED lights up whenever the camera is recording and also serves to confirm the correct placement of the sensor pads.
The Pi Zero saves the footage so that it can be stitched together later to form a continuous, if disjointed, film.
Learn more about the Augenblick blink camera
You can find more information on the conception, design, and build process of Augenblickhere in German, with a shorter explanation including lots of photos here in English.
And if you’re keen to recreate this project, our free project resource for a wearable Pi Zero time-lapse camera will come in handy as a starting point.
Version 26.1 of the Emacs editor is out. Highlights include a built-in Lisp threading mechanism that provides some concurrency, double buffering when running under X, a redesigned flymake mode, 24-bit color support in text mode, and a systemd unit file.
SHB is a small invitational gathering of people studying various aspects of the human side of security, organized each year by Alessandro Acquisti, Ross Anderson, and myself. The 50 or so people in the room include psychologists, economists, computer security researchers, sociologists, political scientists, neuroscientists, designers, lawyers, philosophers, anthropologists, business school professors, and a smattering of others. It’s not just an interdisciplinary event; most of the people here are individually interdisciplinary.
The goal is to maximize discussion and interaction. We do that by putting everyone on panels, and limiting talks to 7-10 minutes. The rest of the time is left to open discussion. Four hour-and-a-half panels per day over two days equals eight panels; six people per panel means that 48 people get to speak. We also have lunches, dinners, and receptions — all designed so people from different disciplines talk to each other.
I invariably find this to be the most intellectually stimulating conference of my year. It influences my thinking in many different, and sometimes surprising, ways.
Here are my posts on the first, second, third, fourth, fifth, sixth, seventh, eighth, ninth, and tenth SHB workshops. Follow those links to find summaries, papers, and occasionally audio recordings of the various workshops.
The adoption of Apache Spark has increased significantly over the past few years, and running Spark-based application pipelines is the new normal. Spark jobs that are in an ETL (extract, transform, and load) pipeline have different requirements—you must handle dependencies in the jobs, maintain order during executions, and run multiple jobs in parallel. In most of these cases, you can use workflow scheduler tools like Apache Oozie, Apache Airflow, and even Cron to fulfill these requirements.
Apache Oozie is a widely used workflow scheduler system for Hadoop-based jobs. However, its limited UI capabilities, lack of integration with other services, and heavy XML dependency might not be suitable for some users. On the other hand, Apache Airflow comes with a lot of neat features, along with powerful UI and monitoring capabilities and integration with several AWS and third-party services. However, with Airflow, you do need to provision and manage the Airflow server. The Cron utility is a powerful job scheduler. But it doesn’t give you much visibility into the job details, and creating a workflow using Cron jobs can be challenging.
What if you have a simple use case, in which you want to run a few Spark jobs in a specific order, but you don’t want to spend time orchestrating those jobs or maintaining a separate application? You can do that today in a serverless fashion using AWS Step Functions. You can create the entire workflow in AWS Step Functions and interact with Spark on Amazon EMR through Apache Livy.
In this post, I walk you through a list of steps to orchestrate a serverless Spark-based ETL pipeline using AWS Step Functions and Apache Livy.
Input data
For the source data for this post, I use the New York City Taxi and Limousine Commission (TLC) trip record data. For a description of the data, see this detailed dictionary of the taxi data. In this example, we’ll work mainly with the following three columns for the Spark jobs.
Column name
Column description
RateCodeID
Represents the rate code in effect at the end of the trip (for example, 1 for standard rate, 2 for JFK airport, 3 for Newark airport, and so on).
FareAmount
Represents the time-and-distance fare calculated by the meter.
TripDistance
Represents the elapsed trip distance in miles reported by the taxi meter.
The trip data is in comma-separated values (CSV) format with the first row as a header. To shorten the Spark execution time, I trimmed the large input data to only 20,000 rows. During the deployment phase, the input file tripdata.csv is stored in Amazon S3 in the <<your-bucket>>/emr-step-functions/input/ folder.
The following image shows a sample of the trip data:
Solution overview
The next few sections describe how Spark jobs are created for this solution, how you can interact with Spark using Apache Livy, and how you can use AWS Step Functions to create orchestrations for these Spark applications.
At a high level, the solution includes the following steps:
Trigger the AWS Step Function state machine by passing the input file path.
The first stage in the state machine triggers an AWS Lambda
The Lambda function interacts with Apache Spark running on Amazon EMR using Apache Livy, and submits a Spark job.
The state machine waits a few seconds before checking the Spark job status.
Based on the job status, the state machine moves to the success or failure state.
Subsequent Spark jobs are submitted using the same approach.
The state machine waits a few seconds for the job to finish.
The job finishes, and the state machine updates with its final status.
Let’s take a look at the Spark application that is used for this solution.
Spark jobs
For this example, I built a Spark jar named spark-taxi.jar. It has two different Spark applications:
MilesPerRateCode – The first job that runs on the Amazon EMR cluster. This job reads the trip data from an input source and computes the total trip distance for each rate code. The output of this job consists of two columns and is stored in Apache Parquet format in the output path.
The following are the expected output columns:
rate_code – Represents the rate code for the trip.
total_distance – Represents the total trip distance for that rate code (for example, sum(trip_distance)).
RateCodeStatus – The second job that runs on the EMR cluster, but only if the first job finishes successfully. This job depends on two different input sets:
csv – The same trip data that is used for the first Spark job.
miles-per-rate – The output of the first job.
This job first reads the tripdata.csv file and aggregates the fare_amount by the rate_code. After this point, you have two different datasets, both aggregated by rate_code. Finally, the job uses the rate_code field to join two datasets and output the entire rate code status in a single CSV file.
The output columns are as follows:
rate_code_id – Represents the rate code type.
total_distance – Derived from first Spark job and represents the total trip distance.
total_fare_amount – A new field that is generated during the second Spark application, representing the total fare amount by the rate code type.
Note that in this case, you don’t need to run two different Spark jobs to generate that output. The goal of setting up the jobs in this way is just to create a dependency between the two jobs and use them within AWS Step Functions.
Both Spark applications take one input argument called rootPath. It’s the S3 location where the Spark job is stored along with input and output data. Here is a sample of the final output:
The next section discusses how you can use Apache Livy to interact with Spark applications that are running on Amazon EMR.
Using Apache Livy to interact with Apache Spark
Apache Livy provides a REST interface to interact with Spark running on an EMR cluster. Livy is included in Amazon EMR release version 5.9.0 and later. In this post, I use Livy to submit Spark jobs and retrieve job status. When Amazon EMR is launched with Livy installed, the EMR master node becomes the endpoint for Livy, and it starts listening on port 8998 by default. Livy provides APIs to interact with Spark.
Let’s look at a couple of examples how you can interact with Spark running on Amazon EMR using Livy.
To list active running jobs, you can execute the following from the EMR master node:
curl localhost:8998/sessions
If you want to do the same from a remote instance, just change localhost to the EMR hostname, as in the following (port 8998 must be open to that remote instance through the security group):
Through Spark submit, you can pass multiple arguments for the Spark job and Spark configuration settings. You can also do that using Livy, by passing the S3 path through the args parameter, as shown following:
For a detailed list of Livy APIs, see the Apache Livy REST API page. This post uses GET /batches and POST /batches.
In the next section, you create a state machine and orchestrate Spark applications using AWS Step Functions.
Using AWS Step Functions to create a Spark job workflow
AWS Step Functions automatically triggers and tracks each step and retries when it encounters errors. So your application executes in order and as expected every time. To create a Spark job workflow using AWS Step Functions, you first create a Lambda state machine using different types of states to create the entire workflow.
First, you use the Task state—a simple state in AWS Step Functions that performs a single unit of work. You also use the Wait state to delay the state machine from continuing for a specified time. Later, you use the Choice state to add branching logic to a state machine.
The following is a quick summary of how to use different states in the state machine to create the Spark ETL pipeline:
Task state – Invokes a Lambda function. The first Task state submits the Spark job on Amazon EMR, and the next Task state is used to retrieve the previous Spark job status.
Wait state – Pauses the state machine until a job completes execution.
Choice state – Each Spark job execution can return a failure, an error, or a success state So, in the state machine, you use the Choice state to create a rule that specifies the next action or step based on the success or failure of the previous step.
Here is one of my Task states, MilesPerRateCode, which simply submits a Spark job:
"MilesPerRate Job": {
"Type": "Task",
"Resource":"arn:aws:lambda:us-east-1:xxxxxx:function:blog-miles-per-rate-job-submit-function",
"ResultPath": "$.jobId",
"Next": "Wait for MilesPerRate job to complete"
}
This Task state configuration specifies the Lambda function to execute. Inside the Lambda function, it submits a Spark job through Livy using Livy’s POST API. Using ResultPath, it tells the state machine where to place the result of the executing task. As discussed in the previous section, Spark submit returns the session ID, which is captured with $.jobId and used in a later state.
The following code section shows the Lambda function, which is used to submit the MilesPerRateCode job. It uses the Python request library to submit a POST against the Livy endpoint hosted on Amazon EMR and passes the required parameters in JSON format through payload. It then parses the response, grabs id from the response, and returns it. The Next field tells the state machine which state to go to next.
Just like in the MilesPerRate job, another state submits the RateCodeStatus job, but it executes only when all previous jobs have completed successfully.
Here is the Task state in the state machine that checks the Spark job status:
Just like other states, the preceding Task executes a Lambda function, captures the result (represented by jobStatus), and passes it to the next state. The following is the Lambda function that checks the Spark job status based on a given session ID:
In the Choice state, it checks the Spark job status value, compares it with a predefined state status, and transitions the state based on the result. For example, if the status is success, move to the next state (RateCodeJobStatus job), and if it is dead, move to the MilesPerRate job failed state.
To set up this entire solution, you need to create a few AWS resources. To make it easier, I have created an AWS CloudFormation template. This template creates all the required AWS resources and configures all the resources that are needed to create a Spark-based ETL pipeline on AWS Step Functions.
This CloudFormation template requires you to pass the following four parameters during initiation.
Parameter
Description
ClusterSubnetID
The subnet where the Amazon EMR cluster is deployed and Lambda is configured to talk to this subnet.
KeyName
The name of the existing EC2 key pair to access the Amazon EMR cluster.
VPCID
The ID of the virtual private cloud (VPC) where the EMR cluster is deployed and Lambda is configured to talk to this VPC.
S3RootPath
The Amazon S3 path where all required files (input file, Spark job, and so on) are stored and the resulting data is written.
IMPORTANT: These templates are designed only to show how you can create a Spark-based ETL pipeline on AWS Step Functions using Apache Livy. They are not intended for production use without modification. And if you try this solution outside of the us-east-1 Region, download the necessary files from s3://aws-data-analytics-blog/emr-step-functions, upload the files to the buckets in your Region, edit the script as appropriate, and then run it.
To launch the CloudFormation stack, choose Launch Stack:
Launching this stack creates the following list of AWS resources.
Logical ID
Resource Type
Description
StepFunctionsStateExecutionRole
IAM role
IAM role to execute the state machine and have a trust relationship with the states service.
SparkETLStateMachine
AWS Step Functions state machine
State machine in AWS Step Functions for the Spark ETL workflow.
LambdaSecurityGroup
Amazon EC2 security group
Security group that is used for the Lambda function to call the Livy API.
RateCodeStatusJobSubmitFunction
AWS Lambda function
Lambda function to submit the RateCodeStatus job.
MilesPerRateJobSubmitFunction
AWS Lambda function
Lambda function to submit the MilesPerRate job.
SparkJobStatusFunction
AWS Lambda function
Lambda function to check the Spark job status.
LambdaStateMachineRole
IAM role
IAM role for all Lambda functions to use the lambda trust relationship.
EMRCluster
Amazon EMR cluster
EMR cluster where Livy is running and where the job is placed.
During the AWS CloudFormation deployment phase, it sets up S3 paths for input and output. Input files are stored in the <<s3-root-path>>/emr-step-functions/input/ path, whereas spark-taxi.jar is copied under <<s3-root-path>>/emr-step-functions/.
The following screenshot shows how the S3 paths are configured after deployment. In this example, I passed a bucket that I created in the AWS account s3://tm-app-demos for the S3 root path.
If the CloudFormation template completed successfully, you will see Spark-ETL-State-Machine in the AWS Step Functions dashboard, as follows:
Choose the Spark-ETL-State-Machine state machine to take a look at this implementation. The AWS CloudFormation template built the entire state machine along with its dependent Lambda functions, which are now ready to be executed.
On the dashboard, choose the newly created state machine, and then choose New execution to initiate the state machine. It asks you to pass input in JSON format. This input goes to the first state MilesPerRate Job, which eventually executes the Lambda function blog-miles-per-rate-job-submit-function.
Pass the S3 root path as input:
{
“rootPath”: “s3://tm-app-demos”
}
Then choose Start Execution:
The rootPath value is the same value that was passed when creating the CloudFormation stack. It can be an S3 bucket location or a bucket with prefixes, but it should be the same value that is used for AWS CloudFormation. This value tells the state machine where it can find the Spark jar and input file, and where it will write output files. After the state machine starts, each state/task is executed based on its definition in the state machine.
At a high level, the following represents the flow of events:
Execute the first Spark job, MilesPerRate.
The Spark job reads the input file from the location <<rootPath>>/emr-step-functions/input/tripdata.csv. If the job finishes successfully, it writes the output data to <<rootPath>>/emr-step-functions/miles-per-rate.
If the Spark job fails, it transitions to the error state MilesPerRate job failed, and the state machine stops. If the Spark job finishes successfully, it transitions to the RateCodeStatus Job state, and the second Spark job is executed.
If the second Spark job fails, it transitions to the error state RateCodeStatus job failed, and the state machine stops with the Failed status.
If this Spark job completes successfully, it writes the final output data to the <<rootPath>>/emr-step-functions/rate-code-status/ It also transitions the RateCodeStatus job finished state, and the state machine ends its execution with the Success status.
This following screenshot shows a successfully completed Spark ETL state machine:
The right side of the state machine diagram shows the details of individual states with their input and output.
When you execute the state machine for the second time, it fails because the S3 path already exists. The state machine turns red and stops at MilePerRate job failed. The following image represents that failed execution of the state machine:
You can also check your Spark application status and logs by going to the Amazon EMR console and viewing the Application history tab:
I hope this walkthrough paints a picture of how you can create a serverless solution for orchestrating Spark jobs on Amazon EMR using AWS Step Functions and Apache Livy. In the next section, I share some ideas for making this solution even more elegant.
Next steps
The goal of this post is to show a simple example that uses AWS Step Functions to create an orchestration for Spark-based jobs in a serverless fashion. To make this solution robust and production ready, you can explore the following options:
In this example, I manually initiated the state machine by passing the rootPath as input. You can instead trigger the state machine automatically. To run the ETL pipeline as soon as the files arrive in your S3 bucket, you can pass the new file path to the state machine. Because CloudWatch Events supports AWS Step Functions as a target, you can create a CloudWatch rule for an S3 event. You can then set AWS Step Functions as a target and pass the new file path to your state machine. You’re all set!
You can also improve this solution by adding an alerting mechanism in case of failures. To do this, create a Lambda function that sends an alert email and assigns that Lambda function to a Fail That way, when any part of your state fails, it triggers an email and notifies the user.
If you want to submit multiple Spark jobs in parallel, you can use the Parallel state type in AWS Step Functions. The Parallel state is used to create parallel branches of execution in your state machine.
With Lambda and AWS Step Functions, you can create a very robust serverless orchestration for your big data workload.
Cleaning up
When you’ve finished testing this solution, remember to clean up all those AWS resources that you created using AWS CloudFormation. Use the AWS CloudFormation console or AWS CLI to delete the stack named Blog-Spark-ETL-Step-Functions.
Summary
In this post, I showed you how to use AWS Step Functions to orchestrate your Spark jobs that are running on Amazon EMR. You used Apache Livy to submit jobs to Spark from a Lambda function and created a workflow for your Spark jobs, maintaining a specific order for job execution and triggering different AWS events based on your job’s outcome. Go ahead—give this solution a try, and share your experience with us!
Tanzir Musabbir is an EMR Specialist Solutions Architect with AWS. He is an early adopter of open source Big Data technologies. At AWS, he works with our customers to provide them architectural guidance for running analytics solutions on Amazon EMR, Amazon Athena & AWS Glue. Tanzir is a big Real Madrid fan and he loves to travel in his free time.
When I talk with customers and partners, I find that they are in different stages in the adoption of DevOps methodologies. They are automating the creation of application artifacts and the deployment of their applications to different infrastructure environments. In many cases, they are creating and supporting multiple applications using a variety of coding languages and artifacts.
The management of these processes and artifacts can be challenging, but using the right tools and methodologies can simplify the process.
In this post, I will show you how you can automate the creation and storage of application artifacts through the implementation of a pipeline and custom deploy action in AWS CodePipeline. The example includes a Node.js code base stored in an AWS CodeCommit repository. A Node Package Manager (npm) artifact is built from the code base, and the build artifact is published to a JFrogArtifactory npm repository.
I frequently recommend AWS CodePipeline, the AWS continuous integration and continuous delivery tool. You can use it to quickly innovate through integration and deployment of new features and bug fixes by building a workflow that automates the build, test, and deployment of new versions of your application. And, because AWS CodePipeline is extensible, it allows you to create a custom action that performs customized, automated actions on your behalf.
JFrog’s Artifactory is a universal binary repository manager where you can manage multiple applications, their dependencies, and versions in one place. Artifactory also enables you to standardize the way you manage your package types across all applications developed in your company, no matter the code base or artifact type.
If you already have a Node.js CodeCommit repository, a JFrog Artifactory host, and would like to automate the creation of the pipeline, including the custom action and CodeBuild project, you can use this AWS CloudFormationtemplate to create your AWS CloudFormation stack.
This figure shows the path defined in the pipeline for this project. It starts with a change to Node.js source code committed to a private code repository in AWS CodeCommit. With this change, CodePipeline triggers AWS CodeBuild to create the npm package from the node.js source code. After the build, CodePipeline triggers the custom action job worker to commit the build artifact to the designated artifact repository in Artifactory.
This blog post assumes you have already:
· Created a CodeCommit repository that contains a Node.js project.
· Configured a two-stage pipeline in AWS CodePipeline.
The Source stage of the pipeline is configured to poll the Node.js CodeCommit repository. The Build stage is configured to use a CodeBuild project to build the npm package using a buildspec.yml file located in the code repository.
If you do not have a Node.js repository, you can create a CodeCommit repository that contains this simple ‘Hello World’ project. This project also includes a buildspec.yml file that is used when you define your CodeBuild project. It defines the steps to be taken by CodeBuild to create the npm artifact.
If you do not already have a pipeline set up in CodePipeline, you can use this template to create a pipeline with a CodeCommit source action and a CodeBuild build action through the AWS Command Line Interface (AWS CLI). If you do not want to install the AWS CLI on your local machine, you can use AWS Cloud9, our managed integrated development environment (IDE), to interact with AWS APIs.
In your development environment, open your favorite editor and fill out the template with values appropriate to your project. For information, see the readme in the GitHub repository.
Use this CLI command to create the pipeline from the template:
It creates a pipeline that has a CodeCommit source action and a CodeBuild build action.
Integrating JFrog Artifactory
JFrog Artifactory provides default repositories for your project needs. For my NPM package repository, I am using the default virtual npm repository (named npm) that is available in Artifactory Pro. You might want to consider creating a repository per project but for the example used in this post, using the default lets me get started without having to configure a new repository.
I can use the steps in the Set Me Up -> npm section on the landing page to configure my worker to interact with the default NPM repository.
Describes the required values to run the custom action. I will define my custom action in the ‘Deploy’ category, identify the provider as ‘Artifactory’, of version ‘1’, and specify a variety of configurationProperties whose values will be defined when this stage is added to my pipeline.
Polls CodePipeline for a job, scanning for its action-definition properties. In this blog post, after a job has been found, the job worker does the work required to publish the npm artifact to the Artifactory repository.
{
"category": "Deploy",
"configurationProperties": [{
"name": "TypeOfArtifact",
"required": true,
"key": true,
"secret": false,
"description": "Package type, ex. npm for node packages",
"type": "String"
},
{ "name": "RepoKey",
"required": true,
"key": true,
"secret": false,
"type": "String",
"description": "Name of the repository in which this artifact should be stored"
},
{ "name": "UserName",
"required": true,
"key": true,
"secret": false,
"type": "String",
"description": "Username for authenticating with the repository"
},
{ "name": "Password",
"required": true,
"key": true,
"secret": true,
"type": "String",
"description": "Password for authenticating with the repository"
},
{ "name": "EmailAddress",
"required": true,
"key": true,
"secret": false,
"type": "String",
"description": "Email address used to authenticate with the repository"
},
{ "name": "ArtifactoryHost",
"required": true,
"key": true,
"secret": false,
"type": "String",
"description": "Public address of Artifactory host, ex: https://myexamplehost.com or http://myexamplehost.com:8080"
}],
"provider": "Artifactory",
"version": "1",
"settings": {
"entityUrlTemplate": "{Config:ArtifactoryHost}/artifactory/webapp/#/artifacts/browse/tree/General/{Config:RepoKey}"
},
"inputArtifactDetails": {
"maximumCount": 5,
"minimumCount": 1
},
"outputArtifactDetails": {
"maximumCount": 5,
"minimumCount": 0
}
}
There are seven sections to the custom action definition:
category: This is the stage in which you will be creating this action. It can be Source, Build, Deploy, Test, Invoke, Approval. Except for source actions, the category section simply allows us to organize our actions. I am setting the category for my action as ‘Deploy’ because I’m using it to publish my node artifact to my Artifactory instance.
configurationProperties: These are the parameters or variables required for your project to authenticate and commit your artifact. In the case of my custom worker, I need:
TypeOfArtifact: In this case, npm, because it’s for the Node Package Manager.
RepoKey: The name of the repository. In this case, it’s the default npm.
UserName and Password for the user to authenticate with the Artifactory repository.
EmailAddress used to authenticate with the repository.
Artifactory host name or IP address.
provider: The name you define for your custom action stage. I have named the provider Artifactory.
version: Version number for the custom action. Because this is the first version, I set the version number to 1.
entityUrlTemplate: This URL is presented to your users for the deploy stage along with the title you define in your provider. The link takes the user to their artifact repository page in the Artifactory host.
inputArtifactDetails: The number of artifacts to expect from the previous stage in the pipeline.
outputArtifactDetails: The number of artifacts that should be the result from the custom action stage. Later in this blog post, I define 0 for my output artifacts because I am publishing the artifact to the Artifactory repository as the final action.
After I define the custom action in a JSON file, I use the AWS CLI to create the custom action type in CodePipeline:
After I create the custom action type in the same region as my pipeline, I edit the pipeline to add a Deploy stage and configure it to use the custom action I created for Artifactory:
I have created a custom worker for the actions required to commit the npm artifact to the Artifactory repository. The worker is in Python and it runs in a loop on an Amazon EC2 instance. My custom worker polls for a deploy job and publishes the NPM artifact to the Artifactory repository.
The EC2 instance is running Amazon Linux and has an IAM instance role attached that gives the worker permission to access CodePipeline. The worker process is as follows:
Take the configuration properties from the custom worker and poll CodePipeline for a custom action job.
After there is a job in the job queue with the appropriate category, provider, and version, acknowledge the job.
Download the zipped artifact created in the previous Build stage from the provided S3 buckets with the provided temporary credentials.
Unzip the artifact into a temporary directory.
A user-defined Artifactory user name and password is used to receive a temporary API key from Artifactory.
To avoid having to write the password to a file, use that temporary API key and user name to authenticate with the NPM repository.
Publish the Node.js package to the specified repository.
Because I am running my custom worker on an Amazon Linux EC2 instance, I installed npm with the following command:
sudo yum install nodejs npm --enablerepo=epel
For my custom worker, I used pip to install the required Python libraries:
pip install boto3 requests
For a full Python package list, see requirements.txt in the GitHub repository.
Let’s take a look at some of the code snippets from the worker.
First, the worker polls for jobs:
def action_type():
ActionType = {
'category': 'Deploy',
'owner': 'Custom',
'provider': 'Artifactory',
'version': '1' }
return(ActionType)
def poll_for_jobs():
try:
artifactory_action_type = action_type()
print(artifactory_action_type)
jobs = codepipeline.poll_for_jobs(actionTypeId=artifactory_action_type)
while not jobs['jobs']:
time.sleep(10)
jobs = codepipeline.poll_for_jobs(actionTypeId=artifactory_action_type)
if jobs['jobs']:
print('Job found')
return jobs['jobs'][0]
except ClientError as e:
print("Received an error: %s" % str(e))
raise
When there is a job in the queue, the poller returns a number of values from the queue such as jobId, the input and output S3 buckets for artifacts, temporary credentials to access the S3 buckets, and other configuration details from the stage in the pipeline.
After successfully receiving the job details, the worker sends an acknowledgement to CodePipeline to ensure that the work on the job is not duplicated by other workers watching for the same job:
def job_acknowledge(jobId, nonce):
try:
print('Acknowledging job')
result = codepipeline.acknowledge_job(jobId=jobId, nonce=nonce)
return result
except Exception as e:
print("Received an error when trying to acknowledge the job: %s" % str(e))
raise
With the job now acknowledged, the worker publishes the source code artifact into the desired repository. The worker gets the value of the artifact S3 bucket and objectKey from the inputArtifacts in the response from the poll_for_jobs API request. Next, the worker creates a new directory in /tmp and downloads the S3 object into this directory:
def get_bucket_location(bucketName, init_client):
region = init_client.get_bucket_location(Bucket=bucketName)['LocationConstraint']
if not region:
region = 'us-east-1'
return region
def get_s3_artifact(bucketName, objectKey, ak, sk, st):
init_s3 = boto3.client('s3')
region = get_bucket_location(bucketName, init_s3)
session = Session(aws_access_key_id=ak,
aws_secret_access_key=sk,
aws_session_token=st)
s3 = session.resource('s3',
region_name=region,
config=botocore.client.Config(signature_version='s3v4'))
try:
tempdirname = tempfile.mkdtemp()
except OSError as e:
print('Could not write temp directory %s' % tempdirname)
raise
bucket = s3.Bucket(bucketName)
obj = bucket.Object(objectKey)
filename = tempdirname + '/' + objectKey
try:
if os.path.dirname(objectKey):
directory = os.path.dirname(filename)
os.makedirs(directory)
print('Downloading the %s object and writing it to disk in %s location' % (objectKey, tempdirname))
with open(filename, 'wb') as data:
obj.download_fileobj(data)
except ClientError as e:
print('Downloading the object and writing the file to disk raised this error: ' + str(e))
raise
return(filename, tempdirname)
Because the downloaded artifact from S3 is a zip file, the worker must unzip it first. To have a clean area in which to work, I extract the downloaded zip archive into a new directory:
def unzip_codepipeline_artifact(artifact, origtmpdir):
# create a new temp directory
# Unzip artifact into new directory
try:
newtempdir = tempfile.mkdtemp()
print('Extracting artifact %s into temporary directory %s' % (artifact, newtempdir))
zip_ref = zipfile.ZipFile(artifact, 'r')
zip_ref.extractall(newtempdir)
zip_ref.close()
shutil.rmtree(origtmpdir)
return(os.listdir(newtempdir), newtempdir)
except OSError as e:
if e.errno != errno.EEXIST:
shutil.rmtree(newtempdir)
raise
The worker now has the npm package that I want to store in my Artifactory NPM repository.
To authenticate with the NPM repository, the worker requests a temporary token from the Artifactory host. After receiving this temporary token, it creates a .npmrc file in the worker user’s home directory that includes a hash of the user name and temporary token. After it has authenticated, the worker runs npm config set registry <URL OF REPOSITORY> to configure the npm registry value to be the Artifactory host. Next, the worker runs npm publish –registry <URL OF REPOSITORY>, which publishes the node package to the NPM repository in the Artifactory host.
def push_to_npm(configuration, artifact_list, temp_dir, jobId):
reponame = configuration['RepoKey']
art_type = configuration['TypeOfArtifact']
print("Putting artifact into NPM repository " + reponame)
token, hostname, username = gen_artifactory_auth_token(configuration)
npmconfigfile = create_npmconfig_file(configuration, username, token)
url = hostname + '/artifactory/api/' + art_type + '/' + reponame
print("Changing directory to " + str(temp_dir))
os.chdir(temp_dir)
try:
print("Publishing following files to the repository: %s " % os.listdir(temp_dir))
print("Sending artifact to Artifactory NPM registry URL: " + url)
subprocess.call(["npm", "config", "set", "registry", url])
req = subprocess.call(["npm", "publish", "--registry", url])
print("Return code from npm publish: " + str(req))
if req != 0:
err_msg = "npm ERR! Recieved non OK response while sending response to Artifactory. Return code from npm publish: " + str(req)
signal_failure(jobId, err_msg)
else:
signal_success(jobId)
except requests.exceptions.RequestException as e:
print("Received an error when trying to commit artifact %s to repository %s: " % (str(art_type), str(configuration['RepoKey']), str(e)))
raise
return(req, npmconfigfile)
If the return value from publishing to the repository is not 0, the worker signals a failure to CodePipeline. If the value is 0, the worker signals success to CodePipeline to indicate that the stage of the pipeline has been completed successfully.
For the custom worker code, see npm_job_worker.py in the GitHub repository.
I run my custom worker on an EC2 instance using the command python npm_job_worker.py, with an optional --version flag that can be used to specify worker versions other than 1. Then I trigger a release change in my pipeline:
From my custom worker output logs, I have just committed a package named node_example at version 1.0.3:
On artifact: index.js
Committing to the repo: https://artifactory.myexamplehost.com/artifactory/api/npm/npm
Sending artifact to Artifactory URL: https:// artifactoryhost.myexamplehost.com/artifactory/api/npm/npm
npm config: 0
npm http PUT https://artifactory.myexamplehost.com/artifactory/api/npm/npm/node_example
npm http 201 https://artifactory.myexamplehost.com/artifactory/api/npm/npm/node_example
+ [email protected]
Return code from npm publish: 0
Signaling success to CodePipeline
After that has been built successfully, I can find my artifact in my Artifactory repository:
To help you automate this process, I have created this AWS CloudFormation template that automates the creation of the CodeBuild project, the custom action, and the CodePipeline pipeline. It also launches the Amazon EC2-based custom job worker in an AWS Auto Scaling group. This template requires you to have a VPC and CodeCommit repository for your Node.js project. If you do not currently have a VPC in which you want to run your custom worker EC2 instances, you can use this AWS QuickStart to create one. If you do not have an existing Node.js project, I’ve provided a sample project in the GitHub repository.
Conclusion
I‘ve shown you the steps to integrate your JFrog Artifactory repository with your CodePipeline workflow. I’ve shown you how to create a custom action in CodePipeline and how to create a custom worker that works in your CI/CD pipeline. To dig deeper into custom actions and see how you can integrate your Artifactory repositories into your AWS CodePipeline projects, check out the full code base on GitHub.
If you have any questions or feedback, feel free to reach out to us through the AWS CodePipeline forum.
Erin McGill is a Solutions Architect in the AWS Partner Program with a focus on DevOps and automation tooling.
Businesses and organizations that rely on macOS server for essential office and data services are facing some decisions about the future of their IT services.
Apple recently announced that it is deprecating a significant portion of essential network services in macOS Server, as they described in a support statement posted on April 24, 2018, “Prepare for changes to macOS Server.” Apple’s note includes:
macOS Server is changing to focus more on management of computers, devices, and storage on your network. As a result, some changes are coming in how Server works. A number of services will be deprecated, and will be hidden on new installations of an update to macOS Server coming in spring 2018.
The note lists the services that will be removed in a future release of macOS Server, including calendar and contact support, Dynamic Host Configuration Protocol (DHCP), Domain Name Services (DNS), mail, instant messages, virtual private networking (VPN), NetInstall, Web server, and the Wiki.
Apple assures users who have already configured any of the listed services that they will be able to use them in the spring 2018 macOS Server update, but the statement ends with links to a number of alternative services, including hosted services, that macOS Server users should consider as viable replacements to the features it is removing. These alternative services are all FOSS (Free and Open-Source Software).
As difficult as this could be for organizations that use macOS server, this is not unexpected. Apple left the server hardware space back in 2010, when Steve Jobs announced the company was ending its line of Xserve rackmount servers, which were introduced in May, 2002. Since then, macOS Server has hardly been a prominent part of Apple’s product lineup. It’s not just the product itself that has lost some luster, but the entire category of SMB office and business servers, which has been undergoing a gradual change in recent years.
Some might wonder how important the news about macOS Server is, given that macOS Server represents a pretty small share of the server market. macOS Server has been important to design shops, agencies, education users, and small businesses that likely have been on Macs for ages, but it’s not a significant part of the IT infrastructure of larger organizations and businesses.
What Comes After macOS Server?
Lovers of macOS Server don’t have to fear having their Mac minis pried from their cold, dead hands quite yet. Installed services will continue to be available. In the fall of 2018, new installations and upgrades of macOS Server will require users to migrate most services to other software. Since many of the services of macOS Server were already open-source, this means that a change in software might not be required. It does mean more configuration and management required from those who continue with macOS Server, however.
Users can continue with macOS Server if they wish, but many will see the writing on the wall and look for a suitable substitute.
The Times They Are A-Changin’
For many people working in organizations, what is significant about this announcement is how it reflects the move away from the once ubiquitous server-based IT infrastructure. Services that used to be centrally managed and office-based, such as storage, file sharing, communications, and computing, have moved to the cloud.
In selecting the next office IT platforms, there’s an opportunity to move to solutions that reflect and support how people are working and the applications they are using both in the office and remotely. For many, this means including cloud-based services in office automation, backup, and business continuity/disaster recovery planning. This includes Software as a Service, Platform as a Service, and Infrastructure as a Service (Saas, PaaS, IaaS) options.
IT solutions that integrate well with the cloud are worth strong consideration for what comes after a macOS Server-based environment.
Synology NAS as a macOS Server Alternative
One solution that is becoming popular is to replace macOS Server with a device that has the ability to provide important office services, but also bridges the office and cloud environments. Using Network-Attached Storage (NAS) to take up the server slack makes a lot of sense. Many customers are already using NAS for file sharing, local data backup, automatic cloud backup, and other uses. In the case of Synology, their operating system, Synology DiskStation Manager (DSM), is Linux based, and integrates the basic functions of file sharing, centralized backup, RAID storage, multimedia streaming, virtual storage, and other common functions.
Synology NAS
Since DSM is based on Linux, there are numerous server applications available, including many of the same ones that are available for macOS Server, which shares conceptual roots with Linux as it comes from BSD Unix.
Synology DiskStation Manager Package Center
According to Ed Lukacs, COO at 2FIFTEEN Systems Management in Salt Lake City, their customers have found the move from macOS Server to Synology NAS not only painless, but positive. DSM works seamlessly with macOS and has been faster for their customers, as well. Many of their customers are running Adobe Creative Suite and Google G Suite applications, so a workflow that combines local storage, remote access, and the cloud, is already well known to them. Remote users are supported by Synology’s QuickConnect or VPN.
Business continuity and backup are simplified by the flexible storage capacity of the NAS. Synology has built-in backup to Backblaze B2 Cloud Storage with Synology’s Cloud Sync, as well as a choice of a number of other B2-compatible applications, such as Cloudberry, Comet, and Arq.
Customers have been able to get up and running quickly, with only initial data transfers requiring some time to complete. After that, management of the NAS can be handled in-house or with the support of a Managed Service Provider (MSP).
Are You Sticking with macOS Server or Moving to Another Platform?
If you’re affected by this change in macOS Server, please let us know in the comments how you’re planning to cope. Are you using Synology NAS for server services? Please tell us how that’s working for you.
I’m in danger of contradicting myself, after previously pointing out that x86 machine code is a high-level language, but this article claiming C is a not a low level language is bunk. C certainly has some problems, but it’s still the closest language to assembly. This is obvious by the fact it’s still the fastest compiled language. What we see is a typical academic out of touch with the real world.
The author makes the (wrong) observation that we’ve been stuck emulating the PDP-11 for the past 40 years. C was written for the PDP-11, and since then CPUs have been designed to make C run faster. The author imagines a different world, such as where CPU designers instead target something like LISP as their preferred language, or Erlang. This misunderstands the state of the market. CPUs do indeed supports lots of different abstractions, and C has evolved to accommodate this.
The author criticizes things like “out-of-order” execution which has lead to the Spectre sidechannel vulnerabilities. Out-of-order execution is necessary to make C run faster. The author claims instead that those resources should be spent on having more slower CPUs, with more threads. This sacrifices single-threaded performance in exchange for a lot more threads executing in parallel. The author cites Sparc Tx CPUs as his ideal processor.
But here’s the thing, the Sparc Tx was a failure. To be fair, it’s mostly a failure because most of the time, people wanted to run old C code instead of new Erlang code. But it was still a failure at running Erlang.
Time after time, engineers keep finding that “out-of-order”, single-threaded performance is still the winner. A good example is ARM processors for both mobile phones and servers. All the theory points to in-order CPUs as being better, but all the products are out-of-order, because this theory is wrong. The custom ARM cores from Apple and Qualcomm used in most high-end phones are so deeply out-of-order they give Intel CPUs competition. The same is true on the server front with the latest Qualcomm Centriq and Cavium ThunderX2 processors, deeply out of order supporting more than 100 instructions in flight.
The Cavium is especially telling. Its ThunderX CPU had 48 simple cores which was replaced with the ThunderX2 having 32 complex, deeply out-of-order cores. The performance increase was massive, even on multithread-friendly workloads. Every competitor to Intel’s dominance in the server space has learned the lesson from Sparc Tx: many wimpy cores is a failure, you need fewer beefy cores. Yes, they don’t need to be as beefy as Intel’s processors, but they need to be close.
Even Intel’s “Xeon Phi” custom chip learned this lesson. This is their GPU-like chip, running 60 cores with 512-bit wide “vector” (sic) instructions, designed for supercomputer applications. Its first version was purely in-order. Its current version is slightly out-of-order. It supports four threads and focuses on basic number crunching, so in-order cores seems to be the right approach, but Intel found in this case that out-of-order processing still provided a benefit. Practice is different than theory.
As an academic, the author of the above article focuses on abstractions. The criticism of C is that it has the wrong abstractions which are hard to optimize, and that if we instead expressed things in the right abstractions, it would be easier to optimize.
This is an intellectually compelling argument, but so far bunk.
The reason is that while the theoretical base language has issues, everyone programs using extensions to the language, like “intrinsics” (C ‘functions’ that map to assembly instructions). Programmers write libraries using these intrinsics, which then the rest of the normal programmers use. In other words, if your criticism is that C is not itself low level enough, it still provides the best access to low level capabilities.
Given that C can access new functionality in CPUs, CPU designers add new paradigms, from SIMD to transaction processing. In other words, while in the 1980s CPUs were designed to optimize C (stacks, scaled pointers), these days CPUs are designed to optimize tasks regardless of language.
The author of that article criticizes the memory/cache hierarchy, claiming it has problems. Yes, it has problems, but only compared to how well it normally works. The author praises the many simple cores/threads idea as hiding memory latency with little caching, but misses the point that caches also dramatically increase memory bandwidth. Intel processors are optimized to read a whopping 256 bits every clock cycle from L1 cache. Main memory bandwidth is orders of magnitude slower.
The author goes onto criticize cache coherency as a problem. C uses it, but other languages like Erlang don’t need it. But that’s largely due to the problems each languages solves. Erlang solves the problem where a large number of threads work on largely independent tasks, needing to send only small messages to each other across threads. The problems C solves is when you need many threads working on a huge, common set of data.
For example, consider the “intrusion prevention system”. Any thread can process any incoming packet that corresponds to any region of memory. There’s no practical way of solving this problem without a huge coherent cache. It doesn’t matter which language or abstractions you use, it’s the fundamental constraint of the problem being solved. RDMA is an important concept that’s moved from supercomputer applications to the data center, such as with memcached. Again, we have the problem of huge quantities (terabytes worth) shared among threads rather than small quantities (kilobytes).
The fundamental issue the author of the the paper is ignoring is decreasing marginal returns. Moore’s Law has gifted us more transistors than we can usefully use. We can’t apply those additional registers to just one thing, because the useful returns we get diminish.
For example, Intel CPUs have two hardware threads per core. That’s because there are good returns by adding a single additional thread. However, the usefulness of adding a third or fourth thread decreases. That’s why many CPUs have only two threads, or sometimes four threads, but no CPU has 16 threads per core.
You can apply the same discussion to any aspect of the CPU, from register count, to SIMD width, to cache size, to out-of-order depth, and so on. Rather than focusing on one of these things and increasing it to the extreme, CPU designers make each a bit larger every process tick that adds more transistors to the chip.
The same applies to cores. It’s why the “more simpler cores” strategy fails, because more cores have their own decreasing marginal returns. Instead of adding cores tied to limited memory bandwidth, it’s better to add more cache. Such cache already increases the size of the cores, so at some point it’s more effective to add a few out-of-order features to each core rather than more cores. And so on.
The question isn’t whether we can change this paradigm and radically redesign CPUs to match some academic’s view of the perfect abstraction. Instead, the goal is to find new uses for those additional transistors. For example, “message passing” is a useful abstraction in languages like Go and Erlang that’s often more useful than sharing memory. It’s implemented with shared memory and atomic instructions, but I can’t help but think it couldn’t better be done with direct hardware support.
Of course, as soon as they do that, it’ll become an intrinsic in C, then added to languages like Go and Erlang.
Summary Academics live in an ideal world of abstractions, the rest of us live in practical reality. The reality is that vast majority of programmers work with the C family of languages (JavaScript, Go, etc.), whereas academics love the epiphanies they learned using other languages, especially function languages. CPUs are only superficially designed to run C and “PDP-11 compatibility”. Instead, they keep adding features to support other abstractions, abstractions available to C. They are driven by decreasing marginal returns — they would love to add new abstractions to the hardware because it’s a cheap way to make use of additional transitions. Academics are wrong believing that the entire system needs to be redesigned from scratch. Instead, they just need to come up with new abstractions CPU designers can add.
Today we’re launching a new partnership between the Scouts and the Raspberry Pi Foundation that will help tens of thousands of young people learn crucial digital skills for life. In this blog post, I want to explain what we’ve got planned, why it matters, and how you can get involved.
This is personal
First, let me tell you why this partnership matters to me. As a child growing up in North Wales in the 1980s, Scouting changed my life. My time with 2nd Rhyl provided me with countless opportunities to grow and develop new skills. It taught me about teamwork and community in ways that continue to shape my decisions today.
As my own kids (now seven and ten) have joined Scouting, I’ve seen the same opportunities opening up for them, and like so many parents, I’ve come back to the movement as a volunteer to support their local section. So this is deeply personal for me, and the same is true for many of my colleagues at the Raspberry Pi Foundation who in different ways have been part of the Scouting movement.
That shouldn’t come as a surprise. Scouting and Raspberry Pi share many of the same values. We are both community-led movements that aim to help young people develop the skills they need for life. We are both powered by an amazing army of volunteers who give their time to support that mission. We both care about inclusiveness, and pride ourselves on combining fun with learning by doing.
Raspberry Pi
Raspberry Pi started life in 2008 as a response to the problem that too many young people were growing up without the skills to create with technology. Our goal is that everyone should be able to harness the power of computing and digital technologies, for work, to solve problems that matter to them, and to express themselves creatively.
In 2012 we launched our first product, the world’s first $35 computer. Just six years on, we have sold over 20 million Raspberry Pi computers and helped kickstart a global movement for digital skills.
The Raspberry Pi Foundation now runs the world’s largest network of volunteer-led computing clubs (Code Clubs and CoderDojos), and creates free educational resources that are used by millions of young people all over the world to learn how to create with digital technologies. And lots of what we are able to achieve is because of partnerships with fantastic organisations that share our goals. For example, through our partnership with the European Space Agency, thousands of young people have written code that has run on two Raspberry Pi computers that Tim Peake took to the International Space Station as part of his Mission Principia.
Digital makers
Today we’re launching the new Digital Maker Staged Activity Badge to help tens of thousands of young people learn how to create with technology through Scouting. Over the past few months, we’ve been working with the Scouts all over the UK to develop and test the new badge requirements, along with guidance, project ideas, and resources that really make them work for Scouting. We know that we need to get two things right: relevance and accessibility.
Relevance is all about making sure that the activities and resources we provide are a really good fit for Scouting and Scouting’s mission to equip young people with skills for life. From the digital compass to nature cameras and the reinvented wide game, we’ve had a lot of fun thinking about ways we can bring to life the crucial role that digital technologies can play in the outdoors and adventure.
We are beyond excited to be launching a new partnership with the Raspberry Pi Foundation, which will help tens of thousands of young people learn digital skills for life.
We also know that there are great opportunities for Scouts to use digital technologies to solve social problems in their communities, reflecting the movement’s commitment to social action. Today we’re launching the first set of project ideas and resources, with many more to follow over the coming weeks and months.
Accessibility is about providing every Scout leader with the confidence, support, and kit to enable them to offer the Digital Maker Staged Activity Badge to their young people. A lot of work and care has gone into designing activities that require very little equipment: for example, activities at Stages 1 and 2 can be completed with a laptop without access to the internet. For the activities that do require kit, we will be working with Scout Stores and districts to make low-cost kit available to buy or loan.
We’re producing accessible instructions, worksheets, and videos to help leaders run sessions with confidence, and we’ll also be planning training for leaders. We will work with our network of Code Clubs and CoderDojos to connect them with local sections to organise joint activities, bringing both kit and expertise along with them.
Get involved
Today’s launch is just the start. We’ll be developing our partnership over the next few years, and we can’t wait for you to join us in getting more young people making things with technology.
Take a look at the brand-new Raspberry Pi resources designed especially for Scouts, to get young people making and creating right away.
As you can see from my EC2 Instance History post, we add new instance types on a regular and frequent basis. Driven by increasingly powerful processors and designed to address an ever-widening set of use cases, the size and diversity of this list reflects the equally diverse group of EC2 customers!
Near the bottom of that list you will find the new compute-intensive C5 instances. With a 25% to 50% improvement in price-performance over the C4 instances, the C5 instances are designed for applications like batch and log processing, distributed and or real-time analytics, high-performance computing (HPC), ad serving, highly scalable multiplayer gaming, and video encoding. Some of these applications can benefit from access to high-speed, ultra-low latency local storage. For example, video encoding, image manipulation, and other forms of media processing often necessitates large amounts of I/O to temporary storage. While the input and output files are valuable assets and are typically stored as Amazon Simple Storage Service (S3) objects, the intermediate files are expendable. Similarly, batch and log processing runs in a race-to-idle model, flushing volatile data to disk as fast as possible in order to make full use of compute resources.
New C5d Instances with Local Storage In order to meet this need, we are introducing C5 instances equipped with local NVMe storage. Available for immediate use in 5 regions, these instances are a great fit for the applications that I described above, as well as others that you will undoubtedly dream up! Here are the specs:
Instance Name
vCPUs
RAM
Local Storage
EBS Bandwidth
Network Bandwidth
c5d.large
2
4 GiB
1 x 50 GB NVMe SSD
Up to 2.25 Gbps
Up to 10 Gbps
c5d.xlarge
4
8 GiB
1 x 100 GB NVMe SSD
Up to 2.25 Gbps
Up to 10 Gbps
c5d.2xlarge
8
16 GiB
1 x 225 GB NVMe SSD
Up to 2.25 Gbps
Up to 10 Gbps
c5d.4xlarge
16
32 GiB
1 x 450 GB NVMe SSD
2.25 Gbps
Up to 10 Gbps
c5d.9xlarge
36
72 GiB
1 x 900 GB NVMe SSD
4.5 Gbps
10 Gbps
c5d.18xlarge
72
144 GiB
2 x 900 GB NVMe SSD
9 Gbps
25 Gbps
Other than the addition of local storage, the C5 and C5d share the same specs. Both are powered by 3.0 GHz Intel Xeon Platinum 8000-series processors, optimized for EC2 and with full control over C-states on the two largest sizes, giving you the ability to run two cores at up to 3.5 GHz using Intel Turbo Boost Technology.
You can use any AMI that includes drivers for the Elastic Network Adapter (ENA) and NVMe; this includes the latest Amazon Linux, Microsoft Windows (Server 2008 R2, Server 2012, Server 2012 R2 and Server 2016), Ubuntu, RHEL, SUSE, and CentOS AMIs.
Here are a couple of things to keep in mind about the local NVMe storage:
Naming – You don’t have to specify a block device mapping in your AMI or during the instance launch; the local storage will show up as one or more devices (/dev/nvme*1 on Linux) after the guest operating system has booted.
Encryption – Each local NVMe device is hardware encrypted using the XTS-AES-256 block cipher and a unique key. Each key is destroyed when the instance is stopped or terminated.
Lifetime – Local NVMe devices have the same lifetime as the instance they are attached to, and do not stick around after the instance has been stopped or terminated.
Available Now C5d instances are available in On-Demand, Reserved Instance, and Spot form in the US East (N. Virginia), US West (Oregon), EU (Ireland), US East (Ohio), and Canada (Central) Regions. Prices vary by Region, and are just a bit higher than for the equivalent C5 instances.
Three soldiers from Blandford Camp have successfully designed and built an autonomous robot as part of their Foreman of Signals Course at the Dorset Garrison.
Autonomous robots
Forces Radio BFBS carried a story last week about Staff Sergeant Jolley, Sergeant Rana, and Sergeant Paddon, also known as the “Project ROVER” team. As part of their Foreman of Signals training, their task was to design an autonomous robot that can move between two specified points, take a temperature reading, and transmit the information to a remote computer. The team comments that, while semi-autonomous robots have been used as far back as 9/11 for tasks like finding people trapped under rubble, nothing like their robot and on a similar scale currently exists within the British Army.
The ROVER buggy
Their build is named ROVER, which stands for Remote Obstacle aVoiding Environment Robot. It’s a buggy that moves on caterpillar tracks, and it’s tethered; we wonder whether that might be because it doesn’t currently have an on-board power supply. A demo shows the robot moving forward, then changing its path when it encounters an obstacle. The team is using RealVNC‘s remote access software to allow ROVER to send data back to another computer.
Applications for ROVER
Dave Ball, Senior Lecturer in charge of the Foreman of Signals course, comments that the project is “a fantastic opportunity for [the team] to, even only halfway through the course, showcase some of the stuff they’ve learnt and produce something that’s really quite exciting.” The Project ROVER team explains that the possibilities for autonomous robots like this one are extensive: they include mine clearance, bomb disposal, and search-and-rescue campaigns. They point out that existing semi-autonomous hardware is not as easy to program as their build. In contrast, they say, “with the invention of the Raspberry Pi, this has allowed three very inexperienced individuals to program a robot very capable of doing these things.”
We make Raspberry Pi computers because we want building things with technology to be as accessible as possible. So it’s great to see a project like this, made by people who aren’t techy and don’t have a lot of computing experience, but who want to solve a problem and see that the Pi is an affordable and powerful tool that can help.
By continuing to use the site, you agree to the use of cookies. more information
The cookie settings on this website are set to "allow cookies" to give you the best browsing experience possible. If you continue to use this website without changing your cookie settings or you click "Accept" below then you are consenting to this.