Well, we actually won’t show you how we create the magic in our big OATH consumer mail factory. But nevertheless we wanted to share how interested developers could leverage some of our unique features we offer for our Yahoo and AOL Mail customers.
To drive experiences like our travel and shopping smart views or message threading, we tag qualified mails with something we call DECOS and THREADID. While we will not indulge in explaining how exactly we use them internally, we wanted to share how they can be used and accessed through IMAP.
So let’s just look at a sample IMAP command chain. We’ll just assume that you are familiar with the IMAP protocol at this point and you know how to properly talk to an IMAP server.
So here’s how you would retrieve DECO and THREADIDs for specific messages:
Researchers havedemonstrated the ability to send inaudible commands to voice assistants like Alexa, Siri, and Google Assistant.
Over the last two years, researchers in China and the United States have begun demonstrating that they can send hidden commands that are undetectable to the human ear to Apple’s Siri, Amazon’s Alexa and Google’s Assistant. Inside university labs, the researchers have been able to secretly activate the artificial intelligence systems on smartphones and smart speakers, making them dial phone numbers or open websites. In the wrong hands, the technology could be used to unlock doors, wire money or buy stuff online – simply with music playing over the radio.
A group of students from University of California, Berkeley, and Georgetown University showed in 2016 that they could hide commands in white noise played over loudspeakers and through YouTube videos to get smart devices to turn on airplane mode or open a website.
This month, some of those Berkeley researchers published a research paper that went further, saying they could embed commands directly into recordings of music or spoken text. So while a human listener hears someone talking or an orchestra playing, Amazon’s Echo speaker might hear an instruction to add something to your shopping list.
Containers are, of course, all the rage these days; in fact, during his 2018 Legal and Licensing Workshop (LLW) talk, Dirk Hohndel said with a grin that he hears “containers may take off”. But, while containers are easy to set up and use, license compliance for containers is “incredibly hard”. He has been spending “way too much time” thinking about container compliance recently and, beyond the standard “let’s go shopping” solution to hard problems, has come up with some ideas. Hohndel is a longtime member of the FOSS community who is now the chief open source officer at VMware—a company that ships some container images.
Brian Carrigan found the remains of a $500 supermarket barcode scanner at a Scrap Exchange for $6.25, and decided to put it to use as a shopping list builder for his pantry.
Upcycling from scraps
Brian wasn’t planning to build the Wunderscan. But when he stumbled upon the remains of a $500 Cubit barcode scanner at his local reuse center, his inner maker took hold of the situation.
It had been ripped from its connectors and had unlabeled wires hanging from it; a bit of hardware gore if such a thing exists. It was labeled on sale for $6.25, and a quick search revealed that it originally retailed at over $500… I figured I would try to reverse engineer it, and if all else fails, scrap it for the laser and motor.
Brian decided that the scanner, once refurbished with a Raspberry Pi Zero W and new wiring, would make a great addition to his home pantry as a shopping list builder using Wunderlist. “I thought a great use of this would be to keep near our pantry so that when we are out of a spice or snack, we could just scan the item and it would get posted to our shopping list.”
Reverse engineering
The datasheet for the Cubit scanner was available online, and Brian was able to discover the missing pieces required to bring the unit back to working order.
However, no wiring diagram was provided with the datasheet, so he was forced to figure out the power connections and signal output for himself using a bit of luck and an oscilloscope.
Now that the part was powered and working, all that was left was finding the RS232 transmit line. I used my oscilloscope to do this part and found it by scanning items and looking for the signal. It was not long before this wire was found and I was able to receive UPC codes.
Scanning codes and building (Wunder)lists
When the scanner reads a barcode, it sends the ASCII representation of a UPC code to the attached Raspberry Pi Zero W. Brian used the free UPC Database to convert each code to the name of the corresponding grocery item. Next, he needed to add it to the Wunderlist shopping list that his wife uses for grocery shopping.
Wunderlist provides an API token so users can incorporate list-making into their projects. With a little extra coding, Brian was able to convert the scanning of a pantry item’s barcode into a new addition to the family shopping list.
Curious as to how it all came together? You can find information on the project, including code and hardware configurations, on Brian’s blog. If you’ve built something similar, we’d love to see it in the comments below.
HackSpace magazine is back with our brand-new issue 6, available for you on shop shelves, in your inbox, and on our website right now.
Inside Hackspace magazine 6
Paper is probably the first thing you ever used for making, and for good reason: in no other medium can you iterate through 20 designs at the cost of only a few pennies. We’ve roped in Rob Ives to show us how to make a barking paper dog with moveable parts and a cam mechanism. Even better, the magazine includes this free paper automaton for you to make yourself. That’s right: free!
At the other end of the scale, there’s the forge, where heat, light, and noise combine to create immutable steel. We speak to Alec Steele, YouTuber, blacksmith, and philosopher, about his amazingly beautiful Damascus steel creations, and about why there’s no difference between grinding a knife and blowing holes in a mountain to build a road through it.
Do it yourself
You’ve heard of reading glasses — how about glasses that read for you? Using a camera, optical character recognition software, and a text-to-speech engine (and of course a Raspberry Pi to hold it all together), reader Andrew Lewis has hacked together his own system to help deal with age-related macular degeneration.
It’s the definition of hacking: here’s a problem, there’s no solution in the shops, so you go and build it yourself!
Radio
60 years ago, the cutting edge of home hacking was the transistor radio. Before the internet was dreamt of, the transistor radio made the world smaller and brought people together. Nowadays, the components you need to build a radio are cheap and easily available, so if you’re in any way electronically inclined, building a radio is an ideal excuse to dust off your soldering iron.
Tutorials
If you’re a 12-month subscriber (if you’re not, you really should be), you’ve no doubt been thinking of all sorts of things to do with the Adafruit Circuit Playground Express we gave you for free. How about a sewable circuit for a canvas bag? Use the accelerometer to detect patterns of movement — walking, for example — and flash a series of lights in response. It’s clever, fun, and an easy way to add some programmable fun to your shopping trips.
We’re also making gin, hacking a children’s toy car to unlock more features, and getting started with robot sumo to fill the void left by the cancellation of Robot Wars.
All this, plus an 11-metre tall mechanical miner, in HackSpace magazine issue 6 — subscribe here from just £4 an issue or get the PDF version for free. You can also find HackSpace magazine in WHSmith, Tesco, Sainsbury’s, and independent newsagents in the UK. If you live in the US, check out your local Barnes & Noble, Fry’s, or Micro Center next week. We’re also shipping to stores in Australia, Hong Kong, Canada, Singapore, Belgium, and Brazil, so be sure to ask your local newsagent whether they’ll be getting HackSpace magazine.
This post was contributed by Christoph Kassen, AWS Solutions Architect
With the emergence of microservices architectures, the number of services that are part of a web application has increased a lot. It’s not unusual anymore to build and operate hundreds of separate microservices, all as part of the same application.
Think of a typical e-commerce application that displays products, recommends related items, provides search and faceting capabilities, and maintains a shopping cart. Behind the scenes, many more services are involved, such as clickstream tracking, ad display, targeting, and logging. When handling a single user request, many of these microservices are involved in responding. Understanding, analyzing, and debugging the landscape is becoming complex.
AWS X-Ray provides application-tracing functionality, giving deep insights into all microservices deployed. With X-Ray, every request can be traced as it flows through the involved microservices. This provides your DevOps teams the insights they need to understand how your services interact with their peers and enables them to analyze and debug issues much faster.
With microservices architectures, every service should be self-contained and use the technologies best suited for the problem domain. Depending on how the service is built, it is deployed and hosted differently.
One of the most popular choices for packaging and deploying microservices at the moment is containers. The application and its dependencies are clearly defined, the container can be built on CI infrastructure, and the deployment is simplified greatly. Container schedulers, such as Kubernetes and Amazon Elastic Container Service (Amazon ECS), greatly simplify deploying and running containers at scale.
Running X-Ray on Kubernetes
Kubernetes is an open-source container management platform that automates deployment, scaling, and management of containerized applications.
This post shows you how to run X-Ray on top of Kubernetes to provide application tracing capabilities to services hosted on a Kubernetes cluster. Additionally, X-Ray also works for applications hosted on Amazon ECS, AWS Elastic Beanstalk, Amazon EC2, and even when building services with AWS Lambda functions. This flexibility helps you pick the technology you need while still being able to trace requests through all of the services running within your AWS environment.
The complete code, including a simple Node.js based demo application is available in the corresponding aws-xray-kubernetes GitHub repository, so you can quickly get started with X-Ray.
The sample application within the repository consists of two simple microservices, Service-A and Service-B. The following architecture diagram shows how each service is deployed with two Pods on the Kubernetes cluster:
Requests are sent to the Service-A from clients.
Service-A then contacts Service-B.
The requests are serviced by Service-B.
Service-B adds a random delay to each request to show different response times in X-Ray.
To test out the sample applications on your own Kubernetes cluster use the Dockerfiles provided in the GitHub repository, build the two containers, push them to a container registry and apply the yaml configuration with kubectl to your Kubernetes cluster.
Prerequisites
If you currently do not have a cluster running within your AWS environment, take a look at Amazon Elastic Container Service for Kubernetes (Amazon EKS), or use the instructions from the Manage Kubernetes Clusters on AWS Using Kops blog post to spin up a self-managed Kubernetes cluster.
Security Setup for X-Ray
The nodes in the Kubernetes cluster hosting web application Pods need IAM permissions so that the Pods hosting the X-Ray daemon can send traces to the X-Ray service backend.
The easiest way is to set up a new IAM policy allowing all worker nodes within your Kubernetes cluster to write data to X-Ray. In the IAM console or AWS CLI, create a new policy like the following:
Adjust the AWS account ID within the resource. Give the policy a descriptive name, such as k8s-nodes-XrayWriteAccess.
Next, attach the policy to the instance profile for the Kubernetes worker nodes. Therefore, select the IAM role assigned to your worker instances (check the EC2 console if you are unsure) and attach the IAM policy created earlier to it. You can attach the IAM permissions directly from the command line with the following command:
aws iam attach-role-policy --role-name k8s-nodes --policy-arn arn:aws:iam::000000000000:policy/k8s-nodes-XrayWriteAccess
Build the X-Ray daemon Docker image
The X-Ray daemon is available as a single, statically compiled binary that can be downloaded directly from the AWS website.
The first step is to create a Docker container hosting the X-Ray daemon binary and exposing port 2000 via UDP. The daemon is either configured via command line parameters or a configuration file. The most important option is to set the listen port to the correct IP address so that tracing requests from application Pods can be accepted.
To build your own Docker image containing the X-Ray daemon, use the Dockerfile shown below.
# Use Amazon Linux Version 1
FROM amazonlinux:1
# Download latest 2.x release of X-Ray daemon
RUN yum install -y unzip && \
cd /tmp/ && \
curl https://s3.dualstack.us-east-2.amazonaws.com/aws-xray-assets.us-east-2/xray-daemon/aws-xray-daemon-linux-2.x.zip > aws-xray-daemon-linux-2.x.zip && \
unzip aws-xray-daemon-linux-2.x.zip && \
cp xray /usr/bin/xray && \
rm aws-xray-daemon-linux-2.x.zip && \
rm cfg.yaml
# Expose port 2000 on udp
EXPOSE 2000/udp
ENTRYPOINT ["/usr/bin/xray"]
# No cmd line parameters, use default configuration
CMD ['']
This container image is based on Amazon Linux which results in a small container image. To build and tag the container image run docker build -t xray:latest.
Create an Amazon ECR repository
Create a repository in Amazon Elastic Container Registry (Amazon ECR) to hold your X-Ray Docker image. Your Kubernetes cluster uses this repository to pull the image from upon deployment of the X-Ray Pods.
Use the following CLI command to create your repository or alternatively the AWS Management Console:
Make sure that you configure the kubectl tool properly for your cluster to be able to deploy the X-Ray Pod onto your Kubernetes cluster. After the X-Ray Pods are deployed to your Kubernetes cluster, applications can send tracing information to the X-Ray daemon on their host. The biggest advantage is that you do not need to provide X-Ray as a sidecar container alongside your application. This simplifies the configuration and deployment of your applications and overall saves resources on your cluster.
To deploy the X-Ray daemon as Pods onto your Kubernetes cluster, run the following from the cloned GitHub repository:
kubectl apply -f xray-k8s-daemonset.yaml
This deploys and maintains an X-Ray Pod on each worker node, which is accepting tracing data from your microservices and routing it to one of the X-Ray pods. When deploying the container using a DaemonSet, the X-Ray port is also exposed directly on the host. This way, clients can connect directly to the daemon on your node. This avoids unnecessary network traffic going across your cluster.
Connecting to the X-Ray daemon
To integrate application tracing with your applications, use the X-Ray SDK for one of the supported programming languages:
Java
Node.js
.NET (Framework and Core)
Go
Python
The SDKs provide classes and methods for generating and sending trace data to the X-Ray daemon. Trace data includes information about incoming HTTP requests served by the application, and calls that the application makes to downstream services using the AWS SDK or HTTP clients.
By default, the X-Ray SDK expects the daemon to be available on 127.0.0.1:2000. This needs to be changed in this setup, as the daemon is not part of each Pod but hosted within its own Pod.
The deployed X-Ray DaemonSet exposes all Pods via the Kubernetes service discovery, so applications can use this endpoint to discover the X-Ray daemon. If you deployed to the default namespace, the endpoint is:
xray-service.default
Applications now need to set the daemon address either with the AWS_XRAY_DAEMON_ADDRESS environment variable (preferred) or directly within the SDK setup code:
To set up the environment variable, include the following information in your Kubernetes application deployment description YAML. That exposes the X-Ray service address via the environment variable, where it is picked up automatically by the SDK.
Sending tracing information from your application is straightforward with the X-Ray SDKs. The example code below serves as a starting point to instrument your application with traces. Take a look at the two sample applications in the GitHub repository to see how to send traces from Service A to Service B. The diagram below visualizes the flow of requests between the services.
Because your application is running within containers, enable both the EC2Plugin and ECSPlugin, which gives you information about the Kubernetes node hosting the Pod as well as the container name. Despite the name ECSPlugin, this plugin gives you additional information about your container when running your application on Kubernetes.
var app = express();
//...
var AWSXRay = require('aws-xray-sdk');
AWSXRay.config([XRay.plugins.EC2Plugin, XRay.plugins.ECSPlugin]);
app.use(AWSXRay.express.openSegment('defaultName')); //required at the start of your routes
app.get('/', function (req, res) {
res.render('index');
});
app.use(AWSXRay.express.closeSegment()); //required at the end of your routes / first in error handling routes
For more information about all options and possibilities to instrument your application code, see the X-Ray documentation page for the corresponding SDK information.
The picture below shows the resulting service map that provides insights into the flow of requests through the microservice landscape. You can drill down here into individual traces and see which path each request has taken.
From the service map, you can drill down into individual requests and see where they originated from and how much time was spent in each service processing the request.
You can also view details about every individual segment of the trace by clicking on it. This displays more details.
On the Resources tab, you will see the Kubernetes Pod picked up by the ECSPlugin, which handled the request, as well as the instance that Pod was running on.
Summary
In this post, I shared how to deploy and run X-Ray on an existing Kubernetes cluster. Using tracing gives you deep insights into your applications to ease analysis and spot potential problems early. With X-Ray, you get these insights for all your applications running on AWS, no matter if they are hosted on Amazon ECS, AWS Lambda, or a Kubernetes cluster.
My career is a different story. Over the past two decades and a change, I went from writing CGI scripts and setting up WAN routers for a chain of shopping malls, to doing pentests for institutional customers, to designing a series of network monitoring platforms and handling incident response for a big telco, to building and running the product security org for one of the largest companies in the world. It’s been an interesting ride – and now that I’m on the hook for the well-being of about 100 folks across more than a dozen subteams around the world, I’ve been thinking a bit about the lessons learned along the way.
Of course, I’m a bit hesitant to write such a post: sometimes, your efforts pan out not because of your approach, but despite it – and it’s possible to draw precisely the wrong conclusions from such anecdotes. Still, I’m very proud of the culture we’ve created and the caliber of folks working on our team. It happened through the work of quite a few talented tech leads and managers even before my time, but it did not happen by accident – so I figured that my observations may be useful for some, as long as they are taken with a grain of salt.
But first, let me start on a somewhat somber note: what nobody tells you is that one’s level on the leadership ladder tends to be inversely correlated with several measures of happiness. The reason is fairly simple: as you get more senior, a growing number of people will come to you expecting you to solve increasingly fuzzy and challenging problems – and you will no longer be patted on the back for doing so. This should not scare you away from such opportunities, but it definitely calls for a particular mindset: your motivation must come from within. Look beyond the fight-of-the-day; find satisfaction in seeing how far your teams have come over the years.
With that out of the way, here’s a collection of notes, loosely organized into three major themes.
The curse of a techie leader
Perhaps the most interesting observation I have is that for a person coming from a technical background, building a healthy team is first and foremost about the subtle art of letting go.
There is a natural urge to stay involved in any project you’ve started or helped improve; after all, it’s your baby: you’re familiar with all the nuts and bolts, and nobody else can do this job as well as you. But as your sphere of influence grows, this becomes a choke point: there are only so many things you could be doing at once. Just as importantly, the project-hoarding behavior robs more junior folks of the ability to take on new responsibilities and bring their own ideas to life. In other words, when done properly, delegation is not just about freeing up your plate; it’s also about empowerment and about signalling trust.
Of course, when you hand your project over to somebody else, the new owner will initially be slower and more clumsy than you; but if you pick the new leads wisely, give them the right tools and the right incentives, and don’t make them deathly afraid of messing up, they will soon excel at their new jobs – and be grateful for the opportunity.
A related affliction of many accomplished techies is the conviction that they know the answers to every question even tangentially related to their domain of expertise; that belief is coupled with a burning desire to have the last word in every debate. When practiced in moderation, this behavior is fine among peers – but for a leader, one of the most important skills to learn is knowing when to keep your mouth shut: people learn a lot better by experimenting and making small mistakes than by being schooled by their boss, and they often try to read into your passing remarks. Don’t run an authoritarian camp focused on total risk aversion or perfectly efficient resource management; just set reasonable boundaries and exit conditions for experiments so that they don’t spiral out of control – and be amazed by the results every now and then.
Death by planning
When nothing is on fire, it’s easy to get preoccupied with maintaining the status quo. If your current headcount or budget request lists all the same projects as last year’s, or if you ever find yourself ending an argument by deferring to a policy or a process document, it’s probably a sign that you’re getting complacent. In security, complacency usually ends in tears – and when it doesn’t, it leads to burnout or boredom.
In my experience, your goal should be to develop a cadre of managers or tech leads capable of coming up with clever ideas, prioritizing them among themselves, and seeing them to completion without your day-to-day involvement. In your spare time, make it your mission to challenge them to stay ahead of the curve. Ask your vendor security lead how they’d streamline their work if they had a 40% jump in the number of vendors but no extra headcount; ask your product security folks what’s the second line of defense or containment should your primary defenses fail. Help them get good ideas off the ground; set some mental success and failure criteria to be able to cut your losses if something does not pan out.
Of course, malfunctions happen even in the best-run teams; to spot trouble early on, instead of overzealous project tracking, I found it useful to encourage folks to run a data-driven org. I’d usually ask them to imagine that a brand new VP shows up in our office and, as his first order of business, asks “why do you have so many people here and how do I know they are doing the right things?”. Not everything in security can be quantified, but hard data can validate many of your assumptions – and will alert you to unseen issues early on.
When focusing on data, it’s important not to treat pie charts and spreadsheets as an art unto itself; if you run a security review process for your company, your CSAT scores are going to reach 100% if you just rubberstamp every launch request within ten minutes of receiving it. Make sure you’re asking the right questions; instead of “how satisfied are you with our process”, try “is your product better as a consequence of talking to us?”
Whenever things are not progressing as expected, it is a natural instinct to fall back to micromanagement, but it seldom truly cures the ill. It’s probable that your team disagrees with your vision or its feasibility – and that you’re either not listening to their feedback, or they don’t think you’d care. It’s good to assume that most of your employees are as smart or smarter than you; barking your orders at them more loudly or more frequently does not lead anyplace good. It’s good to listen to them and either present new facts or work with them on a plan you can all get behind.
In some circumstances, all that’s needed is honesty about the business trade-offs, so that your team feels like your “partner in crime”, not a victim of circumstance. For example, we’d tell our folks that by not falling behind on basic, unglamorous work, we earn the trust of our VPs and SVPs – and that this translates into the independence and the resources we need to pursue more ambitious ideas without being told what to do; it’s how we game the system, so to speak. Oh: leading by example is a pretty powerful tool at your disposal, too.
The human factor
I’ve come to appreciate that hiring decent folks who can get along with others is far more important than trying to recruit conference-circuit superstars. In fact, hiring superstars is a decidedly hit-and-miss affair: while certainly not a rule, there is a proportion of folks who put the maintenance of their celebrity status ahead of job responsibilities or the well-being of their peers.
For teams, one of the most powerful demotivators is a sense of unfairness and disempowerment. This is where tech-originating leaders can shine, because their teams usually feel that their bosses understand and can evaluate the merits of the work. But it also means you need to be decisive and actually solve problems for them, rather than just letting them vent. You will need to make unpopular decisions every now and then; in such cases, I think it’s important to move quickly, rather than prolonging the uncertainty – but it’s also important to sincerely listen to concerns, explain your reasoning, and be frank about the risks and trade-offs.
Whenever you see a clash of personalities on your team, you probably need to respond swiftly and decisively; being right should not justify being a bully. If you don’t react to repeated scuffles, your best people will probably start looking for other opportunities: it’s draining to put up with constant pie fights, no matter if the pies are thrown straight at you or if you just need to duck one every now and then.
More broadly, personality differences seem to be a much better predictor of conflict than any technical aspects underpinning a debate. As a boss, you need to identify such differences early on and come up with creative solutions. Sometimes, all you need is taking some badly-delivered but valid feedback and having a conversation with the other person, asking some questions that can help them reach the same conclusions without feeling that their worldview is under attack. Other times, the only path forward is making sure that some folks simply don’t run into each for a while.
Finally, dealing with low performers is a notoriously hard but important part of the game. Especially within large companies, there is always the temptation to just let it slide: sideline a struggling person and wait for them to either get over their issues or leave. But this sends an awful message to the rest of the team; for better or worse, fairness is important to most. Simply firing the low performers is seldom the best solution, though; successful recovery cases are what sets great managers apart from the average ones.
Oh, one more thought: people in leadership roles have their allegiance divided between the company and the people who depend on them. The obligation to the company is more formal, but the impact you have on your team is longer-lasting and more intimate. When the obligations to the employer and to your team collide in some way, make sure you can make the right call; it might be one of the the most consequential decisions you’ll ever make.
Jason Barnett used the pots feature of the Monzo banking API to create a simple e-paper display so that his kids can keep track of their pocket money.
Monzo
For those outside the UK: Monzo is a smartphone-based bank that allows costumers to manage their money and payment cards via an app, removing the bank clerk middleman.
In the Monzo banking app, users can set up pots, which allow them to organise their money into various, you guessed it, pots. You want to put aside holiday funds, budget your food shopping, or, like Jason, manage your kids’ pocket money? Using pots is an easy way to do it.
Jason’s Monzo Pot ePaper tracker
After failed attempts at keeping track of his sons’ pocket money via a scrap of paper stuck to the fridge, Jason decided to try a new approach.
He started his build by installing Stretch Lite to the SD card of his Raspberry Pi Zero W. “The Pi will be running headless (without screen, mouse or keyboard)”, he explains on his blog, “so there is no need for a full-fat Raspbian image.” While Stretch Lite was downloading, he set up the Waveshare ePaper HAT on his Zero W. He notes that Pimoroni’s “Inky pHAT would be easiest,” but his tutorial is specific to the Waveshare device.
Before ejecting the SD card, Jason updated the boot partition to allow him to access the Pi via SSH. He talks makers through that process here.
Among the libraries he installed for the project is pyMonzo, a Python wrapper for the Monzo API created by Paweł Adamczak. Monzo is still in its infancy, and the API is partly under construction. Until it’s completed, Paweł’s wrapper offers a more stable way to use it.
After installing the software, it was time to set up the e-paper screen for the tracker. Jason adjusted the code for the API so that the screen reloads information every 15 minutes, displaying the up-to-date amount of pocket money in both kids’ pots.
Here is how Jason describes going to the supermarket with his sons, now that he has completed the tracker:
“Daddy, I want (insert first thing picked up here), I’ve always wanted one of these my whole life!” […] Even though you have never seen that (insert thing here) before, I can quickly open my Monzo app, flick to Account, and say “You have £3.50 in your money box”. If my boy wants it, a 2-second withdrawal is made whilst queueing, and done — he walks away with a new (again, insert whatever he wanted his whole life here) and is happy!
Jason’s blog offers a full breakdown of his project, including all necessary code and the specs for the physical build. Be sure to head over and check it out.
Have you used an API in your projects? What would you build with one?
During Q4, Backblaze deployed 100 petabytes worth of Seagate hard drives to our data centers. The newly deployed Seagate 10 and 12 TB drives are doing well and will help us meet our near term storage needs, but we know we’re going to need more drives — with higher capacities. That’s why the success of new hard drive technologies like Heat-Assisted Magnetic Recording (HAMR) from Seagate are very relevant to us here at Backblaze and to the storage industry in general. In today’s guest post we are pleased to have Mark Re, CTO at Seagate, give us an insider’s look behind the hard drive curtain to tell us how Seagate engineers are developing the HAMR technology and making it market ready starting in late 2018.
What is HAMR and How Does It Enable the High-Capacity Needs of the Future?
Guest Blog Post by Mark Re, Seagate Senior Vice President and Chief Technology Officer
Earlier this year Seagate announced plans to make the first hard drives using Heat-Assisted Magnetic Recording, or HAMR, available by the end of 2018 in pilot volumes. Even as today’s market has embraced 10TB+ drives, the need for 20TB+ drives remains imperative in the relative near term. HAMR is the Seagate research team’s next major advance in hard drive technology.
HAMR is a technology that over time will enable a big increase in the amount of data that can be stored on a disk. A small laser is attached to a recording head, designed to heat a tiny spot on the disk where the data will be written. This allows a smaller bit cell to be written as either a 0 or a 1. The smaller bit cell size enables more bits to be crammed into a given surface area — increasing the areal density of data, and increasing drive capacity.
It sounds almost simple, but the science and engineering expertise required, the research, experimentation, lab development and product development to perfect this technology has been enormous. Below is an overview of the HAMR technology and you can dig into the details in our technical brief that provides a point-by-point rundown describing several key advances enabling the HAMR design.
As much time and resources as have been committed to developing HAMR, the need for its increased data density is indisputable. Demand for data storage keeps increasing. Businesses’ ability to manage and leverage more capacity is a competitive necessity, and IT spending on capacity continues to increase.
History of Increasing Storage Capacity
For the last 50 years areal density in the hard disk drive has been growing faster than Moore’s law, which is a very good thing. After all, customers from data centers and cloud service providers to creative professionals and game enthusiasts rarely go shopping looking for a hard drive just like the one they bought two years ago. The demands of increasing data on storage capacities inevitably increase, thus the technology constantly evolves.
According to the Advanced Storage Technology Consortium, HAMR will be the next significant storage technology innovation to increase the amount of storage in the area available to store data, also called the disk’s “areal density.” We believe this boost in areal density will help fuel hard drive product development and growth through the next decade.
Why do we Need to Develop Higher-Capacity Hard Drives? Can’t Current Technologies do the Job?
Why is HAMR’s increased data density so important?
Data has become critical to all aspects of human life, changing how we’re educated and entertained. It affects and informs the ways we experience each other and interact with businesses and the wider world. IDC research shows the datasphere — all the data generated by the world’s businesses and billions of consumer endpoints — will continue to double in size every two years. IDC forecasts that by 2025 the global datasphere will grow to 163 zettabytes (that is a trillion gigabytes). That’s ten times the 16.1 ZB of data generated in 2016. IDC cites five key trends intensifying the role of data in changing our world: embedded systems and the Internet of Things (IoT), instantly available mobile and real-time data, cognitive artificial intelligence (AI) systems, increased security data requirements, and critically, the evolution of data from playing a business background to playing a life-critical role.
Consumers use the cloud to manage everything from family photos and videos to data about their health and exercise routines. Real-time data created by connected devices — everything from Fitbit, Alexa and smart phones to home security systems, solar systems and autonomous cars — are fueling the emerging Data Age. On top of the obvious business and consumer data growth, our critical infrastructure like power grids, water systems, hospitals, road infrastructure and public transportation all demand and add to the growth of real-time data. Data is now a vital element in the smooth operation of all aspects of daily life.
All of this entails a significant infrastructure cost behind the scenes with the insatiable, global appetite for data storage. While a variety of storage technologies will continue to advance in data density (Seagate announced the first 60TB 3.5-inch SSD unit for example), high-capacity hard drives serve as the primary foundational core of our interconnected, cloud and IoT-based dependence on data.
HAMR Hard Drive Technology
Seagate has been working on heat assisted magnetic recording (HAMR) in one form or another since the late 1990s. During this time we’ve made many breakthroughs in making reliable near field transducers, special high capacity HAMR media, and figuring out a way to put a laser on each and every head that is no larger than a grain of salt.
The development of HAMR has required Seagate to consider and overcome a myriad of scientific and technical challenges including new kinds of magnetic media, nano-plasmonic device design and fabrication, laser integration, high-temperature head-disk interactions, and thermal regulation.
A typical hard drive inside any computer or server contains one or more rigid disks coated with a magnetically sensitive film consisting of tiny magnetic grains. Data is recorded when a magnetic write-head flies just above the spinning disk; the write head rapidly flips the magnetization of one magnetic region of grains so that its magnetic pole points up or down, to encode a 1 or a 0 in binary code.
Increasing the amount of data you can store on a disk requires cramming magnetic regions closer together, which means the grains need to be smaller so they won’t interfere with each other.
Heat Assisted Magnetic Recording (HAMR) is the next step to enable us to increase the density of grains — or bit density. Current projections are that HAMR can achieve 5 Tbpsi (Terabits per square inch) on conventional HAMR media, and in the future will be able to achieve 10 Tbpsi or higher with bit patterned media (in which discrete dots are predefined on the media in regular, efficient, very dense patterns). These technologies will enable hard drives with capacities higher than 100 TB before 2030.
The major problem with packing bits so closely together is that if you do that on conventional magnetic media, the bits (and the data they represent) become thermally unstable, and may flip. So, to make the grains maintain their stability — their ability to store bits over a long period of time — we need to develop a recording media that has higher coercivity. That means it’s magnetically more stable during storage, but it is more difficult to change the magnetic characteristics of the media when writing (harder to flip a grain from a 0 to a 1 or vice versa).
That’s why HAMR’s first key hardware advance required developing a new recording media that keeps bits stable — using high anisotropy (or “hard”) magnetic materials such as iron-platinum alloy (FePt), which resist magnetic change at normal temperatures. Over years of HAMR development, Seagate researchers have tested and proven out a variety of FePt granular media films, with varying alloy composition and chemical ordering.
In fact the new media is so “hard” that conventional recording heads won’t be able to flip the bits, or write new data, under normal temperatures. If you add heat to the tiny spot on which you want to write data, you can make the media’s coercive field lower than the magnetic field provided by the recording head — in other words, enable the write head to flip that bit.
So, a challenge with HAMR has been to replace conventional perpendicular magnetic recording (PMR), in which the write head operates at room temperature, with a write technology that heats the thin film recording medium on the disk platter to temperatures above 400 °C. The basic principle is to heat a tiny region of several magnetic grains for a very short time (~1 nanoseconds) to a temperature high enough to make the media’s coercive field lower than the write head’s magnetic field. Immediately after the heat pulse, the region quickly cools down and the bit’s magnetic orientation is frozen in place.
Applying this dynamic nano-heating is where HAMR’s famous “laser” comes in. A plasmonic near-field transducer (NFT) has been integrated into the recording head, to heat the media and enable magnetic change at a specific point. Plasmonic NFTs are used to focus and confine light energy to regions smaller than the wavelength of light. This enables us to heat an extremely small region, measured in nanometers, on the disk media to reduce its magnetic coercivity,
Moving HAMR Forward
As always in advanced engineering, the devil — or many devils — is in the details. As noted earlier, our technical brief provides a point-by-point short illustrated summary of HAMR’s key changes.
Although hard work remains, we believe this technology is nearly ready for commercialization. Seagate has the best engineers in the world working towards a goal of a 20 Terabyte drive by 2019. We hope we’ve given you a glimpse into the amount of engineering that goes into a hard drive. Keeping up with the world’s insatiable appetite to create, capture, store, secure, manage, analyze, rapidly access and share data is a challenge we work on every day.
With thousands of HAMR drives already being made in our manufacturing facilities, our internal and external supply chain is solidly in place, and volume manufacturing tools are online. This year we began shipping initial units for customer tests, and production units will ship to key customers by the end of 2018. Prepare for breakthrough capacities.
Looking for the perfect Christmas gift for a beloved maker in your life? Maybe you’d like to give a relative or friend a taste of the world of coding and Raspberry Pi? Whatever you’re looking for, the Raspberry Pi Christmas shopping list will point you in the right direction.
For those getting started
Thinking about introducing someone special to the wonders of Raspberry Pi during the holidays? Although you can set up your Pi with peripherals from around your home, such as a mobile phone charger, your PC’s keyboard, and the old mouse dwelling in an office drawer, a starter kit is a nice all-in-one package for the budding coder.
You can also buy the Raspberry Pi Press’s brand-new Raspberry Pi Beginners Book, which includes a Raspberry Pi Zero W, a case, a ready-made SD card, and adapter cables.
Once you’ve presented a lucky person with their first Raspberry Pi, it’s time for them to spread their maker wings and learn some new skills.
If you’re looking for something for a confident digital maker, you can’t go wrong with adding to their arsenal of electric and electronic bits and bobs that are no doubt cluttering drawers and boxes throughout their house.
Components such as servomotors, displays, and sensors are staples of the maker world. And when it comes to jumper wires, buttons, and LEDs, one can never have enough.
You could also consider getting your person a soldering iron, some helpings hands, or small tools such as a Dremel or screwdriver set.
And to make their life a little less messy, pop it all inside a Really Useful Box…because they’re really useful.
For kit makers
While some people like to dive into making head-first and to build whatever comes to mind, others enjoy working with kits.
The Naturebytes kit allows you to record the animal visitors of your garden with the help of a camera and a motion sensor. Footage of your local badgers, birds, deer, and more will be saved to an SD card, or tweeted or emailed to you if it’s in range of WiFi.
Coretec’s Tiny 4WD is a kit for assembling a Pi Zero–powered remote-controlled robot at home. Not only is the robot adorable, building it also a great introduction to motors and wireless control.
Finally, why not help your favourite maker create their own gaming arcade using the Arcade Building Kit from The Pi Hut?
For the reader
For those who like to curl up with a good read, or spend too much of their day on public transport, a book or magazine subscription is the perfect treat.
For makers, hackers, and those interested in new technologies, our brand-new HackSpace magazine and the ever popular community magazine The MagPi are ideal. Both are available via a physical or digital subscription, and new subscribers to The MagPi also receive a free Raspberry Pi Zero W plus case.
Looking for something small to keep your loved ones occupied on Christmas morning? Or do you have to buy a Secret Santa gift for the office tech? Here are some wonderful stocking fillers to fill your boots with this season.
The Pi Hut 3D Xmas Tree: available as both a pre-soldered and a DIY version, this gadget will work with any 40-pin Raspberry Pi and allows you to create your own mini light show.
Google AIY Voice kit: build your own home assistant using a Raspberry Pi, the MagPi Essentials guide, and this brand-new kit. “Google, play Mariah Carey again…”
Pimoroni’s Raspberry Pi Zero W Project Kits offer everything you need, including the Pi, to make your own time-lapse cameras, music players, and more.
The official Raspberry Pi Sense HAT, Camera Module, and cases for the Pi 3 and Pi Zero will complete the collection of any Raspberry Pi owner, while also opening up exciting project opportunities.
LEGO Idea’s bought out this amazing ‘Women of NASA’ set, and I thought it would be fun to build, play and learn from these inspiring women! First up, let’s discover a little more about Sally Ride and Mae Jemison, two AWESOME ASTRONAUTS!
Treat the kids, and big kids, in your life to the newest LEGO Ideas set, the Women of NASA — starring Nancy Grace Roman, Margaret Hamilton, Sally Ride, and Mae Jemison!
Explore the world of wearables with Pimoroni’s sewable, hackable, wearable, adorable Bearables kits.
Add lights and motors to paper creations with the Activating Origami Kit, available from The Pi Hut.
With so many amazing kits, HATs, and books available from members of the Raspberry Pi community, it’s hard to only pick a few. Have you found something splendid for the maker in your life? Maybe you’ve created your own kit that uses the Raspberry Pi? Share your favourites with us in the comments below or via our social media accounts.
Glenn Gore here, Chief Architect for AWS. I’m in Las Vegas this week — with 43K others — for re:Invent 2017. We’ve got a lot of exciting announcements this week. I’m going to check in to the Architecture blog with my take on what’s interesting about some of the announcements from an cloud architectural perspective. My first post can be found here.
The Media and Entertainment industry has been a rapid adopter of AWS due to the scale, reliability, and low costs of our services. This has enabled customers to create new, online, digital experiences for their viewers ranging from broadcast to streaming to Over-the-Top (OTT) services that can be a combination of live, scheduled, or ad-hoc viewing, while supporting devices ranging from high-def TVs to mobile devices. Creating an end-to-end video service requires many different components often sourced from different vendors with different licensing models, which creates a complex architecture and a complex environment to support operationally.
AWS Media Services Based on customer feedback, we have developed AWS Media Services to help simplify distribution of video content. AWS Media Services is comprised of five individual services that can either be used together to provide an end-to-end service or individually to work within existing deployments: AWS Elemental MediaConvert, AWS Elemental MediaLive, AWS Elemental MediaPackage, AWS Elemental MediaStore and AWS Elemental MediaTailor. These services can help you with everything from storing content safely and durably to setting up a live-streaming event in minutes without having to be concerned about the underlying infrastructure and scalability of the stream itself.
In my role, I participate in many AWS and industry events and often work with the production and event teams that put these shows together. With all the logistical tasks they have to deal with, the biggest question is often: “Will the live stream work?” Compounding this fear is the reality that, as users, we are also quick to jump on social media and make noise when a live stream drops while we are following along remotely. Worse is when I see event organizers actively selecting not to live stream content because of the risk of failure and and exposure — leading them to decide to take the safe option and not stream at all.
With AWS Media Services addressing many of the issues around putting together a high-quality media service, live streaming, and providing access to a library of content through a variety of mechanisms, I can’t wait to see more event teams use live streaming without the concern and worry I’ve seen in the past. I am excited for what this also means for non-media companies, as video becomes an increasingly common way of sharing information and adding a more personalized touch to internally- and externally-facing content.
AWS Media Services will allow you to focus more on the content and not worry about the platform. Awesome!
Amazon Neptune As a civilization, we have been developing new ways to record and store information and model the relationships between sets of information for more than a thousand years. Government census data, tax records, births, deaths, and marriages were all recorded on medium ranging from knotted cords in the Inca civilization, clay tablets in ancient Babylon, to written texts in Western Europe during the late Middle Ages.
One of the first challenges of computing was figuring out how to store and work with vast amounts of information in a programmatic way, especially as the volume of information was increasing at a faster rate than ever before. We have seen different generations of how to organize this information in some form of database, ranging from flat files to the Information Management System (IMS) used in the 1960s for the Apollo space program, to the rise of the relational database management system (RDBMS) in the 1970s. These innovations drove a lot of subsequent innovations in information management and application development as we were able to move from thousands of records to millions and billions.
Today, as architects and developers, we have a vast variety of database technologies to select from, which have different characteristics that are optimized for different use cases:
Relational databases are well understood after decades of use in the majority of companies who required a database to store information. Amazon Relational Database (Amazon RDS) supports many popular relational database engines such as MySQL, Microsoft SQL Server, PostgreSQL, MariaDB, and Oracle. We have even brought the traditional RDBMS into the cloud world through Amazon Aurora, which provides MySQL and PostgreSQL support with the performance and reliability of commercial-grade databases at 1/10th the cost.
Non-relational databases (NoSQL) provided a simpler method of storing and retrieving information that was often faster and more scalable than traditional RDBMS technology. The concept of non-relational databases has existed since the 1960s but really took off in the early 2000s with the rise of web-based applications that required performance and scalability that relational databases struggled with at the time. AWS published this Dynamo whitepaper in 2007, with DynamoDB launching as a service in 2012. DynamoDB has quickly become one of the critical design elements for many of our customers who are building highly-scalable applications on AWS. We continue to innovate with DynamoDB, and this week launched global tables and on-demand backup at re:Invent 2017. DynamoDB excels in a variety of use cases, such as tracking of session information for popular websites, shopping cart information on e-commerce sites, and keeping track of gamers’ high scores in mobile gaming applications, for example.
Graph databases focus on the relationship between data items in the store. With a graph database, we work with nodes, edges, and properties to represent data, relationships, and information. Graph databases are designed to make it easy and fast to traverse and retrieve complex hierarchical data models. Graph databases share some concepts from the NoSQL family of databases such as key-value pairs (properties) and the use of a non-SQL query language such as Gremlin. Graph databases are commonly used for social networking, recommendation engines, fraud detection, and knowledge graphs. We released Amazon Neptune to help simplify the provisioning and management of graph databases as we believe that graph databases are going to enable the next generation of smart applications.
A common use case I am hearing every week as I talk to customers is how to incorporate chatbots within their organizations. Amazon Lex and Amazon Polly have made it easy for customers to experiment and build chatbots for a wide range of scenarios, but one of the missing pieces of the puzzle was how to model decision trees and and knowledge graphs so the chatbot could guide the conversation in an intelligent manner.
Graph databases are ideal for this particular use case, and having Amazon Neptune simplifies the deployment of a graph database while providing high performance, scalability, availability, and durability as a managed service. Security of your graph database is critical. To help ensure this, you can store your encrypted data by running AWS in Amazon Neptune within your Amazon Virtual Private Cloud (Amazon VPC) and using encryption at rest integrated with AWS Key Management Service (AWS KMS). Neptune also supports Amazon VPC and AWS Identity and Access Management (AWS IAM) to help further protect and restrict access.
Our customers now have the choice of many different database technologies to ensure that they can optimize each application and service for their specific needs. Just as DynamoDB has unlocked and enabled many new workloads that weren’t possible in relational databases, I can’t wait to see what new innovations and capabilities are enabled from graph databases as they become easier to use through Amazon Neptune.
Look for more on DynamoDB and Amazon S3 from me on Monday.
Contributed by: Stephen Liedig, Senior Solutions Architect, ANZ Public Sector, and Otavio Ferreira, Manager, Amazon Simple Notification Service
Want to make your cloud-native applications scalable, fault-tolerant, and highly available? Recently, we wrote a couple of posts about using AWS messaging services Amazon SQS and Amazon SNS to address messaging patterns for loosely coupled communication between highly cohesive components. For more information, see:
Today, AWS is releasing a new message filtering functionality for SNS. This new feature simplifies the pub/sub messaging architecture by offloading the filtering logic from subscribers, as well as the routing logic from publishers, to SNS.
In this post, we walk you through the new message filtering feature, and how to use it to clean up unnecessary logic in your components, and reduce the number of topics in your architecture.
Topic-based filtering
SNS is a fully managed pub/sub messaging service that lets you fan out messages to large numbers of recipients at one time, using topics. SNS topics support a variety of subscription types, allowing you to push messages to SQS queues, AWS Lambda functions, HTTP endpoints, email addresses, and mobile devices (SMS, push).
In the above scenario, every subscriber receives the same message published to the topic, allowing them to process the message independently. For many use cases, this is sufficient.
However, in more complex scenarios, the subscriber may only be interested in a subset of the messages being published. The onus, in that case, is on each subscriber to ensure that they are filtering and only processing those messages in which they are actually interested.
To avoid this additional filtering logic on each subscriber, many organizations have adopted a practice in which the publisher is now responsible for routing different types of messages to different topics. However, as depicted in the following diagram, this topic-based filtering practice can lead to overly complicated publishers, topic proliferation, and additional overhead in provisioning and managing your SNS topics.
Attribute-based filtering
To leverage the new message filtering capability, SNS requires the publisher to set message attributes and each subscriber to set a subscription attribute (a subscription filter policy). When the publisher posts a new message to the topic, SNS attempts to match the incoming message attributes to the filter policy set on each subscription, to determine whether a particular subscriber is interested in that incoming event. If there is a match, SNS then pushes the message to the subscriber in question. The new attribute-based message filtering approach is depicted in the following diagram.
Message filtering in action
Look at how message filtering works. The following example is based on a sports merchandise ecommerce website, which publishes a variety of events to an SNS topic. The events range from checkout events (triggered when orders are placed or canceled) to buyers’ navigation events (triggered when product pages are visited). The code below is based on the existing AWS SDK for Python.
First, create the single SNS topic to which all shopping events are published.
Next, subscribe the endpoints that will be listening to those shopping events. The first subscriber is an SQS queue that is processed by a payment gateway, while the second subscriber is a Lambda function that indexes the buyer’s shopping interests against a search engine.
A subscription filter policy is set as a subscription attribute, by the subscription owner, as a simple JSON object, containing a set of key-value pairs. This object defines the kind of event in which the subscriber is interested.
You’re now ready to start publishing events with attributes!
Message attributes allow you to provide structured metadata items (such as time stamps, geospatial data, event type, signatures, and identifiers) about the message. Message attributes are optional and separate from, but sent along with, the message body. You can include up to 10 message attributes with your message.
The first message published in this example is related to an order that has been placed on the ecommerce website. The message attribute “event_type” with the value “order_placed” matches only the filter policy associated with the payment gateway subscription. Therefore, only the SQS queue subscribed to the SNS topic is notified about this checkout event.
The second message published is related to a buyer’s navigation activity on the ecommerce website. The message attribute “event_type” with the value “product_page_visited” matches only the filter policy associated with the search engine subscription. Therefore, only the Lambda function subscribed to the SNS topic is notified about this navigation event.
The following diagram represents the architecture for this ecommerce website, with the message filtering mechanism in action. As described earlier, checkout events are pushed only to the SQS queue, whereas navigation events are pushed to the Lambda function only.
Message filtering criteria
It is important to remember the following things about subscription filter policy matching:
A subscription filter policy either matches an incoming message, or it doesn’t. It’s Boolean logic.
For a filter policy to match a message, the message must contain all the attribute keys listed in the policy.
Attributes of the message not mentioned in the filtering policy are ignored.
The value of each key in the filter policy is an array containing one or more values. The policy matches if any of the values in the array match the value in the corresponding message attribute.
If the value in the message attribute is an array, then the filter policy matches if the intersection of the policy array and the message array is non-empty.
The matching is exact (character-by-character), without case-folding or any other string normalization.
The values being matched follow JSON rules: Strings enclosed in quotes, numbers, and the unquoted keywords true, false, and null.
Number matching is at the string representation level. Example: 300, 300.0, and 3.0e2 aren’t considered equal.
When should I use message filtering?
We recommend using message filtering and grouping subscribers into a single topic only when all of the following is true:
Subscribers are semantically related to each other
Subscribers consume similar types of events
Subscribers are supposed to share the same access permissions on the topic
Technically, you could get away with creating a single topic for your entire domain to handle all event processing, even unrelated use cases, but this wouldn’t be recommended. This option could result in an unnecessarily large topic, which could potentially impact your message delivery latency. Also, you would lose the ability to implement fine-grained access control on your topics.
Finally, if you already use SNS, but had to add filtering logic in your subscribers or routing logic in your publishers (topic-based filtering), you can now immediately benefit from message filtering. This new approach lets you clean up any unnecessary logic in your components, and reduce the number of topics in your architecture.
Summary
As we’ve shown in this post, the new message filtering capability in Amazon SNS gives you a great amount of flexibility in your messaging pattern. It allows you to really simplify your pub/sub infrastructure requirements.
Message filtering can be implemented easily with existing AWS SDKs by applying message and subscription attributes across all SNS supported protocols (Amazon SQS, AWS Lambda, HTTP, SMS, email, and mobile push). It’s now available in all AWS commercial regions, at no extra charge.
Here’s a few ideas for next steps to get you started:
Add filter policies to your subscriptions on the SNS console,
As a school supply aficionado, the month of September has always held a special place in my heart. Nothing sets the tone for success like getting a killer deal on pens and a crisp college ruled notebook. Even if back to school shopping trips have secured a seat in your distant memory, this is still a perfect time of year to stock up on office supplies and set aside some time for flexing those learning muscles. A great way to get started: scan through our September Tech Talks and check out the ones that pique your interest. This month we are covering re:Invent, AI, and much more.
September 2017 – Schedule
Noted below are the upcoming scheduled live, online technical sessions being held during the month of September. Make sure to register ahead of time so you won’t miss out on these free talks conducted by AWS subject matter experts.
The AWS Online Tech Talks series covers a broad range of topics at varying technical levels. These sessions feature live demonstrations & customer examples led by AWS engineers and Solution Architects. Check out the AWS YouTube channel for more on-demand webinars on AWS technologies.
The powerhouse Amazon retail store is set to launch in Australia toward the end of 2018 and Aussie ecommerce retailers need to ready themselves for the competition storm ahead.
2018 may seem a while away but getting your ecommerce site in tip top shape and ready to compete can take time. Check out these helpful hints from the Anchor crew.
Speed kills
If you’ve ever heard of the tale of the tortoise and the hare, the moral is that “slow and steady wins the race”. This is definitely not the place for that phrase, because if your site loads as slowly as a 1995 dial up connection, your ecommerce store will not, I repeat, will not win the race.
Site speed can be impacted by a number of factors and getting the balance right between a site that loads at lightning speed and delivering engaging content to your audience. There are many ways to check the performance of your site including Anchor’s free hosting check up or pingdom.
Taking action can boost the performance of your site:
As an ecommerce store, getting credit card details as fast as possible is probably at the top of your list, but it’s important to remember that it’s an actual person that needs to hand over the details.
Consider the customer’s experience whilst checking out. Making people log in to their account before checkout, can lead to abandoned carts as customers try to remember the vital details. Similarly, making a customer enter all their details before displaying shipping costs is more of an annoyance than a benefit.
Built for growth
Before you blast out a promo email to your entire database or spend up big on PPC, consider what happens when this 5 fold increase in traffic, all jumps onto your site at around the same time.
Will your site come screeching to a sudden halt with a 504 or 408 error message, or ride high on the wave of increased traffic? If you have fixed infrastructure such as a dedicated server, or are utilising a VPS, then consider the maximum concurrent users that your site can handle.
Consider this. Amazon.com.au will be built on the scalable cloud infrastructure of Amazon Web Services and will utilise all the microservices and data mining technology to offer customers a seamless, personalised shopping experience. How will your business compete?
Search ready
Being found online is important for any business, but for ecommerce sites, it’s essential. Gaining results from SEO practices can take time so beware of ‘quick fix guarantees’ from outsourced agencies.
Search Engine Optimisation (SEO) practices can have lasting effects. Good practices can ensure your site is found via organic search without huge advertising budgets, on the other hand ‘black hat’ practices can push your ecommerce store into search oblivion.
SEO takes discipline and focus to get right. Here are some of our favourite hints for SEO greatness from those who live and breathe SEO:
Leverage Descriptive alt tags and image file names
Create content for people, not bots (keyword stuffing is a no no!)
SEO best practices are continually evolving, but creating a site that is designed to give users a great experience and give them the content they expect to find.
Google My Business is a free service that EVERY business should take advantage of. It is a listing service where your business can provide details such as address, phone number, website, and trading hours. It’s easy to update and manage, you can add photos, a physical address (if applicable), and display shopper reviews.
Get your site ship shape
Overwhelmed by these starter tips? If you are ready to get your site into tip top shape–get in touch. We work with awesome partners like eWave who can help create a seamless online shopping experience.
Amazon has been issued a patent on security measures that prevents people from comparison shopping while in the store. It’s not a particularly sophisticated patent — it basically detects when you’re using the in-store Wi-Fi to visit a competitor’s site and then blocks access — but it is an indication of how retail has changed in recent years.
What’s interesting is that Amazon is on the other side of this arms race. As an on-line retailer, it wants people to walk into stores and then comparison shop on its site. Yes, I know it’s buying Whole Foods, but it’s still predominantly an online retailer. Maybe it patented this to prevent stores from implementing the technology.
It’s probably not nearly that strategic. It’s hard to build a business strategy around a security measure that can be defeated with cellular access.
As a hosting provider, we speak with many businesses who need a fix for their slow site speeds. There are many contributing factors why hosting infrastructure may be constraining your site performance but typically; old infrastructure used by some hosting providers, contention issues and even the physical location of the servers. Having your site hosted in a high-speed environment with world class managed services (such as Anchor) provides the right foundations and utilising a Content Delivery Network (CDN) that can give you that extra boost in speed and performance you desire – and deserve. One of the more popular site performance applications is Cloudflare; global network designed to optimize security, performance and reliability, without the bloat of legacy technologies. Cloudflare has some robust CDN capabilities in addition to other security services like DDoS (Distributed Denial of Service) protection and reverse proxies.
A traditional CDN is a group of web servers distributed across multiple locations around the world, which delivers content more efficiently to users. The server selected for delivering content to a specific user is typically based on a measure of network proximity. For example, the server with the fewest network hops or the server with the quickest response time is chosen.
If you are looking to take advantage of a CDN, a great place to to start is Cloudflare’s free plan. This basic plan can be set up in less than 5 minutes and only requires a simple change to your domain’s DNS settings to get you up and running. There is no hardware or software to install or maintain and you do not need to change any of your site’s existing code. As a partner of Cloudflare, we can offer discounted pricing to our customers if you are looking to take advantage of some of Cloudflare’s advanced performance and security features such as image optimisations, firewalls and PCI compliance to name just a few.
CloudFlare utilises more than 40 data centres in almost as many countries, and use the size of their ‘quietly built cloud’ to process more than 5% of all web requests. It includes:
A Global CDN
DDoS Protection
Page Rules
DDoS Protection- Why do I need it and how to protect against attack?
In 2015 the internet saw the highest rate of DDoS attacks ever. Generally, the attackers will flood a network or service (usually with thousands of IP addresses) in order to overwhelm the server and make a network or website unavailable for its users. It is extremely important to make sure your site is protected from such an attack, especially if your site is eCommerce and down time will prevent customers completing their purchases.
What are Global CDN’s?
As mentioned above, Content Delivery Networks (CDNs) are important for a number of reasons. The primary feature that a CDN does, is provides alternative server nodes, or locations for the user to download resources (usually JavaScript or static content). This means that although the server may be located in the US, someone in Sydney can still experience fast load speed and response times due to this reduced latency. This is extremely important for sites that have users in other countries, especially those who are shopping online, as these sites generally have a large volumes of images, which can be timely to load. Overall, it improves your user’s experience in terms of speed.
Page Rules
Page Rules give you the ability to control how Cloudflare actually works on a URL or subdomain basis, which means it allows you to customise it’s functionality to match your domain’s unique needs. They give you the ability to take various actions based on the page’s URL, such as creating redirects, fine tuning caching behavior, or enabling and disabling our various services. This helps you to optimize speed, harden security, increase reliability, maximize bandwidth savings, and much more.
Other benefits include, the added scalability or capacity effects that a CDN like Cloudflare has, not only does it have higher availability but also lower packet loss. Further, Cloudflare provides website traffic insight and other analytics such as threat monitoring, so that you can improve your site even further.
As a partner of Cloudflare, Anchor receives discounted rates for the Pro and Business plans, as well as can help you install the free plan if you are a customer. The easiest part about Cloudflare however, is that it only requires a simple change to your domain’s DNS settings. There is no hardware or software to install or maintain and you do not need to change any of your site’s existing code.
If your site is running slow and want know how you can boost your site performance, contact us for a free, no obligation site hosting check up.
I encourage all of you to either listen to or read the transcript of Terry Gross’ Fresh Air interview with Joseph Turow about his discussion of his book “The Aisles Have Eyes: How Retailers Track Your Shopping, Strip Your Privacy, And Define Your Power”.
Now, most of you who read my blog know the difference between proprietary and Free Software, and the difference between a network service and software that runs on your own device. I want all of you have a good understanding of that to do a simple thought experiment:
How many of the horrible things that Turow talks about can happen if there is no proprietary software on your IoT or mobile devices?
AFAICT, other than the facial recognition in the store itself that he talked about in Russia, everything he talks about would be mitigated or eliminated completely as a thread if users could modify the software on their devices.
Yes, universal software freedom will not solve all the worlds’ problems. But it does solve a lot of them, at least with regard to the bad things the powerful want to do to us via technology.
(BTW, the blog title is a reference to Philip K. Dick’s Minority Report, which includes a scene about systems reading people’s eyes to target-market to them. It’s not the main theme of that particular book, though… Dick was always going off on tangents in his books.)
Organizations often grow organically—and so does their data in individual silos. Such systems are often powered by traditional RDBMS systems and they grow orthogonally in size and features. To gain intelligence across heterogeneous data sources, you have to join the data sets. However, this imposes new challenges, as joining data over dblinks or into a single view is extremely cumbersome and an operational nightmare.
This post walks through using AWS Database Migration Service (AWS DMS) and other AWS services to make it easy to converge multiple heterogonous data sources to Amazon Redshift. You can then use Amazon QuickSight, to visualize the converged dataset to gain additional business insights.
AWS service overview
Here’s a brief overview of AWS services that help with data convergence.
AWS DMS
With DMS, you can migrate your data to and from most widely used commercial and open-source databases. The service supports homogenous migrations such as Oracle to Oracle, as well as heterogeneous migrations between different database platforms, such as Oracle to Amazon Aurora or Microsoft SQL Server to MySQL. It also allows you to stream data to Amazon Redshift from any of the supported sources including:
Amazon Aurora
PostgreSQL
MySQL
MariaDB
Oracle
SAP ASE
SQL Server
DMS enables consolidation and easy analysis of data in the petabyte-scale data warehouse. It can also be used for continuous data replication with high availability.
Amazon QuickSight
Amazon QuickSight provides very fast, easy-to-use, cloud-powered business intelligence at 1/10th the cost of traditional BI solutions. QuickSight uses a new, super-fast, parallel, in-memory calculation engine (“SPICE”) to perform advanced calculations and render visualizations rapidly.
QuickSight integrates automatically with AWS data services, enables organizations to scale to hundreds of thousands of users, and delivers fast and responsive query performance to them. You can easily connect QuickSight to AWS data services, including Amazon Redshift, Amazon RDS, Amazon Aurora, Amazon S3, and Amazon Athena. You can also upload CSV, TSV, and spreadsheet files or connect to third-party data sources such as Salesforce.
Amazon Redshift
Amazon Redshift delivers fast query performance by using columnar storage technology to improve I/O efficiency and parallelizing queries across multiple nodes. Amazon Redshift is typically priced at 1/10th of the price of the competition. We have many customers running petabyte scale data analytics on AWS using Amazon Redshift.
Amazon Redshift is also ANSI SQL compliant, supports JDBC/ODBC, and is easy to connect to your existing business intelligence (BI) solution. However, if your storage requirement is in the 10s of TB range and requires high levels of concurrency across small queries, you may want to consider Amazon Aurora as the target converged database.
Walkthrough
Assume that you have an events company specializing on sports, and have built a MySQL database that holds data for the players and the sporting events. Customers and ticket information is stored in another database; in this case, assume it is PostgresSQL and this gets updated when customer purchases tickets from our website and mobile apps. You can download a sample dataset from the aws-database-migration-samples GitHub repo.
These databases could be anywhere: at an on-premises facility; on AWS in Amazon EC2 or Amazon RDS, or other cloud provider; or in a mixture of such locations. To complicate things a little more, you can assume that the lost opportunities (where a customer didn’t complete buying the ticket even though it was added to the shopping cart) are streamed via clickstream through Amazon Kinesis and then stored on Amazon S3. We then use AWS Data Pipeline to orchestrate a process to cleanse that data using Amazon EMR and make it ready for loading to Amazon Redshift. The clickstream integration is not covered in this post but was demonstrated in the recent Real-time Clickstream Anomaly Detection with Amazon Kinesis Analytics post.
Architecture
In this solution, you use DMS to bring the two data sources into Amazon Redshift and run analytics to gain business insights. The following diagram demonstrates the proposed solution.
After the data is available on Amazon Redshift, you could easily build BI dashboards and generate intelligent reports to gain insights using Amazon QuickSight. You could also take this a step further and build a model using Amazon Machine Learning. Amazon Machine Learning uses powerful algorithms to create ML models by finding patterns in your existing data stored in Amazon S3, or Amazon Redshift. It is also highly scalable and can generate billions of predictions daily, and serve those predictions in real time and at high throughput.
Creating source databases
For the purposes of this post, create two RDS databases, one with a MySQL engine, and the other with PostgreSQL and then load some data. These represent a real-life scenario where databases could be located on-premises, on AWS, or both. Just as in real life, there may be more than two source databases; the process described in this post would still be reasonably similar.
Follow the steps in Tutorial: Create a Web Server and an Amazon RDS Database to create the two source databases. Use the links from the main tutorial page to see how to connect to specific databases and load data. For more information, see:
Make a note of the security group that you create and associate all the RDS instances with it. Call it “MyRDSSecurityGroup”.
Afterward, you should be able to see all the databases listed in the RDS Instances dashboard.
Setting up a target Amazon Redshift cluster
Set up a two-node cluster as shown below, with a cluster name similar to “consolidated-dwh” and a database named similar to “mydwh”. You could also set up a one-node cluster based on the instance type; the instance type may be available on the AWS Free Tier.
In the next step, choose Publicly Accessible for non-production usage to keep the configuration simple.
Also, for simplicity, choose the same VPC where you have placed the RDS instances and include the MyRDSSecurityGroup in the list of security groups allowed to access the Amazon Redshift cluster.
Setting up DMS
You can set up DMS easily, as indicated in the AWS Database Migration Service post on the AWS blog. However, rather than using the wizard, you may take a step-by-step approach:
Create a replication instance.
Create the endpoints for the two source databases and the target Amazon Redshift database.
Create a task to synchronize each of the sources to the target.
Create a replication instance
In the DMS console, choose Replication instances, Create replication instance. The instance type you select depends on the data volume you deal with. After setup, you should be able to see your replication instance.
Create endpoints
In the DMS console, choose Endpoints, Create endpoint. You need to configure the two source endpoints representing the PostgreSQL and MySQL RDS databases. You also need to create the target endpoint by supplying the Amazon Redshift database that you created in the previous steps. After configuration, the endpoints look similar to the following screenshot:
Create a task and start data migration
You can rely on DMS to create the target tables in your target Amazon Redshift database or you may want to take advantage of AWS Schema Conversion Tool to create the target schema and also do a compatibility analysis in the process. Using the AWS Schema Conversion Tool is particularly useful when migrating using heterogeneous data sources. For more information, see Getting Started with the AWS Schema Conversion Tool.
For simplicity, I avoided using the AWS Schema Conversion Tool in this post and used jump to DMS to create the target schema and underlying tables and then set up the synchronization between the data sources and the target.
In the DMS console, choose Tasks, Create Tasks. Fill in the fields as shown in the following screenshot:
Note that given the source is RDS MySQL and you chose Migrate data and replicate on going changes, you need to enable bin log retention. Other engines have other requirements and DMS prompts you accordingly. For this particular case, run the following command:
Now, choose Start task on create. In the task settings, choose Drop tables on target to have DMS create the tables, if you haven’t already created the target tables using the AWS Schema Conversion Tool, as described earlier. Choose Enable logging but note that this incurs additional costs as the generated CloudWatch logs require storage.
In the table mappings, for Schema to migrate, ensure that the correct schema has been selected from the source databases. DMS creates the schema on the target if it does not already exist.
Repeat for the other data source, choosing the other source endpoint and the same Amazon Redshift target endpoint. In the table mappings section, choose Custom and customize as appropriate. For example, you can specify the schema names to include and tables to exclude, as shown in the following screenshot:
Using this custom configuration, you can perform some minor transformations, such as down casing target table names, or choosing a different target schema for both sources.
After both tasks have successfully completed, the Tasks tab now looks like the following:
Running queries on Amazon Redshift
In Amazon Redshift, select your target cluster and choose Loads. You can see all operations that DMS performed in the background to load the data from the two source databases into Amazon Redshift.
Ensure change data capture is working
Generate additional data on Amazon RDS PostgreSQL in the ticketing.sporting_event_ticket by running the script provided in the generate_mlb_season.sql aws-database-migration-samples GitHub repository. Notice that the tasks have caught up and are showing the migration in progress. You can also query the target tables and see that the new data is in the target table.
Visualization options
Set up QuickSight and configure your data source to be your Amazon Redshift database. If you have a Redshift cluster in the same account and in the same region, it will appear when you clock Redshift (Auto-discovered) from the data sets page, as shown below.
Access to any other Redshift cluster can be configured as follows using the Redshift (Manual connect) link:
Now, create your data set. Choose New Data Set and select either a new data source or an existing data source listed at the bottom of the page. Choose Ticketing for Sports.
In the next step, choose Create Data Set.
In the next step, when QuickSight prompts you to choose your table, you can select the schema and the required table and choose Select. Alternatively, you may choose Edit/Preview data.
You could use the graphical options shown below to start creating your data set. Given that you have data from multiple sources, it’s safe to assume that your target tables are in separate schemas. Select the schema and tables, select the other schemas, and bring the appropriate tables to the palette by selecting them using the check box to the right. For each join, select the join type and then map the appropriate keys between the tables until the two reds turn to one of the blue join types.
In this case, rather than preparing the data set in the palette, you provide a custom SQL query. On the left pane, choose Tables, Switch to Custom SQL tool.
Paste the following SQL query in the Custom SQL field and enter a name.
select to_char( e.start_date_time, 'YYYY-MM-DD' ) event_date,
to_char( e.start_date_time, 'HH24:MI' ) start_time, e.sold_out,
e.sport_type_name, l.name event_location, l.city event_city,
l.seating_capacity, hteam.name home_team, hl.name home_field,
hl.city home_city, ateam.name away_team, al.name away_field,
al.city away_city, sum( t.ticket_price ) total_ticket_price,
avg( t.ticket_price ) average_ticket_price,
min ( t.ticket_price ) cheapest_ticket,
max( t.ticket_price ) most_expensive_ticket, count(*) num_tickets
from ticketing.sporting_event_ticket t, sourcemysql.sporting_event e,
sourcemysql.sport_location l, sourcemysql.sport_team hteam,
sourcemysql.sport_team ateam, sourcemysql.sport_location hl,
sourcemysql.sport_location al
where t.sporting_event_id = e.id
and t.sport_location_id = l.id
and e.home_team_id = hteam.id
and e.away_team_id = ateam.id
and hteam.home_field_id = hl.id
and ateam.home_field_id = al.id
group by to_char( e.start_date_time, 'YYYY-MM-DD' ),
to_char( e.start_date_time, 'HH24:MI' ), e.start_date_time,
e.sold_out, e.sport_type_name, l.name, l.city, l.seating_capacity,
hteam.name, ateam.name, hl.name, hl.city, al.name, al.city;
You can choose Save and visualize and view the QuickSight visualization toolkit and filter options. Here you can build your story or dashboards and start sharing them with your team.
Now, you can choose various fields from the field list and the various measures to get the appropriate visualization, like the one shown below. This one was aimed to understand the date at which each event in each city reached the maximum capacity.
You can also combine many such visualizations and prepare your dashboard for management reporting. The analysis may also drive where you need to invent on campaigns and where things are going better than expected to ensure a healthy sales pipeline.
Summary
In this post, you used AWS DMS to converge multiple heterogonous data sources to an Amazon Redshift cluster. You also used Quicksight to create a data visualization on the converged dataset to provide you with additional insights. Although we have used an e-commerce use case related to an events company, this concept of converging multiple data silos to a target is also applicable to other verticals such as retail, health-care, finance, insurance and banking, gaming, and so on.
If you have questions or suggestions, please comment below.
About the Author
Pratim Das is a Specialist Solutions Architect for Analytics in EME. He works with customers on big data and analytical projects, helping them build solutions on AWS, using AWS services and (or) other open source or commercial solution from the big data echo system. In his spare time he enjoys cooking and creating exciting new recipes always with that spicy kick.
Note: The elves at Pi Towers are all taking next week off to spend some time with their families, and this blog will be quiet for the week. We’ll be back at the start of January. Happy holidays!
Happy 25th of December, everybody!
If you’re one of the many who woke up this morning to find some Raspberry Pi goodies under your tree, congratulations.
Now you’ve unpacked the Pi, confirmed it to indeed be roughly the size of a credit card, and confused a less tech-savvy loved one by telling them “This is a computer!”, you may be wondering to do with it next…and that’s where we come in.
You’ll need to make sure you have the latest Raspbian operating system (OS) on your Pi. You may have been given an SD with Rasbian pre-installed but if not, head to our downloads page to get it.
2. Start me up
ALL THE POWER!
You’ll need to plug your Pi into a monitor (your TV will do), keyboard and mouse in order to get started. You’ll also need a good-quality power supply providing at least 2A.
We’ve some great instructions within our help pages to get you up and running. And if you’re still stuck, our forum has loads of information and is full of helpful people. Feel free to join and ask a question, and search previous topics for advice.
3. So how do I build a robot then?!
With tinsel and tape and bows and…
Excellent question. But if you’ve never tried to code before, you may want to start with something a little smaller…like Scratch or Sonic Pi, or a physical build such as the Parent Detector or a Burping Jelly Baby.
You’ll find more projects on our resources pages, along with some brilliant inspirational builds on our YouTube channel and blog. Or simply search for Raspberry Pi online. We’ve an amazing community of makers who share their code and builds for all to use, and now you’re one of us…WELCOME!
By continuing to use the site, you agree to the use of cookies. more information
The cookie settings on this website are set to "allow cookies" to give you the best browsing experience possible. If you continue to use this website without changing your cookie settings or you click "Accept" below then you are consenting to this.