Backblaze is hiring a Director of Sales. This is a critical role for Backblaze as we continue to grow the team. We need a strong leader who has experience in scaling a sales team and who has an excellent track record for exceeding goals by selling Software as a Service (SaaS) solutions. In addition, this leader will need to be highly motivated, as well as able to create and develop a highly-motivated, success oriented sales team that has fun and enjoys what they do.
The History of Backblaze from our CEO In 2007, after a friend’s computer crash caused her some suffering, we realized that with every photo, video, song, and document going digital, everyone would eventually lose all of their information. Five of us quit our jobs to start a company with the goal of making it easy for people to back up their data.
Like many startups, for a while we worked out of a co-founder’s one-bedroom apartment. Unlike most startups, we made an explicit agreement not to raise funding during the first year. We would then touch base every six months and decide whether to raise or not. We wanted to focus on building the company and the product, not on pitching and slide decks. And critically, we wanted to build a culture that understood money comes from customers, not the magical VC giving tree. Over the course of 5 years we built a profitable, multi-million dollar revenue business — and only then did we raise a VC round.
Fast forward 10 years later and our world looks quite different. You’ll have some fantastic assets to work with:
A brand millions recognize for openness, ease-of-use, and affordability.
A computer backup service that stores over 500 petabytes of data, has recovered over 30 billion files for hundreds of thousands of paying customers — most of whom self-identify as being the people that find and recommend technology products to their friends.
Our B2 service that provides the lowest cost cloud storage on the planet at 1/4th the price Amazon, Google or Microsoft charges. While being a newer product on the market, it already has over 100,000 IT and developers signed up as well as an ecosystem building up around it.
A growing, profitable and cash-flow positive company.
And last, but most definitely not least: a great sales team.
You might be saying, “sounds like you’ve got this under control — why do you need me?” Don’t be misled. We need you. Here’s why:
We have a great team, but we are in the process of expanding and we need to develop a structure that will easily scale and provide the most success to drive revenue.
We just launched our outbound sales efforts and we need someone to help develop that into a fully successful program that’s building a strong pipeline and closing business.
We need someone to work with the marketing department and figure out how to generate more inbound opportunities that the sales team can follow up on and close.
We need someone who will work closely in developing the skills of our current sales team and build a path for career growth and advancement.
We want someone to manage our Customer Success program.
So that’s a bit about us. What are we looking for in you?
Experience: As a sales leader, you will strategically build and drive the territory’s sales pipeline by assembling and leading a skilled team of sales professionals. This leader should be familiar with generating, developing and closing software subscription (SaaS) opportunities. We are looking for a self-starter who can manage a team and make an immediate impact of selling our Backup and Cloud Storage solutions. In this role, the sales leader will work closely with the VP of Sales, marketing staff, and service staff to develop and implement specific strategic plans to achieve and exceed revenue targets, including new business acquisition as well as build out our customer success program.
Leadership: We have an experienced team who’s brought us to where we are today. You need to have the people and management skills to get them excited about working with you. You need to be a strong leader and compassionate about developing and supporting your team.
Data driven and creative: The data has to show something makes sense before we scale it up. However, without creativity, it’s easy to say “the data shows it’s impossible” or to find a local maximum. Whether it’s deciding how to scale the team, figuring out what our outbound sales efforts should look like or putting a plan in place to develop the team for career growth, we’ve seen a bit of creativity get us places a few extra dollars couldn’t.
Jive with our culture: Strong leaders affect culture and the person we hire for this role may well shape, not only fit into, ours. But to shape the culture you have to be accepted by the organism, which means a certain set of shared values. We default to openness with our team, our customers, and everyone if possible. We love initiative — without arrogance or dictatorship. We work to create a place people enjoy showing up to work. That doesn’t mean ping pong tables and foosball (though we do try to have perks & fun), but it means people are friendly, non-political, working to build a good service but also a good place to work.
Do the work: Ideas and strategy are critical, but good execution makes them happen. We’re looking for someone who can help the team execute both from the perspective of being capable of guiding and organizing, but also someone who is hands-on themselves.
Additional Responsibilities needed for this role:
Recruit, coach, mentor, manage and lead a team of sales professionals to achieve yearly sales targets. This includes closing new business and expanding upon existing clientele.
Expand the customer success program to provide the best customer experience possible resulting in upsell opportunities and a high retention rate.
Develop effective sales strategies and deliver compelling product demonstrations and sales pitches.
Acquire and develop the appropriate sales tools to make the team efficient in their daily work flow.
Apply a thorough understanding of the marketplace, industry trends, funding developments, and products to all management activities and strategic sales decisions.
Ensure that sales department operations function smoothly, with the goal of facilitating sales and/or closings; operational responsibilities include accurate pipeline reporting and sales forecasts.
This position will report directly to the VP of Sales and will be staffed in our headquarters in San Mateo, CA.
Requirements:
7 – 10+ years of successful sales leadership experience as measured by sales performance against goals. Experience in developing skill sets and providing career growth and opportunities through advancement of team members.
Background in selling SaaS technologies with a strong track record of success.
Strong presentation and communication skills.
Must be able to travel occasionally nationwide.
BA/BS degree required
Think you want to join us on this adventure? Send an email to jobscontact@backblaze.com with the subject “Director of Sales.” (Recruiters and agencies, please don’t email us.) Include a resume and answer these two questions:
How would you approach evaluating the current sales team and what is your process for developing a growth strategy to scale the team?
What are the goals you would set for yourself in the 3 month and 1-year timeframes?
Thank you for taking the time to read this and I hope that this sounds like the opportunity for which you’ve been waiting.
We have been busy adding new features and capabilities to Amazon Redshift, and we wanted to give you a glimpse of what we’ve been doing over the past year. In this article, we recap a few of our enhancements and provide a set of resources that you can use to learn more and get the most out of your Amazon Redshift implementation.
In 2017, we made more than 30 announcements about Amazon Redshift. We listened to you, our customers, and delivered Redshift Spectrum, a feature of Amazon Redshift, that gives you the ability to extend analytics to your data lake—without moving data. We launched new DC2 nodes, doubling performance at the same price. We also announced many new features that provide greater scalability, better performance, more automation, and easier ways to manage your analytics workloads.
To see a full list of our launches, visit our what’s new page—and be sure to subscribe to our RSS feed.
Major launches in 2017
Amazon Redshift Spectrum—extend analytics to your data lake, without moving data
We launched Amazon Redshift Spectrum to give you the freedom to store data in Amazon S3, in open file formats, and have it available for analytics without the need to load it into your Amazon Redshift cluster. It enables you to easily join datasets across Redshift clusters and S3 to provide unique insights that you would not be able to obtain by querying independent data silos.
With Redshift Spectrum, you can run SQL queries against data in an Amazon S3 data lake as easily as you analyze data stored in Amazon Redshift. And you can do it without loading data or resizing the Amazon Redshift cluster based on growing data volumes. Redshift Spectrum separates compute and storage to meet workload demands for data size, concurrency, and performance. Redshift Spectrum scales processing across thousands of nodes, so results are fast, even with massive datasets and complex queries. You can query open file formats that you already use—such as Apache Avro, CSV, Grok, ORC, Apache Parquet, RCFile, RegexSerDe, SequenceFile, TextFile, and TSV—directly in Amazon S3, without any data movement.
“For complex queries, Redshift Spectrum provided a 67 percent performance gain,” said Rafi Ton, CEO, NUVIAD. “Using the Parquet data format, Redshift Spectrum delivered an 80 percent performance improvement. For us, this was substantial.”
DC2 nodes—twice the performance of DC1 at the same price
We launched second-generation Dense Compute (DC2) nodes to provide low latency and high throughput for demanding data warehousing workloads. DC2 nodes feature powerful Intel E5-2686 v4 (Broadwell) CPUs, fast DDR4 memory, and NVMe-based solid state disks (SSDs). We’ve tuned Amazon Redshift to take advantage of the better CPU, network, and disk on DC2 nodes, providing up to twice the performance of DC1 at the same price. Our DC2.8xlarge instances now provide twice the memory per slice of data and an optimized storage layout with 30 percent better storage utilization.
“Redshift allows us to quickly spin up clusters and provide our data scientists with a fast and easy method to access data and generate insights,” said Bradley Todd, technology architect at Liberty Mutual. “We saw a 9x reduction in month-end reporting time with Redshift DC2 nodes as compared to DC1.”
On average, our customers are seeing 3x to 5x performance gains for most of their critical workloads.
We introduced short query acceleration to speed up execution of queries such as reports, dashboards, and interactive analysis. Short query acceleration uses machine learning to predict the execution time of a query, and to move short running queries to an express short query queue for faster processing.
We launched results caching to deliver sub-second response times for queries that are repeated, such as dashboards, visualizations, and those from BI tools. Results caching has an added benefit of freeing up resources to improve the performance of all other queries.
We also introduced late materialization to reduce the amount of data scanned for queries with predicate filters by batching and factoring in the filtering of predicates before fetching data blocks in the next column. For example, if only 10 percent of the table rows satisfy the predicate filters, Amazon Redshift can potentially save 90 percent of the I/O for the remaining columns to improve query performance.
We launched query monitoring rules and pre-defined rule templates. These features make it easier for you to set metrics-based performance boundaries for workload management (WLM) queries, and specify what action to take when a query goes beyond those boundaries. For example, for a queue that’s dedicated to short-running queries, you might create a rule that aborts queries that run for more than 60 seconds. To track poorly designed queries, you might have another rule that logs queries that contain nested loops.
Customer insights
Amazon Redshift and Redshift Spectrum serve customers across a variety of industries and sizes, from startups to large enterprises. Visit our customer page to see the success that customers are having with our recent enhancements. Learn how companies like Liberty Mutual Insurance saw a 9x reduction in month-end reporting time using DC2 nodes. On this page, you can find case studies, videos, and other content that show how our customers are using Amazon Redshift to drive innovation and business results.
In addition, check out these resources to learn about the success our customers are having building out a data warehouse and data lake integration solution with Amazon Redshift:
You can enhance your Amazon Redshift data warehouse by working with industry-leading experts. Our AWS Partner Network (APN) Partners have certified their solutions to work with Amazon Redshift. They offer software, tools, integration, and consulting services to help you at every step. Visit our Amazon Redshift Partner page and choose an APN Partner. Or, use AWS Marketplace to find and immediately start using third-party software.
To see what our Partners are saying about Amazon Redshift Spectrum and our DC2 nodes mentioned earlier, read these blog posts:
If you are evaluating or considering a proof of concept with Amazon Redshift, or you need assistance migrating your on-premises or other cloud-based data warehouse to Amazon Redshift, our team of product experts and solutions architects can help you with architecting, sizing, and optimizing your data warehouse. Contact us using this support request form, and let us know how we can assist you.
If you are an Amazon Redshift customer, we offer a no-cost health check program. Our team of database engineers and solutions architects give you recommendations for optimizing Amazon Redshift and Amazon Redshift Spectrum for your specific workloads. To learn more, email us at [email protected].
Larry Heathcote is a Principle Product Marketing Manager at Amazon Web Services for data warehousing and analytics. Larry is passionate about seeing the results of data-driven insights on business outcomes. He enjoys family time, home projects, grilling out and the taste of classic barbeque.
Bitcoin was a neat idea. No, really! Decentralization is cool. Overhauling our terrible financial infrastructure is cool. Hash functions are cool.
Unfortunately, it seems to have devolved into mostly a get-rich-quick scheme for nerds, and by nearly any measure it’s turning into a spectacular catastrophe. Its “success” is measured in how much a bitcoin is worth in US dollars, which is pretty close to an admission from its own investors that its only value is in converting back to “real” money — all while that same “success” is making it less useful as a distinct currency.
Blah, blah, everyone already knows this.
What concerns me slightly more is the gold rush hype cycle, which is putting cryptocurrency and “blockchain” in the news and lending it all legitimacy. People have raked in millions of dollars on ICOs of novel coins I’ve never heard mentioned again. (Note: again, that value is measured in dollars.) Most likely, none of the investors will see any return whatsoever on that money. They can’t, really, unless a coin actually takes off as a currency, and that seems at odds with speculative investing since everyone either wants to hoard or ditch their coins. When the coins have no value themselves, the money can only come from other investors, and eventually the hype winds down and you run out of other investors.
I fear this will hurt a lot of people before it’s over, so I’d like for it to be over as soon as possible.
That said, the hype itself has gotten way out of hand too. First it was the obsession with “blockchain” like it’s a revolutionary technology, but hey, Git is a fucking blockchain. The novel part is the way it handles distributed consensus (which in Git is basically left for you to figure out), and that’s uniquely important to currency because you want to be pretty sure that money doesn’t get duplicated or lost when moved around.
But now we have startups trying to use blockchains for website backends and file storage and who knows what else? Why? What advantage does this have? When you say “blockchain”, I hear “single Git repository” — so when you say “email on the blockchain”, I have an aneurysm.
Bitcoin seems to have sparked imagination in large part because it’s decentralized, but I’d argue it’s actually a pretty bad example of a decentralized network, since people keep forking it. The ability to fork is a feature, sure, but the trouble here is that the Bitcoin family has no notion of federation — there is one canonical Bitcoin ledger and it has no notion of communication with any other. That’s what you want for currency, not necessarily other applications. (Bitcoin also incentivizesfrivolous forking by giving the creator an initial pile of coins to keep and sell.)
And federation is much more interesting than decentralization! Federation gives us email and the web. Federation means I can set up my own instance with my own rules and still be able to meaningfully communicate with the rest of the network. Federation has some amount of tolerance for changes to the protocol, so such changes are more flexible and rely more heavily on consensus.
Federation is fantastic, and it feels like a massive tragedy that this rekindled interest in decentralization is mostly focused on peer-to-peer networks, which do little to address our current problems with centralized platforms.
Again, the tech is cool and all, but the marketing hype is getting way out of hand.
Maybe what I really want from 2018 is less marketing?
For one, I’ve seen a huge uptick in uncritically referring to any software that creates or classifies creative work as “AI”. Can we… can we not. It’s not AI. Yes, yes, nerds, I don’t care about the hair-splitting about the nature of intelligence — you know that when we hear “AI” we think of a human-like self-aware intelligence. But we’re applying it to stuff like a weird dog generator. Or to whatever neural network a website threw into production this week.
And this is dangerously misleading — we already had massive tech companies scapegoating The Algorithm™ for the poor behavior of their software, and now we’re talking about those algorithms as though they were self-aware, untouchable, untameable, unknowable entities of pure chaos whose decisions we are arbitrarily bound to. Ancient, powerful gods who exist just outside human comprehension or law.
It’s weird to see this stuff appear in consumer products so quickly, too. It feels quick, anyway. The latest iPhone can unlock via facial recognition, right? I’m sure a lot of effort was put into ensuring that the same person’s face would always be recognized… but how confident are we that other faces won’t be recognized? I admit I don’t follow all this super closely, so I may be imagining a non-problem, but I do know that humans are remarkably bad at checking for negative cases.
Hell, take the recurring problem of major platforms like Twitter and YouTube classifying anything mentioning “bisexual” as pornographic — because the word is also used as a porn genre, and someone threw a list of porn terms into a filter without thinking too hard about it. That’s just a word list, a fairly simple thing that any human can review; but suddenly we’re confident in opaque networks of inferred details?
I don’t know. “Traditional” classification and generation are much more comforting, since they’re a set of fairly abstract rules that can be examined and followed. Machine learning, as I understand it, is less about rules and much more about pattern-matching; it’s built out of the fingerprints of the stuff it’s trained on. Surely that’s just begging for tons of edge cases. They’re practically made of edge cases.
I’m reminded of a point I saw made a few days ago on Twitter, something I’d never thought about but should have. TurnItIn is a service for universities that checks whether students’ papers match any others, in order to detect cheating. But this is a paid service, one that fundamentally hinges on its corpus: a large collection of existing student papers. So students pay money to attend school, where they’re required to let their work be given to a third-party company, which then profits off of it? What kind of a goofy business model is this?
And my thoughts turn to machine learning, which is fundamentally different from an algorithm you can simply copy from a paper, because it’s all about the training data. And to get good results, you need a lot of training data. Where is that all coming from? How many for-profit companies are setting a neural network loose on the web — on millions of people’s work — and then turning around and selling the result as a product?
This is really a question of how intellectual property works in the internet era, and it continues our proud decades-long tradition of just kinda doing whatever we want without thinking about it too much. Nothing if not consistent.
A bit tougher, since computers are pretty alright now and everything continues to chug along. Maybe we should just quit while we’re ahead. There’s some real pie-in-the-sky stuff that would be nice, but it certainly won’t happen within a year, and may never happen except in some horrific Algorithmic™ form designed by people that don’t know anything about the problem space and only works 60% of the time but is treated as though it were bulletproof.
The giants are getting more giant. Maybe too giant? Granted, it could be much worse than Google and Amazon — it could be Apple!
Amazon has its own delivery service and brick-and-mortar stores now, as well as providing the plumbing for vast amounts of the web. They’re not doing anything particularly outrageous, but they kind of loom.
Ad company Google just put ad blocking in its majority-share browser — albeit for the ambiguously-noble goal of only blocking obnoxious ads so that people will be less inclined to install a blanket ad blocker.
Twitter is kind of a nightmare but no one wants to leave. I keep trying to use Mastodon as well, but I always forget about it after a day, whoops.
Facebook sounds like a total nightmare but no one wants to leave that either, because normies don’t use anything else, which is itself direly concerning.
IRC is rapidly bleeding mindshare to Slack and Discord, both of which are far better at the things IRC sadly never tried to do and absolutely terrible at the exact things IRC excels at.
The problem is the same as ever: there’s no incentive to interoperate. There’s no fundamental technical reason why Twitter and Tumblr and MySpace and Facebook can’t intermingle their posts; they just don’t, because why would they bother? It’s extra work that makes it easier for people to not use your ecosystem.
I don’t know what can be done about that, except that hope for a really big player to decide to play nice out of the kindness of their heart. The really big federated success stories — say, the web — mostly won out because they came along first. At this point, how does a federated social network take over? I don’t know.
I… don’t really have a solid grasp on what’s happening in tech socially at the moment. I’ve drifted a bit away from the industry part, which is where that all tends to come up. I have the vague sense that things are improving, but that might just be because the Rust community is the one I hear the most about, and it puts a lot of effort into being inclusive and welcoming.
So… more projects should be like Rust? Do whatever Rust is doing? And not so much what Linus is doing.
I haven’t heard this brought up much lately, but it would still be nice to see. The Bay Area runs on open source and is raking in zillions of dollars on its back; pump some of that cash back into the ecosystem, somehow.
I’ve seen a couple open source projects on Patreon, which is fantastic, but feels like a very small solution given how much money is flowing through the commercial tech industry.
One might wonder where the money to host a website comes from, then? I don’t know. Maybe we should loop this in with the above thing and find a more informal way to pay people for the stuff they make when we find it useful, without the financial and cognitive overhead of A Transaction or Giving Someone My Damn Credit Card Number. You know, something like Bitco— ah, fuck.
I don’t know. What are we working on at the moment? Wayland? Do Wayland, I guess. Oh, and hi-DPI, which I hear sucks. And please fix my sound drivers so PulseAudio stops blaming them when it fucks up.
Note to readers! Starting next month, we will be publishing our monthly Hot Startups blog post on the AWS Startup Blog. Please come check us out.
As visual communication—whether through social media channels like Instagram or white space-heavy product pages—becomes a central part of everyone’s life, accessible design platforms and tools become more and more important in the world of tech. This trend is why we have chosen to spotlight three design-related startups—namely Canva, Figma, and InVision—as our hot startups for the month of February. Please read on to learn more about these design-savvy companies and be sure to check out our full post here.
Canva (Sydney, Australia)
For a long time, creating designs required expensive software, extensive studying, and time spent waiting for feedback from clients or colleagues. With Canva, a graphic design tool that makes creating designs much simpler and accessible, users have the opportunity to design anything and publish anywhere. The platform—which integrates professional design elements, including stock photography, graphic elements, and fonts for users to build designs either entirely from scratch or from thousands of free templates—is available on desktop, iOS, and Android, making it possible to spin up an invitation, poster, or graphic on a smartphone at any time.
Figma is a cloud-based design platform that empowers designers to communicate and collaborate more effectively. Using recent advancements in WebGL, Figma offers a design tool that doesn’t require users to install any software or special operating systems. It also allows multiple people to work in a file at the same time—a crucial feature.
As the need for new design talent increases, the industry will need plenty of junior designers to keep up with the demand. Figma is prepared to help students by offering their platform for free. Through this, they “hope to give young designers the resources necessary to kick-start their education and eventually, their careers.”
Founded in 2011 with the goal of helping improve every digital experience in the world, digital product design platform InVision helps users create a streamlined and scalable product design process, build and iterate on prototypes, and collaborate across organizations. The company, which raised a $100 million series E last November, bringing the company’s total funding to $235 million, currently powers the digital product design process at more than 80 percent of the Fortune 100 and brands like Airbnb, HBO, Netflix, and Uber.
We expand AWS by picking a geographic area (which we call a Region) and then building multiple, isolated Availability Zones in that area. Each Availability Zone (AZ) has multiple Internet connections and power connections to multiple grids.
Today I am happy to announce that we are opening our 50th AWS Availability Zone, with the addition of a third AZ to the EU (London) Region. This will give you additional flexibility to architect highly scalable, fault-tolerant applications that run across multiple AZs in the UK.
Since launching the EU (London) Region, we have seen an ever-growing set of customers, particularly in the public sector and in regulated industries, use AWS for new and innovative applications. Here are a couple of examples, courtesy of my AWS colleagues in the UK:
Enterprise – Some of the UK’s most respected enterprises are using AWS to transform their businesses, including BBC, BT, Deloitte, and Travis Perkins. Travis Perkins is one of the largest suppliers of building materials in the UK and is implementing the biggest systems and business change in its history, including an all-in migration of its data centers to AWS.
Startups – Cross-border payments company Currencycloud has migrated its entire payments production, and demo platform to AWS resulting in a 30% saving on their infrastructure costs. Clearscore, with plans to disrupting the credit score industry, has also chosen to host their entire platform on AWS. UnderwriteMe is using the EU (London) Region to offer an underwriting platform to their customers as a managed service.
Public Sector -The Met Office chose AWS to support the Met Office Weather App, available for iPhone and Android phones. Since the Met Office Weather App went live in January 2016, it has attracted more than half a million users. Using AWS, the Met Office has been able to increase agility, speed, and scalability while reducing costs. The Driver and Vehicle Licensing Agency (DVLA) is using the EU (London) Region for services such as the Strategic Card Payments platform, which helps the agency achieve PCI DSS compliance.
The AWS EU (London) Region has achieved Public Services Network (PSN) assurance, which provides UK Public Sector customers with an assured infrastructure on which to build UK Public Sector services. In conjunction with AWS’s Standardized Architecture for UK-OFFICIAL, PSN assurance enables UK Public Sector organizations to move their UK-OFFICIAL classified data to the EU (London) Region in a controlled and risk-managed manner.
For a complete list of AWS Regions and Services, visit the AWS Global Infrastructure page. As always, pricing for services in the Region can be found on the detail pages; visit our Cloud Products page to get started.
Last week I attended a talk given by Bryan Mistele, president of Seattle-based INRIX. Bryan’s talk provided a glimpse into the future of transportation, centering around four principle attributes, often abbreviated as ACES:
Autonomous – Cars and trucks are gaining the ability to scan and to make sense of their environments and to navigate without human input.
Connected – Vehicles of all types have the ability to take advantage of bidirectional connections (either full-time or intermittent) to other cars and to cloud-based resources. They can upload road and performance data, communicate with each other to run in packs, and take advantage of traffic and weather data.
Electric – Continued development of battery and motor technology, will make electrics vehicles more convenient, cost-effective, and environmentally friendly.
Shared – Ride-sharing services will change usage from an ownership model to an as-a-service model (sound familiar?).
Individually and in combination, these emerging attributes mean that the cars and trucks we will see and use in the decade to come will be markedly different than those of the past.
On the Road with AWS AWS customers are already using our AWS IoT, edge computing, Amazon Machine Learning, and Alexa products to bring this future to life – vehicle manufacturers, their tier 1 suppliers, and AutoTech startups all use AWS for their ACES initiatives. AWS Greengrass is playing an important role here, attracting design wins and helping our customers to add processing power and machine learning inferencing at the edge.
AWS customer Aptiv (formerly Delphi) talked about their Automated Mobility on Demand (AMoD) smart vehicle architecture in a AWS re:Invent session. Aptiv’s AMoD platform will use Greengrass and microservices to drive the onboard user experience, along with edge processing, monitoring, and control. Here’s an overview:
Another customer, Denso of Japan (one of the world’s largest suppliers of auto components and software) is using Greengrass and AWS IoT to support their vision of Mobility as a Service (MaaS). Here’s a video:
AWS at CES The AWS team will be out in force at CES in Las Vegas and would love to talk to you. They’ll be running demos that show how AWS can help to bring innovation and personalization to connected and autonomous vehicles.
Personalized In-Vehicle Experience – This demo shows how AWS AI and Machine Learning can be used to create a highly personalized and branded in-vehicle experience. It makes use of Amazon Lex, Polly, and Amazon Rekognition, but the design is flexible and can be used with other services as well. The demo encompasses driver registration, login and startup (including facial recognition), voice assistance for contextual guidance, personalized e-commerce, and vehicle control. Here’s the architecture for the voice assistance:
Connected Vehicle Solution – This demo shows how a connected vehicle can combine local and cloud intelligence, using edge computing and machine learning at the edge. It handles intermittent connections and uses AWS DeepLens to train a model that responds to distracted drivers. Here’s the overall architecture, as described in our Connected Vehicle Solution:
Digital Content Delivery – This demo will show how a customer uses a web-based 3D configurator to build and personalize their vehicle. It will also show high resolution (4K) 3D image and an optional immersive AR/VR experience, both designed for use within a dealership.
Autonomous Driving – This demo will showcase the AWS services that can be used to build autonomous vehicles. There’s a 1/16th scale model vehicle powered and driven by Greengrass and an overview of a new AWS Autonomous Toolkit. As part of the demo, attendees drive the car, training a model via Amazon SageMaker for subsequent on-board inferencing, powered by Greengrass ML Inferencing.
To speak to one of my colleagues or to set up a time to see the demos, check out the Visit AWS at CES 2018 page.
Some Resources If you are interested in this topic and want to learn more, the AWS for Automotive page is a great starting point, with discussions on connected vehicles & mobility, autonomous vehicle development, and digital customer engagement.
When you are ready to start building a connected vehicle, the AWS Connected Vehicle Solution contains a reference architecture that combines local computing, sophisticated event rules, and cloud-based data processing and storage. You can use this solution to accelerate your own connected vehicle projects.
Today we launched our 17th Region globally, and the second in China. The AWS China (Ningxia) Region, operated by Ningxia Western Cloud Data Technology Co. Ltd. (NWCD), is generally available now and provides customers another option to run applications and store data on AWS in China.
Operating Partner To comply with China’s legal and regulatory requirements, AWS has formed a strategic technology collaboration with NWCD to operate and provide services from the AWS China (Ningxia) Region. Founded in 2015, NWCD is a licensed datacenter and cloud services provider, based in Ningxia, China. NWCD joins Sinnet, the operator of the AWS China China (Beijing) Region, as an AWS operating partner in China. Through these relationships, AWS provides its industry-leading technology, guidance, and expertise to NWCD and Sinnet, while NWCD and Sinnet operate and provide AWS cloud services to local customers. While the cloud services offered in both AWS China Regions are the same as those available in other AWS Regions, the AWS China Regions are different in that they are isolated from all other AWS Regions and operated by AWS’s Chinese partners separately from all other AWS Regions. Customers using the AWS China Regions enter into customer agreements with Sinnet and NWCD, rather than with AWS.
Use it Today The AWS China (Ningxia) Region, operated by NWCD, is open for business, and you can start using it now! Starting today, Chinese developers, startups, and enterprises, as well as government, education, and non-profit organizations, can leverage AWS to run their applications and store their data in the new AWS China (Ningxia) Region, operated by NWCD. Customers already using the AWS China (Beijing) Region, operated by Sinnet, can select the AWS China (Ningxia) Region directly from the AWS Management Console, while new customers can request an account at www.amazonaws.cn to begin using both AWS China Regions.
We don’t often recognize or celebrate anniversaries at AWS. With nearly 100 services on our list, we’d be eating cake and drinking champagne several times a week. While that might sound like fun, we’d rather spend our working hours listening to customers and innovating. With that said, Amazon QuickSight has now been generally available for a little over a year and I would like to give you a quick update!
QuickSight in Action Today, tens of thousands of customers (from startups to enterprises, in industries as varied as transportation, legal, mining, and healthcare) are using QuickSight to analyze and report on their business data.
Here are a couple of examples:
Gemini provides legal evidence procurement for California attorneys who represent injured workers. They have gone from creating custom reports and running one-off queries to creating and sharing dynamic QuickSight dashboards with drill-downs and filtering. QuickSight is used to track sales pipeline, measure order throughput, and to locate bottlenecks in the order processing pipeline.
Jivochat provides a real-time messaging platform to connect visitors to website owners. QuickSight lets them create and share interactive dashboards while also providing access to the underlying datasets. This has allowed them to move beyond the sharing of static spreadsheets, ensuring that everyone is looking at the same and is empowered to make timely decisions based on current data.
Transfix is a tech-powered freight marketplace that matches loads and increases visibility into logistics for Fortune 500 shippers in retail, food and beverage, manufacturing, and other industries. QuickSight has made analytics accessible to both BI engineers and non-technical business users. They scrutinize key business and operational metrics including shipping routes, carrier efficient, and process automation.
Looking Back / Looking Ahead The feedback on QuickSight has been incredibly helpful. Customers tell us that their employees are using QuickSight to connect to their data, perform analytics, and make high-velocity, data-driven decisions, all without setting up or running their own BI infrastructure. We love all of the feedback that we get, and use it to drive our roadmap, leading to the introduction of over 40 new features in just a year. Here’s a summary:
Looking forward, we are watching an interesting trend develop within our customer base. As these customers take a close look at how they analyze and report on data, they are realizing that a serverless approach offers some tangible benefits. They use Amazon Simple Storage Service (S3) as a data lake and query it using a combination of QuickSight and Amazon Athena, giving them agility and flexibility without static infrastructure. They also make great use of QuickSight’s dashboards feature, monitoring business results and operational metrics, then sharing their insights with hundreds of users. You can read Building a Serverless Analytics Solution for Cleaner Cities and review Serverless Big Data Analytics using Amazon Athena and Amazon QuickSight if you are interested in this approach.
New Features and Enhancements We’re still doing our best to listen and to learn, and to make sure that QuickSight continues to meet your needs. I’m happy to announce that we are making seven big additions today:
Geospatial Visualization – You can now create geospatial visuals on geographical data sets.
Private VPC Access – You can now sign up to access a preview of a new feature that allows you to securely connect to data within VPCs or on-premises, without the need for public endpoints.
Flat Table Support – In addition to pivot tables, you can now use flat tables for tabular reporting. To learn more, read about Using Tabular Reports.
Calculated SPICE Fields – You can now perform run-time calculations on SPICE data as part of your analysis. Read Adding a Calculated Field to an Analysis for more information.
Wide Table Support – You can now use tables with up to 1000 columns.
HIPAA Compliance – You can now run HIPAA-compliant workloads on QuickSight.
Geospatial Visualization Everyone seems to want this feature! You can now take data that contains a geographic identifier (country, city, state, or zip code) and create beautiful visualizations with just a few clicks. QuickSight will geocode the identifier that you supply, and can also accept lat/long map coordinates. You can use this feature to visualize sales by state, map stores to shipping destinations, and so forth. Here’s a sample visualization:
Private VPC Access Preview If you have data in AWS (perhaps in Amazon Redshift, Amazon Relational Database Service (RDS), or on EC2) or on-premises in Teradata or SQL Server on servers without public connectivity, this feature is for you. Private VPC Access for QuickSight uses an Elastic Network Interface (ENI) for secure, private communication with data sources in a VPC. It also allows you to use AWS Direct Connect to create a secure, private link with your on-premises resources. Here’s what it looks like:
If you are ready to join the preview, you can sign up today.
Contributed by Tiffany Jernigan, Developer Advocate for Amazon ECS
Get ready for takeoff!
We made sure that this year’s re:Invent is chock-full of containers: there are over 40 sessions! New to containers? No problem, we have several introductory sessions for you to dip your toes. Been using containers for years and know the ins and outs? Don’t miss our technical deep-dives and interactive chalk talks led by container experts.
If you can’t make it to Las Vegas, you can catch the keynotes and session recaps from our livestream and on Twitch.
Session types
Not everyone learns the same way, so we have multiple types of breakout content:
Birds of a Feather An interactive discussion with industry leaders about containers on AWS.
Breakout sessions 60-minute presentations about building on AWS. Sessions are delivered by both AWS experts and customers and span all content levels.
Workshops 2.5-hour, hands-on sessions that teach how to build on AWS. AWS credits are provided. Bring a laptop, and have an active AWS account.
Chalk Talks 1-hour, highly interactive sessions with a smaller audience. They begin with a short lecture delivered by an AWS expert, followed by a discussion with the audience.
Session levels
Whether you’re new to containers or you’ve been using them for years, you’ll find useful information at every level.
Introductory Sessions are focused on providing an overview of AWS services and features, with the assumption that attendees are new to the topic.
Advanced Sessions dive deeper into the selected topic. Presenters assume that the audience has some familiarity with the topic, but may or may not have direct experience implementing a similar solution.
Expert Sessions are for attendees who are deeply familiar with the topic, have implemented a solution on their own already, and are comfortable with how the technology works across multiple services, architectures, and implementations.
Session locations
All container sessions are located in the Aria Resort.
MONDAY 11/27
Breakout sessions
Level 200 (Introductory)
CON202 – Getting Started with Docker and Amazon ECS By packaging software into standardized units, Docker gives code everything it needs to run, ensuring consistency from your laptop all the way into production. But once you have your code ready to ship, how do you run and scale it in the cloud? In this session, you become comfortable running containerized services in production using Amazon ECS. We cover container deployment, cluster management, service auto-scaling, service discovery, secrets management, logging, monitoring, security, and other core concepts. We also cover integrated AWS services and supplementary services that you can take advantage of to run and scale container-based services in the cloud.
Chalk talks
Level 200 (Introductory)
CON211 – Reducing your Compute Footprint with Containers and Amazon ECS Tomas Riha, platform architect for Volvo, shows how Volvo transitioned its WirelessCar platform from using Amazon EC2 virtual machines to containers running on Amazon ECS, significantly reducing cost. Tomas dives deep into the architecture that Volvo used to achieve the migration in under four months, including Amazon ECS, Amazon ECR, Elastic Load Balancing, and AWS CloudFormation.
CON212 – Anomaly Detection Using Amazon ECS, AWS Lambda, and Amazon EMR Learn about the architecture that Cisco CloudLock uses to enable automated security and compliance checks throughout the entire development lifecycle, from the first line of code through runtime. It includes integration with IAM roles, Amazon VPC, and AWS KMS.
Level 400 (Expert)
CON410 – Advanced CICD with Amazon ECS Control Plane Mohit Gupta, product and engineering lead for Clever, demonstrates how to extend the Amazon ECS control plane to optimize management of container deployments and how the control plane can be broadly applied to take advantage of new AWS services. This includes ark—an AWS CLI-based deployment to Amazon ECS, Dapple—a slack-based automation system for deployments and notifications, and Kayvee—log and event routing libraries based on Amazon Kinesis.
Workshops
Level 200 (Introductory)
CON209 – Interstella 8888: Learn How to Use Docker on AWS Interstella 8888 is an intergalactic trading company that deals in rare resources, but their antiquated monolithic logistics systems are causing the business to lose money. Join this workshop to get hands-on experience with Docker as you containerize Interstella 8888’s aging monolithic application and deploy it using Amazon ECS.
CON213 – Hands-on Deployment of Kubernetes on AWS In this workshop, attendees get hands-on experience using Kubernetes and Kops (Kubernetes Operations), as described in our recent blog post. Attendees learn how to provision a cluster, assign role-based permissions and security, and launch a container. If you’re interested in learning best practices for running Kubernetes on AWS, don’t miss this workshop.
TUESDAY 11/28
Breakout Sessions
Level 200 (Introductory)
CON206 – Docker on AWS In this session, Docker Technical Staff Member Patrick Chanezon discusses how Finnish Rail, the national train system for Finland, is using Docker on Amazon Web Services to modernize their customer facing applications, from ticket sales to reservations. Patrick also shares the state of Docker development and adoption on AWS, including explaining the opportunities and implications of efforts such as Project Moby, Docker EE, and how developers can use and contribute to Docker projects.
CON208 – Building Microservices on AWS Increasingly, organizations are turning to microservices to help them empower autonomous teams, letting them innovate and ship software faster than ever before. But implementing a microservices architecture comes with a number of new challenges that need to be dealt with. Chief among these finding an appropriate platform to help manage a growing number of independently deployable services. In this session, Sam Newman, author of Building Microservices and a renowned expert in microservices strategy, discusses strategies for building scalable and robust microservices architectures. He also tells you how to choose the right platform for building microservices, and about common challenges and mistakes organizations make when they move to microservices architectures.
Level 300 (Advanced)
CON302 – Building a CICD Pipeline for Containers on AWS Containers can make it easier to scale applications in the cloud, but how do you set up your CICD workflow to automatically test and deploy code to containerized apps? In this session, we explore how developers can build effective CICD workflows to manage their containerized code deployments on AWS.
Ajit Zadgaonkar, Director of Engineering and Operations at Edmunds walks through best practices for CICD architectures used by his team to deploy containers. We also deep dive into topics such as how to create an accessible CICD platform and architect for safe blue/green deployments.
CON307 – Building Effective Container Images Sick of getting paged at 2am and wondering “where did all my disk space go?” New Docker users often start with a stock image in order to get up and running quickly, but this can cause problems as your application matures and scales. Creating efficient container images is important to maximize resources, and deliver critical security benefits.
In this session, AWS Sr. Technical Evangelist Abby Fuller covers how to create effective images to run containers in production. This includes an in-depth discussion of how Docker image layers work, things you should think about when creating your images, working with Amazon ECR, and mise-en-place for install dependencies. Prakash Janakiraman, Co-Founder and Chief Architect at Nextdoor discuss high-level and language-specific best practices for with building images and how Nextdoor uses these practices to successfully scale their containerized services with a small team.
CON309 – Containerized Machine Learning on AWS Image recognition is a field of deep learning that uses neural networks to recognize the subject and traits for a given image. In Japan, Cookpad uses Amazon ECS to run an image recognition platform on clusters of GPU-enabled EC2 instances. In this session, hear from Cookpad about the challenges they faced building and scaling this advanced, user-friendly service to ensure high-availability and low-latency for tens of millions of users.
CON320 – Monitoring, Logging, and Debugging for Containerized Services As containers become more embedded in the platform tools, debug tools, traces, and logs become increasingly important. Nare Hayrapetyan, Senior Software Engineer and Calvin French-Owen, Senior Technical Officer for Segment discuss the principals of monitoring and debugging containers and the tools Segment has implemented and built for logging, alerting, metric collection, and debugging of containerized services running on Amazon ECS.
Chalk Talks
Level 300 (Advanced)
CON314 – Automating Zero-Downtime Production Cluster Upgrades for Amazon ECS Containers make it easy to deploy new code into production to update the functionality of a service, but what happens when you need to update the Amazon EC2 compute instances that your containers are running on? In this talk, we’ll deep dive into how to upgrade the Amazon EC2 infrastructure underlying a live production Amazon ECS cluster without affecting service availability. Matt Callanan, Engineering Manager at Expedia walk through Expedia’s “PRISM” project that safely relocates hundreds of tasks onto new Amazon EC2 instances with zero-downtime to applications.
CON322 – Maximizing Amazon ECS for Large-Scale Workloads Head of Mobfox DevOps, David Spitzer, shows how Mobfox used Docker and Amazon ECS to scale the Mobfox services and development teams to achieve low-latency networking and automatic scaling. This session covers Mobfox’s ecosystem architecture. It compares 2015 and today, the challenges Mobfox faced in growing their platform, and how they overcame them.
CON323 – Microservices Architectures for the Enterprise Salva Jung, Principle Engineer for Samsung Mobile shares how Samsung Connect is architected as microservices running on Amazon ECS to securely, stably, and efficiently handle requests from millions of mobile and IoT devices around the world.
CON324 – Windows Containers on Amazon ECS Docker containers are commonly regarded as powerful and portable runtime environments for Linux code, but Docker also offers API and toolchain support for running Windows Servers in containers. In this talk, we discuss the various options for running windows-based applications in containers on AWS.
CON326 – Remote Sensing and Image Processing on AWS Learn how Encirca services by DuPont Pioneer uses Amazon ECS powered by GPU-instances and Amazon EC2 Spot Instances to run proprietary image-processing algorithms against satellite imagery. Mark Lanning and Ethan Harstad, engineers at DuPont Pioneer show how this architecture has allowed them to process satellite imagery multiple times a day for each agricultural field in the United States in order to identify crop health changes.
Workshops
Level 300 (Advanced)
CON317 – Advanced Container Management at Catsndogs.lol Catsndogs.lol is a (fictional) company that needs help deploying and scaling its container-based application. During this workshop, attendees join the new DevOps team at CatsnDogs.lol, and help the company to manage their applications using Amazon ECS, and help release new features to make our customers happier than ever.Attendees get hands-on with service and container-instance auto-scaling, spot-fleet integration, container placement strategies, service discovery, secrets management with AWS Systems Manager Parameter Store, time-based and event-based scheduling, and automated deployment pipelines. If you are a developer interested in learning more about how Amazon ECS can accelerate your application development and deployment workflows, or if you are a systems administrator or DevOps person interested in understanding how Amazon ECS can simplify the operational model associated with running containers at scale, then this workshop is for you. You should have basic familiarity with Amazon ECS, Amazon EC2, and IAM.
Additional requirements:
The AWS CLI or AWS Tools for PowerShell installed
An AWS account with administrative permissions (including the ability to create IAM roles and policies) created at least 24 hours in advance.
WEDNESDAY 11/29
Birds of a Feather (BoF)
CON01 – Birds of a Feather: Containers and Open Source at AWS Cloud native architectures take advantage of on-demand delivery, global deployment, elasticity, and higher-level services to enable developer productivity and business agility. Open source is a core part of making cloud native possible for everyone. In this session, we welcome thought leaders from the CNCF, Docker, and AWS to discuss the cloud’s direction for growth and enablement of the open source community. We also discuss how AWS is integrating open source code into its container services and its contributions to open source projects.
Breakout Sessions
Level 300 (Advanced)
CON308 – Mastering Kubernetes on AWS Much progress has been made on how to bootstrap a cluster since Kubernetes’ first commit and is now only a matter of minutes to go from zero to a running cluster on Amazon Web Services. However, evolving a simple Kubernetes architecture to be ready for production in a large enterprise can quickly become overwhelming with options for configuration and customization.
In this session, Arun Gupta, Open Source Strategist for AWS and Raffaele Di Fazio, software engineer at leading European fashion platform Zalando, show the common practices for running Kubernetes on AWS and share insights from experience in operating tens of Kubernetes clusters in production on AWS. We cover options and recommendations on how to install and manage clusters, configure high availability, perform rolling upgrades and handle disaster recovery, as well as continuous integration and deployment of applications, logging, and security.
CON310 – Moving to Containers: Building with Docker and Amazon ECS If you’ve ever considered moving part of your application stack to containers, don’t miss this session. We cover best practices for containerizing your code, implementing automated service scaling and monitoring, and setting up automated CI/CD pipelines with fail-safe deployments. Manjeeva Silva and Thilina Gunasinghe show how McDonalds implemented their home delivery platform in four months using Docker containers and Amazon ECS to serve tens of thousands of customers.
Level 400 (Expert)
CON402 – Advanced Patterns in Microservices Implementation with Amazon ECS Scaling a microservice-based infrastructure can be challenging in terms of both technical implementation and developer workflow. In this talk, AWS Solutions Architect Pierre Steckmeyer is joined by Will McCutchen, Architect at BuzzFeed, to discuss Amazon ECS as a platform for building a robust infrastructure for microservices. We look at the key attributes of microservice architectures and how Amazon ECS supports these requirements in production, from configuration to sophisticated workload scheduling to networking capabilities to resource optimization. We also examine what it takes to build an end-to-end platform on top of the wider AWS ecosystem, and what it’s like to migrate a large engineering organization from a monolithic approach to microservices.
CON404 – Deep Dive into Container Scheduling with Amazon ECS As your application’s infrastructure grows and scales, well-managed container scheduling is critical to ensuring high availability and resource optimization. In this session, we deep dive into the challenges and opportunities around container scheduling, as well as the different tools available within Amazon ECS and AWS to carry out efficient container scheduling. We discuss patterns for container scheduling available with Amazon ECS, the Blox scheduling framework, and how you can customize and integrate third-party scheduler frameworks to manage container scheduling on Amazon ECS.
Chalk Talks
Level 300 (Advanced)
CON312 – Building a Selenium Fleet on the Cheap with Amazon ECS with Spot Fleet Roberto Rivera and Matthew Wedgwood, engineers at RetailMeNot, give a practical overview of setting up a fleet of Selenium nodes running on Amazon ECS with Spot Fleet. Discuss the challenges of running Selenium with high availability at minimum cost using Amazon ECS container introspection to connect the Selenium Hub with its nodes.
CON315 – Virtually There: Building a Render Farm with Amazon ECS Learn how 8i Corp scales its multi-tenanted, volumetric render farm up to thousands of instances using AWS, Docker, and an API-driven infrastructure. This render farm enables them to turn the video footage from an array of synchronized cameras into a photo-realistic hologram capable of playback on a range of devices, from mobile phones to high-end head mounted displays. Join Owen Evans, VP of Engineering for 8i, as they dive deep into how 8i’s rendering infrastructure is built and maintained by just a handful of people and powered by Amazon ECS.
CON325 – Developing Microservices – from Your Laptop to the Cloud Wesley Chow, Staff Engineer at Adroll, shows how his team extends Amazon ECS by enabling local development capabilities. Hologram, Adroll’s local development program, brings the capabilities of the Amazon EC2 instance metadata service to non-EC2 hosts, so that developers can run the same software on local machines with the same credentials source as in production.
CON327 – Patterns and Considerations for Service Discovery Roven Drabo, head of cloud operations at Kaplan Test Prep, illustrates Kaplan’s complete container automation solution using Amazon ECS along with how his team uses NGINX and HashiCorp Consul to provide an automated approach to service discovery and container provisioning.
CON328 – Building a Development Platform on Amazon ECS Quinton Anderson, Head of Engineering for Commonwealth Bank of Australia, walks through how they migrated their internal development and deployment platform from Mesos/Marathon to Amazon ECS. The platform uses a custom DSL to abstract a layered application architecture, in a way that makes it easy to plug or replace new implementations into each layer in the stack.
Workshops
Level 300 (Advanced)
CON318 – Interstella 8888: Monolith to Microservices with Amazon ECS Interstella 8888 is an intergalactic trading company that deals in rare resources, but their antiquated monolithic logistics systems are causing the business to lose money. Join this workshop to get hands-on experience deploying Docker containers as you break Interstella 8888’s aging monolithic application into containerized microservices. Using Amazon ECS and an Application Load Balancer, you create API-based microservices and deploy them leveraging integrations with other AWS services.
CON332 – Build a Java Spring Application on Amazon ECS This workshop teaches you how to lift and shift existing Spring and Spring Cloud applications onto the AWS platform. Learn how to build a Spring application container, understand bootstrap secrets, push container images to Amazon ECR, and deploy the application to Amazon ECS. Then, learn how to configure the deployment for production.
THURSDAY 11/30
Breakout Sessions
Level 200 (Introductory)
CON201 – Containers on AWS – State of the Union Just over four years after the first public release of Docker, and three years to the day after the launch of Amazon ECS, the use of containers has surged to run a significant percentage of production workloads at startups and enterprise organizations. Join Deepak Singh, General Manager of Amazon Container Services, as he covers the state of containerized application development and deployment trends, new container capabilities on AWS that are available now, options for running containerized applications on AWS, and how AWS customers successfully run container workloads in production.
Level 300 (Advanced)
CON304 – Batch Processing with Containers on AWS Batch processing is useful to analyze large amounts of data. But configuring and scaling a cluster of virtual machines to process complex batch jobs can be difficult. In this talk, we show how to use containers on AWS for batch processing jobs that can scale quickly and cost-effectively. We also discuss AWS Batch, our fully managed batch-processing service. You also hear from GoPro and Here about how they use AWS to run batch processing jobs at scale including best practices for ensuring efficient scheduling, fine-grained monitoring, compute resource automatic scaling, and security for your batch jobs.
Level 400 (Expert)
CON406 – Architecting Container Infrastructure for Security and Compliance While organizations gain agility and scalability when they migrate to containers and microservices, they also benefit from compliance and security, advantages that are often overlooked. In this session, Kelvin Zhu, lead software engineer at Okta, joins Mitch Beaumont, enterprise solutions architect at AWS, to discuss security best practices for containerized infrastructure. Learn how Okta built their development workflow with an emphasis on security through testing and automation. Dive deep into how containers enable automated security and compliance checks throughout the development lifecycle. Also understand best practices for implementing AWS security and secrets management services for any containerized service architecture.
Chalk Talks
Level 300 (Advanced)
CON329 – Full Software Lifecycle Management for Containers Running on Amazon ECS Learn how The Washington Post uses Amazon ECS to run Arc Publishing, a digital journalism platform that powers The Washington Post and a growing number of major media websites. Amazon ECS enabled The Washington Post to containerize their existing microservices architecture, avoiding a complete rewrite that would have delayed the platform’s launch by several years. In this session, Jason Bartz, Technical Architect at The Washington Post, discusses the platform’s architecture. He addresses the challenges of optimizing Arc Publishing’s workload, and managing the application lifecycle to support 2,000 containers running on more than 50 Amazon ECS clusters.
CON330 – Running Containerized HIPAA Workloads on AWS Nihar Pasala, Engineer at Aetion, discusses the Aetion Evidence Platform, a system for generating the real-world evidence used by healthcare decision makers to implement value-based care. This session discusses the architecture Aetion uses to run HIPAA workloads using containers on Amazon ECS, best practices, and learnings.
Level 400 (Expert)
CON408 – Building a Machine Learning Platform Using Containers on AWS DeepLearni.ng develops and implements machine learning models for complex enterprise applications. In this session, Thomas Rogers, Engineer for DeepLearni.ng discusses how they worked with Scotiabank to leverage Amazon ECS, Amazon ECR, Docker, GPU-accelerated Amazon EC2 instances, and TensorFlow to develop a retail risk model that helps manage payment collections for millions of Canadian credit card customers.
Workshops
Level 300 (Advanced)
CON319 – Interstella 8888: CICD for Containers on AWS Interstella 8888 is an intergalactic trading company that deals in rare resources, but their antiquated monolithic logistics systems are causing the business to lose money. Join this workshop to learn how to set up a CI/CD pipeline for containerized microservices. You get hands-on experience deploying Docker container images using Amazon ECS, AWS CloudFormation, AWS CodeBuild, and AWS CodePipeline, automating everything from code check-in to production.
FRIDAY 12/1
Breakout Sessions
Level 400 (Expert)
CON405 – Moving to Amazon ECS – the Not-So-Obvious Benefits If you ask 10 teams why they migrated to containers, you will likely get answers like ‘developer productivity’, ‘cost reduction’, and ‘faster scaling’. But teams often find there are several other ‘hidden’ benefits to using containers for their services. In this talk, Franziska Schmidt, Platform Engineer at Mapbox and Yaniv Donenfeld from AWS will discuss the obvious, and not so obvious benefits of moving to containerized architecture. These include using Docker and Amazon ECS to achieve shared libraries for dev teams, separating private infrastructure from shareable code, and making it easier for non-ops engineers to run services.
Chalk Talks
Level 300 (Advanced)
CON331 – Deploying a Regulated Payments Application on Amazon ECS Travelex discusses how they built an FCA-compliant international payments service using a microservices architecture on AWS. This chalk talk covers the challenges of designing and operating an Amazon ECS-based PaaS in a regulated environment using a DevOps model.
Workshops
Level 400 (Expert)
CON407 – Interstella 8888: Advanced Microservice Operations Interstella 8888 is an intergalactic trading company that deals in rare resources, but their antiquated monolithic logistics systems are causing the business to lose money. In this workshop, you help Interstella 8888 build a modern microservices-based logistics system to save the company from financial ruin. We give you the hands-on experience you need to run microservices in the real world. This includes implementing advanced container scheduling and scaling to deal with variable service requests, implementing a service mesh, issue tracing with AWS X-Ray, container and instance-level logging with Amazon CloudWatch, and load testing.
Know before you go
Want to brush up on your container knowledge before re:Invent? Here are some helpful resources to get started:
In recent years, Amazon Web Services and Microsoft have shifted to more of a focus on artificial intelligence (AI), and as a result, have vastly increased their investment in the implementation and advancement of AI over their fleets of online products and services.
The reasoning that Microsoft gave to justify their focus is that the popularity of AI has been steadily increasing, which can be attributed to three main factors: cloud computing, powerful algorithms, and multitudes of data.
Amazon, on the other hand, are planning to introduce a number of new AI products into the market, in order to catch up to the likes of Microsoft and Google. Through a project codenamed “Ironman”, AWS are working internally as well as partnering with startups to take a few-years-old AWS data warehousing service and gearing it to organise data for workloads employing machine learning techniques.
For more information more information on recent AI endeavours in the cloud computing market, check out our previous article below:
This post by Backblaze’s CEO and co-founder Gleb Budman is the seventh in a series about entrepreneurship. You can choose posts in the series from the list below:
Use the Join button above to receive notification of new posts in this series.
“Are you crazy?” “Why would you do that?!” “You shouldn’t share that!”
These are just a few of the common questions and comments we heard after posting some of the information we have shared over the years. So was it crazy? Misguided? Should you do it?
With that background I’d like to dig into the decision to become so transparent, from releasing stats on hard drive failures, to storage pod specs, to publishing our cloud storage costs, and open sourcing the Reed-Solomon code. What was the thought process behind becoming so transparent when most companies work so hard to hide their inner workings, especially information such as the Storage Pod specs that would normally be considered a proprietary advantage? Most importantly I’d like to explore the positives and negatives of being so transparent.
Sharing Intellectual Property
The first “transparency” that garnered a flurry of “why would you share that?!” came as a result of us deciding to open source our Storage Pod design: publishing the specs, parts, prices, and how to build it yourself. The Storage Pod was a key component of our infrastructure, gave us a cost (and thus competitive) advantage, took significant effort to develop, and had a fair bit of intellectual property: the “IP.”
The negatives of sharing this are obvious: it allows our competitors to use the design to reduce our cost advantage, and it gives away the IP, which could be patentable or have value as a trade secret.
The positives were certainly less obvious, and at the time we couldn’t have guessed how massive they would be.
We wrestled with the decision: prospective users and others online didn’t believe we could offer our service for such a low price, thinking that we would burn through some cash hoard and then go out of business. We wanted to reassure them, but how?
This is how our response evolved:
We’ve built a lower cost storage platform.
But why would anyone believe us?
Because, we’ve designed our own servers and they’re less expensive.
But why would anyone believe they were so low cost and efficient?
Because here’s how much they cost versus others.
But why would anyone believe they cost that little and still enabled us to efficiently store data?
Because here are all the components they’re made of, this is how to build them, and this is how they work.
Ok, you can’t argue with that.
Great — so that would reassure people. But should we do this? Is it worth it?
This was 2009, we were a tiny company of seven people working from our co-founder’s one-bedroom apartment. We decided that the risk of not having potential customers trust us was more impactful than the risk of our competitors possibly deciding to use our server architecture. The former might kill the company in short order; the latter might make it harder for us to compete in the future. Moreover, we figured that most competitors were established on their own platforms and were unlikely to switch to ours, even if it were better.
Takeaway: Build your brand today. There are no assurances you will make it to tomorrow if you can’t make people believe in you today.
A Sharing Success Story — The Backblaze Storage Pod
So with that, we decided to publish everything about the Storage Pod. As for deciding to actually open source it? That was a ‘thank you’ to the open source community upon whose shoulders we stood as we used software such as Linux, Tomcat, etc.
With eight years of hindsight, here’s what happened:
As best as I can tell, none of our direct competitors ever used our Storage Pod design, opting instead to continue paying more for commercial solutions.
Hundreds of press articles have been written about Backblaze as a direct result of sharing the Storage Pod design.
Millions of people have read press articles or our blog posts about the Storage Pods.
Backblaze was established as a storage tech thought leader, and a resource for those looking for information in the space.
Our blog became viewed as a resource, not a corporate mouthpiece.
Recruiting has been made easier through the awareness of Backblaze, the appreciation for us taking on challenging tech problems in interesting ways, and for our openness.
Sourcing for our Storage Pods has become easier because we can point potential vendors to our blog posts and say, “here’s what we need.”
And those are just the direct benefits for us. One of the things that warms my heart is that doing this has helped others:
Several companies have started selling servers based on our Storage Pod designs.
Netflix credits Backblaze with being the inspiration behind their CDN servers.
Many schools, labs, and others have shared that they’ve been able to do what they didn’t think was possible because using our Storage Pod designs provided lower-cost storage.
And I want to believe that in general we pushed forward the development of low-cost storage servers in the industry.
So overall, the decision on being transparent and sharing our Storage Pod designs was a clear win.
Takeaway: Never underestimate the value of goodwill. It can help build new markets that fuel your future growth and create new ecosystems.
Sharing An “Almost Acquisition”
Acquisition announcements are par for the course. No company, however, talks about the acquisition that fell through. If rumors appear in the press, the company’s response is always, “no comment.” But in 2010, when Backblaze was almost, but not acquired, we wrote about it in detail. Crazy?
The negatives of sharing this are slightly less obvious, but the two issues most people worried about were, 1) the fact that the company could be acquired would spook customers, and 2) the fact that it wasn’t would signal to potential acquirers that something was wrong.
So, why share this at all? No one was asking “did you almost get acquired?”
First, we had established a culture of transparency and this was a significant event that occurred for us, thus we defaulted to assuming we would share. Second, we learned that acquisitions fall through all the time, not just during the early fishing stage, but even after term sheets are signed, diligence is done, and all the paperwork is complete. I felt we had learned some things about the process that would be valuable to others that were going through it.
As it turned out, we received emails from startup founders saying they saved the post for the future, and from lawyers, VCs, and advisors saying they shared them with their portfolio companies. Among the most touching emails I received was from a founder who said that after an acquisition fell through she felt so alone that she became incredibly depressed, and that reading our post helped her see that this happens and that things could be OK after. Being transparent about almost getting acquired was worth it just to help that one founder.
And what about the concerns? As for spooking customers, maybe some were — but our sign-ups went up, not down, afterward. Any company can be acquired, and many of the world’s largest have been. That we were being both thoughtful about where to go with it, and open about it, I believe gave customers a sense that we would do the right thing if it happened. And as for signaling to potential acquirers? The ones I’ve spoken with all knew this happens regularly enough that it’s not a factor.
Takeaway: Being open and transparent is also a form of giving back to others.
Sharing Strategic Data
For years people have been desperate to know how reliable are hard drives. They could go to Amazon for individual reviews, but someone saying “this drive died for me” doesn’t provide statistical insight. Google published a study that showed annualized drive failure rates, but didn’t break down the results by manufacturer or model. Since Backblaze has deployed about 100,000 hard drives to store customer data, we have been able to collect a wealth of data on the reliability of the drives by make, model, and size. Was Backblaze the only one with this data? Of course not — Google, Amazon, Microsoft, and any other cloud-scale storage provider tracked it. Yet none would publish. Should Backblaze?
Again, starting with the main negatives: 1) sharing which drives we liked could increase demand for them, thus reducing availability or increasing prices, and 2) publishing the data might make the drive vendors unhappy with us, thereby making it difficult for us to buy drives.
But we felt that the largest drive purchasers (Amazon, Google, etc.) already had their own stats and would buy the drives they chose, and if individuals or smaller companies used our stats, they wouldn’t sufficiently move the overall market demand. Also, we hoped that the drive companies would see that we were being fair in our analysis and, if anything, would leverage our data to make drives even better.
Again, publishing the data resulted in tremendous value for Backblaze, with millions of people having read the analysis that we put out quarterly. Also, becoming known as the place to go for drive reliability information is a natural fit with being a backup and storage provider. In addition, in a twist from many people’s expectations, some of the drive companies actually started working closer with us, seeing that we could be a good source of data for them as feedback. We’ve also seen many individuals and companies make more data-based decisions on which drives to buy, and researchers have used the data for a variety of analyses.
Takeaway: Being open and transparent is rarely as risky as it seems.
Sharing Revenue (And Other Metrics)
Journalists always want to publish company revenue and other metrics, and private companies always shy away from sharing. For a long time we did, too. Then, we opened up about that, as well.
The negatives of sharing these numbers are: 1) external parties may otherwise perceive you’re doing better than you are, 2) if you share numbers often, you may show that growth has slowed or worse, 3) it gives your competitors info to compare their own business too.
We decided that, while some may have perceived we were bigger, our scale was plenty significant. Since we choose what we share and when, it’s up to us whether to disclose at any point. And if our competitors compare, what will they actually change that would affect us?
I did wait to share revenue until I felt I had the right person to write about it. At one point a journalist said she wouldn’t write about us unless I disclosed revenue. I suggested we had a lot to offer for the story, but didn’t want to share revenue yet. She refused to budge and I walked away from the article. Several year later, I reached out to a journalist who had covered Backblaze before and I felt understood our business and offered to share revenue with him. He wrote a deep-dive about the company, with revenue being one of the components of the story.
Sharing these metrics showed that we were at scale and running a real business, one with positive unit economics and margins, but not one where we were gouging customers.
Takeaway: Being open with the press about items typically not shared can be uncomfortable, but the press can amplify your story.
Should You Share?
For Backblaze, I believe the results of transparency have been staggering. However, it’s not for everyone. Apple has, clearly, been wildly successful taking secrecy to the extreme. In their case, early disclosure combined with the long cycle of hardware releases could significantly impact sales of current products.
“For Backblaze, I believe the results of transparency have been staggering.” — Gleb Budman
I will argue, however, that for most startups transparency wins. Most startups need to establish credibility and trust, build awareness and a fan base, show that they understand what their customers need and be useful to them, and show the soul and passion behind the company. Some startup companies try to buy these virtues with investor money, and sometimes amplifying your brand via paid marketing helps. But, authentic transparency can build awareness and trust not only less expensively, but more deeply than money can buy.
Backblaze was open from the beginning. With no outside investors, as founders we were able to express ourselves and make our decisions. And it’s easier to be a company that shares if you do it from the start, but for any company, here are a few suggestions:
Ask about sharing: If something significant happens — good or bad — ask “should we share this?” If you made a tough decision, ask “should we share the thinking behind the decision and why it was tough?”
Default to yes: It’s often scary to share, but look for the reasons to say ‘yes,’ not the reasons to say ‘no.’ That doesn’t mean you won’t sometimes decide not to, but make that the high bar.
Minimize reviews: Press releases tend to be sanitized and boring because they’ve been endlessly wordsmithed by committee. Establish the few things you don’t want shared, but minimize the number of people that have to see anything else before it can go out. Teach, then trust.
Engage: Sharing will result in comments on your blog, social, articles, etc. Reply to people’s questions and engage. It’ll make the readers more engaged and give you a better understanding of what they’re looking for.
Accept mistakes: Things will become public that aren’t perfectly sanitized. Accept that and don’t punish people for oversharing.
Building a culture of a company that is open to sharing takes time, but continuous practice will build that, and over time the company will navigate its voice and approach to sharing.
The AWS Community Heroes program helps shine a spotlight on some of the innovative work being done by rockstar AWS developers around the globe. Marrying cloud expertise with a passion for community building and education, these heroes share their time and knowledge across social media and through in-person events. Heroes also actively help drive community-led tracks at conferences. At this year’s re:Invent, many Heroes will be speaking during the Monday Community Day track.
This November, we are thrilled to have four Heroes joining our network of cloud innovators. Without further ado, meet to our newest AWS Community Heroes!
Anh Ho Viet is the founder of AWS Vietnam User Group, Co-founder & CEO of OSAM, an AWS Consulting Partner in Vietnam, an AWS Certified Solutions Architect, and a cloud lover.
At OSAM, Anh and his enthusiastic team have helped many companies, from SMBs to Enterprises, move to the cloud with AWS. They offer a wide range of services, including migration, consultation, architecture, and solution design on AWS. Anh’s vision for OSAM is beyond a cloud service provider; the company will take part in building a complete AWS ecosystem in Vietnam, where other companies are encouraged to become AWS partners through training and collaboration activities.
In 2016, Anh founded the AWS Vietnam User Group as a channel to share knowledge and hands-on experience among cloud practitioners. Since then, the community has reached more than 4,800 members and is still expanding. The group holds monthly meetups, connects many SMEs to AWS experts, and provides real-time, free-of-charge consultancy to startups. In August 2017, Anh joined as lead content creator of a program called “Cloud Computing Lectures for Universities” which includes translating AWS documentation & news into Vietnamese, providing students with fundamental, up-to-date knowledge of AWS cloud computing, and supporting students’ career paths.
Thorsten Höger is CEO and Cloud consultant at Taimos, where he is advising customers on how to use AWS. Being a developer, he focuses on improving development processes and automating everything to build efficient deployment pipelines for customers of all sizes.
Before being self-employed, Thorsten worked as a developer and CTO of Germany’s first private bank running on AWS. With his colleagues, he migrated the core banking system to the AWS platform in 2013. Since then he organizes the AWS user group in Stuttgart and is a frequent speaker at Meetups, BarCamps, and other community events.
As a supporter of open source software, Thorsten is maintaining or contributing to several projects on Github, like test frameworks for AWS Lambda, Amazon Alexa, or developer tools for CloudFormation. He is also the maintainer of the Jenkins AWS Pipeline plugin.
In his spare time, he enjoys indoor climbing and cooking.
Yu Zhang (Becky Zhang) is COO of BootDev, which focuses on Big Data solutions on AWS and high concurrency web architecture. Before she helped run BootDev, she was working at Yubis IT Solutions as an operations manager.
Becky plays a key role in the AWS User Group Shanghai (AWSUGSH), regularly organizing AWS UG events including AWS Tech Meetups and happy hours, gathering AWS talent together to communicate the latest technology and AWS services. As a female in technology industry, Becky is keen on promoting Women in Tech and encourages more woman to get involved in the community.
Becky also connects the China AWS User Group with user groups in other regions, including Korea, Japan, and Thailand. She was invited as a panelist at AWS re:Invent 2016 and spoke at the Seoul AWS Summit this April to introduce AWS User Group Shanghai and communicate with other AWS User Groups around the world.
Besides events, Becky also promotes the Shanghai AWS User Group by posting AWS-related tech articles, event forecasts, and event reports to Weibo, Twitter, Meetup.com, and WeChat (which now has over 2000 official account followers).
Nilesh Vaghela is the founder of ElectroMech Corporation, an AWS Cloud and open source focused company (the company started as an open source motto). Nilesh has been very active in the Linux community since 1998. He started working with AWS Cloud technologies in 2013 and in 2014 he trained a dedicated cloud team and started full support of AWS cloud services as an AWS Standard Consulting Partner. He always works to establish and encourage cloud and open source communities.
He started the AWS Meetup community in Ahmedabad in 2014 and as of now 12 Meetups have been conducted, focusing on various AWS technologies. The Meetup has quickly grown to include over 2000 members. Nilesh also created a Facebook group for AWS enthusiasts in Ahmedabad, with over 1500 members.
Apart from the AWS Meetup, Nilesh has delivered a number of seminars, workshops, and talks around AWS introduction and awareness, at various organizations, as well as at colleges and universities. He has also been active in working with startups, presenting AWS services overviews and discussing how startups can benefit the most from using AWS services.
Nilesh is Red Hat Linux Technologies and AWS Cloud Technologies trainer as well.
To learn more about the AWS Community Heroes Program and how to get involved with your local AWS community, click here.
We launched Convertible Reserved Instances for EC2 just about a year ago. The Convertible RIs give you a significant discount (typically 54% when compared to On-Demand) and allow you to change the instance family and other parameters associated with the RI if your needs change.
Today we are introducing Convertible RIs with a 1-year term, complementing the existing 3-year term. We are also making the Convertible Reserved Instance model more flexible by allowing you to exchange portions of your RIs and to perform bulk exchanges.
New 1-Year Convertible RIs Convertible Reserved Instances with a 1-year term are now available. This will give you more options and more flexibility; you can now purchase a mix of 1-year and 3-year Convertible Reserved Instances (CRIs) in accord with your needs. Startups with financial constraints will find this option attractive, as will other ventures that may not be in a position to make a commitment that runs for longer than one year.
Merging and Splitting Convertible RIs Let’s say that you start running your web and application servers on M4 instances and uses Convertible RIs to save money. Later, after a tuning exercise you move your application servers to C4 instances. With today’s launch you can exchange a portion of your M4 Convertible RIs for C4 Convertible RIs. You can also merge two or more CRIs (perhaps for smaller instances) and obtain one for a larger instance.
The exchange model for Convertible Reserved Instances is based on splitting, exchanging, and merging. Let’s say I own a 3-year Partial Upfront CRI for four t2.micro instances:
My application has changed and now I want to use a pair of t2.micro instances and a single r4.xlarge. The first step is to split this CRI into the part that I want to keep and the part that I want to exchange. I select it and click on Modify Reserved Instances. Then I create my desired configuration and click on Continue:
I review the request and click on Submit Modifications:
The state of the CRI changes to indicate that it is being modified. After a moment or two it will be marked as retired, replaced by a pair that are active:
Now I can exchange one of the 2-instance CRIs. I select it, click on Exchange Reserved Instance, and enter the desired configuration for my new CRI:
I click on Find Offering to see my options, and choose the desired one, an r4.xlarge Partial Upfront. As you can see, the console “does the math” takes the remaining upfront value ($139.995 in this case) of the unneeded CRIs into account when computing the upfront payment:
When I am ready to move forward I click on Exchange. This initiates the exchange process and lets me know that it may take a few minutes to complete.
I can also merge two or more Convertible Reserved Instances together and then use them as the starting point for an exchange. To do this I simply select the existing CRIs, click on Action, and choose Exchange Reserved Instances. I can see the total remaining upfront value of the selected CRIs and proceed accordingly:
You can merge CRIs that have different start dates and/or term lengths. The merged CRI will have the expiry date of the RI that is furthest from the date of exchange. Merging CRIs with different term lengths always produces a 3-year CRI.
In 2015, the Centers for Medicare and Medicaid Services (CMS) reported that healthcare spending made up 17.8% of the U.S. GDP – that’s almost $3.2 trillion or $9,990 per person. By 2025, the CMS estimates this number will increase to nearly 20%. As cloud technology evolves in the healthcare and life science industries, we are seeing how companies of all sizes are using AWS to provide powerful and innovative solutions to customers across the globe. This month we are excited to feature the following startups:
ClearCare – helping home care agencies operate efficiently and grow their business.
DNAnexus – providing a cloud-based global network for sharing and managing genomic data.
ClearCare (San Francisco, CA)
ClearCare envisions a future where home care is the only choice for aging in place. Home care agencies play a critical role in the economy and their communities by significantly lowering the overall cost of care, reducing the number of hospital admissions, and bending the cost curve of aging. Patients receiving home care typically have multiple chronic conditions and functional limitations, driving over $190 billion in healthcare spending in the U.S. each year. To offset these costs, health insurance payers are developing in-home care management programs for patients. ClearCare’s goal is to help home care agencies leverage technology to improve costs, outcomes, and quality of life for the aging population. The company’s powerful software platform is specifically designed for use by non-medical, in-home care agencies to manage their businesses.
Founder and CEO Geoff Nudd created ClearCare because of his own grandmother’s need for care. Keeping family members and caregivers up to date on a loved one’s well being can be difficult, so Geoff created what is now ClearCare’s Family Room, which enables caregivers and agency staff to check schedules and receive real-time updates about what’s happening in the home. Since then, agencies have provided feedback on others areas of their businesses that could be streamlined. ClearCare has now built over 20 modules to help home care agencies optimize operations with services including a telephony service, billing and payroll, and more. ClearCare now serves over 4,000 home care agencies, representing 500,000 caregivers and 400,000 seniors.
Using AWS, ClearCare is able to spin up reliable infrastructure for proofs of concept and iterate on those systems to quickly get value to market. The company runs many AWS services including Amazon Elasticsearch Service, Amazon RDS, and Amazon CloudFront. Amazon EMR and Amazon Athena have enabled ClearCare to build a Hadoop-based ETL and data warehousing system that processes terabytes of data each day. By utilizing these managed services, ClearCare has been able to go from concept to customer delivery in less than three months.
DNAnexus is accelerating the application of genomic data in precision medicine by providing a cloud-based platform for sharing and managing genomic and biomedical data and analysis tools. The company was founded in 2009 by Stanford graduate student Andreas Sundquist and two Stanford professors Arend Sidow and Serafim Batzoglou, to address the need for scaling secondary analysis of next-generation sequencing (NGS) data in the cloud. The founders quickly learned that users needed a flexible solution to build complex analysis workflows and tools that enable them to share and manage large volumes of data. DNAnexus is optimized to address the challenges of security, scalability, and collaboration for organizations that are pursuing genomic-based approaches to health, both in clinics and research labs. DNAnexus has a global customer base – spanning North America, Europe, Asia-Pacific, South America, and Africa – that runs a million jobs each month and is doubling their storage year-over-year. The company currently stores more than 10 petabytes of biomedical and genomic data. That is equivalent to approximately 100,000 genomes, or in simpler terms, over 50 billion Facebook photos!
DNAnexus is working with its customers to help expand their translational informatics research, which includes expanding into clinical trial genomic services. This will help companies developing different medicines to better stratify clinical trial populations and develop companion tests that enable the right patient to get the right medicine. In collaboration with Janssen Human Microbiome Institute, DNAnexus is also launching Mosaic – a community platform for microbiome research.
AWS provides DNAnexus and its customers the flexibility to grow and scale research programs. Building the technology infrastructure required to manage these projects in-house is expensive and time-consuming. DNAnexus removes that barrier for labs of any size by using AWS scalable cloud resources. The company deploys its customers’ genomic pipelines on Amazon EC2, using Amazon S3 for high-performance, high-durability storage, and Amazon Glacier for low-cost data archiving. DNAnexus is also an AWS Life Sciences Competency Partner.
Amazon Redshift makes analyzing exabyte-scale data fast, simple, and cost-effective. It delivers advanced data warehousing capabilities, including parallel execution, compressed columnar storage, and end-to-end encryption as a fully managed service, for less than $1,000/TB/year. With Amazon Redshift Spectrum, you can run SQL queries directly against exabytes of unstructured data in Amazon S3 for $5/TB scanned.
Today, we are making our Dense Compute (DC) family faster and more cost-effective with new second-generation Dense Compute (DC2) nodes at the same price as our previous generation DC1. DC2 is designed for demanding data warehousing workloads that require low latency and high throughput. DC2 features powerful Intel E5-2686 v4 (Broadwell) CPUs, fast DDR4 memory, and NVMe-based solid state disks.
We’ve tuned Amazon Redshift to take advantage of the better CPU, network, and disk on DC2 nodes, providing up to twice the performance of DC1 at the same price. Our DC2.8xlarge instances now provide twice the memory per slice of data and an optimized storage layout with 30 percent better storage utilization.
Customer successes
Several flagship customers, ranging from fast growing startups to large Fortune 100 companies, previewed the new DC2 node type. In their tests, DC2 provided up to twice the performance as DC1. Our preview customers saw faster ETL (extract, transform, and load) jobs, higher query throughput, better concurrency, faster reports, and shorter data-to-insights—all at the same cost as DC1. DC2.8xlarge customers also noted that their databases used up to 30 percent less disk space due to our optimized storage format, reducing their costs.
4Cite Marketing, one of America’s fastest growing private companies, uses Amazon Redshift to analyze customer data and determine personalized product recommendations for retailers. “Amazon Redshift’s new DC2 node is giving us a 100 percent performance increase, allowing us to provide faster insights for our retailers, more cost-effectively, to drive incremental revenue,” said Jim Finnerty, 4Cite’s senior vice president of product.
BrandVerity, a Seattle-based brand protection and compliance company, provides solutions to monitor, detect, and mitigate online brand, trademark, and compliance abuse. “We saw a 70 percent performance boost with the DC2 nodes for running Redshift Spectrum queries. As a result, we can analyze far more data for our customers and deliver results much faster,” said Hyung-Joon Kim, principal software engineer at BrandVerity.
“Amazon Redshift is at the core of our operations and our marketing automation tools,” said Jarno Kartela, head of analytics and chief data scientist at DNA Plc, one of the leading Finnish telecommunications groups and Finland’s largest cable operator and pay TV provider. “We saw a 52 percent performance gain in moving to Amazon Redshift’s DC2 nodes. We can now run queries in half the time, allowing us to provide more analytics power and reduce time-to-insight for our analytics and marketing automation users.”
You can try the new node type using our getting started guide. Just choose dc2.large or dc2.8xlarge in the Amazon Redshift console:
If you have a DC1.large Amazon Redshift cluster, you can restore to a new DC2.large cluster using an existing snapshot. To migrate from DS2.xlarge, DS2.8xlarge, or DC1.8xlarge Amazon Redshift clusters, you can use the resize operation to move data to your new DC2 cluster. For more information, see Clusters and Nodes in Amazon Redshift.
To get the latest Amazon Redshift feature announcements, check out our What’s New page, and subscribe to the RSS feed.
This post by Backblaze’s CEO and co-founder Gleb Budman is the sixth in a series about entrepreneurship. You can choose posts in the series from the list below:
Use the Join button above to receive notification of new posts in this series.
Perhaps your business is competing in a brand new space free from established competitors. Most of us, though, start companies that compete with existing offerings from large, established companies. You need to come up with a better mousetrap — not the first mousetrap.
That’s the challenge Backblaze faced. In this post, I’d like to share some of the lessons I learned from that experience.
Backblaze vs. Giants
Competing with established companies that are orders of magnitude larger can be daunting. How can you succeed?
I’ll set the stage by offering a few sets of giants we compete with:
When we started Backblaze, we offered online backup in a market where companies had been offering “online backup” for at least a decade, and even the newer entrants had raised tens of millions of dollars.
When we built our storage servers, the alternatives were EMC, NetApp, and Dell — each of which had a market cap of over $10 billion.
When we introduced our cloud storage offering, B2, our direct competitors were Amazon, Google, and Microsoft. You might have heard of them.
What did we learn by competing with these giants on a bootstrapped budget? Let’s take a look.
Determine What Success Means
For a long time Apple considered Apple TV to be a hobby, not a real product worth focusing on, because it did not generate a billion in revenue. For a $10 billion per year revenue company, a new business that generates $50 million won’t move the needle and often isn’t worth putting focus on. However, for a startup, getting to $50 million in revenue can be the start of a wildly successful business.
Lesson Learned: Don’t let the giants set your success metrics.
The Advantages Startups Have
The giants have a lot of advantages: more money, people, scale, resources, access, etc. Following their playbook and attacking head-on means you’re simply outgunned. Common paths to failure are trying to build more features, enter more markets, outspend on marketing, and other similar approaches where scale and resources are the primary determinants of success.
But being a startup affords many advantages most giants would salivate over. As a nimble startup you can leverage those to succeed. Let’s breakdown nine competitive advantages we’ve used that you can too.
1. Drive Focus
It’s hard to build a $10 billion revenue business doing just one thing, and most giants have a broad portfolio of businesses, numerous products for each, and targeting a variety of customer segments in multiple markets. That adds complexity and distributes management attention.
Startups get the benefit of having everyone in the company be extremely focused, often on a singular mission, product, customer segment, and market. While our competitors sell everything from advertising to Zantac, and are investing in groceries and shipping, Backblaze has focused exclusively on cloud storage. This means all of our best people (i.e. everyone) is focused on our cloud storage business. Where is all of your focus going?
Lesson Learned: Align everyone in your company to a singular focus to dramatically out-perform larger teams.
2. Use Lack-of-Scale as an Advantage
You may have heard Paul Graham say “Do things that don’t scale.” There are a host of things you can do specifically because you don’t have the same scale as the giants. Use that as an advantage.
When we look for data center space, we have more options than our largest competitors because there are simply more spaces available with room for 100 cabinets than for 1,000 cabinets. With some searching, we can find data center space that is better/cheaper.
When a flood in Thailand destroyed factories, causing the world’s supply of hard drives to plummet and prices to triple, we started drive farming. The giants certainly couldn’t. It was a bit crazy, but it let us keep prices unchanged for our customers.
Our Chief Cloud Officer, Tim, used to work at Adobe. Because of their size, any new product needed to always launch in a multitude of languages and in global markets. Once launched, they had scale. But getting any new product launched was incredibly challenging.
Lesson Learned: Use lack-of-scale to exploit opportunities that are closed to giants.
3. Build a Better Product
This one is probably obvious. If you’re going to provide the same product, at the same price, to the same customers — why do it? Remember that better does not always mean more features. Here’s one way we built a better product that didn’t require being a bigger company.
All online backup services required customers to choose what to include in their backup. We found that this was complicated for users since they often didn’t know what needed to be backed up. We flipped the model to back up everything and allow users to exclude if they wanted to, but it was not required. This reduced the number of features/options, while making it easier and better for the user.
This didn’t require the resources of a huge company; it just required understanding customers a bit deeper and thinking about the solution differently. Building a better product is the most classic startup competitive advantage.
Lesson Learned: Dig deep with your customers to understand and deliver a better mousetrap.
4. Provide Better Service
How can you provide better service? Use your advantages. Escalations from your customer care folks to engineering can go through fewer hoops. Fixing an issue and shipping can be quicker. Access to real answers on Twitter or Facebook can be more effective.
A strategic decision we made was to have all customer support people as full-time employees in our headquarters. This ensures they are in close contact to the whole company for feedback to quickly go both ways.
Having a smaller team and fewer layers enables faster internal communication, which increases customer happiness. And the option to do things that don’t scale — such as help a customer in a unique situation — can go a long way in building customer loyalty.
Lesson Learned: Service your customers better by establishing clear internal communications.
5. Remove The Unnecessary
After determining that the industry standard EMC/NetApp/Dell storage servers would be too expensive to build our own cloud storage upon, we decided to build our own infrastructure. Many said we were crazy to compete with these multi-billion dollar companies and that it would be impossible to build a lower cost storage server. However, not only did it prove to not be impossible — it wasn’t even that hard.
One key trick? Remove the unnecessary. While EMC and others built servers to sell to other companies for a wide variety of use cases, Backblaze needed servers that only Backblaze would run, and for a single use case. As a result we could tailor the servers for our needs by removing redundancy from each server (since we would run redundant servers), and using lower-performance components (since we would get high-performance by running parallel servers).
What do your customers and use cases not need? This can trim costs and complexity while often improving the product for your use case.
Lesson Learned: Don’t think “what can we add” to what the giants offer — think “what can we remove.”
6. Be Easy
How many times have you visited a large company website, particularly one that’s not consumer-focused, only to leave saying, “Huh? I don’t understand what you do.” Keeping your website clear, and your product and pricing simple, will dramatically increase conversion and customer satisfaction. If you’re able to make it 2x easier and thus increasing your conversion by 2x, you’ve just allowed yourself to spend ½ as much acquiring a customer.
Providing unlimited data backup wasn’t specifically about providing more storage — it was about making it easier. Since users didn’t know how much data they needed to back up, charging per gigabyte meant they wouldn’t know the cost. Providing unlimited data backup meant they could just relax.
Customers love easy — and being smaller makes easy easier to deliver. Use that as an advantage in your website, marketing materials, pricing, product, and in every other customer interaction.
Lesson Learned: Ease-of-use isn’t a slogan: it’s a competitive advantage. Treat it as seriously as any other feature of your product
7. Don’t Be Afraid of Risk
Obviously unnecessary risks are unnecessary, and some risks aren’t worth taking. However, large companies that have given guidance to Wall Street with a $0.01 range on their earning-per-share are inherently going to be very risk-averse. Use risk-tolerance to open up opportunities, and adjust your tolerance level as you scale. In your first year, there are likely an infinite number of ways your business may vaporize; don’t be too worried about taking a risk that might have a 20% downside when the upside is hockey stick growth.
Using consumer-grade hard drives in our servers may have caused pain and suffering for us years down-the-line, but they were priced at approximately 50% of enterprise drives. Giants wouldn’t have considered the option. Turns out, the consumer drives performed great for us.
Lesson Learned: Use calculated risks as an advantage.
8. Be Open
The larger a company grows, the more it wants to hide information. Some of this is driven by regulatory requirements as a public company. But most of this is cultural. Sharing something might cause a problem, so let’s not. All external communication is treated as a critical press release, with rounds and rounds of editing by multiple teams and approvals. However, customers are often desperate for information. Moreover, sharing information builds trust, understanding, and advocates.
I started blogging at Backblaze before we launched. When we blogged about our Storage Pod and open-sourced the design, many thought we were crazy to share this information. But it was transformative for us, establishing Backblaze as a tech thought leader in storage and giving people a sense of how we were able to provide our service at such a low cost.
Over the years we’ve developed a culture of being open internally and externally, on our blog and with the press, and in communities such as Hacker News and Reddit. Often we’ve been asked, “why would you share that!?” — but it’s the continual openness that builds trust. And that culture of openness is incredibly challenging for the giants.
Lesson Learned: Overshare to build trust and brand where giants won’t.
9. Be Human
As companies scale, typically a smaller percent of founders and executives interact with customers. The people who build the company become more hidden, the language feels “corporate,” and customers start to feel they’re interacting with the cliche “faceless, nameless corporation.” Use your humanity to your advantage. From day one the Backblaze About page listed all the founders, and my email address. While contacting us shouldn’t be the first path for a customer support question, I wanted it to be clear that we stand behind the service we offer; if we’re doing something wrong — I want to know it.
To scale it’s important to have processes and procedures, but sometimes a situation falls outside of a well-established process. While we want our employees to follow processes, they’re still encouraged to be human and “try to do the right thing.” How to you strike this balance? Simon Sinek gives a good talk about it: make your employees feel safe. If employees feel safe they’ll be human.
If your customer is a consumer, they’ll appreciate being treated as a human. Even if your customer is a corporation, the purchasing decision-makers are still people.
Lesson Learned: Being human is the ultimate antithesis to the faceless corporation.
Build Culture to Sustain Your Advantages at Scale
Presumably the goal is not to always be competing with giants, but to one day become a giant. Does this mean you’ll lose all of these advantages? Some, yes — but not all. Some of these advantages are cultural, and if you build these into the culture from the beginning, and fight to keep them as you scale, you can keep them as you become a giant.
Tesla still comes across as human, with Elon Musk frequently interacting with people on Twitter. Apple continues to provide great service through their Genius Bar. And, worst case, if you lose these at scale, you’ll still have the other advantages of being a giant such as money, people, scale, resources, and access.
Of course, some new startup will be gunning for you with grand ambitions, so just be sure not to get complacent. 😉
As consumers continue to demand faster, simpler, and more on-the-go services, FinTech companies are responding with ever more innovative solutions to fit everyone’s needs and to improve customer experience. This month, we are excited to feature the following startups—all of whom are disrupting traditional financial services in unique ways:
Acorns – allowing customers to invest spare change automatically.
Bondlinc – improving the bond trading experience for clients, financial institutions, and private banks.
Lenda – reimagining homeownership with a secure and streamlined online service.
Acorns (Irvine, CA)
Driven by the belief that anyone can grow wealth, Acorns is relentlessly pursuing ways to help make that happen. Currently the fastest-growing micro-investing app in the U.S., Acorns takes mere minutes to get started and is currently helping over 2.2 million people grow their wealth. And unlike other FinTech apps, Acorns is focused on helping America’s middle class – namely the 182 million citizens who make less than $100,000 per year – and looking after their financial best interests.
Acorns is able to help their customers effortlessly invest their money, little by little, by offering ETF portfolios put together by Dr. Harry Markowitz, a Nobel Laureate in economic sciences. They also offer a range of services, including “Round-Ups,” whereby customers can automatically invest spare change from every day purchases, and “Recurring Investments,” through which customers can set up automatic transfers of just $5 per week into their portfolio. Additionally, Found Money, Acorns’ earning platform, can help anyone spend smarter as the company connects customers to brands like Lyft, Airbnb, and Skillshare, who then automatically invest in customers’ Acorns account.
The Acorns platform runs entirely on AWS, allowing them to deliver a secure and scalable cloud-based experience. By utilizing AWS, Acorns is able to offer an exceptional customer experience and fulfill its core mission. Acorns uses Terraform to manage services such as Amazon EC2 Container Service, Amazon CloudFront, and Amazon S3. They also use Amazon RDS and Amazon Redshift for data storage, and Amazon Glacier to manage document retention.
Acorns is hiring! Be sure to check out their careers page if you are interested.
Bondlinc (Singapore)
Eng Keong, Founder and CEO of Bondlinc, has long wanted to standardize, improve, and automate the traditional workflows that revolve around bond trading. As a former trader at BNP Paribas and Jefferies & Company, E.K. – as Keong is known – had personally seen how manual processes led to information bottlenecks in over-the-counter practices. This drove him, along with future Bondlinc CTO Vincent Caldeira, to start a new service that maximizes efficiency, information distribution, and accessibility for both clients and bankers in the bond market.
Currently, bond trading requires banks to spend a significant amount of resources retrieving data from expensive and restricted institutional sources, performing suitability checks, and attaching required documentation before presenting all relevant information to clients – usually by email. Bankers are often overwhelmed by these time-consuming tasks, which means clients don’t always get proper access to time-sensitive bond information and pricing. Bondlinc bridges this gap between banks and clients by providing a variety of solutions, including easy access to basic bond information and analytics, updates of new issues and relevant news, consolidated management of your portfolio, and a chat function between banker and client. By making the bond market much more accessible to clients, Bondlinc is taking private banking to the next level, while improving efficiency of the banks as well.
As a startup running on AWS since inception, Bondlinc has built and operated its SaaS product by leveraging Amazon EC2, Amazon S3, Elastic Load Balancing, and Amazon RDS across multiple Availability Zones to provide its customers (namely, financial institutions) a highly available and seamlessly scalable product distribution platform. Bondlinc also makes extensive use of Amazon CloudWatch, AWS CloudTrail, and Amazon SNS to meet the stringent operational monitoring, auditing, compliance, and governance requirements of its customers. Bondlinc is currently experimenting with Amazon Lex to build a conversational interface into its mobile application via a chat-bot that provides trading assistance services.
To see how Bondlinc works, request a demo at Bondlinc.com.
Lenda (San Francisco, CA)
Lenda is a digital mortgage company founded by seasoned FinTech entrepreneur Jason van den Brand. Jason wanted to create a smarter, simpler, and more streamlined system for people to either get a mortgage or refinance their homes. With Lenda, customers can find out if they are pre-approved for loans, and receive accurate, real-time mortgage rate quotes from industry-experienced home loan advisors. Lenda’s advisors support customers through the loan process by providing financial advice and guidance for a seamless experience.
Lenda’s innovative platform allows borrowers to complete their home loans online from start to finish. Through a savvy combination of being a direct lender with proprietary technology, Lenda has simplified the mortgage application process to save customers time and money. With an interactive dashboard, customers know exactly where they are in the mortgage process and can manage all of their documents in one place. The company recently received its Series A funding of $5.25 million, and van den Brand shared that most of the capital investment will be used to improve Lenda’s technology and fulfill the company’s mission, which is to reimagine homeownership, starting with home loans.
A robust analytics platform is critical for an organization’s success. However, an analytical system is only as good as the data that is used to power it. Wrong or incomplete data can derail analytical projects completely. Moreover, with varied data types and sources being added to the analytical platform, it’s important that the platform is able to keep up. This means the system has to be flexible enough to adapt to changing data types and volumes.
For a platform based on relational databases, this would mean countless data model updates, code changes, and regression testing, all of which can take a very long time. For a decision support system, it’s imperative to provide the correct answers quickly. Architects must design analytical platforms with the following criteria:
Help ingest data easily from multiple sources
Analyze it for completeness and accuracy
Use it for metrics computations
Store the data assets and scale as they grow rapidly without causing disruptions
Adapt to changes as they happen
Have a relatively short development cycle that is repeatable and easy to implement.
Schema-on-read is a unique approach for storing and querying datasets. It reverses the order of things when compared to schema-on-write in that the data can be stored as is and you apply a schema at the time that you read it. This approach has changed the time to value for analytical projects. It means you can immediately load data and can query it to do exploratory activities.
In this post, I show how to build a schema-on-read analytical pipeline, similar to the one used with relational databases, using Amazon Athena. The approach is completely serverless, which allows the analytical platform to scale as more data is stored and processed via the pipeline.
The data integration challenge
Part of the reason why it’s so difficult to build analytical platforms are the challenges associated with data integration. This is usually done as a series of Extract Transform Load (ETL) jobs that pull data from multiple varied sources and integrate them into a central repository. The multiple steps of ETL can be viewed as a pipeline where raw data is fed in one end and integrated data accumulates at the other.
Two major hurdles complicate the development of ETL pipelines:
The sheer volume of data needing to be integrated makes it very difficult to scale these jobs.
There are an immense variety of data formats from where information needs to be gathered.
Put this in context by looking at the healthcare industry. Building an analytical pipeline for healthcare usually involves working with multiple datasets that cover care quality, provider operations, patient clinical records, and patient claims data. Customers are now looking for even more data sources that may contain valuable information that can be analyzed. Examples include social media data, data from devices and sensors, and biometric and human-generated data such as notes from doctors. These new data sources do not rely on fixed schemas, so integrating them by conventional ETL jobs is much more difficult.
The volume of data available for analysis is growing as well. According to an article published by NCBI, the US healthcare system reached a collective data volume of 150 exabytes in 2011. With the current rate of growth, it is expected to quickly reach zettabyte scale, and soon grow into the yottabytes.
Why schema-on-read?
The traditional data warehouses of the early 2000s, like the one shown in the following diagram, were based on a standard four-layer approach of ingest, stage, store, and report. This involved building and maintaining huge data models that suited certain sources and analytical queries. This type of design is based on a schema-on-write approach, where you write data into a predefined schema and read data by querying it.
As we progressed to more varied datasets and analytical requirements, the predefined schemas were not able to keep up. Moreover, by the time the models were built and put into production, the requirements had changed.
Schema-on-read provides much needed flexibility to an analytical project. Not having to rely on physical schemas improves the overall performance when it comes to high volume data loads. Because there are no physical constraints for the data, it works well with datasets with undefined data structures. It gives customers the power to experiment and try out different options.
How can Athena help?
Athena is a serverless analytical query engine that allows you to start querying data stored in Amazon S3 instantly. It supports standard formats like CSV and Parquet and integrates with Amazon QuickSight, which allows you to build interactive business intelligence reports.
The Amazon Athena User Guide provides multiple best practices that should be considered when using Athena for analytical pipelines at production scale. There are various performance tuning techniques that you can apply to speed up query performance. For more information, see Top 10 Performance Tuning Tips for Amazon Athena.
Solution architecture
To demonstrate this solution, I use the healthcare dataset from the Centers for Disease Control (CDC) Behavioral Risk Factor Surveillance system (BRFSS). It gathers data via telephone surveys for health-related risk behaviors and conditions, and the use of preventive services. The dataset is available as zip files from the CDC FTP portal for general download and analysis. There is also a user guide with comprehensive details about the program and the process of collecting data.
The following diagram shows the schema-on-read pipeline that demonstrates this solution.
In this architecture, S3 is the central data repository for the CSV files, which are divided by behavioral risk factors like smoking, drinking, obesity, and high blood pressure.
Athena is used to collectively query the CSV files in S3. It uses the AWS Glue Data Catalog to maintain the schema details and applies it at the time of querying the data.
The dataset is further filtered and transformed into a subset that is specifically used for reporting with Amazon QuickSight.
As new data files become available, they are incrementally added to the S3 bucket and the subset query automatically appends them to the end of the table. The dashboards in Amazon QuickSight are refreshed with the new values of calculated metrics.
Walkthrough
To use Athena in an analytical pipeline, you have to consider how to design it for initial data ingestion and the subsequent incremental ingestions. Moreover, you also have to decide on the mechanism to trigger a particular stage in the pipeline, which can either be scheduled or event-based.
For the purposes of this post, you build a simple pipeline where the data is:
Ingested into a staging area in S3
Filtered and transformed for reporting into a different S3 location
Transformed into a reporting table in Athena
Included into a dashboard created using Amazon QuickSight
Incrementally ingested for updates
This approach is highly customizable and can be used for building more complex pipelines with many more steps.
Data staging in S3
The data ingestion into S3 is fairly straightforward. The files can be ingested via the S3 command line interface (CLI), API, or the AWS Management Console. The data files are in CSV format and already divided by the behavioral condition. To improve performance, I recommend that you use partitioned data with Athena, especially when dealing with large volumes. You can use pre-partitioned data in S3 or build partitions later in the process. The example you are working with has a total of 247 CSV files storing about 205 MB of data across them, but typical production scale deployment would be much larger.
To automate the pipeline, you can make it either event-based or schedule-based. If you take the event-based approach, you can make use of S3 events to trigger an action when the files are uploaded. The event triggers an action, using an AWS Lambda function that corresponds to another step in the pipeline. Traditional ETL jobs have to rely on mechanisms like database triggers to enable this, which can cause additional performance overhead.
If you choose to go with a scheduled-based approach, you can use Lambda with scheduled events. The schedule is managed via a cron expression and the Lambda function is used to run the next step of the pipeline. This is suitable for workloads similar to a scheduled batch ETL job.
Filter and data transformation
To filter and transform the dataset, first look at the overall counts and structure of the data. This allows you to choose the columns that are important for a given report and use the filter clause to extract a subset of the data. Transforming the data as you progress through the pipeline ensures that you are only exposing relevant data to the reporting layer, to optimize performance.
To look at the entire dataset, create a table in Athena to go across the entire data volume. This can be done using the following query:
CREATE EXTERNAL TABLE IF NOT EXISTS brfsdata(
ID STRING,
HIW STRING,
SUSA_NAME STRING,
MATCH_NAME STRING,
CHSI_NAME STRING,
NSUM STRING,
MEAN STRING,
FLAG STRING,
IND STRING,
UP_CI STRING,
LOW_CI STRING,
SEMEAN STRING,
AGE_ADJ STRING,
DATASRC STRING,
FIPS STRING,
FIPNAME STRING,
HRR STRING,
DATA_YR STRING,
UNIT STRING,
AGEGRP STRING,
GENDER STRING,
RACE STRING,
EHN STRING,
EDU STRING,
FAMINC STRING,
DISAB STRING,
METRO STRING,
SEXUAL STRING,
FAMSTRC STRING,
MARITAL STRING,
POP_SPC STRING,
POP_POLICY STRING
)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
ESCAPED BY '\\'
LINES TERMINATED BY '\n'
LOCATION "s3://<YourBucket/YourPrefix>"
Replace YourBucket and YourPrefix with your corresponding values.
In this case, there are a total of ~1.4 million records, which you get by running a simple COUNT(*) query on the table.
From here, you can run multiple analysis queries on the dataset. For example, you can find the number of records that fall into a certain behavioral risk, or the state that has the highest number of diabetic patients recorded. These metrics provide data points that help to determine the attributes that would be needed from a reporting perspective.
As you can see, this is much simpler compared to a schema-on-write approach where the analysis of the source dataset is much more difficult. This solution allows you to design the reporting platform in accordance with the questions you are looking to answer from your data. The source data analysis is the first step to design a good analytical platform and this approach allows customers to do that much earlier in the project lifecycle.
Reporting table
After you have completed the source data analysis, the next step is to filter out the required data and transform it to create a reporting database. This is synonymous to the data mart in a standard analytical pipeline. Based on the analysis carried out in the previous step, you might notice some mismatches with the data headers. You might also identify the filter clauses to apply to the dataset to get to your reporting data.
Athena automatically saves query results in S3 for every run. The default bucket for this is created in the following format:
Athena creates a prefix for each saved query and stores the result set as CSV files organized by dates. You can use this feature to filter out result datasets and store them in an S3 bucket for reporting.
To enable this, create queries that can filter out and transform the subset of data on which to report. For this use case, create three separate queries to filter out unwanted data and fix the column headers:
Query 1:
SELECT ID, up_ci AS source, semean AS state, datasrc
AS year, fips AS unit, fipname AS age,mean, current_date AS dt, current_time AS tm FROM brfsdata
WHERE ID != '' AND hrr IS NULL AND semean NOT LIKE '%29193%'
Query 2:
SELECT ID, up_ci AS source, semean AS state, datasrc
AS year, fips AS unit, fipname AS age,mean, current_date AS dt, current_time AS tm
FROM brfsdata WHERE ID != '' AND hrr IS NOT NULL AND up_ci LIKE '%BRFSS%'and semean NOT LIKE '"%' AND semean NOT LIKE '%29193%'
Query 3:
SELECT ID, low_ci AS source, age_adj AS state, fips
AS year, fipname AS unit, hrr AS age,mean, current_date AS dt, current_time AS tm
FROM brfsdata WHERE ID != '' AND hrr IS NOT NULL AND up_ci NOT LIKE '%BRFSS%' AND age_adj NOT LIKE '"%' AND semean NOT LIKE '%29193%' AND low_ci LIKE 'BRFSS'
You can save these queries in Athena so that you can get to the query results easily every time they are executed. The following screenshot is an example of the results when query 1 is executed four times.
The next step is to copy these results over to a new bucket for creating your reporting table. This can be done by running an S3 CP command from the CLI or API, as shown below:
Note the prefix structure in which Athena stores query results. It creates a separate prefix for each day in which the query is executed, and stores the corresponding CSV and metadata file for each run. Copy the result set over to a new prefix “Reporting_Data”. Use the “exclude” and “include” option of S3 CP to only copy the CSV files and use “recursive” to copy all the files from the run.
You can replace the value of the saved query name from “Query1” to “Query2” or “Query3” to copy all data resulting from those queries to the same target prefix. For pipelines that require more complicated transformations, divide the query transformation into multiple steps and execute them based on events or schedule them, as described in the earlier data staging step.
Amazon QuickSight dashboard
After the filtered results sets are copied as CSV files into the new Reporting_Data prefix, create a new table in Athena that is used specifically for BI reporting. This can be done using a create table statement similar to the one below:
CREATE EXTERNAL TABLE IF NOT EXISTS BRFSS_REPORTING(
ID varchar(100),
source varchar(100),
state varchar(100),
year int,
unit varchar(10),
age varchar(10),
mean float
)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
ESCAPED BY '\\'
LINES TERMINATED BY '\n'
LOCATION "s3://healthdataset/brfsdata/Reporting_Data"
This table can now act as a source for a dashboard on Amazon QuickSight, which is a straightforward process to enable. When you choose a new data source, Athena shows up as an option and Amazon QuickSight automatically detects the tables in Athena that are exposed for querying. Here are the data sources supported by Athena at the time of this post:
After choosing Athena, give a name to the data source and choose the database. The tables available for querying automatically show up in the list.
If you choose “BRFSS_REPORTING”, you can create custom metrics using the columns in the reporting table, which can then be used in reports and dashboards.
To build a complete pipeline, think about ingesting data incrementally for new records as they become available. To enable this, make sure that the data can be incrementally ingested into the reporting schema and that the reporting metrics are refreshed on each run of the report. To demonstrate this, look at a scenario where the new dataset is ingested in a periodic basis into S3, and which has to be included into the reporting schema when calculating the metrics.
Look at the number of records in the reporting table before the incremental ingestion.
SELECT count(*) FROM brfss_reporting;
The results are as follows:
_col0
1
713123
Use Query1 as an example transformation query that can isolate the incremental load. Here is a view of the query result bucket before Query1 runs.
After the incremental data is ingested in S3, trigger an event (or pre-schedule) an execution of Query1 in Athena, which results in the csv result set and the metadata file as shown below.
Next, trigger (or schedule) the copy command to copy the incremental records file into the reporting prefix. This is easily automated by using the predefined structure in which Athena saves the query results on S3. On checking the records count in the reporting table after copying, you get an increased count.
_col0
1
810042
This shows that 96,919 records were added to our reporting table that can be used in metric calculations.
This process can be implemented programmatically to add incremental records into the reporting table every time new records are ingested into the staging area. As a result, you can simulate an end-to-end analytical pipeline that runs based on events or is scheduled to run as a batch workload.
Conclusion
Using Athena, you can benefit from the advantages of a schema-on-read analytical application. You can combine other AWS services like Lambda and Amazon QuickSight to build an end-to-end analytical pipeline.
However, it’s important to note that a schema-on-read analytical pipeline may not be the answer for all use cases. Carefully consider the choices between schema-on-read and schema-on-write. The source systems you are working with play a critical role in making that decision. For example:
Some systems have defined data structures and the chances of variation is very rare. These systems can work with a fixed relational target schema.
If the target queries mostly involve joining across normalized tables, they work better with a relational database. For this use case, schema-on-write is a good choice.
The query performance in a schema-on-write application is faster as the data is pre-structured. For fixed dashboards with little to no changes, a schema-on-write is a good choice.
You can also choose to go hybrid. Offload a part of the pipeline that deals with unstructured flat datasets into a schema-on-read architecture, and integrate the output into the main schema-on-write pipeline at a later stage. AWS provides multiple options to build analytical pipelines that suit various use cases. For more information, read about the AWS big data and analytics services.
If you have questions or suggestions, please comment below.
Ujjwal Ratan is a healthcare and life sciences Solutions Architect at AWS. He has worked with organizations ranging from large enterprises to smaller startups on problems related to distributed computing, analytics and machine learning. In his free time, he enjoys listening to (and playing) music and taking unplanned road trips with his family.
This post by Backblaze’s CEO and co-founder Gleb Budman is the fifth in a series about entrepreneurship. You can choose posts in the series from the list below:
Use the Join button above to receive notification of new posts in this series.
In my previous posts, I talked about coming up with an idea, determining the solution, and getting your first customers. But you’re building a company, not a product. Let’s talk about what the first year should look like.
The primary goals for that first year are to: 1) set up the company; 2) build, launch, and learn; and 3) survive.
Setting Up the Company
The company you’re building is more than the product itself, and you’re not going to do it alone. You don’t want to spend too much time on this since getting customers is key, but if you don’t set up the basics, there are all sorts of issues down the line.
Find Your Co-Founders & Determine Roles
You may already have the idea, but who do you need to execute it? At Backblaze, we needed people to build the web experience, the client backup application, and the server/storage side. We also needed someone to handle the business/marketing aspects, and we felt that the design and user experience were critical. As a result, we started with five co-founders: three engineers, a designer, and me for the business and marketing.
Of course not every role needs to be filled by a co-founder. You can hire employees for positions as well. But think through the strategic skills you’ll need to launch and consider co-founders with those skill sets.
Too many people think they can just “work together” on everything. Don’t. Determine roles as quickly as possible so that it’s clear who is responsible for what work and which decisions. We were lucky in that we had worked together and thus knew what each person would do, but even so we assigned titles early on to clarify roles.
Takeaway: Fill critical roles and explicitly split roles and responsibilities.
Get Your Legal Basics In Place
When we’re excited about building a product, legal basics are often the last thing we want to deal with. You don’t need to go overboard, but it’s critical to get certain things done.
Determine ownership split. What is the percentage breakdown of the company that each of the founders will own? It can be a tough discussion, but it only becomes more difficult later when there is more value and people have put more time into it. At Backblaze we split the equity equally five ways. This is uncommon. The benefit of this is that all the founders feel valued and “in it together.” The benefit of the more common split where someone has a dominant share is that person is typically empowered to be the ultimate decision-maker. Slicing Pie provides some guidance on how to think about splitting equity. Regardless of which way you want you go, don’t put it off.
Incorporate. Hard to be a company if you’re not. There are various formats, but if you plan to raise angel/venture funding, a Delaware-based C-corp is standard.
Deal With Stock. At a minimum, issue stock to the founders, have each one buy their shares, and file an 83(b). Buying your shares at this stage might be $100. Filing the 83(b) election marks the date at which you purchased your shares, and shows that you bought them for what they were worth. This one piece of paper paper can make the difference between paying long-term capital gains rates (~20%) or income tax rates (~40%).
Assign Intellectual Property. Ask everyone to sign a Proprietary Information and Inventions Assignment (“PIIA”). This document says that what they do at the company is owned by the company. Early on we had a friend who came by and brainstormed ideas. We thought of it as interesting banter. He later said he owned part of our storage design. While we worked it out together, a PIIA makes ownership clear.
The ownership split can be worked out by the founders directly. For the other items, I would involve lawyers. Some law firms will set up the basics and defer payment until you raise money or the business can pay for services out of operations. Gunderson Dettmer did that for us (ask for Bennett Yee). Cooley will do this on a casey-by-case basis as well.
Takeaway: Don’t let the excitement of building a company distract you from filing the basic legal documents required to protect and grow your company.
Get Health Insurance
This item may seem out of place, but not having health insurance can easily bankrupt you personally, and that certainly won’t bode well for your company. While you can buy individual health insurance, it will often be less expensive to buy it as a company. Also, it will make recruiting employees more difficult if you do not offer healthcare. When we contacted brokers they asked us to send the W-2 of each employee that wanted coverage, but the founders weren’t taking a salary at first. To work around this, make the founders ‘officers’ of the company, and the healthcare brokers can then insure them. (Of course, you need to be ok with your co-founders being officers, but hopefully, that is logical anyway.)
Takeaway: Don’t take your co-founders’ physical and financial health for granted. Health insurance can serve as both individual protection and a recruiting tool for future employees.
Building, Launching & Learning
Getting the company set up gives you the foundation, but ultimately a company with no product and no customers isn’t very interesting.
Build
Ideally, you have one person on the team focusing on all of the items above and everyone else can be heads-down building product. There is a lot to say about building product, but for this post, I’ll just say that your goal is to get something out the door that is good enough to start collecting feedback. It doesn’t have to have every feature you dream of and doesn’t have to support 1 billion users on day one.
Launch
If you’re building a car or rocket, that may take some time. But with the availability of open-source software and cloud services, most startups should launch inside of a year.
Launching forces a scoping of the feature set to what’s critical, rallies the company around a goal, starts building awareness of your company and solution, and pushes forward the learning process. Backblaze launched in public beta on June 2, 2008, eight months after the founders all started working on it full-time.
Takeaway: Focus on the most important features and launch.
Learn & Iterate
As much as we think we know about the customers and their needs, the launch process and beyond opens up all sorts of insights. This early period is critical to collect feedback and iterate, especially while both the product and company are still quite malleable. We initially planned on building peer-to-peer and local backup immediately on the heels of our online offering, but after launching found minimal demand for those features. On the other hand, there was tremendous demand from companies and resellers.
Takeaway: Use the critical post-launch period to collect feedback and iterate.
Surviving
“Live to fight another day.” If the company doesn’t survive, it’s hard to change the world. Let’s talk about some of the survival components.
Consider What You As A Founding Team Want & How You Work
Are you doing this because you hope to get rich? See yourself on the cover of Fortune? Make your own decisions? Work from home all the time? Founder fighting is the number one reason companies fail; the founders need to be on the same page as much as possible.
At Backblaze we agreed very early on that we wanted three things:
Build products we were proud of
Have fun
Make money
This has driven various decisions over the years and has evolved into being part of the culture. For example, while Backblaze is absolutely a company with a profit motive, we do not compromise the product to make more money. Other directions are not bad; they’re just different.
Do you want a lifestyle business? Or want to build a billion dollar business? Want to run it forever or build it for a couple years and do something else?
Pretend you’re getting married to each other. Do some introspection and talk about your vision of the future a lot. Do you expect everyone to work 20 or 100 hours every week? In the office or remote? How do you like to work? What pet peeves do you have?
When getting married each person brings the “life they’ve known,” often influenced by the life their parents lived. Together they need to decide which aspects of their previous lives they want to keep, toss, or change. As founders coming together, you have the same opportunity for your new company.
Takeaway: In order for a company to survive, the founders must agree on what they want the company to be. Have the discussions early.
Determine How You Will Fund Your Business
Raising venture capital is often seen as the only path, and considered the most important thing to start doing on day one. However, there are a variety of options for funding your business, including using money from savings, part-time work, friends & family money, loans, angels, and customers. Consider the right option for you, your founding team, and your business.
Conserve Cash
Whichever option you choose for funding your business, chances are high that you will not be flush with cash on day one. In certain situations, you actually don’t want to conserve cash because you’ve raised $100m and now you want to run as fast as you can to capture a market — cash is plentiful and time is not. However, with the exception of founder struggles, running out of cash is the most common way companies go under. There are many ways to conserve cash — limit hiring of employees and consultants, use lawyers and accountants sparingly, don’t spend on advertising, work from a home office, etc. The most important way is to simply ensure that you and your team are cash conscious, challenging decisions that commit you to spending cash.
Backblaze spent a total of $94,122 to get to public beta launch. That included building the backup application, our own server infrastructure, the website with account/billing/restore functionality, the marketing involved in getting to launch, and all the steps above in setting up the company, paying for healthcare, etc. The five founders took no salary during this time (which, of course, would have cost dramatically more), so most of this money went to computers, servers, hard drives, and other infrastructure.
Takeaway: Minimize cash burn — it extends your runway and gives you options.
Slowly Flesh Out Your Team
We started with five co-founders, and thus a fairly fleshed-out team. A year in, we only added one person, a Mac architect. Three months later we shipped a beta of our Mac version, which has resulted in more than 50% of our revenue.
Minimizing hiring is key to cash conservation, and hiring ahead of getting market feedback is risky since you may realize that the talent you need will change. However, once you start getting feedback, think about the key people that you need to move your company forward. But be rigorous in determining whether they’re critical. We didn’t hire our first customer support person until all five founders were spending 20% of their time on it.
Takeaway: Don’t hire in anticipation of market growth; hire to fuel the growth.
Keep Your Spirits Up
Startups are roller coasters of emotion. There have been some serious articles about founders suffering from depression and worse. The idea phase is exhilarating, then there is the slog of building. The launch is a blast, but the week after there are crickets.
On June 2, 2008, we launched in public beta with great press and hordes of customers. But a few months later we were signing up only about 10 new customers per month. That’s $50 new monthly recurring revenue (MRR) after a year of work and no salary.
On August 25, 2008, we brought on our Mac architect. Two months later, on October 26, 2008, Apple launched Time Machine — completely free and built-in backup for all Macs.
There were plenty of times when our prospects looked bleak. In the rearview mirror it’s easy to say, “well sure, but now you have lots of customers,” or “yes, but Time Machine doesn’t do cloud backup.” But at the time neither of these were a given.
Takeaway: Getting up each day and believing that as a team you’ll figure it out will let you get to the point where you can look in the rearview mirror and say, “It looked bleak back then.”
Succeeding in Your First Year
I titled the post “Surviving Your First Year,” but if you manage to, 1) set up the company; 2) build, launch, and learn; and 3) survive, you will have done more than survive: you’ll have truly succeeded in your first year.
There’s no doubt about it – Artificial Intelligence is changing the world and how it operates. Across industries, organizations from startups to Fortune 500s are embracing AI to develop new products, services, and opportunities that are more efficient and accessible for their consumers. From driverless cars to better preventative healthcare to smart home devices, AI is driving innovation at a fast rate and will continue to play a more important role in our everyday lives.
This month we’d like to highlight startups using AI solutions to help companies grow. We are pleased to feature:
SignalBox – a simple and accessible deep learning platform to help businesses get started with AI.
Valossa – an AI video recognition platform for the media and entertainment industry.
Kaliber – innovative applications for businesses using facial recognition, deep learning, and big data.
SignalBox (UK)
In 2016, SignalBox founder Alain Richardt was hearing the same comments being made by developers, data scientists, and business leaders. They wanted to get into deep learning but didn’t know where to start. Alain saw an opportunity to commodify and apply deep learning by providing a platform that does the heavy lifting with an easy-to-use web interface, blueprints for common tasks, and just a single-click to productize the models. With SignalBox, companies can start building deep learning models with no coding at all – they just select a data set, choose a network architecture, and go. SignalBox also offers step-by-step tutorials, tips and tricks from industry experts, and consulting services for customers that want an end-to-end AI solution.
SignalBox offers a variety of solutions that are being used across many industries for energy modeling, fraud detection, customer segmentation, insurance risk modeling, inventory prediction, real estate prediction, and more. Existing data science teams are using SignalBox to accelerate their innovation cycle. One innovative UK startup, Energi Mine, recently worked with SignalBox to develop deep networks that predict anomalous energy consumption patterns and do time series predictions on energy usage for businesses with hundreds of sites.
SignalBox uses a variety of AWS services including Amazon EC2, Amazon VPC, Amazon Elastic Block Store, and Amazon S3. The ability to rapidly provision EC2 GPU instances has been a critical factor in their success – both in terms of keeping their operational expenses low, as well as speed to market. The Amazon API Gateway has allowed for operational automation, giving SignalBox the ability to control its infrastructure.
As students at the University of Oulu in Finland, the Valossa founders spent years doing research in the computer science and AI labs. During that time, the team witnessed how the world was moving beyond text, with video playing a greater role in day-to-day communication. This spawned an idea to use technology to automatically understand what an audience is viewing and share that information with a global network of content producers. Since 2015, Valossa has been building next generation AI applications to benefit the media and entertainment industry and is moving beyond the capabilities of traditional visual recognition systems.
Valossa’s AI is capable of analyzing any video stream. The AI studies a vast array of data within videos and converts that information into descriptive tags, categories, and overviews automatically. Basically, it sees, hears, and understands videos like a human does. The Valossa AI can detect people, visual and auditory concepts, key speech elements, and labels explicit content to make moderating and filtering content simpler. Valossa’s solutions are designed to provide value for the content production workflow, from media asset management to end-user applications for content discovery. AI-annotated content allows online viewers to jump directly to their favorite scenes or search specific topics and actors within a video.
Valossa leverages AWS to deliver the industry’s first complete AI video recognition platform. Using Amazon EC2 GPU instances, Valossa can easily scale their computation capacity based on customer activity. High-volume video processing with GPU instances provides the necessary speed for time-sensitive workflows. The geo-located Availability Zones in EC2 allow Valossa to bring resources close to their customers to minimize network delays. Valossa also uses Amazon S3 for video ingestion and to provide end-user video analytics, which makes managing and accessing media data easy and highly scalable.
Serial entrepreneurs Ray Rahman and Risto Haukioja founded Kaliber in 2016. The pair had previously worked in startups building smart cities and online privacy tools, and teamed up to bring AI to the workplace and change the hospitality industry. Our world is designed to appeal to our senses – stores and warehouses have clearly marked aisles, products are colorfully packaged, and we use these designs to differentiate one thing from another. We tell each other apart by our faces, and previously that was something only humans could measure or act upon. Kaliber is using facial recognition, deep learning, and big data to create solutions for business use. Markets and companies that aren’t typically associated with cutting-edge technology will be able to use their existing camera infrastructure in a whole new way, making them more efficient and better able to serve their customers.
Computer video processing is rapidly expanding, and Kaliber believes that video recognition will extend to far more than security cameras and robots. Using the clients’ network of in-house cameras, Kaliber’s platform extracts key data points and maps them to actionable insights using their machine learning (ML) algorithm. Dashboards connect users to the client’s BI tools via the Kaliber enterprise APIs, and managers can view these analytics to improve their real-world processes, taking immediate corrective action with real-time alerts. Kaliber’s Real Metrics are aimed at combining the power of image recognition with ML to ultimately provide a more meaningful experience for all.
By continuing to use the site, you agree to the use of cookies. more information
The cookie settings on this website are set to "allow cookies" to give you the best browsing experience possible. If you continue to use this website without changing your cookie settings or you click "Accept" below then you are consenting to this.