Backblaze is growing rapidly and as we have more and more job listings coming online and more employees to corral, we needed another member on our Human Resources team! Enter Michele, who is joining the HR folks to help recruit, onboard, and expand our HR organization. Lets learn a bit more about Michele shall we?
What is your Backblaze Title? HR Coordinator.
Where are you originally from? I was born and raised in the East Bay.
What attracted you to Backblaze? The opportunity to learn new skills, as most of my experience is in office administration… I’m excited to jump into the HR world!
What do you expect to learn while being at Backblaze? So much! All of the ins and outs of HR, the hiring and onboarding processes, and everything in between…so excited!
Where else have you worked? I’ve previously worked at Clars Auction Gallery where I was Consignor Relations for 6 years, and most recently at Stellar Academy for Dyslexics where I was the Office Administrator/Bookkeeper.
Where did you go to school? San Francisco Institute of Esthetics and Cosmetology.
What’s your dream job? Pastry Chef!
Favorite place you’ve traveled? Maui. I could lay on the beach and bob in the water all day, every day! But also, Disney World…who doesn’t love a good Disney vacation?
Favorite hobby? Baking, traveling, reading, exploring new restaurants, SF Giants games
Star Trek or Star Wars? Star Wars.
Coke or Pepsi? Black iced tea?
Favorite food? Pretty much everything…street tacos, ramen, sushi, Thai, pho.
Why do you like certain things? Because why not?
Anything else you’d like you’d like to tell us? I love Disney!
Another person who loves Disney! Welcome to the team Michele, we’ll have lots of tea ready for you!
As Backblaze continues to grow, and as we go down the path of sharing our stories, we found ourselves in need of someone that could wrangle our content calendar, write blog posts, and come up with interesting ideas that we could share with our readers and fans. We put out the call, and found Roderick! As you’ll read below he has an incredibly interesting history, and we’re thrilled to have his perspective join our marketing team! Lets learn a bit more about Roderick, shall we?
What is your Backblaze Title? Content Director
Where are you originally from? I was born in Southern California, but have lived a lot of different places, including Alaska, Washington, Oregon, Texas, New Mexico, Austria, and Italy.
What attracted you to Backblaze? I met Gleb a number of years ago at the Failcon Conference in San Francisco. I spoke with him and was impressed with him and his description of the company. We connected on LinkedIn after the conference and I ultimately saw his post for this position about a month ago.
What do you expect to learn while being at Backblaze? I hope to learn about Backblaze’s customers and dive deep into the latest in cloud storage and other technologies. I also hope to get to know my fellow employees.
Where else have you worked? I’ve worked for Microsoft, Adobe, Autodesk, and a few startups. I’ve also consulted to Apple, HP, Stanford, the White House, and startups in the U.S. and abroad. I mentored at incubators in Silicon Valley, including IndieBio and Founders Space. I used to own vineyards and a food education and event center in the Napa Valley with my former wife, and worked in a number of restaurants, hotels, and wineries. Recently, I taught part-time at the Culinary Institute of America at Greystone in the Napa Valley. I’ve been a partner in a restaurant and currently am a partner in a mozzarella di bufala company in Marin county where we have about 50 water buffalo that are amazing animals. They are named after famous rock and roll vocalists. Our most active studs now are Sting and Van Morrison. I think singing “a fantabulous night to make romance ‘neath the cover of October skies” works for Van.
Where did you go to school? I studied at Reed College, U.C. Berkeley, U.C. Davis, and the Università per Stranieri di Perugia in Italy. I put myself through college so was in and out of school a number of times to make money. Some of the jobs I held to earn money for college were cook, waiter, dishwasher, bartender, courier, teacher, bookstore clerk, head of hotel maintenance, bookkeeper, lifeguard, journalist, and commercial salmon fisherman in Alaska.
What’s your dream job? I think my dream would be having a job that would continually allow me to learn new things and meet new challenges. I love to learn, travel, and be surprised by things I don’t know.
I love animals and sometimes think I should have become a veterinarian.
Favorite place you’ve traveled? I lived and studied in Italy, and would have to say the Umbria region of Italy is perhaps my favorite place. I also worked in my father’s home country of Austria, which is incredibly beautiful.
Favorite hobby? I love foreign languages, and have studied Italian, French, German, and a few others. I am a big fan of literature and theatre and read widely and have attended theatre productions all over the world. That was my motivation to learn other languages—so I could enjoy literature and theatre in the languages they were written in. I started scuba diving when I was very young because I wanted to be Jacques-Yves Cousteau and explore the oceans. I also sail, motorcycle, ski, bicycle, hike, play music, and hope to finish my pilot’s license someday.
Coke or Pepsi? Red Burgundy
Favorite food? Both my parents are chefs, so I was exposed to a lot of great food growing up. I would have to give more than one answer to that question: fresh baked bread and bouillabaisse. Oh, and white truffles.
Not sure we’ll be able to stock our cupboards with Red Burgundy, but we’ll see what our office admin can do! Welcome to the team!
By Francisco Perez-Sorrosal, Ohad Shacham, Kostas Tsioutsiouliklis, and Edward Bortnikov
We are proud to announce that Omid (“Hope” in Persian), Yahoo’s transaction manager for HBase [1][2], has been accepted as an Apache Incubator project. Yahoo has been a long-time contributor to the Apache community in the Hadoop ecosystem, including HBase, YARN, Storm, and Pig. Our acceptance as an Apache Incubator project is another step forward following the success of ZooKeeper [3] and BookKeeper [4], which were born at Yahoo and graduated to top-level Apache projects.
These days, most NoSQL databases, including HBase, do not provide the OLTP support available in traditional relational databases, forcing the applications running on top of them to trade transactional support for greater agility and scalability. However, transactions are essential in many applications using NoSQL datastores as the main source of data, for example, in incremental content processing systems. Omid enables these applications to benefit from the best of both worlds: the scalability provided by NoSQL datastores, such as HBase, and the concurrency and atomicity provided by transaction processing systems.
Omid provides a high-performant ACID transactional framework with Snapshot Isolation guarantees on top of HBase [5], being able to scale to thousands of clients triggering transactions on application data. It’s one of the few open-source transactional frameworks that can scale beyond 100K transactions per second on mid-range hardware while incurring minimal impact on the latency accessing the datastore.
At its core, Omid utilizes a lock-free approach to support multiple concurrent clients. Its design relies on a centralized conflict detection component called Transaction Status Oracle (TSO), which efficiently resolves write-set collisions among concurrent transactions [6]. Another important benefit is that Omid does not require any modification of the underlying key-value datastore – HBase in this case. Moreover, the recently-added high-availability algorithm eliminates the single point of failure represented by the TSO in those deployments that require a higher degree of dependability [7]. Last but not least, the API is very simple – mimicking the transaction manager APIs in the relational world: begin, commit, rollback – and the client and server configuration processes have been simplified to help both application developers and system administrators.
Efforts toward growing the community have already been underway in the last few months. Apache Hive [8] contributors from Hortonworks expressed interest in storing Hive metadata in HBase using Omid, and this led to a fruitful collaboration that resulted in Omid now supporting HBase 1.x versions. Omid could also be used as the transaction manager in other SQL abstraction layers on top of HBase such as Apache Phoenix [9], or as the transaction coordinator in distributed systems, such as the Apache DistributedLog project [10] and Pulsar, a distributed pub-sub messaging platform recently open sourced by Yahoo.
Since its inception in 2011 at Yahoo Research, Omid has matured to operate at Web scale in a production environment. For example, since 2014 Omid has been used at Yahoo – along with other Hadoop technologies – to power our incremental content ingestion platform for search and personalization products. In this role, Omid is serving millions of transactions per day over HBase data.
We have decided to move the Omid project to “the Apache Way” because we think it is the next logical step after having battle-tested the project in production at Yahoo and having open-sourced the code in Yahoo’s public Github in 2012 (The Omid Github repository currently has 269 stars and 101 forks, and we were asked by our colleagues in the Open Source community to release it as an Apache Incubator project.). As we aim to form a larger Omid community outside Yahoo, we think that the Apache Software Foundation is the perfect umbrella to achieve this. We invite the Apache community to contribute by providing patches, reviewing code, proposing new features or improvements, and giving talks at conferences such as Hadoop Summit, HBaseCon, ApacheCon, etc. under the Apache rules.
We see Omid being recognized as an Apache Incubator Project as the first step in growing a vibrant community around this technology. We are confident that contributors in the Apache community will add more features to Omid and further enhance the current performance and latency. Stay tuned to @ApacheOmid on Twitter!
Pub-sub messaging is a very common design pattern that is increasingly found in distributed systems powering Internet applications. These applications provide real-time services, and need publish-latencies of 5ms on average and no more than 15ms at the 99th percentile. At Internet scale, these applications require a messaging system with ordering, strong durability, and delivery guarantees. In order to handle the “five 9’s” durability requirements of a production environment, the messages have to be committed on multiple disks or nodes.
At the time we started, we could not find any existing open-source messaging solution that could provide the scale, performance, and features Yahoo required to provide messaging as a hosted service, supporting a million topics. So we set out to build Pulsar as a general messaging solution, that also addresses these specific requirements.
Pulsar is a highly scalable, low latency pub-sub messaging system running on commodity hardware. It provides simple pub-sub messaging semantics over topics, guaranteed at-least-once delivery of messages, automatic cursor management for subscribers, and cross-datacenter replication.
Using Pulsar, one can set up a centrally-managed cluster to provide pub-sub messaging as a service; applications can be onboarded as tenants. Pulsar is horizontally scalable; the number of topics, messages processed, throughput, and storage capacity can be expanded by adding servers to the pool.
Pulsar has a robust set of APIs to manage the service, namely, account management activities like provisioning users, allocating capacity, accounting usage, and monitoring the service. Tenants can administer, manage, and monitor their own domains via APIs. Pulsar also provides security via a pluggable authentication scheme, and access control features that let tenants manage access to their data.
Application development using Pulsar is easy due to the simple messaging model and API. Pulsar includes a client library that encapsulates the messaging protocol; complex functions like service discovery, as well as connection establishment and recovery, are handled internally by the library.
Architecture
At a high level, a Pulsar instance is composed of multiple clusters, typically residing in different geographical regions. A Pulsar cluster is composed of a set of Brokers and BookKeepers (bookies), plus ZooKeeper ensembles for coordination and configuration management.
A Pulsar broker serves topics. Each topic is assigned to a broker, and a broker serves thousands of topics. The broker accepts messages from writers, commits them to a durable store, and dispatches them to readers. The broker also serves admin requests. It has no durable state. The broker has built-in optimizations; for example, it caches the data in order to avoid additional disk reads when dispatching messages to clients as well as replication clusters. Pulsar brokers also manage the replicators, which asynchronously push messages published in the local cluster to remote clusters.
Apache BookKeeper is the building block for Pulsar’s durable storage. BookKeeper is a distributed write-ahead log system, a top-level Apache project that was originally developed at and open-sourced by Yahoo in 2011. BookKeeper has an active developer community with contributors across the industry. Using the BookKeeper built-in semantics, Pulsar creates multiple independent logs, called ledgers, and uses them for durable message storage. Bookkeeper hosts, called bookies, are designed to handle thousands of ledgers with concurrent reads and writes. BookKeeper is horizontally scalable in capacity and throughput; from an operational perspective we can elastically add more bookies to a Pulsar cluster to increase capacity.
By using separate physical disks (one for journal and another for general storage), bookies are able to isolate the effects of read operations from impacting the latency of ongoing write operations, and vice-versa. Since read and write paths are decoupled, spikes in reads – which commonly occur when readers drain backlog to catch up – do not impact publish latencies in Pulsar. This sets Pulsar apart from other commonly-used messaging systems.
Managed Ledger represents the storage layer for a single topic. It is the abstraction of a stream of messages, with a single writer, and multiple readers, each with its own associated cursor position, the offset of the reader in the message stream. A single managed ledger uses multiple BookKeeper ledgers to store the data. Cursor positions are maintained in per-cursor ledgers.
A Pulsar cluster runs a ZooKeeper (another top-level Apache project open-sourced by Yahoo in 2008) ensemble used for coordinating assignment of topics among brokers, and storing BookKeeper metadata. In addition, Pulsar runs a Global ZooKeeper ensemble to store the provisioning and configuration data. At Yahoo, we have presence in multiple regions and our users create global topics that are replicated between these regions. The Global Zookeeper ensemble keeps provisioning and configuration data consistent globally. We can tolerate higher write latencies on these writes (e.g.: ~150ms latency for configuration writes).
The load balancer is a distributed service that runs on the brokers, to make sure the traffic is equally spread across all available brokers. Since Pulsar brokers have no durable state, topics can be redistributed within seconds.
Messaging Model
The Pulsar topic is the core of the system; applications and components communicate by publishing to and consuming from the same topic. Topics are created dynamically as needed when a producer (writer) starts publishing on it; and topics are removed when not in use.
Subscriptions are created automatically when a consumer (reader) subscribes to the topic. A subscription persists until it is deleted, and receives all messages published during its lifetime. Common messaging semantics (like JMS Topic or Queue) are available as subscription modes; an exclusive subscription is equivalent to a “topic,” and a shared subscription is equivalent to a “queue.”
Performance
Pulsar is designed for low-publish latencies at scale. Our typical publish latencies on average are well below 5ms. With SSD as the bookie journal device, Pulsar can achieve 99 percentile latencies of 5ms with two guaranteed copies and total ordering.
The latency remains within the acceptable range until the throughput reaches the limit of the disk IO capacity.
Pulsar supports partitioned topics, which can further increase the per-topic throughput.
Pulsar at Yahoo
Pulsar backs major Yahoo applications like Mail, Finance, Sports, Gemini Ads, and Sherpa, Yahoo’s distributed key-value service.
We deployed our first Pulsar instance in Q2 2015. Pulsar use has rapidly grown since then, and as of today, Yahoo runs Pulsar at scale.
Deployed globally, in 10+ data-centers, with full mesh replication capability
Greater than 100 billion messages/day published
More than 1.4 million topics
Average publish latency across the service of less than 5 ms
As Pulsar use grows at Yahoo, we have been scaling the service horizontally. Most of the challenges we faced were with JVM GC impacting publish latencies, and reducing failover times when the number of topics on a broker went up to tens of 1000s (now 40,000). This led to significant changes to the Pulsar broker and to BookKeeper.
Looking to the Future
We are actively engaged in pushing the scale and reliability boundaries of Pulsar further. Current improvements being worked on include:
Migrate topic between brokers in under 1 sec, from 10 sec
Improve 99.9%ile publish latencies to 5ms
Provide additional language bindings for Pulsar
Conclusion
Pulsar is a highly scalable pub-sub messaging system, production-ready and battled tested at Yahoo. We are glad to make Pulsar available as open source under Apache License Version 2.0. Detailed instructions and documentation are available at Yahoo’s Github repository. Our goal is to make Pulsar widely used and well integrated with other large-scale open source software, and we welcome contributions from the community to make that happen.
By continuing to use the site, you agree to the use of cookies. more information
The cookie settings on this website are set to "allow cookies" to give you the best browsing experience possible. If you continue to use this website without changing your cookie settings or you click "Accept" below then you are consenting to this.