In the early stages of the war in Ukraine in 2022, PIPEDREAM, a known malware was quietly on the brink of wiping out a handful of critical U.S. electric and liquid natural gas sites. PIPEDREAM is an attack toolkit with unmatched and unprecedented capabilities developed for use against industrial control systems (ICSs).
The malware was built to manipulate the network communication protocols used by programmable logic controllers (PLCs) leveraged by two critical producers of PLCs for ICSs within the critical infrastructure sector, Schneider Electric and OMRON.
As home to over 200 million Internet users and the fourth-largest population in the world, Indonesians depend on fast and reliable Internet, but this has always been a challenging part of the world for Internet infrastructure. This has real world implications on performance and reliability (IP transit is on average 6x more expensive than our major South East Asian interconnection markets). That said, first we wanted to share what makes things challenging in Indonesia; geography, infrastructure, and market dynamics.
Geography: The Internet backbone for many countries is almost entirely delivered by terrestrial fiber optic cables, where connectivity is more affordable and easier to build when the land mass is contiguous and there is a concentrated population distribution. However, Indonesia is a collection of over 18,000 islands, spanning three time zones, and approximately 3,200 miles (5,100 km) east to west. By comparison, the United States is 2,800 miles (4,500 km) east to west. While parts of Indonesia are geographically close to Singapore (the regional Internet hub with over 60% of the region’s data centers) given how large Indonesia is, much of it is far away.
Infrastructure: Indonesia is a large country and to connect it to the rest of the Internet it currently relies on submarine fiber optic cables. There are a total of 22 separate submarine cables connecting Indonesia to Singapore, Malaysia, Australia and onward. Many of the cable systems cross the Strait of Malacca, a narrow stretch of water, between the Malay Peninsula (Peninsular Malaysia) and the Indonesian island of Sumatra to the southwest, connecting the Indian and Pacific Oceans. This makes reliability challenging as a result of human activities, such as ships dropping their anchors, fishing trawlers, and dredging as it is one of the world’s top five busiest shipping lanes. Additionally, Indonesia is geographically located in a very active seismic zone and is very earthquake prone.
There are a number of new submarine cable systems that have come online and four significant builds planned (Apricot, ACC-1, Echo and Nui) that will improve both available capacity and cost economics in the market. Right now the cost is still significantly higher than comparable distances. For example Jakarta to Singapore is approximately 60 times more expensive than a service the same distance would be in the continental US or Europe for a 100Gbps wavelength service. Staying in Asia, a similar distance from Hong Kong to Taiwan costs around 1/6th that of Jakarta to Singapore.
Cyber 1 and Cyber 3 (NTT NexCenter) Data Center Buildings in Jakarta, 2019 (Photo Credit: Tom Paseka).Picture of Cyber 1 Lobby Directory
While areas like Batam are becoming increasingly popular for data center builds due to its proximity to Singapore, Jakarta is still the most developed and mature market. It has the largest and best interconnected data centers in the country, including the two pictured.
Cloudflare is deployed in the facility on the right (NTT NexCenter), however most ISPs are inside the building on the left (Cyber 1). The two buildings are approximately 30-50 meters apart, yet it’s surprisingly difficult to be able to connect between them. One of the reasons why is market fragmentation and how many options are available. In the adjacent picture of the Cyber 1 building lobby directory many of the listings are unique data centers each with different policies and access conditions.
In the past, we’ve talked about the Cost of Bandwidth around the world (and updated here), but we’ve never talked about Indonesia specifically. Using the same methodology as we’ve used in the past, Indonesia’s cost is 43x times more expensive than North America or Europe, or even multiples more expensive than other countries in Asia.
Market dynamics: While Indonesia has good and functioning Internet Exchanges, there are a few ISPs who dominate the market. The three largest ISPs in the country (Telkom Indonesia, Indosat Ooredoo Hutchison and XL Axiata) collectively control 80% of the market, while Telkom Indonesia alone has a market share of around 60% by revenue.
This results in Telkom Indonesia having a heavily dominant market share position to leverage resulting in refusal to peer, or exchange Internet traffic in Indonesia without expensive payments, or instead, preferring to connect to other networks outside of Indonesia, introducing latency and diminished performance.
Despite all of these challenges, our network has come a long way since our initial deployment to Jakarta in 2019.
We’ve established:
A carrier neutral local point of presence at NTT Indonesia Nexcenter Data Center, one of the major interconnection hubs in Jakarta
An edge partnership point of presence in Yogyakarta with CitranetIX
Direct interconnections in country with two of the top three networks.
Peering across three of the larger local internet exchanges, Indonesia Internet Exchange, Jakarta Internet Exchange and Biznet Internet Exchange
Dedicated 100G wavelength transport back to Singapore
All of this results in a more performant and reliable network for our local customers.
We wanted to see how our network is performing since deployment. We mentioned during Speed Week in 2021 how we benchmark against different networks, and sharing some of those benchmarks here.
At the end of December 2021, Cloudflare was only faster in a few networks, as compared to other providers in Indonesia.
Fast forward twelve months to December 2022, Cloudflare is significantly faster in even more networks.
The TCP protocol is a connection-oriented protocol, which means that a connection is established and maintained until the application programs at each end have finished exchanging messages. The Connect Time summarizes how fast a session can be set up between a client and a server over a network. TTLB (or time to last byte) is the time taken to send the entire response to the web browser. It’s a good measure of how long a complete download takes. Check out our recent blog on Benchmarking Edge Network Performance for more information on how we measure the performance of our network and benchmark ourselves against industry players.
On closer inspection against the three major ISPs specifically, we’re the top provider for two out of the three networks. Cloudflare’s performance has improved year-on-year (16% reduction) and continues to lead (comparative to the other networks) meaning faster and more responsive services for our customers.
Helping build a better Internet for Indonesia doesn’t stop here and there is always more work to be done! We want to be the number one network everywhere and won’t rest until we are. We are continuing to connect to more networks locally, invest in direct submarine cable capacity, as well as further deployments into new data center buildings, Internet Exchanges and new cities too!
Are you operating a network and not yet peering with Cloudflare? Log-in to our Peering Portal or find out more information here for ways to set up peering, or request we deploy nodes into your network directly.
Cyber defense assistance in Ukraine is working. The Ukrainian government and Ukrainian critical infrastructure organizations have better defended themselves and achieved higher levels of resiliency due to the efforts of CDAC and many others. But this is not the end of the road—the ability to provide cyber defense assistance will be important in the future. As a result, it is timely to assess how to provide organized, effective cyber defense assistance to safeguard the post-war order from potential aggressors.
The conflict in Ukraine is resetting the table across the globe for geopolitics and international security. The US and its allies have an imperative to strengthen the capabilities necessary to deter and respond to aggression that is ever more present in cyberspace. Lessons learned from the ad hoc conduct of cyber defense assistance in Ukraine can be institutionalized and scaled to provide new approaches and tools for preventing and managing cyber conflicts going forward.
I am often asked why where weren’t more successful cyberattacks by Russia against Ukraine. I generally give four reasons: (1) Cyberattacks are more effective in the “grey zone” between peace and war, and there are better alternatives once the shooting and bombing starts. (2) Setting these attacks up takes time, and Putin was secretive about his plans. (3) Putin was concerned about attacks spilling outside the war zone, and affecting other countries. (4) Ukrainian defenses were good, aided by other countries and companies. This paper gives a fifth reasons: they were technically successful, but keeping them out of the news made them operationally unsuccessful.
What will it take for policy makers to take cybersecurity seriously? Not minimal-change seriously. Not here-and-there seriously. But really seriously. What will it take for policy makers to take cybersecurity seriously enough to enact substantive legislative changes that would address the problems? It’s not enough for the average person to be afraid of cyberattacks. They need to know that there are engineering fixes—and that’s something we can provide.
For decades, I have been waiting for the “big enough” incident that would finally do it. In 2015, Chinese military hackers hacked the Office of Personal Management and made off with the highly personal information of about 22 million Americans who had security clearances. In 2016, the Mirai botnet leveraged millions of Internet-of-Things devices with default admin passwords to launch a denial-of-service attack that disabled major Internet platforms and services in both North America and Europe. In 2017, hackers—years later we learned that it was the Chinese military—hacked the credit bureau Equifax and stole the personal information of 147 million Americans. In recent years, ransomware attacks have knocked hospitals offline, and many articles have been written about Russia inside the U.S. power grid. And last year, the Russian SVR hacked thousands of sensitive networks inside civilian critical infrastructure worldwide in what we’re now calling Sunburst (and used to call SolarWinds).
Those are all major incidents to security people, but think about them from the perspective of the average person. Even the most spectacular failures don’t affect 99.9% of the country. Why should anyone care if the Chinese have his or her credit records? Or if the Russians are stealing data from some government network? Few of us have been directly affected by ransomware, and a temporary Internet outage is just temporary.
Cybersecurity has never been a campaign issue. It isn’t a topic that shows up in political debates. (There was one question in a 2016 Clinton–Trump debate, but the response was predictably unsubstantive.) This just isn’t an issue that most people prioritize, or even have an opinion on.
So, what will it take? Many of my colleagues believe that it will have to be something with extreme emotional intensity—sensational, vivid, salient—that results in large-scale loss of life or property damage. A successful attack that actually poisons a water supply, as someone tried to do in January by raising the levels of lye at a Florida water-treatment plant. (That one was caught early.) Or an attack that disables Internet-connected cars at speed, something that was demonstrated by researchers in 2014. Or an attack on the power grid, similar to what Russia did to the Ukraine in 2015 and 2016. Will it take gas tanks exploding and planes falling out of the sky for the average person to read about the casualties and think “that could have been me”?
Here’s the real problem. For the average nonexpert—and in this category I include every lawmaker—to push for change, they not only need to believe that the present situation is intolerable, they also need to believe that an alternative is possible. Real legislative change requires a belief that the never-ending stream of hacks and attacks is not inevitable, that we can do better. And that will require creating working examples of secure, dependable, resilient systems.
Providing alternatives is how engineers help facilitate social change. We could never have eliminated sales of tungsten-filament household light bulbs if fluorescent and LED replacements hadn’t become available. Reducing the use of fossil fuel for electricity generation requires working wind turbines and cost-effective solar cells.
We need to demonstrate that it’s possible to build systems that can defend themselves against hackers, criminals, and national intelligence agencies; secure Internet-of-Things systems; and systems that can reestablish security after a breach. We need to prove that hacks aren’t inevitable, and that our vulnerability is a choice. Only then can someone decide to choose differently. When people die in a cyberattack and everyone asks “What can be done?” we need to have something to tell them.
We don’t yet have the technology to build a truly safe, secure, and resilient Internet and the computers that connect to it. Yes, we have lots of security technologies. We have older secure systems—anyone still remember Apollo’s DomainOS and MULTICS?—that lost out in a market that didn’t reward security. We have newer research ideas and products that aren’t successful because the market still doesn’t reward security. We have even newer research ideas that won’t be deployed, again, because the market still prefers convenience over security.
What I am proposing is something more holistic, an engineering research task on a par with the Internet itself. The Internet was designed and built to answer this question: Can we build a reliable network out of unreliable parts in an unreliable world? It turned out the answer was yes, and the Internet was the result. I am asking a similar research question: Can we build a secure network out of insecure parts in an insecure world? The answer isn’t obviously yes, but it isn’t obviously no, either.
While any successful demonstration will include many of the security technologies we know and wish would see wider use, it’s much more than that. Creating a secure Internet ecosystem goes beyond old-school engineering to encompass the social sciences. It will include significant economic, institutional, and psychological considerations that just weren’t present in the first few decades of Internet research.
Cybersecurity isn’t going to get better until the economic incentives change, and that’s not going to change until the political incentives change. The political incentives won’t change until there is political liability that comes from voter demands. Those demands aren’t going to be solely the results of insecurity. They will also be the result of believing that there’s a better alternative. It is our task to research, design, build, test, and field that better alternative—even though the market couldn’t care less right now.
This essay originally appeared in the May/June 2021 issue of IEEE Security & Privacy. I forgot to publish it here.
The idea was to bring together people from a variety of disciplines who are all thinking about different aspects of democracy, less from a “what we need to do today” perspective and more from a blue-sky future perspective. My remit to the participants was this:
The idea is to start from scratch, to pretend we’re forming a new country and don’t have any precedent to deal with. And that we don’t have any unique interests to perturb our thinking. The modern representative democracy was the best form of government mid-eighteenth century politicians technology could invent. The twenty-first century is a very different place technically, scientifically, and philosophically. What could democracy look like if it were reinvented today? Would it even be democracy—what comes after democracy?
Some questions to think about:
Representative democracies were built under the assumption that travel and communications were difficult. Does it still make sense to organize our representative units by geography? Or to send representatives far away to create laws in our name? Is there a better way for people to choose collective representatives?
Indeed, the very idea of representative government is due to technological limitations. If an AI system could find the optimal solution for balancing every voter’s preferences, would it still make sense to have representatives—or should we vote for ideas and goals instead?
With today’s technology, we can vote anywhere and any time. How should we organize the temporal pattern of voting—and of other forms of participation?
Starting from scratch, what is today’s ideal government structure? Does it make sense to have a singular leader “in charge” of everything? How should we constrain power—is there something better than the legislative/judicial/executive set of checks and balances?
The size of contemporary political units ranges from a few people in a room to vast nation-states and alliances. Within one country, what might the smaller units be—and how do they relate to one another?
Who has a voice in the government? What does “citizen” mean? What about children? Animals? Future people (and animals)? Corporations? The land?
And much more: What about the justice system? Is the twelfth-century jury form still relevant? How do we define fairness? Limit financial and military power? Keep our system robust to psychological manipulation?
My perspective, of course, is security. I want to create a system that is resilient against hacking: one that can evolve as both technologies and threats evolve.
The format was one that I have used before. Forty-eight people meet over two days. There are four ninety-minute panels per day, with six people on each. Everyone speaks for ten minutes, and the rest of the time is devoted to questions and comments. Ten minutes means that no one gets bogged down in jargon or details. Long breaks between sessions and evening dinners allow people to talk more informally. The result is a very dense, idea-rich environment that I find extremely valuable.
It was amazing event. Everyone participated. Everyone was interesting. (Details of the event—emerging themes, notes from the speakers—are in the comments.) It’s a week later and I am still buzzing with ideas. I hope this is only the first of an ongoing series of similar workshops.
The International Committee of the Red Cross wants some digital equivalent to the iconic red cross, to alert would-be hackers that they are accessing a medical network.
The emblem wouldn’t provide technical cybersecurity protection to hospitals, Red Cross infrastructure or other medical providers, but it would signal to hackers that a cyberattack on those protected networks during an armed conflict would violate international humanitarian law, experts say, Tilman Rodenhäuser, a legal adviser to the International Committee of the Red Cross, said at a panel discussion hosted by the organization on Thursday.
I can think of all sorts of problems with this idea and many reasons why it won’t work, but those also apply to the physical red cross on buildings, vehicles, and people’s clothing. So let’s try it.
Prevention is often seen as the responsibility of the software developer, as they are required to securely develop and deliver code, verify third party components, and harden the build environment. But the supplier also holds a critical responsibility in ensuring the security and integrity of our software. After all, the software vendor is responsible for liaising between the customer and software developer. It is through this relationship that additional security features can be applied via contractual agreements, software releases and updates, notifications and mitigations of vulnerabilities.
Software suppliers will find guidance from NSA and our partners on preparing organizations by defining software security checks, protecting software, producing well-secured software, and responding to vulnerabilities on a continuous basis. Until all stakeholders seek to mitigate concerns specific to their area of responsibility, the software supply chain cycle will be vulnerable and at risk for potential compromise.
A combination of ransomware and distributed denial-of-service attacks, the onslaught disrupted government services and prompted the country’s electrical utility to switch to manual control.
[…]
But the attack against Montenegro’s infrastructure seemed more sustained and extensive, with targets including water supply systems, transportation services and online government services, among many others.
Government officials in the country of just over 600,000 people said certain government services remained temporarily disabled for security reasons and that the data of citizens and businesses were not endangered.
The Director of the Directorate for Information Security, Dusan Polovic, said 150 computers were infected with malware at a dozen state institutions and that the data of the Ministry of Public Administration was not permanently damaged. Polovic said some retail tax collection was affected.
Russia is being blamed, but I haven’t seen any evidence other than “they’re the obvious perpetrator.”
EDITED TO ADD (9/12): The Montenegro government is hedging on that Russia attribution. It seems to be a regular criminal ransomware attack. The Cuba Ransomware gang has Russian members, but that’s not the same thing as the government.
Good essay arguing that open-source software is a critical national-security asset and needs to be treated as such:
Open source is at least as important to the economy, public services, and national security as proprietary code, but it lacks the same standards and safeguards. It bears the qualities of a public good and is as indispensable as national highways. Given open source’s value as a public asset, an institutional structure must be built that sustains and secures it.
This is not a novel idea. Open-source code has been called the “roads and bridges” of the current digital infrastructure that warrants the same “focus and funding.” Eric Brewer of Google explicitly called open-source software “critical infrastructure” in a recent keynote at the Open Source Summit in Austin, Texas. Several nations have adopted regulations that recognize open-source projects as significant public assets and central to their most important systems and services. Germany wants to treat open-source software as a public good and launched a sovereign tech fund to support open-source projects “just as much as bridges and roads,” and not just when a bridge collapses. The European Union adopted a formal open-source strategy that encourages it to “explore opportunities for dedicated support services for open source solutions [it] considers critical.”
Designing an institutional framework that would secure open source requires addressing adverse incentives, ensuring efficient resource allocation, and imposing minimum standards. But not all open-source projects are made equal. The first step is to identify which projects warrant this heightened level of scrutiny—projects that are critical to society. CISA defines critical infrastructure as industry sectors “so vital to the United States that [its] incapacity or destruction would have a debilitating impact on our physical or economic security or public health or safety.” Efforts should target the open-source projects that share those features.
CISA, NSA, FBI, and similar organizations in the other Five Eyes countries are warning that attacks on MSPs — as a vector to their customers — are likely to increase. No details about what this prediction is based on. Makes sense, though. The SolarWinds attack was incredibly successful for the Russian SVR, and a blueprint for future attacks.
The Department of Energy, CISA, the FBI, and the NSA jointly issued an advisory describing a sophisticated piece of malware called Pipedream that’s designed to attack a wide range of industrial control systems. This is clearly from a government, but no attribution is given. There’s also no indication of how the malware was discovered. It seems not to have been used yet.
The White House has issued its starkest warning that Russia may be planning cyberattacks against critical-sector U.S. companies amid the Ukraine invasion.
[…]
Context: The alert comes after Russia has lobbed a series of digital attacks at the Ukrainian government and critical industry sectors. But there’s been no sign so far of major disruptive hacks against U.S. targets even as the government has imposed increasingly harsh sanctions that have battered the Russian economy.
The public alert followed classified briefings government officials conducted last week for more than 100 companies in sectors at the highest risk of Russian hacks, Neuberger said. The briefing was prompted by “preparatory activity” by Russian hackers, she said.
U.S. analysts have detected scanning of some critical sectors’ computers by Russian government actors and other preparatory work, one U.S. official told my colleague Ellen Nakashima on the condition of anonymity because of the matter’s sensitivity. But whether that is a signal that there will be a cyberattack on a critical system is not clear, Neuberger said.
Neuberger declined to name specific industry sectors under threat but said they’re part of critical infrastructure – a government designation that includes industries deemed vital to the economy and national security, including energy, finance, transportation and pipelines.
President Biden’s statement. White House fact sheet. And here’s a video of the extended Q&A with deputy national security adviser Anne Neuberger.
EDITED TO ADD (3/23): Long — three hour — conference call with CISA.
Companies critical to U.S. national interests will now have to report when they’re hacked or they pay ransomware, according to new rules approved by Congress.
[…]
The reporting requirement legislation was approved by the House and the Senate on Thursday and is expected to be signed into law by President Joe Biden soon. It requires any entity that’s considered part of the nation’s critical infrastructure, which includes the finance, transportation and energy sectors, to report any “substantial cyber incident” to the government within three days and any ransomware payment made within 24 hours.
Even better would be if they had to report it to the public.
Today, in partnership with CrowdStrike and Ping Identity, Cloudflare is launching the Critical Infrastructure Defense Project (CriticalInfrastructureDefense.org). The Project was born out of conversations with cybersecurity and government experts concerned about potential retaliation to the sanctions that resulted from the Russian invasion of Ukraine.
In particular, there is a fear that critical United States infrastructure will be targeted with cyber attacks. While these attacks may target any industry, the experts we consulted with were particularly concerned about three areas that were often underprepared and could cause significant disruption: hospitals, energy, and water.
To help address that need, Cloudflare, CrowdStrike, and Ping Identity have committed under the Critical Infrastructure Defense Project to offer a broad suite of our products for free for at least the next four months to any United States-based hospital, or energy or water utility. You can learn more at: www.CriticalInfrastructureDefense.org.
We are not powerless against hackers. Organizations that have adopted a Zero Trust approach to security have been successful at mitigating even determined attacks. There are three core components to any Zero Trust security approach: 1) Network Security, 2) Endpoint Security; and 3) Identity.
Cloudflare, CrowdStrike, and Ping Identity are three of the leading Zero Trust security companies securing each of these components. Cloudflare’s Zero Trust network security offers a broad set of services that organizations can easily implement to ensure their connections are protected no matter where users access the network. CrowdStrike provides a broad set of end point security services to ensure that laptops, phones, and servers are not compromised. And Ping Identity provides identity solutions, including multi-factor authentication, that are foundational to any organization’s posture.
Each of us is great at what we do on our own. Together, we provide an integrated solution that is unrivaled and proven to stand up to even the most sophisticated nation state cyber attacks.
And this is what we think is required, because the current threat is significantly higher than what we have seen since any of our companies was founded. We all built our companies relying on the nation’s infrastructure, and we believe it is incumbent on us to provide our technology in order to protect that infrastructure when it is threatened. For this period of heightened risk, we are all providing our services at no cost to organizations in these most vulnerable sectors.
We’ve also worked together to ensure our products function in harmony and are easy to implement. We don’t want short-staffed IT teams, long requisition processes, or limited budgets to stand in the way of getting the protection that’s needed in place immediately. We’ve taken a cue from hospitals to triage the risks through a recommended list showing organizations that may be short of IT staff how they can proceed: suggesting what they should prioritize over the next day, over the next week, and over the next month.
You can download the recommended security triage program here. We know that not every organization will be able to implement every recommendation. But every step you get through on the list will help your organization be incrementally better prepared for whatever is to come.
Our teams are also committed to working directly with organizations in these sectors to make onboarding as quick and painless as possible. We will onboard customers under this project with the same level of service as if they were our largest paying customers. We believe it is our duty to help ensure that the nation’s critical infrastructure remains online and available through this challenging time.
We anticipate that, based on what we learn over the days ahead, the Critical Infrastructure Defense Project may expand to additional sectors and countries. We hope the predictions of retaliatory cyberattacks don’t come true. But, if they do, we know our solutions can mitigate the risk, and we stand ready to fully deploy them to protect our most critical infrastructure.
According to a report from CISA last week, there were three ransomware attacks against water treatment plants last year.
WWS Sector cyber intrusions from 2019 to early 2021 include:
In August 2021, malicious cyber actors used Ghost variant ransomware against a California-based WWS facility. The ransomware variant had been in the system for about a month and was discovered when three supervisory control and data acquisition (SCADA) servers displayed a ransomware message.
In July 2021, cyber actors used remote access to introduce ZuCaNo ransomware onto a Maine-based WWS facility’s wastewater SCADA computer. The treatment system was run manually until the SCADA computer was restored using local control and more frequent operator rounds.
In March 2021, cyber actors used an unknown ransomware variant against a Nevada-based WWS facility. The ransomware affected the victim’s SCADA system and backup systems. The SCADA system provides visibility and monitoring but is not a full industrial control system (ICS).
It’s a matter of going after those with deep pockets. From Wired:
Cloudflare was sued in November 2018 by Mon Cheri Bridals and Maggie Sottero Designs, two wedding dress manufacturers and sellers that alleged Cloudflare was guilty of contributory copyright infringement because it didn’t terminate services for websites that infringed on the dressmakers’ copyrighted designs….
[Judge] Chhabria noted that the dressmakers have been harmed “by the proliferation of counterfeit retailers that sell knock-off dresses using the plaintiffs’ copyrighted images” and that they have “gone after the infringers in a range of actions, but to no avail — every time a website is successfully shut down, a new one takes its place.” Chhabria continued, “In an effort to more effectively stamp out infringement, the plaintiffs now go after a service common to many of the infringers: Cloudflare. The plaintiffs claim that Cloudflare contributes to the underlying copyright infringement by providing infringers with caching, content delivery, and security services. Because a reasonable jury could not — at least on this record — conclude that Cloudflare materially contributes to the underlying copyright infringement, the plaintiffs’ motion for summary judgment is denied and Cloudflare’s motion for summary judgment is granted.”
I was an expert witness for Cloudflare in this case, basically explaining to the court how the service works.
In this post, we are introducing Netflix Drive, a Cloud drive for media assets and providing a high level overview of some of its features and interfaces. We intend this to be a first post in a series of posts covering Netflix Drive. In the future posts, we will do an architectural deep dive into the several components of Netflix Drive.
Netflix, and particularly Studio applications (and Studio in the Cloud) produce petabytes of data backed by billions of media assets. Several artists and workflows that may be globally distributed, work on different projects, and each of these projects produce content that forms a part of the large corpus of assets.
Here is an example of globally distributed production where several artists and workflows work in conjunction to create and share assets for one or many projects.
Fig 1: Globally distributed production with artists working on different assets from different parts of the world
There are workflows in which these artists may want to view a subset of these assets from this large dataset, for example, pertaining to a specific project. These artists may want to create personal workspaces and work on generating intermediate assets. To support such use cases, access control at the user workspace and project workspace granularity is extremely important for presenting a globally consistent view of pertinent data to these artists.
Netflix Drive aims to solve this problem of exposing different namespaces and attaching appropriate access control to help build a scalable, performant, globally distributed platform for storing and retrieving pertinent assets.
Netflix Drive is envisioned to be a Cloud Drive for Studio and Media applications and lends itself to be a generic paved path solution for all content in Netflix.
It exposes a file/folder interface for applications to save their data and an API interface for control operations. Netflix Drive relies on a data store that will be the persistent storage layer for assets, and a metadata store which will provide a relevant mapping from the file system hierarchy to the data store entities. The major pieces, as shown in Fig. 2, are the file system interface, the API interface, and the metadata and data stores. We will delve into these in the following sections.
Fig 2: Netflix Drive components
File interface for Netflix Drive
Creative applications such as Nuke, Maya, Adobe Photoshop store and retrieve content using files and folders. Netflix Drive relies on FUSE (File System In User Space) to provide POSIX files and folders interface to such applications. A FUSE based POSIX interface provides feature customization elasticity, deployment configuration flexibility as well as a standard and seamless file/folder interface. A similar user space abstraction is available for Windows (WinFSP) and MacOS (MacFUSE)
The operations that originate from user, application and system actions on files and folders translate to a well defined set of function and system calls which are forwarded by the Linux Virtual File System Layer (or a pass-through/filter driver in Windows) to the FUSE layer in user space. The resulting metadata and data operations will be implemented by appropriate metadata and data adapters in Netflix Drive.
Fig 3: POSIX interface of Netflix Drive
The POSIX files and folders interface for Netflix Drive is designed as a layered system with the FUSE implementation hooks forming the top layer. This layer will provide entry points for all of the relevant VFS calls that will be implemented. Netflix Drive contains an abstraction layer below FUSE which allows different metadata and data stores to be plugged into the architecture by having their corresponding adapters implement the interface. We will discuss more about the layered architecture in the section below.
API Interface for Netflix Drive
Along with exposing a file interface which will be a hub of all abstractions, Netflix Drive also exposes API and Polled Task interfaces to allow applications and workflow tools to trigger control operations in Netflix Drive.
For example, applications can explicitly use REST endpoints to publish files stored in Netflix Drive to cloud, and later use a REST endpoint to retrieve a subset of the published files from cloud. The API interface can also be used to track the transfers of large files and allows other applications to be built on top of Netflix Drive.
Fig 4: Control interface of Netflix Drive
The Polled Task interface allows studio and media workflow orchestrators to post or dispatch tasks to Netflix Drive instances on disparate workstations or containers. This allows Netflix Drive to be bootstrapped with an empty namespace when the workstation comes up and dynamically project a specific set of assets relevant to the artists’ work sessions or workflow stages. Further these assets can be projected into a namespace of the artist’s or application’s choosing.
Alternatively, workstations/containers can be launched with the assets of interest prefetched at startup. These allow artists and applications to obtain a workstation which already contains relevant files and optionally add and delete asset trees during the work session. For example, artists perform transformative work on files, and use Netflix Drive to store/fetch intermediate results as well as the final copy which can be transformed back into a media asset.
Bootstrapping Netflix Drive
Given the two different modes in which applications can interact with Netflix Drive, now let us discuss how Netflix Drive is bootstrapped.
On startup, Netflix Drive expects a manifest that contains information about the data store, metadata store, and credentials (tied to a user login) to form an instance of namespace hierarchy. A Netflix Drive mount point may contain multiple Netflix Drive namespaces.
A dynamic instance allowsNetflix Drive to show a user-selected and user-accessible subset of data from a large corpus of assets. A user instance allows it to act like a Cloud Drive, where users can work on content which is automatically synced in the background periodically to Cloud. On restart on a new machine, the same files and folders will be prefetched from the cloud. We will cover the different namespaces of Netflix Drive in more detail in a subsequent blog post.
Here is an example of a typical bootstrap manifest file.
A sample manifest file.
The manifest is a persistent artifact which renders a user workstation its Netflix Drive personality. It survives instance failures and is able to recreate the same stateful interface on any newly deployed instance.
Metadata and Data Store Abstractions
In order to allow a variety of different metadata stores and data stores to be easily plugged into the architecture, Netflix Drive exposes abstract interfaces for both metadata and data stores. Here is a high level diagram explaining the different layers of abstractions in Netflix Drive
Fig 5: Layered architecture of Netflix Drive
Metadata Store Characteristics
Each file in Netflix Drive would have one or many corresponding metadata nodes, corresponding to different versions of the file. The file system hierarchy would be modeled as a tree in the metadata store where the root node is the top level folder for the application.
Each metadata node will contain several attributes, such as checksum of the file, location of the data, user permissions to access data, file metadata such as size, modification time, etc. A metadata node may also provide support for extended attributes which can be used to model ACLs, symbolic links, or other expressive file system constructs.
Metadata Store may also expose the concept of workspaces, where each user/application can have several workspaces, and can share workspaces with other users/applications. These are higher level constructs that are very useful to Studio applications.
Data Store Characteristics
Netflix Drive relies on a data store that allows streaming bytes into files/objects persisted on the storage media. The data store should expose APIs that allow Netflix Drive to perform I/O operations. The transfer mechanism for transport of bytes is a function of the data store.
In the first manifestation, Netflix Drive is using an object store (such as Amazon S3) as a data store. In order to expose file store-like properties, there were some changes needed in the object store. Each file can be stored as one or more objects. For Studio applications, file sizes may exceed the maximum object size for Cloud Storage, and so, the data store service should have the ability to store multiple parts of a file as separate objects. It is the responsibility of the data store service to tie these objects to a single file and inform the metadata store of the single unique Id for these several object parts. This Data store internally implements the chunking of file into several parts, encrypting of the content, and life cycle management of the data.
Multi-tiered architecture
Netflix Drive allows multiple data stores to be a part of the same installation via its bootstrap manifest.
Fig 6: Multiple data stores of Netflix Drive
Some studio applications such as encoding and transcoding have different I/O characteristics than a typical cloud drive.
Most of the data produced by these applications is ephemeral in nature, and is read often initially. The final encoded copy needs to be persisted and the ephemeral data can be deleted. To serve such applications, Netflix Drive can persist the ephemeral data in storage tiers which are closer to the application that allow lower read latencies and better economies for read request, since cloud storage reads incur an egress cost. Finally, once the encoded copy is prepared, this copy can be persisted by Netflix Drive to a persistent storage tier in the cloud. A single data store may also choose to archive some subset of content stored in cheaper alternatives.
Security
Studio applications require strict adherence to security models where only users or applications with specific permissions should be allowed to access specific assets. Security is one of the cornerstones of Netflix Drive design. Netflix Drive dynamic namespace design allows an artist or workflow to access only a small subset of the assets based on the workspace information and access control and is one of the benefits of using Netflix Drive in Studio workflows. Netflix Drive encapsulates the authentication and authorization models in its metadata store. These are translated into POSIX ACLs in Netflix Drive. In the future, Netflix Drive can allow more expressive ACLs by leveraging extended attributes associated with Metadata nodes corresponding to an asset.
Netflix Drive is currently being used by several Studio teams as the paved path solution for working with assets and is integrated with several media suite applications. As of today, Netflix Drive can be installed on CentOS, MacOS and Windows. In the future blog posts, we will cover implementation details, learnings, performance analysis of Netflix Drive, and some of the applications and workflows built on top of Netflix Drive.
If you are passionate about building Storage and Infrastructure solutions for Netflix Data Platform, we are always looking for talented engineers and managers. Please check out our job listings
Netflix Drive was originally published in Netflix TechBlog on Medium, where people are continuing the conversation by highlighting and responding to this story.
Really good op-ed in the New York Times about how vulnerable the GPS system is to interference, spoofing, and jamming — and potential alternatives.
The 2018 National Defense Authorization Act included funding for the Departments of Defense, Homeland Security and Transportation to jointly conduct demonstrations of various alternatives to GPS, which were concluded last March. Eleven potential systems were tested, including eLoran, a low-frequency, high-power timing and navigation system transmitted from terrestrial towers at Coast Guard facilities throughout the United States.
“China, Russia, Iran, South Korea and Saudi Arabia all have eLoran systems because they don’t want to be as vulnerable as we are to disruptions of signals from space,” said Dana Goward, the president of the Resilient Navigation and Timing Foundation, a nonprofit that advocates for the implementation of an eLoran backup for GPS.
Also under consideration by federal authorities are timing systems delivered via fiber optic network and satellite systems in a lower orbit than GPS, which therefore have a stronger signal, making them harder to hack. A report on the technologies was submitted to Congress last week.
GPS is a piece of our critical infrastructure that is essential to a lot of the rest of our critical infrastructure. It needs to be more secure.
A water treatment plant in Oldsmar, Florida, wasattacked last Friday. The attacker took control of one of the systems, and increased the amount of sodium hydroxide — that’s lye — by a factor of 100. This could have been fatal to people living downstream, if an alert operator hadn’t noticed the change and reversed it.
We don’t know who is behind this attack. Despite its similarities to a Russian attack of a Ukrainian power plant in 2015, my bet is that it’s a disgruntled insider: either a current or former employee. It just doesn’t make sense for Russia to be behind this.
ArsTechnica is reporting on the poor cybersecurity at the plant:
The Florida water treatment facility whose computer system experienced a potentially hazardous computer breach last week used an unsupported version of Windows with no firewall and shared the same TeamViewer password among its employees, government officials have reported.
Brian Krebs points out that the fact that we know about this attack is what’s rare:
Spend a few minutes searching Twitter, Reddit or any number of other social media sites and you’ll find countless examples of researchers posting proof of being able to access so-called “human-machine interfaces” — basically web pages designed to interact remotely with various complex systems, such as those that monitor and/or control things like power, water, sewage and manufacturing plants.
And yet, there have been precious few known incidents of malicious hackers abusing this access to disrupt these complex systems. That is, until this past Monday, when Florida county sheriff Bob Gualtieri held a remarkably clear-headed and fact-filled news conference about an attempt to poison the water supply of Oldsmar, a town of around 15,000 not far from Tampa.
The collective thoughts of the interwebz
Manage Consent
To provide the best experiences, we use technologies like cookies to store and/or access device information. Consenting to these technologies will allow us to process data such as browsing behavior or unique IDs on this site. Not consenting or withdrawing consent, may adversely affect certain features and functions.
Functional
Always active
The technical storage or access is strictly necessary for the legitimate purpose of enabling the use of a specific service explicitly requested by the subscriber or user, or for the sole purpose of carrying out the transmission of a communication over an electronic communications network.
Preferences
The technical storage or access is necessary for the legitimate purpose of storing preferences that are not requested by the subscriber or user.
Statistics
The technical storage or access that is used exclusively for statistical purposes.The technical storage or access that is used exclusively for anonymous statistical purposes. Without a subpoena, voluntary compliance on the part of your Internet Service Provider, or additional records from a third party, information stored or retrieved for this purpose alone cannot usually be used to identify you.
Marketing
The technical storage or access is required to create user profiles to send advertising, or to track the user on a website or across several websites for similar marketing purposes.