In some way, shape or form, Internet piracy has always been carried out through some kind of application. Whether that’s a peer-to-peer client utilizing BitTorrent or eD2K, or a Usenet or FTP tool taking things back to their roots, software has always played a crucial role.
Of course, the nature of the Internet beast means that software usage is unavoidable but in recent years piracy has swung more towards the regular web browser, meaning that sites and services offering pirated content are largely easy to locate, identify and block, if authorities so choose.
As revealed this week by the MPA, thousands of platforms around the world are now targeted for blocking, with 1,800 sites and 5,300 domains blocked in Europe alone.
However, as the Kodi phenomenon has shown, web-based content doesn’t always have to be accessed via a standard web browser. Clever but potentially illegal addons and third-party apps are able to scrape web-based resources and present links to content on a wide range of devices, from mobile phones and tablets to set-top boxes.
While it’s still possible to block the resources upon which these addons rely, the scattered nature of the content makes the process much more difficult. One can’t simply block a whole platform because a few movies are illegally hosted there and even Google has found itself hosting thousands of infringing titles, a situation that’s ruthlessly exploited by addon and app developers alike.
Needless to say, the situation hasn’t gone unnoticed. The Alliance for Creativity and Entertainment has spent the last year (1,2,3) targeting many people involved in the addon and app scene, hoping they’ll take their tools and run, rather than further develop a rapidly evolving piracy ecosystem.
Over in Russia, a country that will happily block hundreds or millions of IP addresses if it suits them, the topic of infringing apps was raised this week. It happened during the International Strategic Forum on Intellectual Property, a gathering of 500 experts from more than 30 countries. There were strong calls for yet more tools and measures to deal with films and music being made available via ‘pirate’ apps.
The forum heard that in response to widespread website blocking, people behind pirate sites have begun creating applications for mobile devices to achieve the same ends – the provision of illegal content. This, key players in the music industry say, means that the law needs to be further tightened to tackle the rising threat.
“Consumption of content is now going into the mobile sector and due to this we plan to prevent mass migration of ‘pirates’ to the mobile sector,” said Leonid Agronov, general director of the National Federation of the Music Industry.
The same concerns were echoed by Alexander Blinov, CEO of Warner Music Russia. According to TASS, the powerful industry player said that while recent revenues had been positively affected by site-blocking, it’s now time to start taking more action against apps.
“I agree with all speakers that we can not stop at what has been achieved so far. The music industry has a fight against illegal content in mobile applications on the agenda,” Blinov said.
And if Blinov is to be believed, music in Russia is doing particularly well at the moment. Attributing successes to efforts by parliament, the Ministry of Communications, and copyright holders, Blinov said the local music market has doubled in the past two years.
“We are now in the top three fastest growing markets in the world, behind only China and South Korea,” Blinov said.
While some apps can work in the same manner as a basic web interface, others rely on more complex mechanisms, ‘scraping’ content from diverse sources that can be easily and readily changed if mitigation measures kick in. It will be very interesting to see how Russia deals with this threat and whether it will opt for highly technical solutions or the nuclear options demonstrated recently.
Last year I wrote about the Digital Security Exchange. The project is live:
The DSX works to strengthen the digital resilience of U.S. civil society groups by improving their understanding and mitigation of online threats.
We do this by pairing civil society and social sector organizations with credible and trustworthy digital security experts and trainers who can help them keep their data and networks safe from exposure, exploitation, and attack. We are committed to working with community-based organizations, legal and journalistic organizations, civil rights advocates, local and national organizers, and public and high-profile figures who are working to advance social, racial, political, and economic justice in our communities and our world.
If you are either an organization who needs help, or an expert who can provide help, visit their website.
The OpenBSD 6.3 release is out. “The release was scheduled for April
15, but since all the components are ready ahead of schedule it is being
released now.” This release includes mitigation for the Meltdown
vulnerability but not for Spectre on x86.
Linus has released the 4.16 kernel, as
expected. “We had a number of fixes and cleanups elsewhere, but none
of it made me go ‘uhhuh, better let this soak for another week’“.
Some of the headline changes in this release include initial support for
hypervisor, the usercopy whitelisting
hardening patches, some improvements to the deadline scheduler and, of
course, a lot of Meltdown and Spectre mitigation work.
Version 6.0.0 of the LLVM compiler suite is out.
“This release is the result of the community’s work over the past six
months, including: retpoline Spectre variant 2 mitigation,
significantly improved CodeView debug info for Windows, GlobalISel by
default for AArch64 at -O0, improved scheduling on several x86
micro-architectures, Clang defaults to -std=gnu++14 instead of
-std=gnu++98, support for some upcoming C++2a features, improved
optimizations, new compiler warnings, many bug fixes, and more.”
DDoS vandals have long intensified their attacks by sending a small number of specially designed data packets to publicly available services. The services then unwittingly respond by sending a much larger number of unwanted packets to a target. The best known vectors for these DDoS amplification attacks are poorly secured domain name system resolution servers, which magnify volumes by as much as 50 fold, and network time protocol, which increases volumes by about 58 times.
On Tuesday, researchers reported attackers are abusing a previously obscure method that delivers attacks 51,000 times their original size, making it by far the biggest amplification method ever used in the wild. The vector this time is memcached, a database caching system for speeding up websites and networks. Over the past week, attackers have started abusing it to deliver DDoSes with volumes of 500 gigabits per second and bigger, DDoS mitigation service Arbor Networks reported in a blog post.
Product security is an interesting animal: it is a uniquely cross-disciplinary endeavor that spans policy, consulting,
process automation, in-depth software engineering, and cutting-edge vulnerability research. And in contrast to many
other specializations in our field of expertise – say, incident response or network security – we have virtually no
time-tested and coherent frameworks for setting it up within a company of any size.
In my previous post, I shared some thoughts
on nurturing technical organizations and cultivating the right kind of leadership within. Today, I figured it would
be fitting to follow up with several notes on what I learned about structuring product security work – and about actually
making the effort count.
The “comfort zone” trap
For security engineers, knowing your limits is a sought-after quality: there is nothing more dangerous than a security
expert who goes off script and starts dispensing authoritatively-sounding but bogus advice on a topic they know very
little about. But that same quality can be destructive when it prevents us from growing beyond our most familiar role: that of
a critic who pokes holes in other people’s designs.
The role of a resident security critic lends itself all too easily to a sense of supremacy: the mistaken
belief that our cognitive skills exceed the capabilities of the engineers and product managers who come to us for help
– and that the cool bugs we file are the ultimate proof of our special gift. We start taking pride in the mere act
of breaking somebody else’s software – and then write scathing but ineffectual critiques addressed to executives,
demanding that they either put a stop to a project or sign off on a risk. And hey, in the latter case, they better
brace for our triumphant “I told you so” at some later date.
Of course, escalations of this type have their place, but they need to be a very rare sight; when practiced routinely, they are a telltale
sign of a dysfunctional team. We might be failing to think up viable alternatives that are in tune with business or engineering needs; we might
be very unpersuasive, failing to communicate with other rational people in a language they understand; or it might be that our tolerance for risk
is badly out of whack with the rest of the company. Whatever the cause, I’ve seen high-level escalations where the security team
spoke of valiant efforts to resist inexplicably awful design decisions or data sharing setups; and where product leads in turn talked about
pressing business needs randomly blocked by obstinate security folks. Sometimes, simply having them compare their notes would be enough to arrive
at a technical solution – such as sharing a less sensitive subset of the data at hand.
To be effective, any product security program must be rooted in a partnership with the rest of the company, focused on helping them get stuff done
while eliminating or reducing security risks. To combat the toxic us-versus-them mentality, I found it helpful to have some team members with
software engineering backgrounds, even if it’s the ownership of a small open-source project or so. This can broaden our horizons, helping us see
that we all make the same mistakes – and that not every solution that sounds good on paper is usable once we code it up.
Getting off the treadmill
All security programs involve a good chunk of operational work. For product security, this can be a combination of product launch reviews, design consulting requests, incoming bug reports, or compliance-driven assessments of some sort. And curiously, such reactive work also has the property of gradually expanding to consume all the available resources on a team: next year is bound to bring even more review requests, even more regulatory hurdles, and even more incoming bugs to triage and fix.
Being more tractable, such routine tasks are also more readily enshrined in SDLs, SLAs, and all kinds of other official documents that are often mistaken for a mission statement that justifies the existence of our teams. Soon, instead of explaining to a developer why they should fix a particular problem right away, we end up pointing them to page 17 in our severity classification guideline, which defines that “severity 2” vulnerabilities need to be resolved within a month. Meanwhile, another policy may be telling them that they need to run a fuzzer or a web application scanner for a particular number of CPU-hours – no matter whether it makes sense or whether the job is set up right.
To run a product security program that scales sublinearly, stays abreast of future threats, and doesn’t erect bureaucratic speed bumps just for the sake of it, we need to recognize this inherent tendency for operational work to take over – and we need to reign it in. No matter what the last year’s policy says, we usually don’t need to be doing security reviews with a particular cadence or to a particular depth; if we need to scale them back 10% to staff a two-quarter project that fixes an important API and squashes an entire class of bugs, it’s a short-term risk we should feel empowered to take.
As noted in my earlier post, I find contingency planning to be a valuable tool in this regard: why not ask ourselves how the team would cope if the workload went up another 30%, but bad financial results precluded any team growth? It’s actually fun to think about such hypotheticals ahead of the time – and hey, if the ideas sound good, why not try them out today?
Living for a cause
It can be difficult to understand if our security efforts are structured and prioritized right; when faced with such uncertainty, it is natural to stick to the safe fundamentals – investing most of our resources into the very same things that everybody else in our industry appears to be focusing on today.
I think it’s important to combat this mindset – and if so, we might as well tackle it head on. Rather than focusing on tactical objectives and policy documents, try to write down a concise mission statement explaining why you are a team in the first place, what specific business outcomes you are aiming for, how do you prioritize it, and how you want it all to change in a year or two. It should be a fluid narrative that reads right and that everybody on your team can take pride in; my favorite way of starting the conversation is telling folks that we could always have a new VP tomorrow – and that the VP’s first order of business could be asking, “why do you have so many people here and how do I know they are doing the right thing?”. It’s a playful but realistic framing device that motivates people to get it done.
In general, a comprehensive product security program should probably start with the assumption that no matter how many resources we have at our disposal, we will never be able to stay in the loop on everything that’s happening across the company – and even if we did, we’re not going to be able to catch every single bug. It follows that one of our top priorities for the team should be making sure that bugs don’t happen very often; a scalable way of getting there is equipping engineers with intuitive and usable tools that make it easy to perform common tasks without having to worry about security at all. Examples include standardized, managed containers for production jobs; safe-by-default APIs, such as strict contextual autoescaping for XSS or type safety for SQL; security-conscious style guidelines; or plug-and-play libraries that take care of common crypto or ACL enforcement tasks.
Of course, not all problems can be addressed on framework level, and not every engineer will always reach for the right tools. Because of this, the next principle that I found to be worth focusing on is containment and mitigation: making sure that bugs are difficult to exploit when they happen, or that the damage is kept in check. The solutions in this space can range from low-level enhancements (say, hardened allocators or seccomp-bpf sandboxes) to client-facing features such as browser origin isolation or Content Security Policy.
The usual consulting, review, and outreach tasks are an important facet of a product security program, but probably shouldn’t be the sole focus of your team. It’s also best to avoid undue emphasis on vulnerability showmanship: while valuable in some contexts, it creates a hypercompetitive environment that may be hostile to less experienced team members – not to mention, squashing individual bugs offers very limited value if the same issue is likely to be reintroduced into the codebase the next day. I like to think of security reviews as a teaching opportunity instead: it’s a way to raise awareness, form partnerships with engineers, and help them develop lasting habits that reduce the incidence of bugs. Metrics to understand the impact of your work are important, too; if your engagements are seen mostly as a yet another layer of red tape, product teams will stop reaching out to you for advice.
The other tenet of a healthy product security effort requires us to recognize at a scale and given enough time, every defense mechanism is bound to fail – and so, we need ways to prevent bugs from turning into incidents. The efforts in this space may range from developing product-specific signals for the incident response and monitoring teams; to offering meaningful vulnerability reward programs and nourishing a healthy and respectful relationship with the research community; to organizing regular offensive exercises in hopes of spotting bugs before anybody else does.
Oh, one final note: an important feature of a healthy security program is the existence of multiple feedback loops that help you spot problems without the need to micromanage the organization and without being deathly afraid of taking chances. For example, the data coming from bug bounty programs, if analyzed correctly, offers a wonderful way to alert you to systemic problems in your codebase – and later on, to measure the impact of any remediation and hardening work.
This is part one of a series. The second part will be posted later this week. Use the Join button above to receive notification of future posts in this series.
Though most of us have never set foot inside of a data center, as citizens of a data-driven world we nonetheless depend on the services that data centers provide almost as much as we depend on a reliable water supply, the electrical grid, and the highway system. Every time we send a tweet, post to Facebook, check our bank balance or credit score, watch a YouTube video, or back up a computer to the cloud we are interacting with a data center.
In this series, The Challenges of Opening a Data Center, we’ll talk in general terms about the factors that an organization needs to consider when opening a data center and the challenges that must be met in the process. Many of the factors to consider will be similar for opening a private data center or seeking space in a public data center, but we’ll assume for the sake of this discussion that our needs are more modest than requiring a data center dedicated solely to our own use (i.e. we’re not Google, Facebook, or China Telecom).
Data center technology and management are changing rapidly, with new approaches to design and operation appearing every year. This means we won’t be able to cover everything happening in the world of data centers in our series, however, we hope our brief overview proves useful.
What is a Data Center?
A data center is the structure that houses a large group of networked computer servers typically used by businesses, governments, and organizations for the remote storage, processing, or distribution of large amounts of data.
While many organizations will have computing services in the same location as their offices that support their day-to-day operations, a data center is a structure dedicated to 24/7 large-scale data processing and handling.
Depending on how you define the term, there are anywhere from a half million data centers in the world to many millions. While it’s possible to say that an organization’s on-site servers and data storage can be called a data center, in this discussion we are using the term data center to refer to facilities that are expressly dedicated to housing computer systems and associated components, such as telecommunications and storage systems. The facility might be a private center, which is owned or leased by one tenant only, or a shared data center that offers what are called “colocation services,” and rents space, services, and equipment to multiple tenants in the center.
A large, modern data center operates around the clock, placing a priority on providing secure and uninterrrupted service, and generally includes redundant or backup power systems or supplies, redundant data communication connections, environmental controls, fire suppression systems, and numerous security devices. Such a center is an industrial-scale operation often using as much electricity as a small town.
Types of Data Centers
There are a number of ways to classify data centers according to how they will be used, whether they are owned or used by one or multiple organizations, whether and how they fit into a topology of other data centers; which technologies and management approaches they use for computing, storage, cooling, power, and operations; and increasingly visible these days: how green they are.
Data centers can be loosely classified into three types according to who owns them and who uses them.
Exclusive Data Centers are facilities wholly built, maintained, operated and managed by the business for the optimal operation of its IT equipment. Some of these centers are well-known companies such as Facebook, Google, or Microsoft, while others are less public-facing big telecoms, insurance companies, or other service providers.
Managed Hosting Providers are data centers managed by a third party on behalf of a business. The business does not own data center or space within it. Rather, the business rents IT equipment and infrastructure it needs instead of investing in the outright purchase of what it needs.
Colocation Data Centers are usually large facilities built to accommodate multiple businesses within the center. The business rents its own space within the data center and subsequently fills the space with its IT equipment, or possibly uses equipment provided by the data center operator.
Backblaze, for example, doesn’t own its own data centers but colocates in data centers owned by others. As Backblaze’s storage needs grow, Backblaze increases the space it uses within a given data center and/or expands to other data centers in the same or different geographic areas.
Availability is Key
When designing or selecting a data center, an organization needs to decide what level of availability is required for its services. The type of business or service it provides likely will dictate this. Any organization that provides real-time and/or critical data services will need the highest level of availability and redundancy, as well as the ability to rapidly failover (transfer operation to another center) when and if required. Some organizations require multiple data centers not just to handle the computer or storage capacity they use, but to provide alternate locations for operation if something should happen temporarily or permanently to one or more of their centers.
Organizations operating data centers that can’t afford any downtime at all will typically operate data centers that have a mirrored site that can take over if something happens to the first site, or they operate a second site in parallel to the first one. These data center topologies are called Active/Passive, and Active/Active, respectively. Should disaster or an outage occur, disaster mode would dictate immediately moving all of the primary data center’s processing to the second data center.
While some data center topologies are spread throughout a single country or continent, others extend around the world. Practically, data transmission speeds put a cap on centers that can be operated in parallel with the appearance of simultaneous operation. Linking two data centers located apart from each other — say no more than 60 miles to limit data latency issues — together with dark fiber (leased fiber optic cable) could enable both data centers to be operated as if they were in the same location, reducing staffing requirements yet providing immediate failover to the secondary data center if needed.
This redundancy of facilities and ensured availability is of paramount importance to those needing uninterrupted data center services.
Leadership in Energy and Environmental Design (LEED) is a rating system devised by the United States Green Building Council (USGBC) for the design, construction, and operation of green buildings. Facilities can achieve ratings of certified, silver, gold, or platinum based on criteria within six categories: sustainable sites, water efficiency, energy and atmosphere, materials and resources, indoor environmental quality, and innovation and design.
Green certification has become increasingly important in data center design and operation as data centers require great amounts of electricity and often cooling water to operate. Green technologies can reduce costs for data center operation, as well as make the arrival of data centers more amenable to environmentally-conscious communities.
The ACT, Inc. data center in Iowa City, Iowa was the first data center in the U.S. to receive LEED-Platinum certification, the highest level available.
ACT Data Center exterior
ACT Data Center interior
Factors to Consider When Selecting a Data Center
There are numerous factors to consider when deciding to build or to occupy space in a data center. Aspects such as proximity to available power grids, telecommunications infrastructure, networking services, transportation lines, and emergency services can affect costs, risk, security and other factors that need to be taken into consideration.
The size of the data center will be dictated by the business requirements of the owner or tenant. A data center can occupy one room of a building, one or more floors, or an entire building. Most of the equipment is often in the form of servers mounted in 19 inch rack cabinets, which are usually placed in single rows forming corridors (so-called aisles) between them. This allows staff access to the front and rear of each cabinet. Servers differ greatly in size from 1U servers (i.e. one “U” or “RU” rack unit measuring 44.50 millimeters or 1.75 inches), to Backblaze’s Storage Pod design that fits a 4U chassis, to large freestanding storage silos that occupy many square feet of floor space.
Location will be one of the biggest factors to consider when selecting a data center and encompasses many other factors that should be taken into account, such as geological risks, neighboring uses, and even local flight paths. Access to suitable available power at a suitable price point is often the most critical factor and the longest lead time item, followed by broadband service availability.
With more and more data centers available providing varied levels of service and cost, the choices increase each year. Data center brokers can be employed to find a data center, just as one might use a broker for home or other commercial real estate.
Websites listing available colocation space, such as upstack.io, or entire data centers for sale or lease, are widely used. A common practice is for a customer to publish its data center requirements, and the vendors compete to provide the most attractive bid in a reverse auction.
Business and Customer Proximity
The center’s closeness to a business or organization may or may not be a factor in the site selection. The organization might wish to be close enough to manage the center or supervise the on-site staff from a nearby business location. The location of customers might be a factor, especially if data transmission speeds and latency are important, or the business or customers have regulatory, political, tax, or other considerations that dictate areas suitable or not suitable for the storage and processing of data.
Local climate is a major factor in data center design because the climatic conditions dictate what cooling technologies should be deployed. In turn this impacts uptime and the costs associated with cooling, which can total as much as 50% or more of a center’s power costs. The topology and the cost of managing a data center in a warm, humid climate will vary greatly from managing one in a cool, dry climate. Nevertheless, data centers are located in both extremely cold regions and extremely hot ones, with innovative approaches used in both extremes to maintain desired temperatures within the center.
Geographic Stability and Extreme Weather Events
A major obvious factor in locating a data center is the stability of the actual site as regards weather, seismic activity, and the likelihood of weather events such as hurricanes, as well as fire or flooding.
Backblaze’s Sacramento data center describes its location as one of the most stable geographic locations in California, outside fault zones and floodplains.
Sometimes the location of the center comes first and the facility is hardened to withstand anticipated threats, such as Equinix’s NAP of the Americas data center in Miami, one of the largest single-building data centers on the planet (six stories and 750,000 square feet), which is built 32 feet above sea level and designed to withstand category 5 hurricane winds.
Equinix “NAP of the Americas” Data Center in Miami
Most data centers don’t have the extreme protection or history of the Bahnhof data center, which is located inside the ultra-secure former nuclear bunker Pionen, in Stockholm, Sweden. It is buried 100 feet below ground inside the White Mountains and secured behind 15.7 in. thick metal doors. It prides itself on its self-described “Bond villain” ambiance.
Bahnhof Data Center under White Mountain in Stockholm
Usually, the data center owner or tenant will want to take into account the balance between cost and risk in the selection of a location. The Ideal quadrant below is obviously favored when making this compromise.
Risk mitigation also plays a strong role in pricing. The extent to which providers must implement special building techniques and operating technologies to protect the facility will affect price. When selecting a data center, organizations must make note of the data center’s certification level on the basis of regulatory requirements in the industry. These certifications can ensure that an organization is meeting necessary compliance requirements.
Electrical power usually represents the largest cost in a data center. The cost a service provider pays for power will be affected by the source of the power, the regulatory environment, the facility size and the rate concessions, if any, offered by the utility. At higher level tiers, battery, generator, and redundant power grids are a required part of the picture.
Fault tolerance and power redundancy are absolutely necessary to maintain uninterrupted data center operation. Parallel redundancy is a safeguard to ensure that an uninterruptible power supply (UPS) system is in place to provide electrical power if necessary. The UPS system can be based on batteries, saved kinetic energy, or some type of generator using diesel or another fuel. The center will operate on the UPS system with another UPS system acting as a backup power generator. If a power outage occurs, the additional UPS system power generator is available.
Many data centers require the use of independent power grids, with service provided by different utility companies or services, to prevent against loss of electrical service no matter what the cause. Some data centers have intentionally located themselves near national borders so that they can obtain redundant power from not just separate grids, but from separate geopolitical sources.
Higher redundancy levels required by a company will of invariably lead to higher prices. If one requires high availability backed by a service-level agreement (SLA), one can expect to pay more than another company with less demanding redundancy requirements.
Stay Tuned for Part 2 of The Challenges of Opening a Data Center
That’s it for part 1 of this post. In subsequent posts, we’ll take a look at some other factors to consider when moving into a data center such as network bandwidth, cooling, and security. We’ll take a look at what is involved in moving into a new data center (including stories from Backblaze’s experiences). We’ll also investigate what it takes to keep a data center running, and some of the new technologies and trends affecting data center design and use. You can discover all posts on our blog tagged with “Data Center” by following the link https://www.backblaze.com/blog/tag/data-center/.
The second part of this series on The Challenges of Opening a Data Center will be posted later this week. Use the Join button above to receive notification of future posts in this series.
Researchers have discovered new variants of Spectre and Meltdown. The software mitigations for Spectre and Meltdown seem to block these variants, although the eventual CPU fixes will have to be expanded to account for these new attacks.
The initial panic over the Meltdown and Spectre processor vulnerabilities
has faded, and work on mitigations in the kernel has slowed since our mid-January report. That work has not
stopped, though. Fully equipping the kernel to protect systems from these
vulnerabilities is a task that may well require years. Read on for an
update on the current status of that work.
Linus has released the 4.15 kernel.
“After a release cycle that was unusual in so many (bad) ways, this
last week was really pleasant. Quiet and small, and no last-minute
panics, just small fixes for various issues. I never got a feeling
that I’d need to extend things by yet another week, and 4.15 looks
fine to me.”
Some of the more significant features in this release include:
the long-awaited CPU controller for the
version-2 control-group interface,
significant live-patching improvements,
initial support for the RISC-V architecture,
support for AMD’s secure encrypted virtualization feature, and
the MAP_SYNC mechanism for working
with nonvolatile memory.
This release also, of course, includes mitigations for the Meltdown and Spectre variant-2
vulnerabilities though, as Linus points out in the announcement, the
work of dealing with these issues is not yet done.
By now, almost everybody has probably seen the press coverage of Linus Torvalds’s remarks about one of the
patches addressing Spectre variant 2. Less noted, but much more
informative, is David Woodhouse’s response
on why those patches are the way they are. “That’s why my initial
idea, as implemented in this RFC patchset, was to stick with IBRS on
Skylake, and use retpoline everywhere else. I’ll give you ‘garbage
patches’, but they weren’t being ‘just mindlessly sent around’. If we’re
going to drop IBRS support and accept the caveats, then let’s do it as a
conscious decision having seen what it would look like, not just drop it
quietly because poor Davey is too scared that Linus might shout at him
While some aspects of the kernel’s defenses against the Meltdown and
Spectre vulnerabilities were more-or-less in place when the problems were
disclosed on January 3, others were less fully formed. Additionally,
many of the mitigations (especially for the two Spectre variants) had not
been seen in public prior to the disclosure, meaning that there was a lot
of scope for discussion once they came out. Many of those discussions are
slowing down, and the kernel’s initial response has mostly come into
focus. The 4.15 kernel will include a broad set of mitigations, while some
others will have to wait for later; read on
for details on where things stand.
When the Meltdown and Spectre vulnerabilities were disclosed on
January 3, attention quickly turned to mitigations. There was already
a clear defense against Meltdown in the form of kernel page-table isolation (KPTI), but the
against the two Spectre variants had not been developed in public and still
do not exist in the mainline kernel. Initial versions of proposed
defenses have now been disclosed. The resulting picture shows what has
been done to fend off Spectre-based attacks in the near future, but the
situation remains chaotic, to put it lightly.
One of the main concerns about the mitigations for the Meltdown/Spectre speculative execution bugs has been performance. The Google Security Blog is reporting negligible performance impact on Google systems for two of the mitigations (kernel page-table isolation and Retpoline): “In response to the vulnerabilities that were discovered we developed a novel mitigation called “Retpoline” — a binary modification technique that protects against “branch target injection” attacks. We shared Retpoline with our industry partners and have deployed it on Google’s systems, where we have observed negligible impact on performance.
In addition, we have deployed Kernel Page Table Isolation (KPTI) — a general purpose technique for better protecting sensitive information in memory from other software running on a machine — to the entire fleet of Google Linux production servers that support all of our products, including Search, Gmail, YouTube, and Google Cloud Platform.
There has been speculation that the deployment of KPTI causes significant performance slowdowns. Performance can vary, as the impact of the KPTI mitigations depends on the rate of system calls made by an application. On most of our workloads, including our cloud infrastructure, we see negligible impact on performance.”
Greg Kroah-Hartman has announced the release of the 4.14.12, 4.9.75, and 4.4.110 stable kernels. The bulk of the
changes are either to fix the mitigations for Meltdown/Spectre (in 4.14.12) or to backport
those mitigations (in the two older kernels). There are apparently known (or
suspected) problems with
each of the releases, which Kroah-Hartman is hoping to get shaken out in
the near term. For example, the 4.4.110 announcement warns: “But be
careful, there have been some reports of problems with this
release during the -rc review cycle. Hopefully all of those issues are
So please test, as of right now, it should be ‘bug compatible’ with the
‘enterprise’ kernel releases with regards to the Meltdown bug and proper
support on all virtual platforms (meaning there is still a vdso issue
that might trip up some old binaries, again, please test!)”
Jann Horn has reported eight bugs in the
eBPF verifier, one for the 4.9 kernel and seven introduced in 4.14, to the
oss-security mailing list. Some
of these bugs result in eBPF programs being able to read and write arbitrary
kernel memory, thus can be used for a variety of ill effects, including
privilege escalation. As Ben Hutchings notes,
one mitigation would be to disable unprivileged access to BPF using the
following sysctl: kernel.unprivileged_bpf_disabled=1. More information can also be found
in this Project
Zero bug entry. The fixes are not yet in the mainline tree, but are in
the netdev tree. Hutchings goes on to say: “There is a public
exploit that uses several of these bugs to get root privileges. It doesn’t
work as-is on stretch [Debian 9] with the Linux 4.9 kernel, but is easy to adapt. I
recommend applying the above mitigation as soon as possible to all systems
running Linux 4.4 or later.”
(Note: all discussion here is based on publicly disclosed information, and I am not speaking on behalf of my employers)
I wrote about the potential impact of the most recent Intel ME vulnerabilities a couple of weeks ago. The details of the vulnerability were released last week, and it’s not absolutely the worst case scenario but it’s still pretty bad. The short version is that one of the (signed) pieces of early bringup code for the ME reads an unsigned file from flash and parses it. Providing a malformed file could result in a buffer overflow, and a moderately complicated exploit chain could be built that allowed the ME’s exploit mitigation features to be bypassed, resulting in arbitrary code execution on the ME.
Getting this file into flash in the first place is the difficult bit. The ME region shouldn’t be writable at OS runtime, so the most practical way for an attacker to achieve this is to physically disassemble the machine and directly reprogram it. The AMT management interface may provide a vector for a remote attacker to achieve this – for this to be possible, AMT must be enabled and provisioned and the attacker must have valid credentials. Most systems don’t have provisioned AMT, so most users don’t have to worry about this.
Overall, for most end users there’s little to worry about here. But the story changes for corporate users or high value targets who rely on TPM-backed disk encryption. The way the TPM protects access to the disk encryption key is to insist that a series of “measurements” are correct before giving the OS access to the disk encryption key. The first of these measurements is obtained through the ME hashing the first chunk of the system firmware and passing that to the TPM, with the firmware then hashing each component in turn and storing those in the TPM as well. If someone compromises a later point of the chain then the previous step will generate a different measurement, preventing the TPM from releasing the secret.
However, if the first step in the chain can be compromised, all these guarantees vanish. And since the first step in the chain relies on the ME to be running uncompromised code, this vulnerability allows that to be circumvented. The attacker’s malicious code can be used to pass the “good” hash to the TPM even if the rest of the firmware has been tampered with. This allows a sufficiently skilled attacker to extract the disk encryption key and read the contents of the disk.
In addition, TPMs can be used to perform something called “remote attestation”. This allows the TPM to provide a signed copy of the recorded measurements to a remote service, allowing that service to make a policy decision around whether or not to grant access to a resource. Enterprises using remote attestation to verify that systems are appropriately patched (eg) before they allow them access to sensitive material can no longer depend on those results being accurate.
Things are even worse for people relying on Intel’s Platform Trust Technology (PTT), which is an implementation of a TPM that runs on the ME itself. Since this vulnerability allows full access to the ME, an attacker can obtain all the private key material held in the PTT implementation and, effectively, adopt the machine’s cryptographic identity. This allows them to impersonate the system with arbitrary measurements whenever they want to. This basically renders PTT worthless from an enterprise perspective – unless you’ve maintained physical control of a machine for its entire lifetime, you have no way of knowing whether it’s had its private keys extracted and so you have no way of knowing whether the attestation attempt is coming from the machine or from an attacker pretending to be that machine.
Bootguard, the component of the ME that’s responsible for measuring the firmware into the TPM, is also responsible for verifying that the firmware has an appropriate cryptographic signature. Since that can be bypassed, an attacker can reflash modified firmware that can do pretty much anything. Yes, that probably means you can use this vulnerability to install Coreboot on a system locked down using Bootguard.
(An aside: The Titan security chips used in Google Cloud Platform sit between the chipset and the flash and verify the flash before permitting anything to start reading from it. If an attacker tampers with the ME firmware, Titan should detect that and prevent the system from booting. However, I’m not involved in the Titan project and don’t know exactly how this works, so don’t take my word for this)
Intel have published an update that fixes the vulnerability, but it’s pretty pointless – there’s apparently no rollback protection in the affected 11.x MEs, so while the attacker is modifying your flash to insert the payload they can just downgrade your ME firmware to a vulnerable version. Version 12 will reportedly include optional rollback protection, which is little comfort to anyone who has current hardware. Basically, anyone whose threat model depends on the low-level security of their Intel system is probably going to have to buy new hardware.
This is a big deal for enterprises and any individuals who may be targeted by skilled attackers who have physical access to their hardware, and entirely irrelevant for almost anybody else. If you don’t know that you should be worried, you shouldn’t be.
 Although admins should bear in mind that any system that hasn’t been patched against CVE-2017-5689 considers an empty authentication cookie to be a valid credential
 TPMs are not intended to be strongly tamper resistant, so an attacker could also just remove the TPM, decap it and (with some effort) extract the key that way. This is somewhat more time consuming than just reflashing the firmware, so the ME vulnerability still amounts to a change in attack practicality.
The collective thoughts of the interwebz
The cookie settings on this website are set to "allow cookies" to give you the best browsing experience possible. If you continue to use this website without changing your cookie settings or you click "Accept" below then you are consenting to this.