“Ironically, one of the safest places to be right now is in a cleanroom,” points out Thomas Sonderman, president of SkyWater Technology, in Bloomington, Minn.
Like every business, semiconductor foundries like SkyWater and GlobalFoundries have had to make some pretty radical changes to their operations in order to keep their workers safe and comply with new government mandates, but there are some unique problems to running a 24/7 chip-making operation.
GlobalFoundries’ COVID-19 plan is basically an evolution of its response to a previous coronavirus outbreak, the 2002-3 SARS pandemic. When the company acquired Singapore-based Chartered Semiconductor in 2010, it inherited a set of fabs that had managed to produce chips through the worst of that outbreak. (According to the World Health Organization, Singapore suffered 238 SARS cases and 33 deaths.)
“During that period we established business policies, protocols, and health and safety measures to protect our team while maintaining operations,” says Ronald Sampson, GlobalFoundries’ senior vice president and general manager of U.S. fab operations. “That was a successful protocol that served as the basis for this current pandemic that we’re experiencing together now. Since that time we’ve implemented it worldwide and of course in our three U.S. factories.”
At Fab 8 in Malta, N.Y., GlobalFoundries’ most advanced 300-mm CMOS facility, that translates into a host of procedures. Some of them are common, such as working from home, forbidding travel, limiting visitors, and temperature screening. Others are more unique to the fab operation. For example, workers are split into two teams that never come into contact with each other; they aren’t in the building on the same day, and they even use separate gowning rooms to enter the cleanroom floor. Those gowning rooms are marked off in roughly 2-meter squares, and no two people are allowed to occupy the same square.
Once employees are suited up and in the clean room, they’re taking advantage of it. “It’s one of the cleanest places on earth,” says Sampson. “We’ve moved all of our operations meetings onto the factory floor itself,” instead of having physically separated team members in a conference room.
GlobalFoundries is sharing some of what makes that safety possible, too. It’s assisted healthcare facilities in New York and Vermont, where its U.S. fabs are located, with available personal protective equipment, such as face shields and masks, in addition to making cash donations to local food banks and other causes near its fabs around the world. (SkyWater is currently evaluating what the most significant needs are in its community and whether it is able to play a meaningful role in addressing them.)
But there are plenty of similarities with GlobalFoundries in SkyWater’s current operations, including telecommuting engineers, staggered in-person work shifts, and restricted entry for visitors. There are, of course, few visitors these days. Customers and technology development partners are meeting remotely with SkyWater’s engineers. And many chip making tools can be monitored by service companies remotely.
(Applied Materials, a major chip equipment maker, says that many customers’ tools are monitored and diagnosed remotely already. The company installs a server in the fab that allows field engineers access to the tools without having to set foot on premises.)
With the whole world in economic upheaval, you might expect that the crisis would lead to some surprises in foundry supply chains. Both GlobalFoundries and SkyWater say they are well prepared. For SkyWater, a relatively small US-based foundry with just the one fab, the big reasons for that preparedness was the trade war between the United States and China beginning in 2018.
“If you look at the broader supply chain, we’ve been preparing for this since tariffs began,” says Sonderman. Those necessitated a deep dive into the business’s potential vulnerabilities that’s helped guide the response to the current crisis, he says.
At press time no employees of either company had tested positive for the virus. But that situation is likely to change as the virus spreads, and the companies say they will adapt. Like everybody else, “we’re finding our new normal,” says Sampson.
There’s been a lot of intense and well-funded work developing chips that are specially designed to perform AI algorithms faster and more efficiently. The trouble is that it takes years to design a chip, and the universe of machine learning algorithms moves a lot faster than that. Ideally you want a chip that’s optimized to do today’s AI, not the AI of two to five years ago. Google’s solution: have an AI design the AI chip.
“We believe that it is AI itself that will provide the means to shorten the chip design cycle, creating a symbiotic relationship between hardware and AI, with each fueling advances in the other,” they write in a paper describing the work that posted today to Arxiv.
“We have already seen that there are algorithms or neural network architectures that… don’t perform as well on existing generations of accelerators, because the accelerators were designed like two years ago, and back then these neural nets didn’t exist,” says Azalia Mirhoseini, a senior research scientist at Google. “If we reduce the design cycle, we can bridge the gap.”
Mirhoseini and senior software engineer Anna Goldie have come up with a neural network that learn to do a particularly time-consuming part of design called placement. After studying chip designs long enough, it can produce a design for a Google Tensor Processing Unit in less than 24 hours that beats several weeks-worth of design effort by human experts in terms of power, performance, and area.
Placement is so complex and time-consuming because it involves placing blocks of logic and memory or clusters of those blocks called macros in such a way that power and performance are maximized and the area of the chip is minimized. Heightening the challenge is the requirement that all this happen while at the same time obeying rules about the density of interconnects. Goldie and Mirhoseini targeted chip placement, because even with today’s advanced tools, it takes a human expert weeks of iteration to produce an acceptable design.
Goldie and Mirhoseini modeled chip placement as a reinforcement learning problem. Reinforcement learning systems, unlike typical deep learning, do not train on a large set of labeled data. Instead, they learn by doing, adjusting the parameters in their networks according to a reward signal when they succeed. In this case, the reward was a proxy measure of a combination of power reduction, performance improvement, and area reduction. As a result, the placement-bot becomes better at its task the more designs it does.
The team hopes AI systems like theirs will lead to the design of “more chips in the same time period, and also chips that run faster, use less power, cost less to build, and use less area,” says Goldie.
Researchers at Intel and Cornell University report that they’ve made an electronic nose that can learn the scent of a chemical after just one exposure to it and then identify that scent even when it’s masked by others. The system is built around Intel’s neuromorphic research chip, Loihi and an array of 72 chemical sensors. Loihi was programmed to mimic the workings of neurons in the olfactory bulb, the part of the brain that distinguishes different smells. The system’s inventors say it could one day watch for hazardous substances in the air, sniff out hidden drugs or explosives, or aid in medical diagnoses.
Israeli startup Hailo says it has raised US $60 million in the second round of funding it will use to mass produce its Hailo-8 chip. The Hailo-8 is designed to do deep learning in cars, robots, and other “edge” machines. Such edge chips are meant to reduce the cost, size, and power consumption needs of using AI to process high-resolution information from sensors such as HD cameras.
Dr. Arthur Kreitenberg and his son Elliot got some strange looks when they began the design work for the GermFalcon, a new machine that uses ultraviolet light to wipe out coronavirus and other germs inside an airplane. The father-son founders of Dimer UVC took tape measures with them on flights to unobtrusively record the distances that would form the key design constraints for their system.
“We definitely got lots of looks from passengers and lots of inquiries from flight attendants,” Dr. Kreitenberg recalls. “You can imagine that would cause some attention: taking out a tape measure midflight and measuring armrests. The truth is that when we explained to the flight attendants what we were doing and what we were designing, they [were] really excited about it.”
Honeywell may be a giant industrial technology firm, but it’s definitely not synonymous with advanced computing. Yet the company has made a ten-year commitment to developing an inhouse quantum computing, and it is about to start paying back.
“We expect within next three months we will be releasing world’s most powerful quantum computer,” says Tony Uttley, president of Honeywell Quantum Solutions. It’s the kind of claim competitors like IBM and Google have made periodically, but with Honeywell there’s a difference. Those others, using superconducting components chilled to near absolute zero, have been racing to cram more and more qubits onto a chip, Google reached its “quantum supremacy” milestone with 53 qubits. Uttley says Honeywell can beat it with a handful of its ion qubits.
Uttley is measuring its success using a relatively new metric pushed by IBM and called quantum volume. It’s essentially a measure of the number of physical qubits, how connected they are, and how error prone they are. IBM claimed a leading quantum volume of 32 using a 28-qubit system in early January. Honeywell’s four-qubit system reached 16, and it will hit 64 in coming months, says Uttley
The company has an ambitious path toward rapid expansion after that. “We expect to be on a trajectory to increase quantum volume 10-fold every year for the next five years,” he says. IBM is planning to double its figure every year.
Honeywell’s computer uses ytterbium ions trapped in an electromagnetic field in a narrow groove built in a chip. The qubit relies on the spin state of the ion’s outermost electron and that of its nucleus. This can be manipulated by lasers and can hold its state—remain coherent—for a fairly long time compared to other types of qubits. Importantly, the qubits can be moved around on the trap chip, allowing them to interact in ways that produce quantum algorithms
“We chose trapped ions because we believe in these early days of quantum computing, quality of qubit is going to matter most,” says Uttley.
Honeywell is claiming qubits that are so free from corruption that they’ve achieved a first, a a “mid-circuit” measurement. That is, the system can interrogate the state of a qubit during a computation without damaging the states of the others, and, based on that observed qubit, it can change what the rest of the computation does. “It’s equivalent to an ‘if’ statement,” explains Uttley. Mid-circuit measurements are not currently possible in other technologies. “It’s theoretically possible,” he says. “But practically speaking, it will be a point of differentiation [for Honeywell] for a while.”
Ion-trap quantum systems were first developed at the U.S. National Institute of Standards and Technology in the 1990s. In 2015, a veteran of that group Chris Monroe cofounded the ion-trap quantum computer company IonQ. IonQ has already fit 160 ytterbium-based qubits in its system and performed operations on 79 of them. The startup has published several tests of its system, but not a quantum volume measure.
French startup Cartesiam was founded because of the predicted inundation of IoT sensors and products. Even a few years ago, the idea was that these tens of billions of smart sensors would deliver their data to the cloud. AI and other software there would understand what it meant and trigger the appropriate action.
As it did to many others in the embedded systems space, this scheme looked a little ludicrous. “We were thinking: it doesn’t make sense,” says general manager and cofounder Marc Dupaquier. Transporting all that data was expensive in terms of energy and money, it wasn’t secure, it added latency between an event and the needed reaction, and it was a privacy-endangering use of data. So Cartesiam set about building a system that allows ordinary Arm microcontrollers to run a kind of AI called unsupervised learning.
There’s something inherently inefficient about the way video captures motion today. Cameras capture frame after frame at regular intervals, but most of the pixels in those frames don’t change from one to the other, and whatever is moving in those frames is only captured episodically.
Event-based cameras work differently; their pixels only react if they detect a change in the amount of light falling on them. They capture motion better than any other camera, while generating only a small amount of data and burning little power.
Paris-based startup Prophesee has been developing and selling event-based cameras since 2016, but the applications for their chips were limited. That’s because the circuitry surrounding the light-sensing element took up so much space that the imagers had a fairly low resolution. In a partnership announced this week at the IEEE International Solid-State Circuits Conference in San Francisco, Prophesee used Sony technology to put that circuitry on a separate chip that sits behind the pixels.
“Using the Sony process, which is probably the most advanced process, we managed to shrink the pixel pitch down to 4.86 micrometers” from their previous 15 micrometers, says Luca Verre, the company’s cofounder and CEO.
The resulting 1280 x 720 HD event-based imager is suitable for a much wider range of applications. “We want to enter the space of smart home cameras and smart infrastructure, [simultaneous localization and mapping] solutions for AR/VR, 3D sensing for drones or industrial robots,” he says.
The company is also looking to enter the automotive market, where imagers need a high dynamic range to deal with the big differences between day and night driving. “This is where our technology excels,” he says.
Besides the photodiode, each pixel requires circuits to change the diode’s current into a logarithmic voltage and determine if there’s been an increase or decrease in luminosity. It’s that circuitry that Sony’s technology puts on a separate chip that sits behind the pixels and is linked to them by a dense array of copper connections. Previously, the photodiode made up only 25 percent of the area of the pixel, now it’s 77 percent.
When a pixel detects a change (an event), all that is output is the location of the pixel, the polarity of the change, and a 1-microsecond-resolution time stamp. The imager consumes 32 milliwatts to register 100,000 events per second and ramps up to just 73 milliwatts at 300 million events per second. A system that dynamically compresses the event data allows the chip to sustain a rate of more than 1 billion events per second.
For decades, the trend was for more and more of a computer’s systems to be integrated onto a single chip. Today’s system-on-chips, which power smartphones and servers alike, are the result. But complexity and cost are starting to erode the idea that everything should be on a single slice of silicon.
Already, some of the most of advanced processors, such as AMD’s Zen 2 processor family, are actually a collection of chiplets bound together by high-bandwidth connections within a single package. This week at the IEEE Solid-State Circuits Conference (ISSCC) in San Francisco, French research organization CEA-Leti showed how far this scheme can go, creating a 96-core processor out of six chiplets.
Using an unrelated technology the company had in development, Eta Compute pivoted toward more traditional neural networks such as deep learning and is reaping the rewards. The West Lake Village, Calif.-based company revealed on Wednesday that its first production chips using that technology are now shipping.
MicroLEDs appear to be, forgive the pun, the bright future of displays. Made of micrometer-scale gallium-nitride LEDs, the technology offers an unmatchable ratio of brightness to power consumption.
The problem is that the most easily manufacturable ones are so small they’re suitable only for augmented reality and similar applications. Making bigger ones, like for a watch display or a smartphone screen, requires the near-perfect transfer of tens of thousands of individual microLEDs per second onto a prefabricated silicon backplane. It’s a very difficult proposition, but Apple and some startups are trying to tackle it.
If New Mexico-based startup iBeam is correct, even larger microLED displays could be produced quickly and cheaply on flexible substrates. “iBeam is a new paradigm in manufacturing for microLEDs,” says Julian Osinski, the startup’s vice president of product technology. “We have a way of growing microLEDs directly on a roll of metal foil, and that’s something nobody else can do.”
iBeam’s technology is adapted from the superconductor manufacturing industry where something similar has produced product by the kilometer. Founder and CEO Vladimir Matias is a superconductor manufacturing veteran and saw its potential in producing gallium-nitride devices.
LEDs and other gallium-nitride devices are usually grown atop a silicon or sapphire wafer by a process called epitaxy. For that to work, you need a single crystal, preferably with a similar crystal structure, for the gallium nitride to grow on. The iBeam process can produce that crystal-like substrate on an otherwise amorphous or polycrystalline surface such as metal or glass.
When you deposit material on an amorphous substrate you normally get a film of randomly oriented grains of crystal, explains Matias. But briefly blasting that film with ions from just the right angle gets all the grains to line up. iBeam chooses the film’s material so that the aligned grains match well with gallium nitride’s crystal structure. From there, they grow layers of gallium nitride using standard techniques and fashion them into microLEDs. Quantum dots will then be added to convert the color of some of the microLEDs from their natural blue to red and green.
Just as is done with superconductors, the procedure could be rapidly done in a roll-to-roll fashion, Osinski says. Today’s industry processes produce gallium nitride for about US $2 to $3 per square centimeter. “We’d like to take it down to 10 cents [per square centimeter] or less so it becomes competitive with OLEDs,” he says.
The company has used its existing process to produce microLEDs, and last month it announced the production of high-electron mobility transistors (HEMTs), as well. If the HEMTs could be constructed along with the microLEDs, they could form the circuitry that controls the microLED pixels.
(Kei May Lau’s team at Hong Kong University of Science and Technology developed a structure that integrates the HEMT and microLED so tightly that they effectively become one device.)
The startup’s near-term goal is to produce a small prototype display, which Osinki thinks may take until the end of next year. They hope to have large-scale manufacturing nailed down by 2022. That’s later than some microLED firms are planning to commercialize their products, but iBeam is counting on being able to produce much larger displays and at much lower cost. The company plans to sell its manufacturing process and materials to established display makers rather than become a manufacturer itself.
The kind of memory most people are familiar with returns data when given an address for that data. Content addressable memory (CAM) does the reverse: When given a set of data, it returns the address—typically in a single clock cycle—of where to find it. That ability, so useful in network routers and other systems that require a lot of lookups, is now getting a chance in new kinds of data-intensive tasks such as pattern matching and accelerating neural networks, as well as for doing logic operations in the memory itself.
As researchers strive to boost the capacity of quantum computers, they’ve run into a problem that many people have after a big holiday: There’s just not enough room in the fridge.
Today’s quantum-computer processors must operate inside cryogenic enclosures at near absolute zero, but the electronics needed for readout and control don’t work at such temperatures. So those circuits must reside outside the refrigerator. For today’s sub-100-qubit systems, there’s still enough space for specialized cabling to make the connection. But for future million-qubit systems, there just won’t be enough room. Such systems will need ultralow-power control chips that can operate inside the refrigerator. Engineers unveiled some potential solutions in December during the IEEE International Electron Devices Meeting (IEDM), in San Francisco. They ranged from the familiar to the truly exotic.
Perhaps the most straightforward way to make cryogenic controls for quantum computers is to modify CMOS technology. Unsurprisingly, that’s Intel’s solution. The company unveiled a cryogenic CMOS chip called Horse Ridge that translates quantum-computer instructions into basic qubit operations, which it delivers to the processor as microwave signals.
Horse Ridge is designed to work at 4 kelvins, a slightly higher temperature than the qubit chip itself, but low enough to sit inside the refrigerator with it. The company used its 22-nanometer FinFET manufacturing process to build the chip, but the transistors that make up the control circuitry needed substantial reengineering.
“If you take a transistor and cool it to 4 K, it’s not a foregone conclusion that it will work,” says Jim Clarke, director of quantum hardware at Intel. “There are a lot of fundamental characteristics of devices that are temperature dependent.”
At room temperature, the devices suffer some mechanical peculiarities. First, ambient oxygen can react with the relay’s electrode surfaces. Over time, this reaction can form a high-resistance layer, limiting the device’s ability to conduct current. But at cryogenic temperatures, oxygen freezes out of the air, so that problem doesn’t exist.
Second, the contacts in microscale relays tend to stick together. This shows up as a hysteresis effect: The relay opens at a slightly different voltage than the one at which it closes. But because the adhesive forces are weaker at cryogenic temperatures, the hysteresis is less than 5 percent of what it is at room temperature.
“We didn’t suspect ahead of time that these devices would operate so well at cryogenic temperatures,” says Liu, who led the research presented at IEDM by her graduate student Xiaoer Hu. “In retrospect, we should have.”
Single-flux quantum logic
Hypres, in Elmsford, N.Y., has been commercializing cryogenic ICs for several years. Seeking to steer its rapid single-flux quantum (RSFQ) logic tech into the realm of quantum computing, the company recently spun out a startup called Seeqc.
Seeqc is now designing an entire system using the technology: a digital-control, error-correction, and readout chip designed to work at 3 to 4 K and a separate chip designed to work at 20 millikelvins to interface with the quantum processor.
Quantum computing is already strange, but it might take some even stranger tech to make it work. Scientists at Lund University, in Sweden, and at IBM Research–Zurich have designed a new device called a Weyl semimetal amplifier that they say could bring readout electronics closer to the qubits. Don’t worry if you don’t know what a Weyl semimetal is. There are things about these materials that even the scientists trying to make devices from them don’t fully understand.
What they do know is that these materials, such as tungsten diphosphide, exhibit extremely strong, temperature-dependent magnetoresistance when chilled to below about 50 K. The device they simulated has a gate electrode that produces a magnetic field inside the Weyl semimetal, causing its resistance to go from tiny to huge in a matter of picoseconds. Connecting the input from a qubit to the device could make a high-gain amplifier that dissipates a mere 40 microwatts. That could be low enough for the amplifier to live in the part of the fridge close to where the qubits themselves reside.
This article appears in the February 2020 print issue as “4 Ways to Handle More Qubits.”
Engineers at Purdue University and at Georgia Tech have constructed the first devices from a new kind of two-dimensional material that combines memory-retaining properties and semiconductor properties. The engineers used a newly discovered ferroelectric semiconductor, alpha indium selenide, in two applications: as the basis of a type of transistor that stores memory as the amount of amplification it produces; and in a two-terminal device that could act as a component in future brain-inspired computers. The latter device was unveiled last month at the IEEE International Electron Devices Meeting in San Francisco.
Ferroelectric materials become polarized in an electric field and retain that polarization even after the field has been removed. Ferroelectric RAM cells in commercial memory chips use the former ability to store data in a capacitor-like structure. Recently, researchers have been trying to coax more tricks from these ferroelectric materials by bringing them into the transistor structure itself or by building other types of devices from them.
In particular, they’ve been embedding ferroelectric materials into a transistor’s gate dielectric, the thin layer that separates the electrode responsible for turning the transistor on and off from the channel through which current flows. Researchers have also been seeking a ferroelectric equivalent of the memristors, or resistive RAM, two-terminal devices that store data as resistance. Such devices, called ferroelectric tunnel junctions, are particularly attractive because they could be made into a very dense memory configuration called a cross-bar array. Many researchers working on neuromorphic- and low-power AI chips use memristors to act as the neural synapses in their networks. But so far, ferroelectric tunnel junction memories have been a problem.
“It’s very difficult to do,” says IEEE Fellow Peide Ye, who led the research at Purdue University. Because traditional ferroelectric materials are insulators, when the device is scaled down, there’s too little current passing through, explains Ye. When researchers try to solve that problem by making the ferroelectric layer very thin, the layer loses its ferroelectric properties.
Instead, Ye’s group sought to solve the conductance problem by using a new ferroelectric material—alpha indium selenide— that acts as a semiconductor instead of an insulator. Under the influence of an electric field, the molecule undergoes a structural change that holds the polarization. Even better, the material is ferroelectric even as a single-molecule layer that is only about a nanometer thick. “This material is very unique,” says Ye.
Ye’s group made both transistors and memristor-like devices using the semiconductor. The memristor-like device, which they called a ferroelectric-semiconductor junction (FSJ), is just the semiconductor sandwiched between two conductors. This simple configuration could be formed into a dense cross-bar array and potentially shrunk down so that each device is only about 10 nanometers across, says Ye.
Proving the ability to scale the device down is the next goal for the research, along with characterizing how quickly the devices can switch, explains Ye. Further on, his team will look at applications for the FSJ in neuromorphic chips, where researchers have been trying a variety of new devices in the search for the perfect artificial neural synapse.
Artificial intelligence today is much less than it could be, according to Andrew Feldman, CEO and cofounder of AI computer startup Cerebras Systems.
The problem, as he and his fellow Cerebras founders see it, is that today’s artificial neural networks are too time-consuming and compute-intensive to train. For, say, a self-driving car to recognize all the important objects it will encounter on the road, the car’s neural network has to be shown many, many images of all those things. That process happens in a data center where computers consuming tens or sometimes hundreds of kilowatts are dedicated to what is too often a weeks-long task. Assuming the resulting network can carry out the task with the needed accuracy, the many coefficients that define the strength of connections in the network are then downloaded to the car’s computer, which performs the other half of deep learning, called inference.
Cerebras’s customers—and it already has some, despite emerging from stealth mode only this past summer—complain that training runs for big neural networks on today’s computers can take as long as six weeks. At that rate, they are able to train only maybe six neural networks in a year. “The idea is to test more ideas,” says Feldman. “If you can [train a network] instead in 2 or 3 hours, you can run thousands of ideas.”
When IEEE Spectrum visited Cerebras’s headquarters in Los Altos, Calif., those customers and some potential new ones were already pouring their training data into four CS-1 computers through orange-jacketed fiber-optic cables. These 64-centimeter-tall machines churned away, while the heat exhaust of the 20 kilowatts being consumed by each blew out into the Silicon Valley streets through a hole cut into the wall.
The CS-1 computers themselves weren’t much to look at from the outside. Indeed, about three-quarters of each chassis is taken up with the cooling system. What’s inside that last quarter is the real revolution: a hugely powerful computer made up almost entirely of a single chip. But that one chip extends over 46,255 square millimeters—more than 50 times the size of any other processor chip you can buy. With 1.2 trillion transistors, 400,000 processor cores, 18 gigabytes of SRAM, and interconnects capable of moving 100 million billion bits per second, Cerebras’s Wafer Scale Engine (WSE) defies easy comparison with other systems.
The statistics Cerebras quotes are pretty astounding. According to the company, a 10-rack TPU2 cluster—the second of what are now three generations of Google AI computers—consumes five times as much power and takes up 30 times as much space to deliver just one-third of the performance of a single computer with the WSE. Whether a single massive chip is really the answer the AI community has been waiting for should start to become clear this year. “The [neural-network] models are becoming more complex,” says Mike Demler, a senior analyst with the Linley Group, in Mountain View, Calif. “Being able to quickly train or retrain is really important.”
Customers such as supercomputing giant Argonne National Laboratory, near Chicago, already have the machines on their premises, and if Cerebras’s conjecture is true, the number of neural networks doing amazing things will explode.
When the founders of Cerebras—veterans of Sea Micro, a server business acquired by AMD—began meeting in 2015, they wanted to build a computer that perfectly fit the nature of modern AI workloads, explains Feldman. Those workloads are defined by a few things: They need to move a lot of data quickly, they need memory that is close to the processing core, and those cores don’t need to work on data that other cores are crunching.
This suggested a few things immediately to the company’s veteran computer architects, including Gary Lauterbach, its chief technical officer. First, they could use thousands and thousands of small cores designed to do the relevant neural-network computations, as opposed to fewer more general-purpose cores. Second, those cores should be linked together with an interconnect scheme that moves data quickly and at low energy. And finally, all the needed data should be on the processor chip, not in separate memory chips.
The need to move data to and from these cores was, in large part, what led to the WSE’s uniqueness. The fastest, lowest-energy way to move data between two cores is to have them on the same silicon substrate. The moment data has to travel from one chip to another, there’s a huge cost in speed and power because distances are longer and the “wires” that carry the signals must be wider and less densely packed.
The drive to keep all communications on silicon, coupled with the desire for small cores and local memory, all pointed to making as big a chip as possible, maybe one as big as a whole silicon wafer. “It wasn’t obvious we could do that, that’s for sure,” says Feldman. But “it was fairly obvious that there were big benefits.”
For decades, engineers had assumed that a wafer-scale chip was a dead end. After all, no less a luminary than the late Gene Amdahl, chief architect of the IBM System/360 mainframe, had tried and failed spectacularly at it with a company called Trilogy Systems. But Lauterbach and Feldman say that any comparison with Amdahl’s attempt is laughably out-of-date. The wafers Amdahl was working with were one-tenth the size of today’s, and features that made up devices on those wafers were 30 times the size of today’s.
More important, Trilogy had no way of handling the inevitable errors that arise in chip manufacturing. Everything else being equal, the likelihood of there being a defect increases as the chip gets larger. If your chip is nearly the size of a sheet of letter-size paper, then you’re pretty much asking for it to have defects.
But Lauterbach saw an architectural solution: Because the workload they were targeting favors having thousands of small, identical cores, it was possible to fit in enough redundant cores to account for the defect-induced failure of even 1 percent of them and still have a very powerful, very large chip.
Of course, Cerebras still had to solve a host of manufacturing issues to build its defect-tolerant giganto chip. For example, photolithography tools are designed to cast their feature-defining patterns onto relatively small rectangles, and to do that over and over. That limitation alone would keep a lot of systems from being built on a single wafer, because of the cost and difficulty of casting different patterns in different places on the wafer.
But the WSE doesn’t require that. It resembles a typical wafer full of the exact same chips, just as you’d ordinarily manufacture. The big challenge was finding a way to link those pseudochips together. Chipmakers leave narrow edges of blank silicon called scribe lines around each chip. The wafer is typically diced up along those lines. Cerebras worked with Taiwan Semiconductor Manufacturing Co. (TSMC) to develop a way to build interconnects across the scribe lines so that the cores in each pseudochip could communicate.
With all communications and memory now on a single slice of silicon, data could zip around unimpeded, producing a core-to-core bandwidth of 1,000 petabits per second and an SRAM-to-core bandwidth of 9 petabytes per second. “It’s not just a little more,” says Feldman. “It’s four orders of magnitude greater bandwidth, because we stay on silicon.”
Scribe-line-crossing interconnects weren’t the only invention needed. Chip-manufacturing hardware had to be modified. Even the software for electronic design automation had to be customized for working on such a big chip. “Every rule and every tool and every manufacturing device was designed to pick up a normal-sized chocolate chip cookie, and [we] delivered something the size of the whole cookie sheet,” says Feldman. “Every single step of the way, we have to invent.”
Wafer-scale integration “has been dismissed for the last 40 years, but of course, it was going to happen sometime,” he says. Now that Cerebras has done it, the door may be open to others. “We think others will seek to partner with us to solve problems outside of AI.”
Indeed, engineers at the University of Illinois and the University of California, Los Angeles, see Cerebras’s chip as a boost to their own wafer-scale computing efforts using a technology called silicon-interconnect fabric [see “Goodbye, Motherboard. Hello, Silicon-Interconnect Fabric,” IEEE Spectrum, October 2019]. “This is a huge validation of the research we’ve been doing,” says the University of Illinois’s Rakesh Kumar. “We like the fact that there is commercial interest in something like this.”
The CS-1 is more than just the WSE chip, of course, but it’s not much more. That’s both by design and necessity. What passes for the motherboard is a power-delivery system that sits above the chip and a water-cooled cold plate below it. Surprisingly enough, it was the power-delivery system that was the biggest challenge in the computer’s development.
The WSE’s 1.2 trillion transistors are designed to operate at about 0.8 volts, pretty standard for a processor. There are so many of them, though, that in all they need 20,000 amperes of current. “Getting 20,000 amps into the wafer without significant voltage drop is quite an engineering challenge—much harder than cooling it or addressing the yield problems,” says Lauterbach.
Power can’t be delivered from the edge of the WSE, because the resistance in the interconnects would drop the voltage to zero long before it reached the middle of the chip. The answer was to deliver it vertically from above. Cerebras designed a fiberglass circuit board holding hundreds of special-purpose chips for power control. One million copper posts bridge the millimeter or so from the fiberglass board to points on the WSE.
Delivering power in this way might seem straightforward, but it isn’t. In operation, the chip, the circuit board, and the cold plate all warm up to the same temperature, but they expand when doing so by different amounts. Copper expands the most, silicon the least, and the fiberglass somewhere in between. Mismatches like this are a headache in normal-size chips because the change can be enough to shear away their connection to a printed circuit board or produce enough stress to break the chip. For a chip the size of the WSE, even a small percentage change in size translates to millimeters.
“The challenge of [coefficient of thermal expansion] mismatch with the motherboard was a brutal problem,” says Lauterbach. Cerebras searched for a material with the right intermediate coefficient of thermal expansion, something between those of silicon and fiberglass. Only that would keep the million power-delivery posts connected. But in the end, the engineers had to invent one themselves, an endeavor that took a year and a half to accomplish.
In 2018, Google, Baidu, and some top academic groups began working on benchmarks that would allow apples-to-apples comparisons among systems. The result, MLPerf, released training benchmarks in May 2018.
According to those benchmarks, the technology for training neural networks has made some huge strides in the last few years. On the ResNet-50 image-classification problem, the Nvidia DGX SuperPOD—essentially a 1,500-GPU supercomputer—finished in 80 seconds. It took 8 hours on Nvidia’s DGX-1 machine (circa 2017) and 25 days using the company’s K80 from 2015.
Cerebras hasn’t released MLPerf results or any other independently verifiable apples-to-apples comparisons. Instead the company prefers to let customers try out the CS-1 using their own neural networks and data.
This approach is not unusual, according to analysts. “Everybody runs their own models that they developed for their own business,” says Karl Freund, an AI analyst at Moor Insights. “That’s the only thing that matters to buyers.”
Early customer Argonne National Labs, for one, has some pretty intense needs. In training a neural network to recognize, in real time, different types of gravitational-wave events, scientists recently used one-quarter of the resources of Argonne’s megawatt-consuming Theta supercomputer, the 28th most powerful system in the world.
Cutting power consumption down to mere kilowatts seems like a key benefit in supercomputing. Unfortunately, Lauterbach doubts that this feature will be much of a selling point in data centers. “While a lot of data centers talk about [conserving] power, when it comes down to it…they don’t care,” he says. “They want performance.” And that’s something a processor nearly the size of a dinner plate can certainly provide.
This article appears in the January 2020 print issue as “Huge Chip Smashes Deep Learning’s Speed Barrier.”
People often think that Moore’s Law is all about making smaller and smaller transistors. But these days, a lot of the difficulty is squeezing in the tangle of interconnects needed to get signals and power to them. Those smaller, more dense interconnects are more resistive, leading to a potential waste of power. At the IEEE International Electron Devices Meeting in December, Arm engineers presented a processor design that demonstrates a way to reduce the density of interconnects and deliver power to chips with less waste.
Compared with the company’s 7-nanometer process, used to make iPhone X processors among other high-end systems, N5 leads to devices that are 15 percent faster and 30 percent more power efficient. It produces logic that is 1.84 times as small as the previous process and produces SRAM cells that are only 0.021 square micrometers, the most compact ever reported, Yeap said.
The process is currently in what’s called risk production—initial customers are taking a risk that it will work for their designs. Yeap reported that initial average SRAM yield was about 80 percent and that yield improvement has been faster for N5 than any other recent process introduction.
Some of that yield improvement is likely due to the use of extreme ultraviolet lithography (EUV). N5 is the first TSMC process designed around EUV. The previous generation was developed first using the established 193-nanometer immersion lithography first, and then when EUV was introduced, some of the most difficult to produce chip features were made with the new technology. Because it uses a 13.5-nanometer light instead of 193-nanometers, EUV can define chip features in one step—compared with three or more steps using 193-nanometer light. With more than 10 EUV layers, N5 is the first new process “in quite a long time” that uses fewer photolithography masks than its predecessor, Yeap said.
Part of the performance enhancement comes from the inclusion, for the first time in TSMC’s process, of a “high-mobility channel”. Charge carrier mobility is the speed with which current moves through the transistor, and therefore limits how quickly the device can switch. Asked (several times) about the makeup of the high-mobility channel, Yeap declined to offer details. “Those who know, know,” he said, prompting laughter from the audience. TSMC and others have explored germanium-based channels in the past. And earlier in the day, Intel showed a 3D process with silicon NMOS on the bottom and a layer of germanium PMOS above it.
Yeap would not even be tied down on which type of transistor, NMOS or PMOS or both, had the enhanced channel. However, the latter is probably not very mysterious. Holes generally travel more slowly through silicon devices than electrons and therefore the PMOS devices would benefit from enhanced mobility. When pressed Yeap confirmed that only one variety of device had the high-mobility channel.
At the IEEE International Electron Devices Meeting in San Francisco this week, Intel is unveiling a cryogenic chip designed to accelerate the development of the quantum computers they are building with Delft University’s QuTech research group. The chip, called Horse Ridge for one of the coldest spots in Oregon, uses specially-designed transistors to provide microwave control signals to Intel’s quantum computing chips.
The quantum computer chips in development at IBM, Google, Intel, and other firms today operate at fractions of a degree above absolute zero and must be kept inside a dilution refrigerator. However, as companies have managed to increase the number of quantum bits (qubits) in the chips, and therefore the chips’ capacity to compute, they’ve begun to run into a problem. Each qubit needs its own set of wires leading to control and readout systems outside of the cryogenic container. It’s already getting crowded and as quantum computers continue to scale—Intel’s is up to 49 qubits now—there soon won’t be enough room for the wires.
At Supercomputing 2019 in Denver, Colo., Cerebras Systems unveiled the computer powered by the world’s biggest chip. Cerebras says the computer, the CS-1, has the equivalent machine learning capabilities of hundreds of racks worth of GPU-based computers consuming hundreds of kilowatts, but it takes up only one-third of a standard rack and consumes about 17 kW. Argonne National Labs, future home of what’s expected to be the United States’ first exascale supercomputer, says it has already deployed a CS-1. Argonne is one of two announced U.S. National Laboratories customers for Cerebras, the other being Lawrence Livermore National Laboratory.
The U.S. is investing in upgrades to the fabrication facility that makes the radiation-hardened chips for its nuclear arsenal. It is also spending up to US $170-million to enhance the capabilities of SkyWater Technology Foundry, in Bloomington, Minn., in part to improve the company’s radiation-hardened-chip line for other Defense Department needs.
The cookie settings on this website are set to "allow cookies" to give you the best browsing experience possible. If you continue to use this website without changing your cookie settings or you click "Accept" below then you are consenting to this.