Last summer, Darpa asked hackers to take their best shots at a set of newly designed hardware architectures. After 13,000 hours of hacking by 580 cybersecurity researchers, the results are finally in: just 10 vulnerabilities. Darpa is calling it a win, not because the new hardware fought off every attack, but because it “proved the value of the secure hardware architectures developed under its System Security Integration Through Hardware and Firmware (SSITH) program while pinpointing critical areas to further harden defenses,” says the agency.
Researchers in SSITH, which is part of Darpa’s multibillion dollar Electronics Resurgence Initiative, are now in the third and final phase of developing security architectures and tools that guard systems against common classes of hardware vulnerabilities that can be exploited by malware. [See “How the Spectre and Meltdown Hacks Really Worked.”] The idea is to find a way past the long-standing security model of “patch and pray”, where vulnerabilities are found and software is updated.
In an essay introducing the bug bounty, Keith Rebello, the project’s leader, wrote that patching and praying is a particularly ineffective strategy for IoT hardware, because of the cost and inconsistency of updating and qualifying a hugely diverse set of systems. [See “DARPA: Hack Our Hardware”]
“Knowing that virtually no system is unhackable, we expected to discover bugs within the processors. But FETT really showed us that the SSITH technologies are quite effective at protecting against classes of common software-based hardware exploits,” said Rebello, in a press release. “The majority of the bug reports did not come from exploitation of the vulnerable software applications that we provided to the researchers, but rather from our challenge to the researchers to develop any application with a vulnerability that could be exploited in contradiction with the SSITH processors’ security claims. We’re clearly developing hardware defenses that are raising the bar for attackers.”
Of the 10 vulnerabilities discovered, four were fixed during the bug bounty, which ran from July to October 2020. Seven of those 10 were deemed critical, according to the Common Vulnerability Scoring System 3.0 standards. Most of those resulted from weaknesses introduced by interactions between the hardware, firmware, and the operating system software. For example, one hacker managed to steal the Linux password authentication manager from a protected enclave by hacking the firmware that monitors security, Rebello explains.
In the program’s third and final phase, research teams will work on boosting the performance of their technologies and then fabricating a silicon system-on-chip that implements the security enhancements. They will also take the security tech, which was developed for the open-source RISC-V instruction set architecture, and adapt it to processors with the much more common Arm and x86 instruction set architectures. How long that last part will take depends on the approach the research team took, says Rebelllo. However, he notes that three teams have already ported their architectures to Arm processors in a fraction of the time it took to develop the initial RISC-V version.
One of the biggest problems in computing today is the “memory wall”—the difference between processing time and the time it takes to shuttle data over to the processor from separate DRAM memory chips. The increasingly popularity of AI applications has only made that problem more pronounced, because the huge networks that find faces, understand speech, and recommend consumer goods rarely fit in a processor’s on-board memory.
In December at IEEE International Electron Device Meeting (IEDM), separate research groups in the United States and in Belgium think a new kind of DRAM might be the solution. The new DRAM, made from oxide semiconductors and built in the layers above the processor, holds bits hundreds or thousands of times longer than commercial DRAM and could provide huge area and energy savings when running large neural nets, they say.
Fiber optic links are already the main method of slinging data between clusters of computers in data centers, and engineers want to bring their blazing bandwidth to the processor. That step comes at a cost that researchers at the University of Toronto and Arm think they can greatly reduce.
Silicon photonics components are huge in comparison to their electronic counterparts. That’s a function of optical wavelengths being so much larger than today’s transistors and the copper interconnects that tie them together into circuits. Silicon photonic components are also surprisingly sensitive to changes in temperature, so much so that photonics chips must include heating elements that take up about half of their area and energy consumption, as Charles Lin, one of the team at University of Toronto, explained last month at the IEEE International Electron Device Meeting.
At the virtual conference Lin, a researcher in the laboratory of Amr S. Helmy, described new silicon transceiver components that dodge both of these problems by relying on plasmonics instead of photonics. The results so far point to transceivers capable of at least double the bandwidth while consuming only one third the energy and taking up a mere 20 percent of the area. What’s more, they could be built right atop the processor, instead of on separate chiplets as is done with silicon photonics.
When light strikes the interface between a metal and insulator at a shallow angle, it forms plasmons: waves of electron density that propagate along the metal surface. Conveniently, plasmons can travel down a waveguide that is much narrower than the light that forms it, but they typically peter-out very quickly because the metal absorbs light.
The Toronto researchers invented a structure to take advantage of plasmonics’ smaller size while greatly reducing the loss. Called the coupled hybrid plasmonic waveguide (CPHW), it is essentially a stack made up of silicon, the conductor indium tin oxide, silicon dioxide, aluminum, and more silicon. That combination forms two types of semiconductor junctions—a Schottky diode and a metal-oxide-semiconductor—with the aluminum that contains the plasmon in common between the two. Within the metal, the plasmon in the top junction interferes with the plasmon in the bottom junction in such a way that loss is reduced by almost two orders of magnitude, Lin said.
Using the CPHW as a base, the Toronto group built two key photonics components—a modulator, which turns electronic bits into photonic bits, and a photodetector, which does the reverse. (As is done in silicon photonics, a separate laser provides the light; the modulator blocks the light or lets it pass to represent bits.) The modulator took up just 2 square micrometers and could switch at as fast as 26 gigahertz, the limit of the Toronto team’s equipment. Based on the device’s measured capacitance, the real limit could be as high as 636 GHz. The plasmonic photodetector was near match to silicon photonics’ sensitivity, but it was only 1/36th the size.
One of the CPHW’s biggest advantages is its lack of sensitivity to temperature. Silicon photonics components have a temperature tolerance that can’t swing farther than one degree in order for them to operate at the proper wavelength. Temperature sensitivity is a “big challenge for silicon photonics,” explains Saurabh Sinha, a principal research engineer at Arm. Managing that tolerance requires both extra circuitry and the consumption of energy. In a simulated 16-channel silicon photonics transceiver heating circuits consume half of the circuit’s energy and take up nearly that fraction of their total area, and that translates to huge difference in area: 0.37 mm2 for silicon photonics versus 0.07 mm2 for plasmonic transceivers.
Simulations of the CPHW-based plasmonics transceiver predict a number of benefits over silicon photonics. The CPHW system consumed less than one-third of the energy per bit transmitted of a competing silicon photonics system—0.49 picojoules per bit versus 1.52 pJ/b. It could comfortably transmit more than three times more bits per second at acceptable Ethernet error rates without relying on error correction—150 gigabits per second versus 39 Gb/s.
Sinha says Arm and the Toronto group are discussing next steps, and those might include exploring other potential benefits of these transceivers such as the fact that CPHWs could be constructed atop processor chips, while silicon photonics devices must be made separately from the processor and then linked to them inside the processor package using chiplet technology.
What does an ideal neural network chip look like? The most important part is to have oodles of memory on the chip itself, say engineers. That’s because data transfer (from main memory to the processor chip) generally uses the most energy and produces most of the system lag—even compared to the AI computation itself.
Cerebras Systems solved these problems, collectively called the memory wall, by making a computer consisting almost entirely of a single, large chip containing 18 gigabytes of memory. But researchers in France, Silicon Valley, and Singapore have come up with another way.
Called Illusion, it uses processors built with resistive RAM memory in a 3D stack built above the silicon logic, so it costs little energy or time to fetch data. By itself, even this isn’t enough, because neural networks are increasingly too large to fit in one chip. So the scheme also requires multiple such hybrid processors and an algorithm that both intelligently slices up the network among the processors and also knows when to rapidly turn processors off when they’re idle.
In tests, an eight-chip version of Illusion came within 3-4 percent of the energy use and latency of a “dream” processor that had all the needed memory and processing on one chip.
Late last week, the U.S. Congress passed the annual policy law that guides the U.S. Defense Department. Tucked away inside the National Defense Authorization Act of 2021 (NDAA) are provisions that supporters hope will lead to a resurgence in chip manufacturing in the United States. The provisions include billions of dollars of financial incentives for construction or modernization of facilities “relating to the fabrication, assembly, testing, advanced packaging, or advanced research and development of semiconductors.”
The microelectronics incentives in the law originate out of U.S. officials’ concerns about China’s rapidly growing share of the global chipmaking industry and from the United States’ shrinking stake. The legislation frames that as an issue of U.S. national security.
Although China does not hold a technological lead in chipmaking, its geographic proximity to those who do worries some in the United States. Today, foundries using the most advanced manufacturing processes (currently the 5-nanometer node) are operated by Samsung in South Korea and by Taiwan Semiconductor Manufacturing Company (TSMC) in Taiwan and nowhere else.
Both companies provide foundry services, manufacturing chips for U.S.-based tech giants like Nvidia, AMD, Google, Facebook, and Qualcomm. For years, Intel was their match and more in terms of manufacturing technology, but the company has struggled to move to new processes.
But the Semiconductor Industry Association, the U.S. trade group, says government incentives will accelerate construction. The SIA calculates that a $20-billion incentive program over 10 years would yield 14 new fabs and attract $174 billion in investment versus 9 fabs and $69 billion without the federal incentives. A $50-billion program would yield 19 fabs and attract $279 billion.
The NDAA specifies a cap of $3-billion per project unless Congress and the President agree to more, but how much money actually gets spent in total on microelectronics capacity will depend on separate “appropriations” bills.
“The next step is for leaders in Washington to fully fund the NDAA’s domestic chip manufacturing incentives and research initiatives,” said Bob Bruggeworth, chair of SIA and president, CEO, and director of RF-chipmaker Qorvo, in a press release.
Getting the NDAA’s microelectronics and other technology provisions funded will be one of IEEE USA’s top priorities in 2021, says the organization’s director of government relations, Russell T. Harrison.
Beyond financial incentives, the NDAA also authorizes microelectronics-related R&D, development of a “provably secure” microelectronics supply chain, the creation of a National Semiconductor Research Technology Center to help move new technology into industrial facilities, and establishment of committees to create strategies toward adding capacity at the cutting edge. It also authorizes quantum computing and artificial intelligence initiatives.
The NDAA “has a lot of provisions in it that are very good for IEEE members,” says Harrison.
The semiconductor strategy and investment portion of the law began as separate bills in the House of Representatives and the Senate. In the Senate, it was called the American Foundries Act of 2020, and was introduced in July. The act called for $15 billion for state-of-the-art construction or modernization and $5 billion in R&D spending, including $2 billion for the Defense Advanced Projects Agency’s Electronics Resurgence Initiative. In the House, the bill was called the CHIPS for America Act. It was introduced in June and offered similar levels of R&D.
Some in industry objected to early conceptions of the legislation, believing them to be too narrowly focused on cutting-edge silicon CMOS. Industry lobbied Congress to make the law more inclusive—potentially allowing for expansion of facilities like SkyWater Technology’s 200-mm fab in Bloomington, Minn.
The language in later versions of the bill signals that the government “still wants to pursue advanced nodes but that they understand that we have an existing manufacturing capability in the U.S. that needs support and can still play a big role in making us competitive,” says John Cooney, director of strategic government relations at Skywater.
Realizing that little legislating was likely to happen in an election year, supporters chose to try to fold the microelectronics language into the NDAA, which is considered a must-pass bill and had a 59-year bipartisan streak going into December. President Trump vetoed the NDAA last month, but Congress quickly overrode the veto at the start of January.
“What we’ve seen increasingly over the last nine months, is that there is a bicameral, bipartisan consensus building in Congress that the United States needs to do more to promote technology and technology research [domestically],” says Harrison.
“All of this is a huge step in the right direction, and we’re really excited about it,” says Skywater’s Cooney. “But it is just the first step to be competitive.”
The U.S. move is just one among a series of maneuvers taking place globally as countries and regions seek to build up or regain chipmaking capabilities. China has been on an investment streak through its Made in China 2025 plan. In December, Belgium, France, Germany, and 15 other European Union nations agreed to jointly bolster Europe’s semiconductor industry, including moving toward 2-nanometer node production. The money for this would come from the 145-billion-euro portion of the EU’s pandemic recovery fund set aside for “digital transition.”
Cranes are a familiar fixture of practically any city skyline, but one in the Swiss City of Ticino, near the Italian border, would stand out anywhere: It has six arms. This 110-meter-high starfish of the skyline isn’t intended for construction. It’s meant to prove that renewable energy can be stored by hefting heavy loads and dispatched by releasing them.
Energy Vault, the Swiss company that built the structure, has already begun a test program that will lead to its first commercial deployments in 2021. At least one competitor, Gravitricity, in Scotland, is nearing the same point. And there are at least two companies with similar ideas, New Energy Let’s Go and Gravity Power, that are searching for the funding to push forward.
To be sure, nearly all the world’s currently operational energy-storage facilities, which can generate a total of 174 gigawatts, rely on gravity. Pumped hydro storage, where water is pumped to a higher elevation and then run back through a turbine to generate electricity, has long dominated the energy-storage landscape. But pumped hydro requires some very specific geography—two big reservoirs of water at elevations with a vertical separation that’s large, but not too large. So building new sites is difficult.
Energy Vault, Gravity Power, and their competitors seek to use the same basic principle—lifting a mass and letting it drop—while making an energy-storage facility that can fit almost anywhere. At the same time they hope to best batteries—the new darling of renewable-energy storage—by offering lower long-term costs and fewer environmental issues.
In action, Energy Vault’s towers are constantly stacking and unstacking 35-metric-ton bricks arrayed in concentric rings. Bricks in an inner ring, for example, might be stacked up to store 35 megawatt-hours of energy. Then the system’s six arms would systematically disassemble it, lowering the bricks to build an outer ring and discharging energy in the process.
This joule-storing Jenga game can be complicated. To maintain a constant output, one block needs to be accelerating while another is decelerating. “That’s why we use six arms,” explains Robert Piconi, the company’s CEO and cofounder.
What’s more, the control system has to compensate for gusts of wind, the deflection of the crane as it picks up and sets down bricks, the elongation of the cable, pendulum effects, and more, he says.
Piconi sees several advantages over batteries. Advantage No. 1 is environmental. Instead of chemically reactive and difficult-to-recycle lithium-ion batteries, Energy Vault’s main expenditure is the bricks themselves, which can be made on-site using available dirt and waste material mixed with a new polymer from the Mexico-based cement giant Cemex.
Another advantage, according to Piconi, is the lower operating expense, which the company calculates to be about half that of a battery installation with equivalent storage capacity. Battery-storage facilities must continually replace cells as they degrade. But that’s not the case for Energy Vault’s infrastructure.
The startup is confident enough in its numbers to claim that 2021 will see the start of multiple commercial installations. Energy Vault raised US $110 million in 2019 to build the demonstration unit in Ticino and prepare for a “multicontinent build-out,” says Piconi.
Compared with Energy Vault’s effort, Gravitricity’s energy-storage scheme seems simple. Instead of a six-armed crane shuttling blocks, Gravitricity plans to pull one or just a few much heavier weights up and down abandoned, kilometer-deep mine shafts.
These great masses, each one between 500 and 5,000 metric tons, need only move at mere centimeters per second to produce megawatt-level outputs. Using a single weight lends itself to applications that need high power quickly and for a short duration, such as dealing with second-by-second fluctuations in the grid and maintaining grid frequency, explains Chris Yendell, Gravitricity’s project development manager. Multiple-weight systems would be more suited to storing more energy and generating for longer periods, he says.
Proving the second-to-second response is a primary goal of a 250-kilowatt concept demonstrator that Gravitricity is building in Scotland. Its 50-metric-ton weight will be suspended 7 meters up on a lattice tower. Testing should start during the first quarter of 2021. “We expect to be able to achieve full generation within less than one second of receiving a signal,” says Yendell.
The company will also be developing sites for a full-scale prototype during 2021. “We are currently liaising with mine owners in Europe and in South Africa, [and we’re] certainly interested in the United States as well,” says Yendell. Such a full-scale system would then come on line in 2023.
Gravity Power and its competitor New Energy Let’s Go, which acquired its technology from the now bankrupt Heindl Energy, are also looking underground for energy storage, but they are more closely inspired by pumped hydro. Instead of storing energy using reservoirs at different elevations, they pump water underground to lift an extremely heavy piston. Allowing the piston to fall pushes water through a turbine to generate electricity.
“Reservoirs are the Achilles’ heel of pumped hydro,” says Jim Fiske, the company’s founder. “The whole purpose of a Gravity Power plant is to remove the need for reservoirs. [Our plants] allow us to put pumped-hydro-scale power and storage capacity in 3 to 5 acres [1 to 2 hectares] of flat land.”
Fiske estimates that a 400-megawatt plant with 16 hours of storage (or 6.4 gigawatt-hours of energy) would have a piston that’s more than 8 million metric tons. That might sound ludicrous, but it’s well within the lifting abilities of today’s pumps and the constraints of construction processes, he says.
While these companies expect such underground storage sites to be more economical than battery installations, they will still be expensive. But nations concerned about the changing climate may be willing to pay for storage options like these when they recognize the gravity of the crisis.
This article appears in the January 2021 print issue as “The Ups and Downs of Gravity Energy Storage.”
The logic circuits behind just about every digital device today rely on a pairing of two types of transistors—NMOS and PMOS. The same voltage signal that turns one of them on turns the other off. Putting them together means that electricity should flow only when a bit changes, greatly cutting down on power consumption. These pairs have sat beside each other for decades, but if circuits are to continue shrinking they’re going to have to get closer still. This week, at the IEEE International Electron Devices Meeting (IEDM), Intel showed a different way: stacking the pairs so that one is atop the other. The scheme effectively cut the footprint of a simple CMOS circuit in half, meaning a potential doubling of transistor density on future ICs.
Researchers at Hewlett Packard Labs, where the first practical memristor was created, have invented a new variation on the device—a memristor laser. It’s a laser that can have its wavelength electronically shifted and, uniquely, hold that adjustment even if the power is turned off. At the IEEE International Electron Device Meeting the researchers suggest that, in addition to simplifying photonic transceivers for data transmission between processors, the new devices could form the components of superefficient brain-inspired photonic circuits.
Carbon nanotube devices are getting closer to silicon’s abilities thanks to a series of developments, the latest of which was revealed today at the IEEE Electron Devices Meeting (IEDM). Engineers from Taiwan Semiconductor Manufacturing Company (TSMC), University of California San Diego, and Stanford University explained a new fabrication process that leads to better control of carbon nanotube transistors. Such control is crucial to ensuring that transistors, which act as switches in logic circuits, turn fully off when they are meant to. Interest in carbon nanotube transistors has accelerated recently, because they can potentially be shrunk down further than silicon transistors can and offer a way to produce stacked layers of circuitry much more easily than can be done in silicon.
The team invented a process for producing a better gate dielectric. That’s the layer of insulation between the gate electrode and the transistor channel region. In operation, voltage at the gate sets up an electric field in the channel region that cuts off the flow of current. As silicon transistors were scaled down over the decades, however, that layer of insulation, which was made of silicon dioxide, had to become thinner and thinner in order to control the current using less voltage, reducing energy consumption. Eventually, the insulation barrier was so thin that charge could actually tunnel through it, leaking current and wasting energy.
Cerebras Systems, which makes a specialized AI computer based on the largest chip ever made, is breaking out of its original role as a neural-network training powerhouse and turning its talents toward more traditional scientific computing. In a simulation having 500 million variables, the CS-1 trounced the 69th-most powerful supercomputer in the world.
It also solved the problem—combustion in a coal-fired power plant—faster than the real-world flame it simulates. To top it off, Cerebras and its partners at the U.S. National Energy Technology Center claim, the CS-1 performed the feat faster than any present-day CPU or GPU-based supercomputer could.
The research, which was presented this week at the supercomputing conference SC20, shows that Cerebras’ AI architecture “is not a one trick pony,” says Cerebras CEO Andrew Feldman.
Weather forecasting, design of airplane wings, predicting temperatures in a nuclear power plant, and many other complex problems are solved by simulating “the movement of fluids in space over time,” he says. The simulation divides the world up into a set of cubes, models the movement of fluid in those cubes, and determines the interactions between the cubes. There can be 1 million or more of these cubes and it can take 500,000 variables to describe what’s happening.
According to Feldman, solving that takes a computer system with lots of processor cores, tons of memory very close to the cores, oodles of bandwidth connecting the cores and the memory, and loads of bandwidth connecting the cores. Conveniently, that’s what a neural-network training computer needs, too. The CS-1 contains a single piece of silicon with 400,000 cores, 18 gigabytes of memory, 9 petabytes of memory bandwidth, and 100 petabits per second of core-to-core bandwidth.
Scientists at NETL simulated combustion in a powerplant using both a Cerebras CS-1 and the Joule supercomputer, which has 84,000 CPU cores and consumes 450 kilowatts. By comparison, Cerebras runs on about 20 kilowatts. Joule completed the calculation in 2.1 milliseconds. The CS-1 was more than 200-times faster, finishing in 6 microseconds.
This speed has two implications, according to Feldman. One is that there is no combination of CPUs or even of GPUs today that could beat the CS-1 on this problem. He backs this up by pointing to the nature of the simulation—it does not scale well. Just as you can have too many cooks in the kitchen, throwing too many cores at a problem can actually slow the calculation down. Joule’s speed peaked when using 16,384 of its 84,000 cores.
The limitation comes from connectivity between the cores and between cores and memory. Imagine the volume to be simulated as a 370 x 370 x 370 stack of cubes (136,900 vertical stacks with 370 layers). Cerebras maps the problem to the wafer-scale chip by assigning the array of vertical stacks to a corresponding array of processor cores. Because of that arrangement, communicating the effects of one cube on another is done by transferring data between neighboring cores, which is as fast as it gets. And while each layer of the stack is computed, the data representing the other layers reside inside the core’s memory where it can be quickly accessed.
(Cerebras takes advantage of a similar kind of geometric mapping when training neural networks. [See sidebar “The Software Side of Cerebras,” January 2020.])
And because the simulation completed faster than the real-world combustion event being simulated, the CS-1 could now have a new job on its hands—playing a role in control systems for complex machines.
Feldman reports that the SC-1 has made inroads in the purpose for which it was originally built, as well. Drugmaker GlaxoSmithKline is a known customer, and the SC-1 is doing AI work at Argonne National Laboratory and Lawrence Livermore National Lab, the Pittsburgh Supercomputing Center. He says there are several customers he cannot name in the military, intelligence, and heavy manufacturing industries.
A next generation SC-1 is in the works, he says. The first generation used TSMC’s 16-nanometer process, but Cerebras already has a 7-nanometer version in hand with more than double the memory—40 GB—and the number of AI processor cores—850,000.
Last week, Honeywell’s Quantum Solutions division released its first commercial quantum computer: a system based on trapped ions comprising 10 qubits. The H1, as it’s called, is actually the same ion trap chip the company debuted as a prototype, but with four additional ions. The company revealed a roadmap that it says will rapidly lead to much more powerful quantum computers. Separately, a competitor in ion-trap quantum computing, Maryland-based startup IonQ, unveiled a 32-qubit ion computer last month.
Ion trap quantum computers are made of chips that are designed to trap and hold ions in a line using a specially-designed RF electromagnetic field. The chip can also move specific ions along the line using an electric field. Lasers then encode the ions’ quantum states to perform calculations. Proponents say trapped-ion-based quantum computers are attractive because the qubits are thought to be longer-lasting, have much higher-fidelity, and are potentially easier to connect together than other options, allowing for more reliable computation.
For Honeywell, that means a system that is the only one capable of performing a “mid-circuit measurement” (a kind of quantum equivalent of an if/then), and then recycling the measured qubit back into the computation, says Patty Lee, Honeywell Quantum Solutions chief scientist. The distinction allows for different kinds of quantum algorithms and an ability to perform more complex calculations with fewer ions.
…. IBM’s team defines quantum volume as 2 to the power of the size of the largest circuit with equal width and depth that can pass a certain reliability test involving random two-qubit gates. The circuit’s size is defined by either width based on the number of qubits or depth based on the number of gates, given that width and depth are equal in this case.
That means a 6-qubit quantum computing system would have a quantum volume of 2 to the power of 6, or 64—but only if the qubits are relatively free of noise and the potential errors that can accompany such noise.
Honeywell says its 10-qubit system has a measured quantum volume of 128, the highest in the industry. IonQ’s earlier 11-qubit prototype had a measured quantum volume of 32. It’s 32-ion system could theoretically reach higher than 4 million, the company claims, but this hasn’t been proven yet.
With the launch of its commercial system, Honeywell unveiled that it will use a subscription model for access to its computers. Customers would pay for time and engagement with the systems, even as the systems scale up throughout the year. “Imagine if you have Netflix, and next week it’s twice as good, and 3 months from now it’s 1000 times as good,” says Tony Uttley, president of Honeywell Quantum Solutions. “That’d be a pretty cool subscription. And that’s the approach we’ve taken with this.”
Honeywell’s path forward involves first adding more ions to the H1, which has capacity for 40. “We built a large auditorium,” Uttley says. “Now we’re just filling seats.”
The next step is to change the ion trap chip’s single line to a racetrack configuration. This system, called H2, is already in testing; it allows faster ion computation interactions, because ions at the ends of the line can be moved around to interact with each other. A further scale-up, H3, will come with a chip that has a grid of traps instead of a single line. For this, ions will have to be steered around corners, something Uttley says the company can already do.
For H4, the grid will be integrated with on-chip photonics. Today the laser beams that encode quantum states onto the ions are sent in from outside the vacuum chamber that houses the trap, and that configuration limits the number of points on the chip where computation can happen. An integrated photonics system, which has been designed and tested, would increase the available computation points. In a final step, targeted for 2030, tiles of H4 chips will be stitched together to form a massive integrated system.
For its part, IonQ CEO Peter Chapman told Ars Technica that the company plans to double the number of qubits in its systems every eight months for the next few years. Instead of physically moving ions to get them to interact, IonQ’s system uses carefully crafted pairs of laser pulses on a stationary line of ions.
Despite the progress so far, these systems can’t yet do anything that can’t be done already on a classical computer system. So why are customers buying in now? “With this roadmap we’re showing that we are going to very quickly cross a boundary where there is no way you can fact check” a result, says Uttley. Companies need to see that their quantum algorithms work on these systems now, so that when they reach a capability beyond today’s supercomputers, they can still trust the result, he says.
MLPerf, a consortium of AI experts and computing companies, has released a new set of machine learning records. The records were set on a series of benchmarks that measure the speed of inferencing: how quickly an already-trained neural network can accomplish its task with new data. For the first time, benchmarks for mobiles and tablets were contested. According to David Kanter, executive director of MLPerf’s parent organization, a downloadable app is in the works that will allow anyone to test the AI capabilities of their own smartphone or tablet.
MLPerf’s goal is to present a fair and straightforward way to compare AI systems. Twenty-three organizations—including Dell, Intel, and Nvidia—submitted a total of 1200 results, which were peer reviewed and subjected to random third-party audits. (Google was conspicuously absent this round.) As with the MLPerf records for training AIs released over the summer, Nvidia was the dominant force, besting what competition there was in all six categories for both datacenter and edge computing systems. Including submissions by partners like Cisco and Fujitsu, 1029 results, or 85 percent of the total for edge and data center categories, used Nvidia chips, according to the company.
“Nvidia outperforms by a wide range on every test,” says Paresh Kharaya, senior director of product management, accelerated computing at Nvidia. Nvidia’s A100 GPUs powered its wins in the datacenter categories, while its Xavier was behind the GPU-maker’s edge-computing victories. According to Kharaya, on one of the new MLPerf benchmarks, Deep Learning Recommendation Model (DLRM), a single DGX A100 system was the equivalent of 1000 CPU-based servers.
There were four new inferencing benchmarks introduced this year, adding to the two carried over from the previous round:
BERT, for Bi-directional Encoder Representation from Transformers, is a natural language processing AI contributed by Google. Given a question input, BERT predicts a suitable answer.
DLRM, for Deep Learning Recommendation Model is a recommender system that is trained to optimize click-through rates. It’s used to recommend items for online shopping and rank search results and social media content. Facebook was the major contributor of the DLRM code.
3D U-Net is used in medical imaging systems to tell which 3D voxel in an MRI scan are parts of a tumor and which are healthy tissue. It’s trained on a dataset of brain tumors.
RNN-T, for Recurrent Neural Network Transducer, is a speech recognition model. Given a sequence of speech input, it predicts the corresponding text.
In addition to those new metrics, MLPerf put together the first set of benchmarks for mobile devices, which were used to test smartphone and tablet platforms from MediaTek, Qualcomm, and Samsung as well as a notebook from Intel. The new benchmarks included:
MobileNetEdgeTPU, an image classification benchmark that is considered the most ubiquitous task in computer vision. It’s representative of how a photo app might be able pick out the faces of you or your friends.
SSD-MobileNetV2, for Single Shot multibox Detection with MobileNetv2, is trained to detect 80 different object categories in input frames with 300×300 resolution. It’s commonly used to identify and track people and objects in photography and live video.
DeepLabv3+ MobileNetV2: This is used to understand a scene for things like VR and navigation, and it plays a role in computational photography apps.
MobileBERT is a mobile-optimized variant of the larger natural language processing BERT model that is fine-tuned for question answering. Given a question input, the MobileBERT generates an answer.
The benchmarks were run on a purpose-built app that should be available to everyone within months, according to Kanter. “We want something people can put into their hands for newer phones,” he says.
The results released this week were dubbed version 0.7, as the consortium is still ramping up. Version 1.0 is likely to be complete in 2021.
Earlier this month, a group of eight Arm Research engineers established a startup, Cerfe Labs, to commercialize an experimental memory technology they had been working on for the past five years with Austin-based Symetrix. The technology, called correlated electron RAM (CeRAM), could become a nonvolatile replacement for the fast-access embedded SRAM used in processor high-level cache memory today. Besides being able to hold data in the absence of a power supply, which SRAM cannot do, CeRAM is likely to be considerably smaller than SRAM, potentially easing IC area issues as the industry’s ability to keep shrinking transistors reaches its end.
Much of semiconductor physics relies on the assumption that you can treat electrons individually. But more than half a century ago, Neville Francis Mott showed that in certain materials, when electrons are forced together, “[those materials] will do weird things,” explains says Greg Yeric, formerly a Fellow at Arm Research and now Cerfe Labs’ CTO. One of those things is a reversible transition between a metallic state and an insulating state, called a Mott transition. Laboratories aroundtheworld have been studying this phenomenon in vanadium oxide and other materials, and HP Labs recently described a neuron-like device that relies on the principle.
“What we think we have through our partnership with Symetrix is a correlated-electron switch—a material that can switch resistance states,” says Yeric. The companies are exploring a number of materials, but the one they’ve invested the most in so far is a carbon-doped nickel oxide. Its native state is non-conducting; that is, there is a gap between the allowable energy states of electrons that are bound to atoms and those that are free to move. But if enough electrons are injected into the material, they “screen out” the presence of the nickel atoms (from the perspective of other electrons). This has the effect of shifting the two energy bands so they meet, and allowing current to freely flow as if the material were a metal.
“We think we have a set of materials that exhibit this transition and, importantly, have a nonvolatile state on either side” of it, says Yeric.
The device itself is just the correlated electron material sandwiched between two electrodes, similar in structure to resistive RAM, phase change RAM, and magnetic RAM but less complex than the latter. And like those three, it is constructed in the metal interconnect layers above the silicon, requiring only one transistor in the silicon layer to access it, as opposed to SRAM’s six. Yeric says the company has made devices that fit with 7-nanometer CMOS processes and they should be scalable in both size and voltage to 5-nanometers (today’s cutting edge).
But it’s CeRAM’s speed that could make it a good replacement for SRAM. To date, they’ve made CeRAM with a 2-nanosecond pulse width for writing data, which is on par with what’s needed for a processor’s L3 cache; Yeric says they expect this speed to improve with development.
The carbon-doped nickel oxide material also has properties that are well beyond what today’s nonvolatile memory can do, but they are not as completely proven. For example, CerLab has shown that the device works at temperatures as low as 1.5 kelvins—well beyond what any nonvolatile memory can do, and in range for a role in quantum computing control circuits. In the other direction, they’ve demonstrated device operation up to 125 °C and showed that it retains its bits at up to 400 °C. But these figures were limited by the equipment the company had available. What’s more, the device’s theory of operation suggests that CeRAM should be naturally resistant to ionizing radiation and magnetic field disturbances.
Symetrix, which also develops ferroelectric RAM, explored correlated electron materials in theoretical studies for a Defense Advanced Research Projects Agency (DARPA) program called FRANC, for Foundations Required for Novel Computing. Symetrix “put together models and were able to predict the materials,” says CEO Eric Hennenhoefer, another Arm veteran.
“System designers are always on the lookout for improved memory, as virtually every system is limited in some way by the memory it can access,” says Yeric. “In [Arm’s] canvassing of possible future technologies, we came across the Symetrix technology. We eventually licensed the technology, based on its (very early and speculative) promise to advance embedded memory speed, density, cost, and power, without tradeoff.”
CerfeLab’s goal isn’t to manufacture CeRAM, but to develop the technology to the point that a large-scale manufacturer will want to take over development, he says. A new memory technology’s journey from discovery to commercialization typically takes eight or nine years at least. CeRAM is about half way there, he estimates.
Among the questions still to be answered involves the memory’s endurance—how many times it can switch before it begins to fail. In theory, there’s no element of the CeRAM device that would wear-out. But it would be naïve to think there won’t be problems in the real world. “There are always extrinsic things that wind up limiting endurance,” Yeric says.
One thing that’s kept engineers from copying the brain’s power efficiency and quirky computational skill is the lack of an electronic device that can, all on its own, act like a neuron. It would take a special kind of device to do that, one whose behavior is more complex than any yet created.
Suhas Kumar of Hewlett Packard Laboratories, R. Stanley Williams now at Texas A&M, and the late Stanford student Ziwen Wang have invented a device that meets those requirements. On its own, using a simple DC voltage as the input, the device outputs not just simple spikes, as some other devices can manage, but the whole array of neural activity—bursts of spikes, self-sustained oscillations, and other stuff that goes on in your brain. They described the device last week in Nature.
It combines resistance, capacitance, and what’s called a Mott memristor all in the same device. Memristors are devices that hold a memory, in the form of resistance, of the current that has flowed through them. Mott memristors have an added ability in that they can also reflect a temperature-driven change in resistance. Materials in a Mott transition go between insulating and conducting according to their temperature. It’s a property seen since the 1960s, but only recently explored in nanoscale devices.
The transition happens in a nanoscale sliver of niobium oxide in the memristor. Here when a DC voltage is applied, the NbO2 heats up slightly, causing it to transition from insulating to conducting. Once that switch happens, the charge built up in the capacitance pours through. Then the device cools just enough to trigger the transition back to insulating. The result is a spike of current that resembles a neuron’s action potential.
“We’ve been working for five years to get that,” Williams says. “There’s a lot going on in that one little piece of nanoscale material structure.”
According to Kumar, memristor inventor Leon Chua predicted that if you mapped out the possible device parameters there would be regions of chaotic behavior in between regions where behavior is stable. At the edge of some of these chaotic regions, devices can exist that do what the new artificial neuron does.
Williams credits Kumar with doggedly fine tuning the device’s material and physical parameters to find a combination that works. “You cannot find this by accident,” he says. “Everything has to be perfect before you see this characteristic, but once you’re able to make this thing, it’s actually very robust and reproducible.”
They tested the device first by building spiking versions of Boolean logic gates—NAND and NOR, and then by building a small analog optimization circuit.
There’s a lot of work ahead to turn these into practical devices and scale them up to useful systems that could challenge todays machines. For example, Kumar and Williams plan to explore other possible materials that experience Mott transitions at different temperatures. NbO2’s happens at a worrying 800 C. That temperature is only occurring in a nanometers thin layer, but scaled up to millions of devices, and it could be a problem.
Others have researched vanadium oxide, which transitions at a more pleasant 60 C. But that might be too low, says Williams, given that systems in data centers often operate at 100 C.
There may even be materials that can use other types of transitions to achieve the same result. “Finding the goldilocks material is a very interesting issue,” says Williams.
Testing so far has focused on 94 gigahertz frequencies, which are at the edge of terahertz. “We have just broken through records of millimeter-wave operation by factors which are just stunning,” says Umesh K. Mishra, an IEEE Fellow who heads the UCSB group that published the papers. “If you’re in the device field, if you improve things by 20 percent people are happy. Here, we have improved things by 200 to 300 percent.”
The key power amplifier technology is called a high-electron-mobility transistor (HEMT). It is formed around a junction between two materials having different bandgaps: in this case, gallium nitride and aluminum gallium nitride. At this “heterojunction,” gallium nitride’s natural polarity causes a sheet of excess charge called a two-dimensional electron gas to collect. The presence of this charge gives the device the ability to operate at high frequencies, because the electrons are free to move quickly through it without obstruction.
Gallium nitride HEMTs are already making their mark in amplifiers, and they are a contender for 5G power amplifiers. But to efficiently amplify terahertz frequencies, the typical GaN HEMT needs to scale down in a particular way. Just as with silicon logic transistors, bringing a HEMT’s gate closer to the channel through which current flows—the electron gas in this case—lets it control the flow of current using less energy, making the device more efficient. More specifically, explains Mishra, you want to maximize the ratio of the length of the gate versus the distance from the gate to the electron gas. That’s usually done by reducing the amount of barrier material between the gate’s metal and the rest of the device. But you can only go so far with that strategy. Eventually it will be too thin to prevent current from leaking through, therefore harming efficiency.
But Mishra says his group has come up with a better way: They stood the gallium nitride on its head.
Ordinary gallium nitride is what’s called gallium-polar. That is, if you look down at the surface, the top layer of the crystal will always be gallium. But the Santa Barbara team discovered a way to make nitrogen-polar crystals, so that the top layer is always nitrogen. It might seem like a small difference, but it means that the structure that makes the sheet of charge, the heterojunction, is now upside down.
This delivers a bunch of advantages. First, the source and drain electrodes now make contact with the electron gas via a lower band-gap material (a nanometers-thin layer of GaN) rather than a higher-bandgap one (aluminum gallium nitride), lowering resistance. Second, the gas itself is better confined as the device approaches its lowest current state, because the AlGaN layer beneath acts as a barrier against scattered charge.
Devices made to take advantage of these two characteristics have already yielded record-breaking results. At 94 GHz, one device produced 8.8 Watts per millimeter at 27 percent efficiency. A similar gallium-polar device produced only about 2 W/mm at that efficiency.
But the new geometry also allows for further improvements by positioning the gate even closer to the electron gas, giving it better control. For this to work, however, the gate has to act as a low-leakage Schottky diode. Unlike ordinary p-n junction diodes, which are formed by the junction of regions of semiconductor chemically doped to have different excess charges, Schottky diodes are formed by a layer of metal, insulator, and semiconductor. The Schottky diode Mishra’s team cooked up—ruthenium deposited one atomic layer at a time on top of N-polar GaN—provides a high barrier against current sneaking through it. And, unlike in other attempts at the gate diode, this one doesn’t lose current through random pathways that shouldn’t exist in theory but do in real life.
“Schottky diodes are typically extremely difficult to get on GaN without them being leaky,” says Mishra. “We showed that this material combination… gave us the nearly ideal Schottky diode characteristics.”
The UC Santa Barbara team hasn’t yet published the results of from a HEMT made with this new diode as the gate, says Mishra. But the data so far is promising. And they plan to eventually test the new devices at even higher frequencies than before—140 GHz and 230 GHz—both firmly in the terahertz range.
Quantum computing may have shown its “supremacy” over classical computing a little over a year ago, but it still has a long way to go. Intel’s director of quantum hardware, Jim Clarke, says that quantum computing will really have arrived when it can do something unique that can change our lives, calling that point “quantum practicality.” Clarke talked to IEEE Spectrum about how he intends to get silicon-based quantum computers there:
IEEE Spectrum: Intel seems to have shifted focus from quantum computers that rely on superconducting qubits to ones with silicon spin qubits. Why do you think silicon has the best chance of leading to a useful quantum computer?
Jim Clarke: It’s simple for us… Silicon spin qubits look exactly like a transistor. … The infrastructure is there from a tool fabrication perspective. We know how to make these transistors. So if you can take a technology like quantum computing and map it to such a ubiquitous technology, then the prospect for developing a quantum computer is much clearer.
I would concede that today silicon spin-qubits are not the most advanced quantum computing technology out there. There has been a lot of progress in the last year with superconducting and ion trap qubits.
But there are a few more things: A silicon spin qubit is the size of a transistor—which is to say it is roughly 1 million times smaller than a superconducting qubit. So if you take a relatively large superconducting chip, and you say “how do I get to a useful number of qubits, say 1000 or a million qubits?” all of a sudden you’re dealing with a form factor that is … intimidating.
We’re currently making server chips with billions and billions of transistors on them. So if our spin qubit is about the size of a transistor, from a form-factor and energy perspective, we would expect it to scale much better.
Spectrum: What are silicon spin-qubits and how do they differ from competing technology, such as superconducting qubits and ion trap systems?
Clarke: In an ion trap you are basically using a laser to manipulate a metal ion through its excited states where the population density of two excited states represents the zero and one of the qubit. In a superconducting circuit, you are creating the electrical version of a nonlinear LC (inductor-capacitor) oscillator circuit, and you’re using the two lowest energy levels of that oscillator circuit as the zero and one of your qubit. You use a microwave pulse to manipulate between the zero and one state.
We do something similar with the spin qubit, but it’s a little different. You turn on a transistor, and you have a flow of electrons from one side to another. In a silicon spin qubit, you essentially trap a single electron in your transistor, and then you put the whole thing in a magnetic field [using a superconducting electromagnet in a refrigerator]. This orients the electron to either spin up or spin down. We are essentially using its spin state as the zero and one of the qubit.
That would be an individual qubit. Then with very good control, we can get two separated electrons in close proximity and control the amount of interaction between them. And that serves as our two-qubit interaction.
So we’re basically taking a transistor, operating at the single electron level, getting it in very close proximity to what would amount to another transistor, and then we’re controlling the electrons.
Spectrum: Does the proximity between adjacent qubits limit how the system can scale?
Clarke: I’m going to answer that in two ways. First, the interaction distance between two electrons to provide a two-qubit gate is not asking too much of our process. We make smaller devices every day at Intel. There are other problems, but that’s not one of them.
Typically, these qubits operate on a sort of a nearest neighbor interaction. So you might have a two-dimensional grid of qubits, and you would essentially only have interactions between one of its nearest neighbors. And then you would build up [from there]. That qubit would then have interactions with its nearest neighbors and so forth. And then once you develop an entangled system, that’s how you would get a fully entangled 2D grid. [Entanglement is a condition necessary for certain quantum computations.]
Spectrum: What are some of the difficult issues right now with silicon spin qubits?
Clarke: By highlighting the challenges of this technology, I’m not saying that this is any harder than other technologies. I’m prefacing this, because certainly some of the things that I read in the literature would suggest that qubits are straightforward to fabricate or scale. Regardless of the qubit technology, they’re all difficult.
With a spin qubit, we take a transistor that normally has a current of electrons go through, and you operate it at the single electron level. This is the equivalent of having a single electron, placed into a sea of several hundred thousand silicon atoms and still being able to manipulate whether it’s spin up or spin down.
So we essentially have a small amount of silicon, we’ll call this the channel of our transistor, and we’re controlling a single electron within that that piece of silicon. The challenge is that silicon, even a single crystal, may not be as clean as we need it. Some of the defects—these defects can be extra bonds, they can be charge defects, they can be dislocations in the silicon—these can all impact that single electron that we’re studying. This is really a materials issue that we’re trying to solve.
Spectrum: Just briefly, what is coherence time and what’s its importance to computing?
Clarke: The coherence time is the window during which information is maintained in the qubit. So, in the case of a silicon spin qubit, it’s how long before that electron loses its orientation, and randomly scrambles the spin state. It’s the operating window for a qubit.
Now, all of the qubit types have what amounts to coherence times. Some are better than others. The coherence times for spin qubits, depending on the type of coherence time measurement, can be on the order of milliseconds, which is pretty compelling compared to other technologies.
What needs to happen [to compensate for brief coherence times] is that we need to develop an error correction technique. That’s a complex way of saying we’re going to put together a bunch of real qubits and have them function as one very good logical qubit.
Spectrum: How close is that kind of error correction?
Clarke: It was one of the four items that really needs to happen for us to realize a quantum computer that I wrote about earlier. The first is we need better qubits. The second is we need better interconnects. The third is we need better control. And the fourth is we need error correction. We still need improvements on the first three before we’re really going to get, in a fully scalable manner, to error correction.
You will see groups starting to do little bits of error correction on just a few qubits. But we need better qubits and we need a more efficient way of wiring them up and controlling them before you’re really going to see fully fault-tolerant quantum computing.
Spectrum: One of the improvements to qubits recently was the development of “hot” silicon qubits. Can you explain their significance?
Clarke: Part of it equates to control.
Right now you have a chip at the bottom of a dilution refrigerator, and then, for every qubit, you have several wires that that go from there all the way outside of the fridge. And these are not small wires; they’re coax cables. And so from a form factor perspective and a power perspective—each of these wires dissipates power—you really have a scaling problem.
One of the things that Intel is doing is that we are developing control chips. We have a control chip called Horse Ridge that’s a conventional CMOS chip that we can place in the fridge in close proximity to our qubit chip. Today that control chip sits at 4 kelvins and our qubit chip is at 10 millikelvins and we still have to have wires between those two stages in the fridge.
Now, imagine if we can operate our qubit slightly warmer. And by slightly warmer, I mean maybe 1 kelvin. All of a sudden, the cooling capacity of our fridge becomes much greater. The cooling capacity of our fridge at 10 millikelvin is roughly a milliwatt. That’s not a lot of power. At 1 Kelvin, it’s probably a couple of Watts. So, if we can operate at higher temperatures, we can then place control electronics in very close proximity to our qubit chip.
By having hot qubits we can co-integrate our control with our qubits, and we begin to solve some of the wiring issues that we’re seeing in today’s early quantum computers.
Spectrum: Are hot qubits structurally the same as regular silicon spin qubits?
Clarke: Within silicon spin qubits there are several different types of materials, some are what I would call silicon MOS type qubits— very similar to today’s transistor materials. In other silicon spin qubits you have silicon that’s buried below a layer of silicon germanium. We’ll call that a buried channel device. Each have their benefits and challenges.
We’ve done a lot of work with TU Delft working on a certain type of [silicon MOS] material system, which is a little different than most in the community are studying [and lets us] operate the system at a slightly higher temperature.
I loved the quantum supremacy work. I really did. It’s good for our community. But it’s a contrived problem, on a brute force system, where the wiring is a mess (or at least complex).
What we’re trying to do with the hot qubits and with the Horse Ridge chip is put us on a path to scaling that will get us to a useful quantum computer that will change your life or mine. We’ll call that quantum practicality.
Spectrum: What do you think you’re going to work on next most intensely?
Clarke: In other words, “What keeps Jim up at night?”
There are a few things. The first is time-to-information. Across most of the community, we use these dilution refrigerators. And the standard way [to perform an experiment] is: You fabricate a chip; you put it in a dilution refrigerator; it cools down over the course of several days; you experiment with it over the course of several weeks; then you warm it back up and put another chip in.
Compare that to what we do for transistors: We take a 300-millimeter wafer, put it on a probe station, and after two hours we have thousands and thousands of data points across the wafer that tells us something about our yield, our uniformity, and our performance.
That doesn’t really exist in quantum computing. So we asked, “Is there way to—at slightly higher temperatures—to combine a probe station with a dilution refrigerator?” Over the last two years, Intel has been working with two companies in Finland [Bluefors Oy and Afore Oy] to develop what we call the cryoprober. And this is just coming online now. We’ve been doing an impressive job of installing this massive piece of equipment in the complete absence of field engineers from Finland due to the Coronavirus.
What this will do is speed up our time-to-information by a factor of up to 10,000. So instead of wire bonding a single sample, putting it in the fridge, taking a week to study it, or even a few days to study it, we’re going to be able to put a 300-millimeter wafer into this unit and over the course of an evening step and scan. So we’re going to get a tremendous increase in throughput. I would say a 100 X improvement. My engineers would say 10,000. I’ll leave that as a challenge for them to impress me beyond the 100.
Here’s the other thing that keeps me up at night. Prior to starting the Intel quantum computing program, I was in charge of interconnect research in Intel’s Components Research Group. (This is the wiring on chips.) So, I’m a little less concerned with the wiring into and out of the fridge than I am just about the wiring on the chip.
I’ll give an example: An Intel server chip has probably north of 10 billion transistors on a single chip. Yet the number of wires coming off that chip is a couple of thousand. A quantum computing chip has more wires coming off the chip than there are qubits. This was certainly the case for the Google [quantum supremacy] work last year. This was certainly the case for the Tangle Lake chip that Intel manufactured in 2018, and it’s the case with our spin qubit chips we make now.
So we’ve got to find a way to make the interconnects more elegant. We can’t have more wires coming off the chip than we have devices on the chip. It’s ineffective.
This is something the conventional computing community discovered in the late 1960s with Rent’s Rule [which empirically relates the number of interconnects coming out of a block of logic circuitry to the number of gates in the block]. Last year we published a paper with Technical University Delft on the quantum equivalent of Rent’s Rule. And it talks about, amongst other things the Horse Ridge control chip, the hot qubits, and multiplexing.
We have to find a way to multiplex at low temperatures. And that will be hard. You can’t have a million-qubit quantum computer with two million coax cables coming out of the top of the fridge.
Spectrum: Doesn’t Horse Ridge do multiplexing?
Clarke: It has multiplexing. The second generation will have a little bit more. The form factor of the wires [in the new generation] is much smaller, because we can put it in closer proximity to the [quantum] chip.
So if you kind of combine everything I’ve talked about. If I give you a package that has a classical control chip—call it a future version of Horse Ridge—sitting right next to and in the same package as a quantum chip, both operating at a similar temperature and making use of very small interconnect wires and multiplexing, that would be the vision.
Spectrum: What’s that going to require?
Clarke: It’s going to require a few things. It’s going to require improvements in the operating temperature of the control chip. It’s probably going to require some novel implementations of the packaging so there isn’t a lot of thermal cross talk between the two chips. It’s probably going to require even greater cooling capacity from the dilution refrigerator. And it’s probably going to require some qubit topology that facilitates multiplexing.
Spectrum: Given the significant technical challenges you’ve talked about here, how optimistic are you about the future of quantum computing?
Clarke: At Intel, we’ve consistently maintained that we are early in the quantum race. Every major change in the semiconductor industry has happened on the decade timescale and I don’t believe quantum will be any different. While it’s important to not underestimate the technical challenges involved, the promise and potential are real. I’m excited to see and participate in the meaningful progress we’re making, not just within Intel but the industry as a whole. A computing shift of this magnitude will take technology leaders, scientific research communities, academia, and policy makers all coming together to drive advances in the field, and there is tremendous work already happening on that front across the quantum ecosystem today.
Fremont, Calif.-based magnetic RAM startup, Spin Memory, says it has developed a transistor that allows MRAM and resistive RAM to be scaled down considerably. According to the company, the device could also defeat a stubborn security vulnerability in DRAM called Row Hammer.
Spin Memory calls the device the “Universal Selector.” In a memory cell, the selector is the transistor used to access the memory element—a magnetic tunnel junction in MRAM, a resistive material in RRAM, and a capacitor in DRAM. These are usually built into the body of the silicon, with the memory element constructed above them. Making the selector smaller and simplifying the layout of interconnects that contact it, leads to more compact memory cells.
Essentially, transistors are built horizontal to the plane of the silicon. When the device is on, current flows through a channel region between a source and drain. The Universal Selector tilts that geometry 90 degrees. The source is at bottom attached to a conductor buried in the silicon, the channel region is a vertical silicon pillar, and the drain is on top. The gate, the part of the device that controls the flow of charge, surrounds the channel region ion all sides.
Such vertical gate-all-around devices are similar to those used to make today’s multilayer NAND flash storage chips. But Spin Memory’s devices span only one layer and are tuned to operate at much lower voltages.
According to the company, the vertical device would improve DRAM array density by 20-35% and allow manufacturers to pack up to five times more MRAM or RRAM memory into the same area.
The selector is part of a trio of inventions Spin Memory is developing to boost MRAM’s adoption. The other two are an improved magnetic tunnel junction, and a circuit design that boosts MRAM’s endurance and read and write speeds, as well as eliminating sources of error. The combination, according to Jeff Lewis, senior vice president of product development, would bring MRAM to a level of performance on par with SRAM, the superfast memory embedded in today’s CPUs and other processors.
“The use of SRAM as the main on-chip memory is becoming problematic because of its known scalability,” says Lewis. Because it’s just a single transistor and a magnetic tunnel junction, MRAM could one day have a density advantage over SRAM, which is made up of six-transistors. More importantly, unlike SRAM, MRAM keeps its data even when there is no power to the memory cell. Right now, however, MRAM cells are considerably larger than SRAM. “One of our key objectives was to come up with a smaller cell size for MRAM so that it could have greater attraction as an SRAM replacement.”
With DRAM, the main memory of choice for computers, the Universal Selector has an interesting side-effect: it should make the memory immune to the Row Hammer. This vulnerability occurs when a row of DRAM cells is rapidly charged and discharged. (Basically, flipping the bits at an extremely high rate.) Stray charge from this action can migrate to a neighboring row of cells, corrupting the bits there.
“Row hammer is one of the leading issues in DRAM reliability and security, and has long been a frustrating plague on the memory industry. As DRAM’s longstanding major disturb problem, row hammering is only becoming more of a problem as cells shrink,” Charles Slayman, a device reliability expert at Cisco Systems, said in a press release.
According to Lewis, the new device is immune to this problem because the transistor channel is outside of the bulk of the silicon, and so it’s isolated from the wandering charge. “This is a root-cause fix for row hammer,” he says.
For use in DRAM, the device would have to be shrunk down considerably, which possible. But improving MRAM is the immediate goal. That will involve optimizing the strength of the drive current and other aspects of the device. Spin Memory engineers will be presenting details of Universal Selector next week at the 31st Magnetic Recording Conference.
The regular scaling down in the size of transistors has always had a similar scaling down in the size of the vertical metal contacts that bridge the devices themselves to the wiring that links them up to form logic gates.
But in the last few generations the resistance of those tungsten contacts has become a drag on performance, and chip makers had been eyeing moves to alternative materials for future generations. Chip equipment supplier Applied Materials says it’s come up with a machine that reverses this resistance problem, boosting the performance of today’s chips and allowing fabs to continue using tungsten into the future.
For devices on today’s most advanced chips “resistance is your key issue,” says Zhebo Chen, global product manager. “With the transistor you’ve taken an economy car and turned it into a race car, but if the roads are congested it doesn’t matter.”
The heart of the problem is that in the existing manufacturing process, tungsten contacts must be clad in a layer of titanium nitride. The process involves first forming a hole in a layer of dielectric to contact the transistor, then adding a layer of titanium nitride to line that hole and the surface of the dielectric. The next step uses a process called chemical vapor deposition to put tungsten on all the surfaces at once, growing from the nitride layer inwards within the holes until the hole is filled. Finally, the surface layer of tungsten is removed, leaving just the nitride-clad contacts.
The purpose of the nitride is two-fold. First, it helps the tungsten stick to the walls as the contact grows, preventing flaking. Second, it blocks fluorine used in the growth process from fouling the chip.
The problem is that even as the diameter of the contact has been shrunk down, the thickness of the cladding has not. In 7-nanometer chips today, contacts are only 20 nanometers wide, and only 25 percent of their volume is tungsten, explains Chen. The rest is cladding.
In July, Applied Materials released a machine that can make tungsten contacts with no cladding at all, reducing resistance by 40 percent. This “selective gapfill process” deposits tungsten from the bottom of the contact hole up instead of on all the surfaces at once. Because it uses a different chemistry than the previous process, there’s no need for a liner’s adhesion enhancement nor its fluorine-blocking ability. However, the process does need to be accomplished completely in a vacuum, so the company built it around a sealed system capable of moving wafers through multiple process steps without exposing them to air.
Although the new machine, called the Endura Volta Selective Tungsten CVD system, was introduced in July, Chen says it’s already being used in high-volume manufacturing by leading manufacturers.
“There’s more than 100 kilometers of tungsten contact on a [300-millimeter] wafer,” says Chen. “Doing this right in high-volume manufacturing is exceedingly difficult.”
The most broadly accepted suite of seven standard tests for AI systems released its newest rankings Wednesday, and GPU-maker Nvidia swept all the categories for commercially-available systems with its new A100 GPU-based computers, breaking 16 records. It was, however, the only entrant in some of them.
The rankings are by MLPerf, a consortium with membership from both AI powerhouses like Facebook, Tencent, and Google and startups like Cerebras, Mythic, and Sambanova. MLPerf’s tests measure the time it takes a computer to train a particular set of neural networks to an agreed upon accuracy. Since the previous round of results, released in July 2019, the fastest systems improved by an average of 2.7x, according to MLPerf.
Engineers have been chasing a form of AI that could drastically lower the energy required to do typical AI things like recognize words and images. This analog form of machine learning does one of the key mathematical operations of neural networks using the physics of a circuit instead of digital logic. But one of the main things limiting this approach is that deep learning’s training algorithm, back propagation, has to be done by GPUs or other separate digital systems.
Now University of Montreal AI expert Yoshua Bengio, his student Benjamin Scellier, and colleagues at startup Rain Neuromorphics have come up with way for analog AIs to train themselves. That method, called equilibrium propagation, could lead to continuously learning, low-power analog systems of a far greater computational ability than most in the industry now consider possible, according to Rain CTO Jack Kendall.
Analog circuits could save power in neural networks in part because they can efficiently perform a key calculation, called multiply and accumulate. That calculation multiplies values from inputs according to various weights, and then it sums all those values up. Two fundamental laws of electrical engineering can basically do that, too. Ohm’s Law multiplies voltage and conductance to give current, and Kirchoff’s Current Law sums the currents entering a point. By storing a neural network’s weights in resistive memory devices, such as memristors, multiply-and-accumulate can happen completely in analog, potentially reducing power consumption by orders of magnitude.
The reason analog AI systems can’t train themselves today has a lot to do with the variability of their components. Just like real neurons, those in analog neural networks don’t all behave exactly alike. To do back propagation with analog components, you must build two separate circuit pathways. One going forward to come up with an answer (called inferencing), the other going backward to do the learning so that the answer becomes more accurate. But because of the variability of analog components, the pathways don’t match up.
“You end up accumulating error as you go backwards through the network,” says Bengio. To compensate, a network would need lots of power-hungry analog-to-digital and digital-to-analog circuits, defeating the point of going analog.
Equilibrium propagation allows learning and inferencing to happen on the same network, partly by adjusting the behavior of the network as a whole. “What [equilibrium propagation] allows us to do is to say how we should modify each of these devices so that the overall circuit performs the right thing,” he says. “We turn the physical computation that is happening in the analog devices directly to our advantage.”
Right now, equilibrium propagation is only working in simulation. But Rain plans to have a hardware proof-of-principle in late 2021, according to CEO and cofounder Gordon Wilson. “We are really trying to fundamentally reimagine the hardware computational substrate for artificial intelligence, find the right clues from the brain, and use those to inform the design of this,” he says. The result could be what they call end-to-end analog AI systems that capable of running sophisticated robots or even playing a role in data centers. Both of those are currently considered beyond the capabilities of analog AI, which is now focused only on adding inferencing abilities to sensors and other low-power “edge” devices, while leaving the learning to GPUs.
The collective thoughts of the interwebz
The cookie settings on this website are set to "allow cookies" to give you the best browsing experience possible. If you continue to use this website without changing your cookie settings or you click "Accept" below then you are consenting to this.