Nothing computes more efficiently than a brain, which is why scientists are working hard to create artificial neural networks that mimic the organ as closely as possible. Conventional approaches use artificial neurons that work together to learn different tasks and analyze data; however, these artificial neurons do not have the ability to actually “fire” like real neurons, releasing bursts of electricity that connect them to other neurons in the network. The third generation of this computing tech aims to capture this real-life process more accurately – but achieving such a feat is hard to do efficiently.
The public-private partnership Fujitsu, and national research institute RIKEN put Japan on top of the world supercomputer rankings nine long years ago with the K computer. They’ve done it again, and in spades, with their jointly developed Fugaku supercomputer.
Fugaku, another name for Mount Fuji, sits at the summit of the TOP500 list announced on 22 June. It earned the top spot with an extraordinary performance of 415 Linpack petaflops. This is nearly triple that of the runner-up and previous No. 1, Oak Ridge National Lab’s Summit supercomputer in Tennessee, built by IBM. Fugaku achieved this using 396 racks employing 152,064 A64FX Arm nodes. The Arm components comprise approximately 95 percent of the computer’s almost 159,000 nodes.
In addition to demonstrating world-beating speed, Fugaku beat the competition in: the High Performance Conjugate Gradients (HPCG) benchmark used to test real-world application performance; the Graph500, a rating for data-intensive loads; and HPL-AI, a benchmark for rating artificial intelligence workloads. A Fugaku prototype also took top spot for the most energy-efficient system on the Green500 list last November, achieving an outstanding 16.9 GFlops/Watt power-efficiency during a 2.0 Pflops per second Linpack performance run.
Driving Fugaku’s success is Fujitsu’s 48-core Arm v8.2-A A64FX CPU, which the company is billing as the world’s first CPU to adopt Scalable Vector Extension—an instruction-set extension of Arm v8-A architecture for supercomputers. The 512-bit, 2.2 GHz CPU employs 1,024 Gbytes/s 3D-stacked memory and can handle half-precision arithmetic and multiply-add operations that reduce memory loads in AI and deep learning applications where lower precision is admissible. The CPUs are directly linked by a 6.8 Gbytes/s network Tofu D interconnect that uses a 6-dimensional mesh torus connection.
During three years of planning the computer starting in 2011, a number of designs and architectures were considered. “Our guiding strategy was to build a science-driven, low-powered machine that was easy to use and could run science and engineering applications efficiently,” says Toshiyuki Shimizu, Principal Engineer of Fujitsu’s Platform Development Unit.
Independent observers say they succeeded in every element of the goal. “Fugaku is very impressive with over 7 million cores,” says Jack Dongarra, director of the Innovative Computing Lab, University of Tennessee, Knoxville. “The machine was designed for doing computational science problems from the ground up. It’s a first.”
As for the choice of Arm architecture, Shimizu notes the large number of application developers supporting Arm. “Fugaku also supports Red Hat Enterprise Linux 8.x, a de facto standard operating system widely used by commercial servers,” he points out.
Another plus for Fugaku is that it follows the K computer by maintaining an all-CPU design. Shimizu says this makes memory access and CPU interconnectivity more efficient. Most other supercomputers rely on graphic processing units (GPUs) to accelerate performance.
Dongarra points out an additional benefit here. “A CPU-only system simplifies the programming. Just one program is needed, not two: one for the CPU and one for the GPU.”
Designing and building a computer that, from the ground up, was intended to be Japan’s national flagship didn’t come cheap, of course. The government’s estimated budget for the project’s R&D, acquisitions, and application development is 110 billion yen (roughly US $1 billion).
Fujitsu dispatched the first units of Fugaku to the RIKEN Center for Computational Science (R-CCS) in Kobe last December and shipments were completed last month.
Speaking at the ISC 2020 conference in June, Satoshi Matsuoka, Director of R-CCS, said that although Fugaku was scheduled to start up next year, Japan’s government decided it should be deployed now to help combat Covid-19. He cited that it was being used to study how the virus behaves, what existing drugs might be repurposed to counter it, and how a vaccine could be made.
Other government-targeted application areas given high priority include: disaster-prevention simulations of earthquakes and tsunami; development of fundamental technologies for energy creation, conversion, and storage; creation of new materials to support next-generation industries; and development of new design and production processes for the manufacturing industry.
Fugaku will also be used to realize the creation of a smarter society—dubbed Society 5.0—“that balances economic advancement with the resolution of social problems by a system that highly integrates cyberspace and physical space.”
But the supercomputer industry is nothing if not a game of technology leapfrog, with one country or enterprise providing machines with the highest performance only to be outpaced a short time later. Just how long will Fugaku stay No. 1?
Shimizu doesn’t claim to know, but he says there is room for further improvement of Fugaku’s performance. “The TOP500 result was only 81 percent of peak performance, whereas the efficiency of silicon is higher. We believe we can improve the performance in all the categories.”
But even that might not be enough to keep it on top for long. As Dongarra says, “The U.S. will have exascale machines in 2021.”
Post Syndicated from Samuel K. Moore original https://spectrum.ieee.org/tech-talk/computing/hardware/honeywell-claims-it-has-most-powerful-quantum-computer
“We expect within next three months we will be releasing world’s most powerful quantum computer,” Tony Uttley, president of Honeywell Quantum Solutions, told IEEE Spectrum in March. Right on cue, last week the company claimed it had reached that mark. The benchmark measurement, called quantum volume, is essentially a combined measure of the number of physical qubits, how connected they are, and how error prone they are. For Honeywell’s system, which has 6-qubits, that number is now 64, beating a 53-qubit IBM system that had a quantum volume of 32.
The quantum volume measure isn’t a universally accepted benchmark and has an unclear relationship to the “quantum supremacy” goal Google claimed in 2019, which compares a quantum computer to the theoretical peak performance of classical computers. But Uttley says it’s the best measure so far. “It takes into account more than just how many physical qubits you have,” he says. Just going by the number of qubits doesn’t work because, “you don’t necessarily get all or even any of the benefits of physical qubits” in real computations.
Honeywell’s computer uses ytterbium ions trapped by an electromagnetic field within a narrow groove built in a chip. The qubit is represented by the spin state of the ion’s outermost electron and that of its nucleus. The qubits are manipulated using lasers and can be moved around the trap to carry out algorithms. Much of the quantum volume advantage of this system comes from the length of time the qubits can hold their state before noise corrupts them and crashes the computation. In the ion trap, they last for seconds, as opposed to the microseconds of many other systems. Uttley says this long “coherence time” allows for mid-circuit measurements—a quantum version of if/then programming statements—in quantum algorithms.
Because of COVID-19, most of the United States went into lockdown within weeks of Honeywell’s March prediction. So hitting the mark took a different path than expected. “We had to redesign physical layout of labs to keep social distance,” including adding plexiglass dividers, explains Uttley. And only 30 percent of the project team worked on site. “We pulled in a tremendous amount of automation,” he says.
The quantum computer itself is designed to be accessed remotely, Uttley explains. The company plans to offer it as a cloud service. And partners, such the bank JP Morgan Chase, are already running algorithms on it. The latter firm is interested in quantum algorithms for fraud detection, optimization for trading strategies, and security. More broadly customers want to explore problems of optimization, machine learning, and chemistry and materials science.
Uttley predicts 10-fold boosts in quantum volume per year going forward. His confidence comes from the nature of the ion trap system his team has developed. “It’s like we built a stadium, but right now we’re only occupying a handful of seats.”
Post Syndicated from Amy Nordrum original https://spectrum.ieee.org/computing/hardware/lasers-write-data-into-glass
Magnetic tape and hard disk drives hold much of the world’s archival data. Compared with other memory and storage technologies, tape and disk drives cost less and are more reliable. They’re also nonvolatile, meaning they don’t require a constant power supply to preserve data. Cultural institutions, financial firms, government agencies, and film companies have relied on these technologies for decades, and will continue to do so far into the future.
But archivists may soon have another option—using an extremely fast laser to write data into a 2-millimeter-thick piece of glass, roughly the size of a Post-it note, where that information can remain essentially forever.
This experimental form of optical data storage was demonstrated in 2013 by researchers at the University of Southampton in England. Soon after, that group began working with engineers at Microsoft Research in an effort called Project Silica. Last November, Microsoft completed its first proof of concept by writing the 1978 film Superman on a single small piece of glass and retrieving it.
With this method, researchers could theoretically store up to 360 terabytes of data on a disc the size of a DVD. For comparison, Panasonic aims to someday fit 1 TB on conventional optical discs, while Seagate and Western Digital are shooting for 50- to 60-TB hard disk drives by 2026.
International Data Corp. expects the world to produce 175 zettabytes of data by 2025—up from 33 ZB in 2018. Though only a fraction of that data will be stored, today’s methods may no longer suffice. “We believe people’s appetite for storage will force scientists to look into other kinds of materials,” says Waguih Ishak, chief technologist at Corning Research and Development Corp.
Microsoft’s work is part of a broader company initiative to improve cloud storage through optics. “I think they see it as potentially a distinguishing technology from something like [Amazon Web Services] and other cloud providers,” says James Byron, a Ph.D. candidate in computer science at the University of California, Santa Cruz, who studies storage methods.
Microsoft isn’t alone—John Morris, chief technology officer at Seagate, says researchers there are also focused on understanding the potential of optical data storage in glass. “The challenge is to develop systems that can read and write with reasonable throughput,” he says.
Writing data to glass involves focusing a femtosecond laser, which pulses very quickly, on a point within the glass. The glass itself is a sort known as fused silica. It’s the same type of extremely pure glass used for the Hubble Space Telescope’s mirror as well as the windows on the International Space Station.
The laser’s pulse deforms the glass at its focal point, forming a tiny 3D structure called a voxel. Two properties that measure how the voxel interacts with polarized light—retardance and change in the light’s polarization angle—can together represent several bits of data per voxel.
Microsoft can currently write hundreds of layers of voxels into each piece of glass. The glass can be written to once and read back many times. “This is data in glass, not on glass,” says Ant Rowstron, a principal researcher and deputy lab director at Microsoft Research Lab in Cambridge, England.
Reading data from the glass requires an entirely different setup, which is one potential drawback of this method. Researchers shine different kinds of polarized light—in which light waves all oscillate in the same direction, rather than every which way—onto specific voxels. They capture the results with a camera. Then, machine-learning algorithms analyze those images and translate their measurements into data.
Ishak, who is also an adjunct professor of electrical engineering at Stanford University, is optimistic about the approach. “I’m sure that in the matter of a decade, we’ll see a whole new kind of storage that eclipses and dwarfs everything that we have today,” he says. “And I firmly believe that those pure materials like fused silica will definitely play a major role there.”
But many scientific and engineering challenges remain. “The writing process is hard to make reliable and repeatable, and [it’s hard] to minimize the time it takes to create a voxel,” says Rowstron. “The read process has been a challenge in figuring out how to read the data from the glass using the minimum signal possible from the glass.”
The Microsoft group has added error-correcting codes to improve the system’s accuracy and continues to refine its machine-learning algorithms to automate the read-back process. Already, the team has improved writing speeds by several orders of magnitude from when they began, though Rowstron declined to share absolute speeds.
The team is also considering what it means to store data for such a long time. “We are working on thinking what a Rosetta Stone for glass could look like to help people decode it in the future,” Rowstron says.
This article appears in the June 2020 print issue as “Storing Data in Glass.”
Post Syndicated from Fahmida Y Rashid original https://spectrum.ieee.org/computing/hardware/what-is-confidential-computing
A handful of major technology companies are going all in on a new security model they’re calling confidential computing in an effort to better protect data in all its forms.
The three pillars of data security involve protecting data at rest, in transit, and in use. Protecting data at rest means using methods such as encryption or tokenization so that even if data is copied from a server or database, a thief can’t access the information. Protecting data in transit means making sure unauthorized parties can’t see information as it moves between servers and applications. There are well-established ways to provide both kinds of protection.
Protecting data while in use, though, is especially tough because applications need to have data in the clear—not encrypted or otherwise protected—in order to compute. But that means malware can dump the contents of memory to steal information. It doesn’t really matter if the data was encrypted on a server’s hard drive if it’s stolen while exposed in memory.
Proponents of confidential computing hope to change that. “We’re trying to evangelize there are actually practical solutions” to protect data while it’s in use, said Dave Thaler, a software architect from Microsoft and chair of the Confidential Computing Consortium’s Technical Advisory Council.
The consortium, launched last August under the Linux Foundation, aims to define standards for confidential computing and support the development and adoption of open-source tools. Members include technology heavyweights such as Alibaba, AMD, Arm, Facebook, Fortanix, Google, Huawei, IBM (through its subsidiary Red Hat), Intel, Microsoft, Oracle, Swisscom, Tencent, and Vmware. Several already have confidential computing products and services for sale.
Confidential computing uses hardware-based techniques to isolate data, specific functions, or an entire application from the operating system, hypervisor or virtual machine manager, and other privileged processes. Data is stored in the trusted execution environment (TEE), where it’s impossible to view the data or operations performed on it from outside, even with a debugger. The TEE ensures that only authorized code can access the data. If the code is altered or tampered with, the TEE denies the operation.
Many organizations have declined to migrate some of their most sensitive applications to the cloud because of concerns about potential data exposure. Confidential computing makes it possible for different organizations to combine data sets for analysis without accessing each other’s data, said Seth Knox, vice president of marketing at Fortanix and the outreach chair for the Confidential Computing Consortium. For example, a retailer and credit card company could cross-check customer and transaction data for potential fraud without giving the other party access to the original data.
Confidential computing may have other benefits unrelated to security. An image-processing application, for example, could store files in the TEE instead of sending a video stream to the cloud, saving bandwidth and reducing latency. The application may even divide up such tasks on the processor level, with the main CPU handling most of the processing, but relying on a TEE on the network interface card for sensitive computations.
Such techniques can also protect algorithms. A machine-learning algorithm, or an analytics application such as a stock trading platform, can live inside the TEE. “You don’t want me to know what stocks you’re trading, and I don’t want you to know the algorithm,” said Martin Reynolds, a technology analyst at Gartner. “In this case, you wouldn’t get my code, and I wouldn’t get your data.”
Confidential computing requires extensive collaboration between hardware and software vendors so that applications and data can work with TEEs. Most confidential computing performed today runs on Intel servers (like the Xeon line) with Intel Software Guard Extension (SGX), which isolates specific application code and data to run in private regions of memory. However, recent security research has shown that Intel SGX can be vulnerable to side-channel and timing attacks.
Fortunately, TEEs aren’t available only in Intel hardware. OP-TEE is a TEE for nonsecure Linux Kernels running on Arm Cortex-A cores. Microsoft’s Virtual Secure Mode is a software-based TEE implemented by Hyper-V (the hypervisor for Windows systems) in Windows 10 and Windows Server 2016.
The Confidential Computing Consortium currently supports a handful of open-source projects, including the Intel SGX SDK for Linux, Microsoft’s Open Enclave SDK, and Red Hat’s Enarx. Projects don’t have to be accepted by the consortium to be considered confidential computing: For example, Google’s Asylo is similar to Enarx, and Microsoft Azure’s confidential computing services support both Intel SGX and Microsoft’s Virtual Secure Mode.
Hardware-based TEEs can supplement other security techniques, Thaler said, including homomorphic encryption and secure element chips such as the Trusted Platform Module. “You can combine these technologies because they are not necessarily competing,” he said. “Are you looking at the cloud or looking at the edge? You can pick which techniques to use.”
This article appears in the June 2020 print issue as “The Rise of Confidential Computing.”
Post Syndicated from Charles Q. Choi original https://spectrum.ieee.org/tech-talk/computing/hardware/qubit-supremacy
Quantum computers theoretically can prove more powerful than any supercomputer, and now scientists calculate just what quantum computers need to attain such “quantum supremacy,” and whether or not Google achieved it with their claims last year.
Whereas classical computers switch transistors either on or off to symbolize data as ones and zeroes, quantum computers use quantum bits or qubits that, because of the bizarre nature of quantum physics, can be in a state of superposition where they are both 1 and 0 simultaneously.
Superposition lets one qubit perform two calculations at once, and if two qubits are linked through a quantum effect known as entanglement, they can help perform 22 or four calculations simultaneously; three qubits, 23 or eight calculations; and so on. In principle, a quantum computer with 300 qubits could perform more calculations in an instant than there are atoms in the visible universe.
It remains controversial how many qubits are needed to achieve quantum supremacy over standard computers. Last year, Google claimed to achieve quantum supremacy with just 53 qubits, performing a calculation in 200 seconds that the company estimated would take the world’s most powerful supercomputer 10,000 years, but IBM researchers argued in a blog post “that an ideal simulation of the same task can be performed on a classical system in 2.5 days and with far greater fidelity.”
To see what quantum supremacy might actually demand, researchers analyzed three different ways quantum circuits that might solve problems conventional computers theoretically find intractable. Instantaneous Quantum Polynomial-Time (IQP) circuits are an especially simple way to connect qubits into quantum circuits. Quantum Approximate Optimization Algorithm (QAOA) circuits are more advanced, using qubits to find good solutions to optimization problems. Finally, boson sampling circuits use photons instead of qubits, analyzing the paths such photons take after interacting with one another.
Assuming these quantum circuits were competing against supercomputers capable of up to a quintillion (1018) floating-point operations per second (FLOPS), the researchers calculated that quantum supremacy could be reached with 208 qubits with IQP circuits, 420 qubits with QAOA circuits and 98 photons with boson sampling circuits.
“I’m a little bit surprised that we were ultimately able to produce a number that is not so far from the kinds of numbers we see in devices that already exist,” says study lead author Alexander Dalzell, a quantum physicist at the California Institute of Technology in Pasadena. “The first approach we had suggested 10,000 or more qubits would be necessary, and the second approach still suggested almost 2,000. Finally, on the third approach we were able to eliminate a lot of the overhead in our analysis and reduce the numbers to the mere hundreds of qubits that we quote.”
The scientists add quantum supremacy might be possible with even fewer qubits. “In general, we make a lot of worst-case assumptions that might not be necessary,” Dalzell says.
When it comes to Google, the researchers note the company’s claims are challenging to analyze because Google chose a quantum computing task that was difficult to compare to any known algorithm in classical computation.
“I think the claim that they did something with a quantum device that we don’t know how to do on a classical device, without immense resources, is basically accurate as far as I can tell,” Dalzell says. “I’m less confident that there isn’t some yet-undiscovered classical simulation algorithm that, if we only knew about it, would allow us to replicate Google’s experiment, or even a somewhat larger version of their experiment, on a realistic classical device. To be clear, I’m not saying I think such an algorithm exists. I’m just saying that if it did exist, it wouldn’t be completely and totally surprising.”
In the end, “have we reached quantum computational supremacy when we’ve done something that we don’t know how to do with a classical device? Or do we really want to be confident that it’s impossible even using algorithms we might have not yet discovered?” Dalzell asks. “Google seems to be pretty clearly taking the former position, even acknowledging that they expect algorithmic innovations to bring down the cost of classical simulation, but that they also expect the improvement of quantum devices to be sufficient to maintain a state of quantum computational supremacy. They rely on arguments from complexity theory only to suggest that extreme improvements in classical simulation are unlikely. This is definitely a defensible interpretation.”
Future research can analyze how quantum supremacy estimates deal with noise in quantum circuits. “When there’s no noise, the quantum computational supremacy arguments are on pretty solid footing,” Dalzell says. “But add in noise, and you give something that a classical algorithm might be able to exploit.”
Post Syndicated from Harry Goldstein original https://spectrum.ieee.org/tech-talk/computing/hardware/mit-media-lab-food-computer-project-shut-down
MIT Media Lab’s Open Agriculture Initiative led by principal scientist Caleb Harper was permanently shuttered by the university on 30 April 2020.
“Caleb Harper’s last day of employment with the Institute was April 30, and as he led the Open Agriculture Initiative at the MIT Media Lab, it is closed at MIT,” Kimberly Allen, Director of Media Relations told Spectrum in an email.
As for the fate of OpenAg’s Github repository and the OpenAg Forum (which is no longer reachable), Allen said only, “Any legacy digital properties that may be hosted on Media Lab servers will either be closed or moved in time.”
The OpenAg initiative came under scrutiny following the departure in September 2019 of Media Lab director Joichi Ito after revelations that he had solicited and accepted donations for the Media Lab from convicted child sex offender Jeffrey Epstein. In a report [PDF] commissioned by MIT and released in January, the law firm Goodwin Procter LLP found that Ito had “worked in 2018 to obtain $1.5 million from Epstein to support research by Caleb Harper, a Principal Research Scientist at the Media Lab.” The report states that the donation was never made. The report also notes that Harper, Ito and Professor Ed Boyden met with Epstein at the Media Lab on 15 April 2017, just days before 15 of 17 employees at Harper’s start up Fenome were dismissed.
OpenAg and Fenome designed, developed and fabricated personal food computers, enclosed chambers the size of a mini-fridge packed with LEDs, sensors, pumps, fans, control electronics, and a hydroponic tray for growing plants. Harper mesmerized audiences and investors around the globe with a vision of “nerd farmers” growing Tuscan tomatoes in portable boxes with recipes optimized by machine learning algorithms. But the food computers never lived up to the hype, though they did make an appearance at the Cooper Hewitt Museum’s Design Triennial in the fall of 2019, where the photos for this post were taken.
Maria T. Zuber, vice president for research at MIT, led an internal investigation following allegations that Harper told MIT staff to demonstrate food computers with plants not grown in them and that fertilizer solution used by OpenAg was discharged on into a well on the grounds of at the Bates Research and Engineering Center in Middleton, Mass., in amounts that exceeded limits permitted by the state of Massachusetts. While that investigation was being conducted, OpenAg’s activities were restricted.
The Massachusetts Department of Environmental Protection (MassDEP) concluded their review of OpenAg activities at Bates on 22 April, according to a letter dated 11 May [PDF] to the Bates community from Boleslaw Wyslouch, director of the Laboratory for Nuclear Science and the Bates Research and Engineering Center. MassDEP, Wyslouch said, “fined MIT for discharging spent plan growing solution and dilute cleaning fluids into an Underground Injection Control (UIC) well in violation of the conditions of the well registration terms.”
MassDEP originally fined MIT $25,125 but according to a post on the Bates website detailing the MassDEP review, upon the permanent closure of the OpenAg Initiative, MassDEP “suspended payment of a $10,125 portion of the fine, leaving MIT responsible for paying $15,000.”
The discharge was brought to light by a scientist formerly associated with OpenAg, Babak Babakinejad, who in addition to blowing the whistle on the chemical discharge at Bates, also alleged, in an email to Ito on 5 May 2018, that Harper had taken credit for the deployment of food computers to schools as well as to “a refugee camp in Amman despite the fact that they have never been validated, tested for functionality and up to now we could never make it work i.e. to grow anything consistently, for an experiment beyond prototyping stage.”
A subsequent investigation by Spectrum substantiated Babakinejad’s claims and found that Harper had lied about the supposed refugee camp deployment to potential investors and in several public appearances between 2017 and 2019.
Harper, who for years had been actively promoting the food computer on social media, has been mostly silent since the MIT investigation started last September. His LinkedIn profile now states that he is Executive Director of the Dairy Scale for Good (DS4G) Initiative “working to help US Dairies pilot and integrate new technology and management practices to reach net zero emissions or better while increasing farmer livelihood.”
Post Syndicated from David Schneider original https://spectrum.ieee.org/tech-talk/computing/hardware/quantum-inspire-launches
IEEE Spectrum spoke with Richard Versluis of QuTech in the Netherlands, which last week launched “Inspire,” Europe’s first public-access quantum-computing platform. Versluis, who is the system architect at QuTech, wrote about the challenges of building practical quantum computers in Spectrum’s April issue.
- What “Inspire”” is, exactly
- Programming a quantum computer
- What it takes to simulate 31 qubits
- How to make a spin-qubit chip
- Spin qubits vs other kinds
- Solving problems of practical value
Spectrum: Tell me about QuTech. Is it a company, a government agency, a university, or some combination of the above?
Versluis: QuTech is the advanced research center for Quantum Computing and Quantum Internet, a collaboration founded in 2014 by Delft University of Technology (TU Delft) and the Netherlands Organisation for Applied Scientific Research (TNO). TNO is an independent research institute. About 70 percent of our funding is from other businesses. But because we receive considerable base funding from the government, we are not allowed to make a profit. Our status is very much like the national labs in the United States.
Post Syndicated from Charles Q. Choi original https://spectrum.ieee.org/nanoclast/computing/hardware/terahertz-chip
Novel materials known as photonic topological insulators could one day help terahertz waves send data across chips at unprecedented speeds of a trillion bits per second, a new study finds.
Terahertz waves fall between optical waves and microwaves on the electromagnetic spectrum. Ranging in frequency from 0.1 to 10 terahertz, terahertz waves could be key to future 6G wireless networks. With those networks, engineers aim to transmit data at terabits (trillions of bits) per second.
Such data links could also greatly boost intra-chip and inter-chip communication to support artificial intelligence (AI) and cloud-based technologies, such as autonomous driving.
“Artificial intelligence and cloud-based applications require high volumes of data to be transmitted to a connected device with ultra-high-speed and low latency,” says Ranjan Singh, a photonics researcher at Nanyang Technological University in Singapore and coauthor of the new work. “Take for example, an autonomous vehicle that uses AI to make decisions. In order to increase the efficiency of decision-making tasks, the AI-sensors need to receive data from neighboring vehicles at ultra-high speed to perform the actions in real time.”
Two research groups say they’ve independently built quantum devices that can operate at temperatures above 1 Kelvin—15 times hotter than rival technologies can withstand.
The ability to work at higher temperatures is key to scaling up to the many qubits thought to be required for future commercial-grade quantum computers.
A team led by Andrew Dzurak and Henry Yang from the University of New South Wales in Australia performed a single-qubit operation on a quantum processor at 1.5 Kelvin. Separately, a team led by Menno Veldhorst of Delft University of Technology performed a two-qubit operation at 1.1 Kelvin. Jim Clarke, director of quantum hardware at Intel, is a co-author on the Delft paper. Both groups published descriptions of their devices today in Nature.
HongWen Jiang, a physicist at UCLA and a peer reviewer for both papers, described the research as “a technological breakthrough for semiconductor based quantum computing.”
Post Syndicated from John Boyd original https://spectrum.ieee.org/tech-talk/computing/hardware/japanese-researchers-develop-a-novel-annealing-processor-thats-the-fastest-technology-yet-at-solving-combinatorial-optimization-problems
During the past two years, IEEE Spectrum has spotlighted several new approaches to solving combinatorial optimization problems, particularly Fujitsu’s Digital Annealer and more recently Toshiba’s Simulated Bifurcation Algorithm. Now, researchers at the Tokyo Institute of Technology, with help from colleagues at Hitachi, Hokkaido University, and the University of Tokyo, have engineered a new annealer architecture to deal with this kind of task that has proven too taxing for conventional computers to deal with.
Dubbed STATICA (Stochastic Cellular Automata Annealer Architecture), the processor is designed to take on challenges such as portfolio, logistic, and traffic flow optimization when they are expressed in the form of Ising models.
Originally used to describe the spins of interacting magnets, Ising models can also be used to solve optimization problems. That’s because the evolving magnetic interactions in a system progress towards the lowest-energy state, which conveniently mirrors how an optimization algorithm searches for the best—i.e. ground state—solution. In other words, the answer to a particular optimization question becomes the equivalent of searching for the lowest energy state of the Ising model.
Current annealers such as D-Wave’s quantum annealer computer and Fujitsu’s Digital Annealer calculate spin-evolutions serially, points out Professor Masato Motomura at Tokyo Tech’s Institute of Innovative Research and leader of the STATICA project. As one spin affects all the other spins in a given iteration, spin switchings are calculated one by one, making it a serial process. But in STATICA, he notes, that updating is performed in parallel using stochastic cellular automata (SCA). That is a means of simulating complex systems using the interactions of a large number of neighboring “cells” (spins in STATICA) with simple updating rules and some stochasticity (randomness).
In conventional annealing systems, if one spin flips, it affects all of the connected spins and therefore all the spins must be processed in the next iteration. But in STATICA, SCA introduces copies (replicas) of the original spins into the process. All original spin-spin interactions are redirected to their individual replica spins.
“In this method, all the replica spins are updated in parallel using these spin-spin interactions,” explains Motomura.” If one original spin flips, it affects its replica spin but not any of the other original spins because there is no interaction between them, unlike conventional annealing. And in the next iteration, the replica spins are interpreted as original spins and the parallel spin-update is repeated.
As well as enabling paralleling processing, STATICA also uses pre-computed results to reduce computation. “So if there is no spin-flip, there is nothing to compute,” says Motomura. “And if the influence of a flipped spin has already been computed, that result is reused.”
For proof of concept, the researchers had a 3-by-4-mm STATICA chip fabricated using a 65-nm CMOS process operating at a frequency of 320 megahertz and running on 649 milliwatts. Memory comprises a 1.3 megabit SRAM. This enabled an Ising model of 512 spins, equivalent to 262,000 connections, to be tested.
“Scaling by at least two orders of magnitude is possible,” notes Motomura. And the chip can be fabricated using the same process as standard processors and can easily be added to a PC as a co-processor, for instance, or added to its motherboard.
“At the ISSCC Conference in February, where we presented a paper on STATICA, we mounted the chip on a circuit board with a USB connection,” he says, “and demonstrated it connected to a laptop PC as proof of concept.”
To compare STATICA’s performance against existing annealing technologies (using results given in published papers), the researchers employed a Maxcut benchmark test of 2,000 connections. STATICA came out on top in processing speed, accuracy, and energy efficiency. Compared with its nearest competitor, Toshiba’s Simulated Bifurcation Algorithm, STATICA took 0.13 milliseconds to complete the test, versus 0.5 ms for SBA. In energy efficiency, STATICA ran on an estimated 2 watts of power, far below the to 40 watts for SBA. And in histogram comparisons of accuracy STATICA also came out ahead, according to Motomura.
For the next step, he says the team will scale up the processor and test it out using realistic problems.
Other than that, there are no more technology hurdles to overcome.
“STATICA is ready,” states Motomura. “The only question is whether there is sufficient market demand for such an annealing processor. We hope to see interest, for instance, from ride-sharing companies like Uber, and product distributors such as Amazon. Local governments wanting to control problems such as traffic congestion might also be interested. These are just a few examples of how STATICA might be used besides more obvious applications like portfolio optimization and drug discovery.”
Post Syndicated from Mark Anderson original https://spectrum.ieee.org/tech-talk/computing/hardware/can-quantum-computing-help-us-respond-to-the-coronavirus
D-Wave Systems has offered free cloud computing time on its quantum computer to COVID-19 researchers. The offer, unveiled last week, applies to work toward vaccines and therapies as well as epidemiology, supply distribution, hospital logistics, and diagnostics.
“We have opened up our service for free, unlimited use—for businesses, for governments, for researchers—working on solving problems associated with the pandemic,” said Alan Baratz, CEO of D-Wave, based in Burnaby, British Columbia. “We also recognize that many of these companies may not have experience with quantum computers. So we’ve also reached out to our customers and partners who do have experience with using our systems to ask if they would be willing to help.”
The free quantum computing consulting services D-Wave is arranging include quantum programming expertise in scientific computing as well as in planning, management, and operations for front-line workers.
Post Syndicated from Charles Q. Choi original https://spectrum.ieee.org/tech-talk/computing/hardware/programmable-pics
Reprogrammable photonic circuits based on a novel programmable material might speed the rate at which engineers can develop working photonic devices, researchers say.
Electronic integrated circuits (ICs) are nowadays key to many technologies, but their light-based counterparts, photonic integrated circuits (PICs), may offer many advantages, such as lower energy consumption and faster operation. However, current fabrication methods for PICs experience a great deal of variability, such that many of the resulting devices are slightly off base from the desired specifications, resulting in limited yields.
Post Syndicated from Richard Versluis original https://spectrum.ieee.org/computing/hardware/heres-a-blueprint-for-a-practical-quantum-computer
The classic Rubik’s Cube has 43,252,003,274,489,856,000 different states. You might well wonder how people are able to take a scrambled cube and bring it back to its original configuration, with just one color showing on each side. Some people are even able to do this blindfolded after viewing the scrambled cube once. Such feats are possible because there’s a basic set of rules that always allow someone to restore the cube to its original state in 20 moves or less.
Controlling a quantum computer is a lot like solving a Rubik’s Cube blindfolded: The initial state is well known, and there is a limited set of basic elements (qubits) that can be manipulated by a simple set of rules—rotations of the vector that represents the quantum state. But observing the system during those manipulations comes with a severe penalty: If you take a look too soon, the computation will fail. That’s because you are allowed to view only the machine’s final state.
The power of a quantum computer lies in the fact that the system can be put in a combination of a very large number of states. Sometimes this fact is used to argue that it will be impossible to build or control a quantum computer: The gist of the argument is that the number of parameters needed to describe its state would simply be too high. Yes, it will be quite an engineering challenge to control a quantum computer and to make sure that its state will not be affected by various sources of error. However, the difficulty does not lie in its complex quantum state but in making sure that the basic set of control signals do what they should do and that the qubits behave as you expect them to.
If engineers can figure out how to do that, quantum computers could one day solve problems that are beyond the reach of classical computers. Quantum computers might be able to break codes that were thought to be unbreakable. And they could contribute to the discovery of new drugs, improve machine-learning systems, solve fiendishly complex logistics problems, and so on.
The expectations are indeed high, and tech companies and governments alike are betting on quantum computers to the tune of billions of dollars. But it’s still a gamble, because the same quantum-mechanical effects that promise so much power also cause these machines to be very sensitive and difficult to control.
Must it always be so? The main difference between a classical supercomputer and a quantum computer is that the latter makes use of certain quantum mechanical effects to manipulate data in a way that defies intuition. Here I will briefly touch on just some of these effects. But that description should be enough to help you understand the engineering hurdles—and some possible strategies for overcoming them.
Whereas ordinary classical computers manipulate bits (binary digits), each of which must be either 0 or 1, quantum computers operate on quantum bits, or qubits. Unlike classical bits, qubits can take advantage of a quantum mechanical effect called superposition, allowing a qubit to be in a state where it has a certain amount of zero-ness to it and a certain amount of one-ness to it. The coefficients that describe how much one-ness and how much zero-ness a qubit has are complex numbers, meaning that they have both real and imaginary parts.
In a machine with multiple qubits, you can create those qubits in a very special way, such that the state of one qubit cannot be described independently of the state of the others. This phenomenon is called entanglement. The states that are possible for multiple entangled qubits are more complicated than those for a single qubit.
While two classical bits can be set only to 00, 01, 10, or 11, two entangled qubits can be put into a superposition of these four fundamental states. That is, the entangled pair of qubits can have a certain amount of 00-ness, a certain amount of 01-ness, a certain amount of 10-ness, and a certain amount of 11-ness. Three entangled qubits can be in a superposition of eight fundamental states. And n qubits can be in a superposition of 2n states. When you perform operations on these n entangled qubits, it’s as though you were operating on 2n bits of information at the same time.
The operations you do on a qubit are akin to the rotations done to a Rubik’s Cube. A big difference is that the quantum rotations are never perfect. Because of certain limitations in the quality of the control signals and the sensitivity of the qubits, an operation intended to rotate a qubit by 90 degrees may end up rotating it by 90.1 degrees or by 89.9 degrees, say. Such errors might seem small but they quickly add up, resulting in an output that is completely incorrect.
Another source of error is decoherence: Left by themselves, the qubits will gradually lose the information they contain and also lose their entanglement. This happens because the qubits interact with their environment to some degree, even though the physical substrate used to store them has been engineered to keep them isolated. You can compensate for the effects of control inaccuracy and decoherence using what’s known as quantum error correction, but doing so comes at great cost in terms of the number of physical qubits required and the amount of processing that needs to be done with them.
Once these technical challenges are overcome, quantum computers will be valuable for certain special kinds of calculations. After executing a quantum algorithm, the machine will measure its final state. This measurement, in theory, will yield with high probability the solution to a mathematical problem that a classical computer could not solve in a reasonable period of time.
So how do you begin designing a quantum computer? In engineering, it’s good practice to break down the main function of a machine into groups containing subfunctions that are similar in nature or required performance. These functional groups then can be more easily mapped onto hardware. My colleagues and I at QuTech in the Netherlands have found that the functions needed for a quantum computer can naturally be divided into five such groups, conceptually represented by five layers of control. Researchers at IBM, Google, Intel, and elsewhere are following a similar strategy, although other approaches to building a quantum computer are also possible.
Let me describe that five-layer cake, starting at the top, the highest level of abstraction from the nitty-gritty details of what’s going on deep inside the hardware.
At the top of the pile is the application layer, which is not part of the quantum computer itself but is nevertheless a key part of the overall system. It represents all that’s needed to compose the relevant algorithms: a programming environment, an operating system for the quantum computer, a user interface, and so forth. The algorithms composed using this layer can be fully quantum, but they may also involve a combination of classical and quantum parts. The application layer should not depend on the type of hardware used in the layers under it.
Directly below the application layer is the classical-processing layer, which has three basic functions. First, it optimizes the quantum algorithm being run and compiles it into microinstructions. That’s analogous to what goes on in a classical computer’s CPU, which processes many microinstructions for each machine-code instruction it must carry out. This layer also processes the quantum-state measurements returned by the hardware in the layers below, which may be fed back into a classical algorithm to produce final results. The classical-processing layer will also take care of the calibration and tuning needed for the layers below.
Underneath the classical layer are the digital-, analog-, and quantum-processing layers, which together make up a quantum processing unit (QPU). There is a tight connection between the three layers of the QPU, and the design of one will depend strongly on that of the other two. Let me describe more fully now the three layers that make up the QPU, moving from the top downward.
The digital-processing layer translates microinstructions into pulses, the kinds of signals needed to manipulate qubits, allowing them to act as quantum logic gates. More precisely, this layer provides digital definitions of what those analog pulses should be. The analog pulses themselves are generated in the QPU’s analog-processing layer. The digital layer also feeds back the measurement results of the quantum calculation to the classical-processing layer above it, so that the quantum solution can be combined with results computed classically.
Right now, personal computers or field-programmable gate arrays can handle these tasks. But when error correction is added to quantum computers, the digital-processing layer will have to become much more complicated.
The analog-processing layer creates the various kinds of signals sent to the qubits, one layer below. These are mainly voltage steps and sweeps and bursts of microwave pulses, which are phase and amplitude modulated so as to execute the required qubit operations. Those operations involve qubits connected together to form quantum logic gates, which are used in concert to carry out the overall computation according to the particular quantum algorithm that is being run.
Although it’s not technically difficult to generate such a signal, there are significant hurdles here when it comes to managing the many signals that would be needed for a practical quantum computer. For one, the signals sent to the different qubits would need to be synchronized at picosecond timescales. And you need some way to convey these different signals to the different qubits so as to be able to make them do different things. That’s a big stumbling block.
In today’s small-scale systems, with just a few dozen qubits, each qubit is tuned to a different frequency—think of it as a radio receiver locked to one channel. You can select which qubit to address on a shared signal line by transmitting at its special frequency. That works, but this strategy doesn’t scale. You see, the signals sent to a qubit must have a reasonable bandwidth, say, 10 megahertz. And if the computer contains a million qubits, such a signaling system would need a bandwidth of 10 terahertz, which of course isn’t feasible. Nor would it be possible to build in a million separate signal lines so that you could attach one to each qubit directly.
The solution will probably involve a combination of frequency and spatial multiplexing. Qubits would be fabricated in groups, with each qubit in the group being tuned to a different frequency. The computer would contain many such groups, all attached to an analog communications network that allows the signal generated in the analog layer to be connected only to a selected subset of groups. By arranging the frequency of the signal and the network connections correctly, you can then manipulate the targeted qubit or set of qubits without affecting the others.
That approach should do the job, but such multiplexing comes with a cost: inaccuracies in control. It remains to be determined how such inaccuracies can be overcome.
In current systems, the digital- and analog-processing layers operate mainly at room temperature. Only the quantum-processing layer beneath them, the layer holding the qubits, is kept near absolute zero temperature. But as the number of qubits increases in future systems, the electronics making up all three of these layers will no doubt have to be integrated into one packaged cryogenic chip.
Some companies are currently building what you might call pre-prototype systems, based mainly on superconducting qubits. These machines contain a maximum of a few dozen qubits and are capable of executing tens to hundreds of coherent quantum operations. The companies pursuing this approach include tech giants Google, IBM, and Intel.
By extending the number of control lines, engineers could expand current architectures to a few hundred qubits, but that’s the very most. And the short time that these qubits remain coherent—today, roughly 50 microseconds—will limit the number of quantum instructions that can be executed before the calculation is consumed by errors.
Given these limitations, the main application I anticipate for systems with a few hundred qubits will be as an accelerator for conventional supercomputers. Specific tasks for which the quantum computer runs faster will be sent from a supercomputer to the quantum computer, with the results then returned to the supercomputer for further processing. The quantum computer will in a sense act like the GPU in your laptop, doing certain specific tasks, like matrix inversion or optimization of initial conditions, a lot faster than the CPU alone ever could.
During this next phase in the development of quantum computers, the application layer will be fairly straightforward to build. The digital-processing layer will also be relatively simple. But building the three layers that make up the QPU will be tricky.
Current fabrication techniques cannot produce completely uniform qubits. So different qubits have slightly different properties. That heterogeneity in turn requires the analog layer of the QPU to be tailored to the specific qubits it controls. The need for customization makes the process of building a QPU difficult to scale. Much greater uniformity in the fabrication of qubits would remove the need to customize what goes on in the analog layer and would allow for the multiplexing of control and measurement signals.
Multiplexing will be required for the large numbers of qubits that researchers will probably start introducing in 5 to 10 years so that they can add error correction to their machines. The basic idea behind such error correction is simple enough: Instead of storing the data in one physical qubit, multiple physical qubits are combined into one error-corrected, logical qubit.
Quantum error correction could solve the fundamental problem of decoherence, but it would require anywhere from 100 to 10,000 physical qubits per logical qubit. And that’s not the only hurdle. Implementing error correction will require a low-latency, high-throughput feedback loop that spans all three layers of the QPU.
It remains to be seen which of the many types of qubits being experimented with now—superconducting circuits, spin qubits, photonic systems, ion traps, nitrogen-vacancy centers, and so forth—will prove to be the most suitable for creating the large numbers of qubits needed for error correction. Regardless of which one proves best, it’s clear that success will require packaging and controlling millions of qubits if not more.
Which brings us to the big question: Can that really be done? The millions of qubits would have to be controlled by continuous analog signals. That’s hard but by no means impossible. I and other researchers have calculated that if device quality could be improved by a few orders of magnitude, the control signals used to perform error correction could be multiplexed and the design of the analog layer would become straightforward, with the digital layer managing the multiplexing scheme. These future QPUs would not require millions of digital connections, just some hundreds or thousands, which could be built using current techniques for IC design and fabrication.
The bigger challenge could well prove to be the measurement side of things: Many thousands of measurements per second would need to be performed on the chip. These measurements would be designed so that they do not disturb the quantum information (which remains unknown until the end of the calculation) while at the same time revealing and correcting any errors that arise along the way. Measuring millions of qubits at this frequency will require a drastic change in measurement philosophy.
The current way of measuring qubits requires the demodulation and digitization of an analog signal. At the measurement rate of many kilohertz, and with millions of qubits in a machine, the total digital throughput would be petabytes per second. That’s far too much data to handle using today’s techniques, which involve room-temperature electronics connected to the chip holding the qubits at temperatures near absolute zero.
Clearly, the analog and digital layers of the QPU will have to be integrated with the quantum-processing layer on the same chip, with some clever schemes implemented there for preprocessing and multiplexing the measurements. Fortunately, for the processing that is done to correct errors, not all qubit measurements would have to be passed up to the digital layer. That only needs to be done when local circuity detects an error, which drastically reduces the required digital bandwidth.
What goes on in the quantum layer will fundamentally determine how well the computer will operate. Imperfections in the qubits mean that you’ll need more of them for error correction, and as those imperfections get worse, the requirements for your quantum computer explode beyond what is feasible. But the converse is also true: Improvements in the quality of the qubits might be costly to engineer, but they would very quickly pay for themselves.
In the current pre-prototyping phase of quantum computing, individual qubit control is still unavoidable: It’s required to get the most out of the few qubits that we now have. Soon, though, as the number of qubits available increases, researchers will have to work out systems for multiplexing control signals and the measurements of the qubits.
The next significant step will be the introduction of rudimentary forms of error correction. Initially, there will be two parallel development paths, one with error correction and the other without, but error-corrected quantum computers will ultimately dominate. There’s simply no other route to a machine that can perform useful, real-world tasks.
To prepare for these developments, chip designers, chip-fabrication-process engineers, cryogenic-control specialists, experts in mass data handling, quantum-algorithm developers, and others will need to work together closely.
Such a complex collaboration would benefit from an international quantum-engineering road map. The various tasks required could then be assigned to the different sets of specialists involved, with the publishers of the road map managing communication between groups. By combining the efforts of academic institutions, research institutes, and commercial companies, we can and will succeed in building practical quantum computers, unleashing immense computing power for the future.
This article appears in the April 2020 print issue as “Quantum Computers Scale Up.”
About the Author
Richard Versluis is the system architect at QuTech, a quantum-computing collaboration between Delft University of Technology and the Netherlands Organization for Applied Scientific Research.
When you hear the words “data center” and “games,” you probably think of massive multiplayer online games like World of Warcraft. But there’s another kind of game going on in data centers, one meant to hog resources from the shared mass of computers and storage systems.
Even employees of Google, the company with perhaps the most massive data footprint, once played these games. When asked to submit a job’s computing requirements, some employees inflated their requests for resources in order to reduce the amount of sharing they’d have to do with others. Interestingly, some other employees deflated their resource requests to pretend that their tasks could easily fit within any computer. Once their tasks were slipped into a machine, those operations would then use up all the resources available on it and squeeze out their colleagues’ tasks.
Such trickery might seem a little comical, but it actually points to a real problem—inefficiency.
Globally, data centers consumed 205 billion kilowatt-hours of electricity in 2018. That’s not much less than all of Australia used, and about 1 percent of the world total. A lot of that energy is wasted because servers are not used to their full capacity. An idle server dissipates as much as 50 percent of the power it consumes when running at its peak; as the server takes on work, its fixed power costs are amortized over that work. Because a user running a single task typically takes up only 20 to 30 percent of the server’s resources, multiple users must share the server to boost its utilization and consequently its energy efficiency. Sharing also reduces capital, operating, and infrastructure costs. Not everybody is rich enough to build their own data centers, after all.
To allocate shared resources, data centers deploy resource-management systems, which divide up available processor cores, memory capacity, and network resources according to users’ needs and the system’s own objectives. At first glance, this task should be straightforward because users often have complementary demands. But in truth, it’s not. Sharing creates competition among users, as we saw with those crafty Googlers, and that can distort the use of resources.
So we have pursued a series of projects using game theory, the mathematical models that describe strategic interactions among rational decision makers, to manage the allocation of resources among self-interested users while maximizing data-center efficiency. In this situation, playing the game makes all the difference.
Helping a group of rational and self-interested users share resources efficiently is not just a product of the big-data age. Economists have been doing it for decades. In economics, market mechanisms set prices for resources based on supply and demand. Indeed, many of these mechanisms are currently deployed in public data centers, such as Amazon EC2 and Microsoft Azure. There, the transfer of real money acts as a tool to align users’ incentives (performance) with the provider’s objectives (efficiency). However, there are many situations where the exchange of money is not useful.
Let’s consider a simple example. Suppose that you are given a ticket to an opera on the day of your best friend’s wedding, and you decide to give the ticket to someone who will best appreciate the event. So you run what’s called a second-price auction: You ask your friends to bid for the ticket, stipulating that the winner pay you the amount of the second-highest bid. It has been mathematically proven that your friends have no incentives to misrepresent how much they value the opera ticket in this kind of auction.
If you do not want money or cannot make your friends pay you any, your options become very limited. If you ask your friends how much they would love to go the opera, nothing stops them from exaggerating their desire for the ticket. The opera ticket is just a simple example, but there are plenty of places—such as Google’s private data centers or an academic computer cluster—where money either can’t or shouldn’t change hands to decide who gets what.
Game theory provides practical solutions for just such a problem, and indeed it has been adapted for use in both computer networks and computer systems. We drew inspiration from those two fields, but we also had to address their limitations. In computer networks, there has been much work in designing mechanisms to manage self-interested and uncoordinated routers to avoid congestion. But these models consider contention over only a single resource—network bandwidth. In data-center computer clusters and servers, there is a wide range of resources to fight over.
In computer systems, there’s been a surge of interest in resource-allocation mechanisms that consider multiple resources, notably one called dominant resource fairness [PDF]. However, this and similar work is restricted to performance models and to ratios of processors and memory that don’t always reflect what goes on in a data center.
To come up with game theory models that would work in the data center, we delved into the details of hardware architecture, starting at the smallest level: the transistor.
Transistors were long made to dissipate ever less power as they scaled down in size, in part by lowering the operating voltage. By the mid-2000s, however, that trend, known as Denard Scaling, had broken down. As a result, for a fixed power budget, processors stopped getting faster at the rate to which we had become accustomed. A temporary solution was to put multiple processor cores on the same chip, so that the enormous number of transistors could still be cooled economically. However, it soon became apparent that you cannot turn on all the cores and run them at full speed for very long without melting the chip.
In 2012, computer architects proposed a workaround called computational sprinting. The concept was that processor cores could safely push past their power budget for short intervals called sprints. After a sprint, the processor has to cool down before the next sprint; otherwise the chip is destroyed. If done correctly, sprinting could make a system more responsive to changes in its workload. Computational sprinting was originally proposed for processors in mobile devices like smartphones, which must limit power usage both to conserve charge and to avoid burning the user. But sprinting soon found its way into data centers, which use the trick to cope with bursts of computational demand.
Here’s where the problem arises. Suppose that self-interested users own sprinting-enabled servers, and those servers all share a power supply in a data center. Users could sprint to increase the computational power of their processors, but if a large fraction of them sprint simultaneously, the power load will spike. The circuit breaker is then tripped. This forces the batteries in the uninterruptible power supply (UPS) to provide power while the system recovers. After such a power emergency, all the servers on that power supply are forced to operate on a nominal power budget—no sprinting allowed—while the batteries recharge.
This scenario is a version of the classic “tragedy of the commons,” first identified by British economist William Forster Lloyd in an 1833 essay. He described the following situation: Suppose that cattle herders share a common parcel of land to graze their cows. If an individual herder puts more than the allotted number of cattle on the common, that herder could achieve marginal benefits. But if many herders do that, the overgrazing will damage the land, hurting everyone.
Together with Songchun Fan, then a Duke University doctoral candidate, we studied sprinting strategies as a tragedy of the commons. We built a model of the system that focused on the two main physical constraints. First, for a server processor, a sprint restricts future action by requiring the processor to wait while the chip dissipates heat. Second, for a server cluster, if the circuit breaker trips, then all the server processors must wait while the UPS batteries recharge.
We formulated a sprinting game in which users, in each round, could be in one of three states: active, cooling after a sprint, or recovering after a power emergency. In each epoch, or round of the game, a user’s only decision is whether or not to sprint when their processor is active. Users want to optimize their sprinting to gain benefits, such as improved throughput or reduction in execution time. You should note that these benefits vary according to when the sprint happens. For instance, sprinting is more beneficial when demand is high.
Consider a simple example. You are at round 5, and you know that if you sprint, you will gain 10 units of benefit. However, you’d have to let your processor cool down for a couple of rounds before you can sprint again. But now, say you sprint, and then it turns out that if you had instead waited for round 6 to sprint, you could have gained 20 units. Alternatively, suppose that you save your sprint for a future round instead of using it in round 5. But it turns out that all the other users decided to sprint at round 5, causing a power emergency that prevents you from sprinting for several rounds. Worse, by then your gains won’t be nearly as high.
All users must make these kinds of decisions based on how much utility they gain and on other users’ sprinting strategies. While it might be fun to play against a few users, making these decisions becomes intractable as the number of competitors grows to data-center scale. Fortunately, we found a way to optimize each user’s strategy in large systems by using what’s called mean field game analysis. This method avoids the complexity of scrutinizing individual competitors’ strategies by instead describing their behavior as a population. Key to this statistical approach is the assumption that any individual user’s actions do not change the average system behavior significantly. Because of that assumption, we can approximate the effect of all the other users on any given user with a single averaged effect.
It’s kind of analogous to the way millions of commuters try to optimize their daily travel. An individual commuter, call her Alice, cannot possibly reason about every other person on the road. Instead she formulates some expectation about the population of commuters as a whole, their desired arrival times on a given day, and how their travel plans will contribute to congestion.
Mean field analysis allows us to find the “mean field equilibrium” of the sprinting game. Users optimize their responses to the population, and, in equilibrium, no user benefits by deviating from their best responses to the population.
In the traffic analogy, Alice optimizes her commute according to her understanding of the commuting population’s average behavior. If that optimized plan does not produce the expected traffic pattern, she revises her expectations and rethinks her plan. With every commuter optimizing at once, over a few days, traffic converges to some recurring pattern and commuters’ independent actions produce an equilibrium.
Using the mean field equilibrium, we formulated the optimal strategy for the sprinting game, which boils down to this: A user should sprint when the performance gains exceed a certain threshold, which varies depending on the user. We can compute this threshold using the data center’s workloads and its physical characteristics.
When everybody operates with their optimal threshold at the mean field equilibrium, the system gets a number of benefits. First, the data center’s power management can be distributed, as users implement their own strategies without having to request permission from a centralized manager to sprint. Such independence makes power control more responsive, saving energy. Users can modulate their processor’s power draw in microseconds or less. That wouldn’t be possible if they had to wait tens of milliseconds for permission requests and answers to wind their way across the data center’s network. Second, the equilibrium gets more computing done, because users optimize strategies for timely sprints that reflect their own workload demands. And finally, a user’s strategy becomes straightforward—sprinting whenever the gain exceeds a threshold. That’s extremely easy to implement and trivial to execute.
The sprinting power-management project is just one in a series of data-center management systems we’ve been working on over the past five years. In each, we use key details of the hardware architecture and system to formulate the games. The results have led to practical management mechanisms that provide guarantees of acceptable system behavior when participants act selfishly. Such guarantees, we believe, will only encourage participation in shared systems and establish solid foundations for energy-efficient and scalable data centers.
Although we’ve managed to address the resource-allocation problem at the levels of server multiprocessors, server racks, and server clusters, putting them to use in large data centers will require more work. For one thing, you have to be able to generate a profile of the data center’s performance. Data centers must therefore deploy the infrastructure necessary to monitor hardware activity, assess performance outcomes, and infer preferences for resources.
Most game theory solutions for such systems require the profiling stage to happen off-line. It might be less intrusive instead to construct online mechanisms that can start with some prior knowledge and then update their parameters during execution as characteristics become clearer. Online mechanisms might even improve the game as it’s being played, using reinforcement learning or another form of artificial intelligence.
There’s also the fact that in a data center, users may arrive and depart from the system at any time; jobs may enter and exit distinct phases of a computation; servers may fail and restart. All of these events require the reallocation of resources, yet these reallocations may disrupt computation throughout the system and require that data be shunted about, using up resources. Juggling all these changes while still keeping everyone playing fairly will surely require a lot more work, but we’re confident that game theory will play a part.
This article appears in the April 2020 print issue as “A Win for Game Theory in the Data Center.”
About the Authors
Benjamin C. Lee, an associate professor of electrical and computer engineering at Duke University, and Seyed Majid Zahedi, an assistant professor at the University of Waterloo, in Ont., Canada, describe a game they developed that can make data centers more efficient. While there’s a large volume of literature on game theory’s use in computer networking, Lee says, computing on the scale of a data center is a very different problem. “For every 10 papers we read, we got maybe half an idea,” he says.
Post Syndicated from Charles Q. Choi original https://spectrum.ieee.org/tech-talk/computing/hardware/image-neural
A new ultra-fast machine-vision device can process images thousands of times faster than conventional techniques with an image sensor that is also an artificial neural network.
Machine vision technology often uses artificial neural networks to analyze images. In artificial neural networks, components dubbed “neurons” are fed data and cooperate to solve a problem, such as recognizing images. The neural net repeatedly adjusts the strength of the connections or “synapses” between its neurons and sees if the resulting patterns of behavior are better at solving the problem. Over time, the network discovers which patterns are best at computing solutions. It then adopts these as defaults, mimicking the process of learning in the human brain.
Election security experts will be carefully watching the Democratic primaries and caucuses in 14 states and one U.S. territory on Super Tuesday for signs of irregularities which may prevent accurate and timely reporting of voting results. Of particular interest will be Los Angeles County, where election officials are debuting brand-new custom voting machines to improve how residents vote.
Los Angeles County officials spent US $300 million over the past 10 years to make it easier and more convenient for people to vote—by expanding voting schedules, redesigning ballots, and building 31,000 new ballot-marking machines. As the nation’s largest county in terms of the number of residents, the geographic area that it covers, and the number of languages that must be supported, county officials decided to commission a brand-new system built from scratch instead of trying to customize existing systems to meet their requirements.
Post Syndicated from Samuel K. Moore original https://spectrum.ieee.org/tech-talk/computing/hardware/honeywells-ion-trap-quantum-computer-makes-big-leap
Honeywell may be a giant industrial technology firm, but it’s definitely not synonymous with advanced computing. Yet the company has made a ten-year commitment to developing an inhouse quantum computing, and it is about to start paying back.
“We expect within next three months we will be releasing world’s most powerful quantum computer,” says Tony Uttley, president of Honeywell Quantum Solutions. It’s the kind of claim competitors like IBM and Google have made periodically, but with Honeywell there’s a difference. Those others, using superconducting components chilled to near absolute zero, have been racing to cram more and more qubits onto a chip, Google reached its “quantum supremacy” milestone with 53 qubits. Uttley says Honeywell can beat it with a handful of its ion qubits.
Uttley is measuring its success using a relatively new metric pushed by IBM and called quantum volume. It’s essentially a measure of the number of physical qubits, how connected they are, and how error prone they are. IBM claimed a leading quantum volume of 32 using a 28-qubit system in early January. Honeywell’s four-qubit system reached 16, and it will hit 64 in coming months, says Uttley
The company has an ambitious path toward rapid expansion after that. “We expect to be on a trajectory to increase quantum volume 10-fold every year for the next five years,” he says. IBM is planning to double its figure every year.
Honeywell’s computer uses ytterbium ions trapped in an electromagnetic field in a narrow groove built in a chip. The qubit relies on the spin state of the ion’s outermost electron and that of its nucleus. This can be manipulated by lasers and can hold its state—remain coherent—for a fairly long time compared to other types of qubits. Importantly, the qubits can be moved around on the trap chip, allowing them to interact in ways that produce quantum algorithms
“We chose trapped ions because we believe in these early days of quantum computing, quality of qubit is going to matter most,” says Uttley.
Honeywell is claiming qubits that are so free from corruption that they’ve achieved a first, a a “mid-circuit” measurement. That is, the system can interrogate the state of a qubit during a computation without damaging the states of the others, and, based on that observed qubit, it can change what the rest of the computation does. “It’s equivalent to an ‘if’ statement,” explains Uttley. Mid-circuit measurements are not currently possible in other technologies. “It’s theoretically possible,” he says. “But practically speaking, it will be a point of differentiation [for Honeywell] for a while.”
Ion-trap quantum systems were first developed at the U.S. National Institute of Standards and Technology in the 1990s. In 2015, a veteran of that group Chris Monroe cofounded the ion-trap quantum computer company
IonQ. IonQ has already fit 160 ytterbium-based qubits in its system and performed operations on 79 of them. The startup has
published several tests of its system, but not a quantum volume measure.
Two mysterious components of quantum technology came together in a lab at Rice University in Houston recently. Quantum entanglement—the key to quantum computing—and quantum criticality—an essential ingredient for high-temperature superconductors—have now been linked in a single experiment.
The preliminary results suggest something approaching the same physics is behind these two essential but previously distinct quantum technologies. The temptation, then, is to imagine a future in which a sort of grand unified theory of entanglement and superconductivity might be developed, where breakthroughs in one field could be translated into the other.
Post Syndicated from Samuel K. Moore original https://spectrum.ieee.org/computing/hardware/4-ways-to-make-bigger-quantum-computers
As researchers strive to boost the capacity of quantum computers, they’ve run into a problem that many people have after a big holiday: There’s just not enough room in the fridge.
Today’s quantum-computer processors must operate inside cryogenic enclosures at near absolute zero, but the electronics needed for readout and control don’t work at such temperatures. So those circuits must reside outside the refrigerator. For today’s sub-100-qubit systems, there’s still enough space for specialized cabling to make the connection. But for future million-qubit systems, there just won’t be enough room. Such systems will need ultralow-power control chips that can operate inside the refrigerator. Engineers unveiled some potential solutions in December during the IEEE International Electron Devices Meeting (IEDM), in San Francisco. They ranged from the familiar to the truly exotic.
Perhaps the most straightforward way to make cryogenic controls for quantum computers is to modify CMOS technology. Unsurprisingly, that’s Intel’s solution. The company unveiled a cryogenic CMOS chip called Horse Ridge that translates quantum-computer instructions into basic qubit operations, which it delivers to the processor as microwave signals.
Horse Ridge is designed to work at 4 kelvins, a slightly higher temperature than the qubit chip itself, but low enough to sit inside the refrigerator with it. The company used its 22-nanometer FinFET manufacturing process to build the chip, but the transistors that make up the control circuitry needed substantial reengineering.
“If you take a transistor and cool it to 4 K, it’s not a foregone conclusion that it will work,” says Jim Clarke, director of quantum hardware at Intel. “There are a lot of fundamental characteristics of devices that are temperature dependent.”
Others are working along the same lines. Google presented a cryogenic CMOS control circuit earlier in 2019. In research that was not yet peer-reviewed at press time, Microsoft and its collaborators say they have built a 100,000-transistor CMOS control chip that operates at 100 millikelvins.
In logic circuits, transistors act as switches, but they aren’t the only devices that do so. Engineers in Tsu-Jae King Liu’s laboratory at the University of California, Berkeley, have developed micrometer-scale electromechanical relays as ultralow-power alternatives to transistors. They were surprised to discover that their devices operate better at 4 K than at room temperature.
At room temperature, the devices suffer some mechanical peculiarities. First, ambient oxygen can react with the relay’s electrode surfaces. Over time, this reaction can form a high-resistance layer, limiting the device’s ability to conduct current. But at cryogenic temperatures, oxygen freezes out of the air, so that problem doesn’t exist.
Second, the contacts in microscale relays tend to stick together. This shows up as a hysteresis effect: The relay opens at a slightly different voltage than the one at which it closes. But because the adhesive forces are weaker at cryogenic temperatures, the hysteresis is less than 5 percent of what it is at room temperature.
“We didn’t suspect ahead of time that these devices would operate so well at cryogenic temperatures,” says Liu, who led the research presented at IEDM by her graduate student Xiaoer Hu. “In retrospect, we should have.”
Single-flux quantum logic
Hypres, in Elmsford, N.Y., has been commercializing cryogenic ICs for several years. Seeking to steer its rapid single-flux quantum (RSFQ) logic tech into the realm of quantum computing, the company recently spun out a startup called Seeqc.
In RSFQ and its quantum version, SFQuClass logic, quantized pulses of voltage are blocked, passed, or routed by Josephson junctions, the same type of superconducting devices that make up most of today’s quantum computer chips. In 2014 physicists at University of Wisconsin–Madison first suggested that these pulses could be used to program qubits, and Seeqc’s scientists have been collaborating with them and Syracuse University scientists since 2016.
Seeqc is now designing an entire system using the technology: a digital-control, error-correction, and readout chip designed to work at 3 to 4 K and a separate chip designed to work at 20 millikelvins to interface with the quantum processor.
Quantum computing is already strange, but it might take some even stranger tech to make it work. Scientists at Lund University, in Sweden, and at IBM Research–Zurich have designed a new device called a Weyl semimetal amplifier that they say could bring readout electronics closer to the qubits. Don’t worry if you don’t know what a Weyl semimetal is. There are things about these materials that even the scientists trying to make devices from them don’t fully understand.
What they do know is that these materials, such as tungsten diphosphide, exhibit extremely strong, temperature-dependent magnetoresistance when chilled to below about 50 K. The device they simulated has a gate electrode that produces a magnetic field inside the Weyl semimetal, causing its resistance to go from tiny to huge in a matter of picoseconds. Connecting the input from a qubit to the device could make a high-gain amplifier that dissipates a mere 40 microwatts. That could be low enough for the amplifier to live in the part of the fridge close to where the qubits themselves reside.
This article appears in the February 2020 print issue as “4 Ways to Handle More Qubits.”