In 1997 the IBM Deep Blue supercomputer defeated world chess champion Garry Kasparov. It was a groundbreaking demonstration of supercomputer expertise and a first examine into how high-performance computing may perhaps one day overtake human–level intelligence. In the 10 years that adopted, we began to make spend of artificial intelligence for many practical tasks, such as facial recognition, language translation, and recommending movies and merchandise.
Fast-forward another decade and a half and artificial intelligence has advanced to the point the place it can “synthesize data.” Generative AI, such as ChatGPT and Stable Diffusion, can manufacture poems, create artwork, diagnose disease, write summary experiences and computer code, and even contrivance integrated circuits that rival those made by humans.
Great alternatives lie ahead for artificial intelligence to change into a digital assistant to all human endeavors. ChatGPT is a correct example of how AI has democratized the spend of high-performance computing, offering advantages to each individual in society.
All those marvelous AI applications have been on account of a few factors: innovations in surroundings friendly machine-learning algorithms, the availability of massive amounts of data on which to train neural networks, and progress in energy-surroundings friendly computing thru the advancement of semiconductor expertise. This last contribution to the generative AI revolution has acquired less than its fair share of credit, regardless of its ubiquity.
Over the last three decades, the major milestones in AI had been all enabled by the leading-edge semiconductor expertise of the time and would have been very now not probably with out it. Deep Blue was carried out with a combine of 0.6- and 0.35-micrometer-node chip-manufacturing expertise. The deep neural community that acquired the ImageNet competition, kicking off the latest era of machine learning, was carried out with 40-nanometer expertise. AlphaGo conquered the game of Dawdle the spend of 28-nm expertise, and the initial model of ChatGPT was trained on computer programs built with 5-nm expertise. The most latest incarnation of ChatGPT is powered by servers the spend of even extra advanced 4-nm expertise. Each layer of the computer programs involved, from software and algorithms all the way down to the architecture, circuit contrivance, and software expertise, acts as a multiplier for the performance of AI. On the opposite hand it’s fair to say that the foundational transistor-software expertise is what has enabled the advancement of the layers above.
If the AI revolution is to continue at its latest pace, it’s going to want even extra from the semiconductor commercial. Within a decade, it’ll want a 1-trillion-transistor GPU—that is, a GPU with 10 instances as many gadgets as is typical today.
Advances in semiconductor expertise [top line]—along with new materials, advances in lithography, new varieties of transistors, and advanced packaging—have driven the enchancment of extra capable AI programs [bottom line]
Relentless Divulge in AI Mannequin Sizes
The computation and memory access required for AI training have increased by orders of magnitude in the past 5 years. Training GPT-3, for example, requires the equivalent of extra than 5 billion billion operations per 2d of computation for an entire day (that’s 5,000 petaflops-days), and 3 trillion bytes (3 terabytes) of memory capacity.
Both the computing energy and the memory access wished for brand spanking new generative AI applications continue to develop rapidly. We now want to answer a pressing demand: How can semiconductor expertise sustain pace?
From Integrated Gadgets to Integrated Chiplets
Since the invention of the integrated circuit, semiconductor expertise has been about scaling down in feature dimension so that we can cram extra transistors into a thumbnail-dimension chip. Today, integration has risen one level better; we are going beyond 2D scaling into 3D system integration. We are now striking together many chips into a tightly integrated, massively interconnected system. That is a paradigm shift in semiconductor-expertise integration.
In the era of AI, the capability of a system is straight proportional to the preference of transistors integrated into that system. Considered one of the main limitations is that lithographic chipmaking tools have been designed to make ICs of no extra than about 800 square millimeters, what’s called the reticle restrict. But we can now lengthen the scale of the integrated system beyond lithography’s reticle restrict. By attaching several chips onto a larger interposer—a allotment of silicon into which interconnects are built—we can integrate a system that contains a considerable larger preference of gadgets than what is feasible on a single chip. For example, TSMC’s chip-on-wafer-on-substrate (CoWoS) expertise can accommodate as much as six reticle fields’ price of compute chips, along with a dozen high-bandwidth-memory (HBM) chips.
HBMs are an example of the varied key semiconductor expertise that is increasingly important for AI: the ability to integrate programs by stacking chips atop one another, what we at TSMC call system-on-integrated-chips (SoIC). An HBM includes a stack of vertically interconnected chips of DRAM atop a management common sense IC. It makes spend of vertical interconnects called through-silicon-vias (TSVs) to glean signals thru each chip and solder bumps to gain the connections between the memory chips. Today, high-performance GPUs spend HBMextensively.
Going forward, 3D SoIC expertise can present a “bumpless alternative” to the conventional HBM expertise of today, turning in far denser vertical interconnection between the stacked chips. Latest advances have shown HBM test structures with 12 layers of chips stacked the spend of hybrid bonding, a copper-to-copper connection with a better density than solder bumps can present. Bonded at low temperature on high of a larger base common sense chip, this memory system has a total thickness of factual 600 µm.
With a high-performance computing system tranquil of a large preference of dies running large AI models, high-pace wired communication may speedily restrict the computation pace. Today, optical interconnects are already being ragged to attach server racks in data facilities. We are able to rapidly want optical interfaces based on silicon photonics that are packaged along with GPUs and CPUs. This may allow the scaling up of energy- and area-surroundings friendly bandwidths for scream, optical GPU-to-GPU communication, such that a entire lot of servers can behave as a single giant GPU with a unified memory. Because of the demand from AI applications, silicon photonics will change into certainly one of the semiconductor commercial’s most important enabling technologies.
Toward a Trillion Transistor GPU
As famous already, typical GPU chips ragged for AI training have already reached the reticle self-discipline restrict. And their transistor depend is about 100 billion gadgets. The continuation of the pattern of increasing transistor depend would require a number of chips, interconnected with 2.5D or 3D integration, to gain the computation. The integration of a number of chips, both by CoWoS or SoIC and related advanced packaging technologies, allows for a considerable larger total transistor depend per system than can be squeezed into a single chip. We forecast that within a decade a multichiplet GPU will have extra than 1 trillion transistors.
We’ll want to hyperlink all these chiplets together in a 3D stack, however fortunately, commercial has been able to rapidly scale down the pitch of vertical interconnects, increasing the density of connections. And there may be a number of room for extra. We gaze no reason why the interconnect density can’t develop by an relate of magnitude, and even beyond.
Energy-Efficient Performance Pattern for GPUs
So, how accomplish all these innovative hardware technologies contribute to the performance of a system?
We can gaze the pattern already in server GPUs if we examine at the steady enchancment in a metric called energy-surroundings friendly performance. EEP is a mixed measure of the energy effectivity and pace of a system. Over the past 15 years, the semiconductor commercial has increased energy-surroundings friendly performance about threefold each two years. We acquire this pattern will continue at historical rates. This can be driven by innovations from many sources, along with new materials, software and integration expertise, shameful ultraviolet (EUV) lithography, circuit contrivance, system architecture contrivance, and the co-optimization of all these expertise parts, among varied issues.
In particular, the EEP increase shall be enabled by the advanced packaging technologies we’ve been discussing here. Additionally, ideas such as system-expertise co-optimization (STCO), the place the varied functional parts of a GPU are separated onto their have chiplets and built the spend of essentially the most efficient performing and most economical technologies for each, will change into increasingly critical.
A Mead-Conway Moment for 3D Integrated Circuits
In 1978, Carver Mead, a professor at the California Institute of Know-how, and Lynn Conway at Xerox PARC invented a computer-aided contrivance methodology for integrated circuits. They ragged a area of contrivance principles to portray chip scaling so that engineers may perhaps easily contrivance very-large-scale integration (VLSI) circuits with out considerable data of activity expertise.
That same kind of capability is wished for 3D chip contrivance. Today, designers want to know chip contrivance, system-architecture contrivance, and hardware and software optimization. Manufacturers want to know chip expertise, 3D IC expertise, and advanced packaging expertise. As we did in 1978, we again want a general language to portray these technologies in a way that digital contrivance tools understand. Such a hardware description language presents designers a free hand to work on a 3D IC system contrivance, regardless of the underlying expertise. It’s on the way: An start-source standard, called 3Dblox, has already been embraced by most of today’s expertise companies and digital contrivance automation (EDA) companies.
The Future Beyond the Tunnel
In the era of artificial intelligence, semiconductor expertise is a key enabler for brand spanking new AI capabilities and applications. A new GPU is now not any longer restricted by the standard sizes and gain factors of the past. New semiconductor expertise is now not any longer small to scaling down the next-generation transistors on a two-dimensional plane. An integrated AI system can be tranquil of as many energy-surroundings friendly transistors as is practical, an surroundings friendly system architecture for specialized compute workloads, and an optimized relationship between software and hardware.
For the past 50 years, semiconductor-expertise vogue has felt care for walking interior a tunnel. The road ahead was clear, as there was a properly-defined path. And all people knew what wished to be performed: shrink the transistor.
Now, we have reached the cessation of the tunnel. From here, semiconductor expertise will glean harder to originate. But, beyond the tunnel, many extra probabilities lie ahead. We are now not promenade by the confines of the past.