Whereas the
Top500 checklist ranks the five hundred greatest excessive-performance laptop methods (HPCs) in the sphere, its cousin the Green500 re-ranks the identical 500 supercomputers in accordance with their energy effectivity. For the closing three iterations of the checklist, Henri—a shrimp supercomputer operated by the Flatiron Institute in Unique York—has been named the sphere’s most energy efficient excessive-performance laptop. In-constructed the autumn of 2022, Henri turn out to be the first system to utilize Nvidia’s H100 GPU’s, aka Hopper.
To be taught the secrets and ways of constructing and inserting forward the most energy-efficient supercomputer, we caught up with Henri’s architect,
Ian Fisk, who is co-director of the Scientific Computing Core on the Flatiron Institute. Flatiron is an interior be taught division of the Simons Foundation that brings collectively researchers using contemporary computational tools to attain our understanding of science.
The Flatiron Insitute’s Ian Fisk on…
- Building Henri
- Getting and protecting the end Green500 set
- The strategy forward for Henri and its successors
- Powering science
IEEE Spectrum: Where did the name Henri come from?
Ian Fisk: The name came about for a silly reason. Our outdated machine turn out to be called Rusty. So, when asked by the seller what the machine name turn out to be going to be, we acknowledged, ‘successfully, by our naming convention, it’ll be Rusty, and it’s using [Nvidia’s] H100 chip, so it’d be Rusty Hopper.’ But Rusty Hopper sounds fancy a country singer from the 1980s, so that they didn’t must name it that. And even handed one of many Nvidia engineers who made up our minds that that it is advisable additionally be in a station to truly catch a machine that will catch the top500 and be the end of the Green500 had good had a son named Henri. So, we were asked by the seller if we could additionally take into consideration naming it after that person, which we belief turn out to be candy.
Since the Green500 measures performance per watt, it doesn’t topic how rapid that it is advisable additionally be, it issues how rapid that it is advisable additionally be for what number of watts you used. —Ian Fisk, Flatiron Institute
Did you quandary out to catch the sphere’s greenest supercomputer?
Fisk: Nvidia sold us that gear at an tutorial low cost fee in allotment because we were aiming for this benchmark. It turn out to be lawful for us because it gave us some exposure, however we actually wished the hardware for the scientists, and it turn out to be one method for us to catch catch admission to to H100s very early. But to construct that, we had to construct the test in November 2022. So the equipment came to the loading dock in October, and it turn out to be assembled into a laptop and then examined in legend time. If there turn out to be an award for the rapid 500, we would additionally be the winner.
The numbers in the first test mosey [November 2022] weren’t as lawful as the second time [June 2023]. The second time when there turn out to be a dinky bit bit beyond regular time to breathe, we upgraded the machine. It turn out to be greater: it turn out to be 80 GPUs the first time and 144 the second time. It’s 2.7 petaflops, which for two racks of equipment is an cheap measurement. It’s round 250 on the end 500 greatest supercomputers checklist. And then predominant on the fairway 500 checklist.
Are you able to existing your originate decisions when constructing Henri? Why Nvidia’s H100s?
Fisk: Our journey with Nvidia, which works the entire method abet to
K40s, turn out to be that every technology turn out to be about two to three instances sooner than its predecessor. And that turn out to be undoubtedly upright of the entire things that led as a lot as it, fancy the V100 and the A100. It’s about two and a half of instances better. We already had two racks of A100s, and when it came time to upgrade the flexibility, H100s were the ingredient to aquire.
The H100 on the time were handiest readily available in the PCI linked model, they didn’t personal the NV-hyperlink option yet. And they didn’t personal any water-cooled ones, so we were using air cooled methods again. The GPUs before that machine and after personal all been water cooled methods, because they’re good a dinky bit bit extra efficient, and easier to feature because that it is advisable additionally build away with considerable extra heat. But we chose it because we were expecting very nice performance numbers. And we received them, at closing. With Nvidia, the instrument and the hardware originate of come out on the identical time. And the performance tends to catch better over time as things catch optimized smartly.
The ingredient that separates a laptop from a supercomputer is the low-latency fabric. And on nearly all methods upright now, that low-latency fabric is InfiniBand. The handiest folks that provide it is Mellanox [Technologies], which turn out to be no longer too prolonged previously received by the NVIDIA Corporation, so that they beget your entire stack.
[What] has allowed us to stand on top has been that technology has evolved to utilize extra energy moderately than be extra efficient. We didn’t demand to catch extra than as soon as. —Ian Fisk, Flatiron Institute
There turn out to be one originate decision that turn out to be originate of thrust upon us that we’re revisiting upright now. After we offered the system, the handiest chassis that that it is advisable aquire were PCI Gen 4, and the H100s use PCI Gen 5. In consequence of it turn out to be Gen 4, we were restricted by the communication mosey to the GPUs and to the InfiniBand cards. After we started, we had HDR cards at 100 gigabits every. And all of us of sudden found that that wasn’t going to be enough to construct a lawful test for the Green500. So, we upgraded to 400 gigabits of InfiniBand on every node, and that helped some. Own we had PCIe Gen 5, lets personal had two instances 400 gigabits, and that will were even better.
Lend a hand to top
What optimizations did you wish to construct for the Green500 test?
Fisk: I deem doing the Green500 mosey is a dinky bit bit fancy being a hypermiler. You personal a Honda Civic and you power all the strategy in which via the country getting sixty miles per gallon with the windows closed AC off, and accelerating very slowly, however that’s no longer precisely the manner you’d power it in a flee to catch somewhere. As an instance, as soon as you construct the Green500 mosey, everything that doesn’t generate performance is turn out to be down. There are astronomical obtain-impart drives on all of the methods of this style as soon as you’re running in production, since you wish to abet coaching samples to machine studying functions. But they use energy, and they don’t give you any performance, so those catch turn out to be off. It’s a dinky bit bit fancy a hypermiler taking the spare tire out of their vehicle because they wished to catch better mileage, however it’s no longer how they would actually power it the entire time.
How personal you been in a station to connect the #1 set for nearly two years?
Fisk: Indubitably, the ingredient that will knock Henri off its perch often is the following technology of hardware. But I deem the ingredient that has allowed us to stand on top has been that technology has evolved to utilize extra energy moderately than be extra efficient. We didn’t demand to catch extra than as soon as, we were expecting that folks would come along with the water-cooled model of H100’s and be extra efficient than us, however that hasn’t came about to this level.
The H100 comes in two fashions, the PCI model that plugs into the board as a card and the motherboard mount, it’s called an SXM5. And the SXM5 is the NV-linked model. The astronomical incompatibility is that there the SXM5 has a communication protocol between the GPUs that lets in them to chat to one another at 900 gigabytes per second. It’s dramatically better than anything on InfiniBand. It’s actually what lets in them to solve complications fancy gargantuan language fashions, because as soon as you’re doing heaps of these calculations, at every epoch, there could additionally be a gargantuan amount of files that has to head with the circulate abet and forth. So those communication links are wanted, however they additionally use extra electrical energy. The LINPACK benchmark that they construct for the green500 test benefits from a lawful communication layer, however no longer at that stage.
The clarification why no one has overwhelmed the machine yet is that the SXM5s actually use considerable extra electrical energy, they use 700 watts per GPU while ours handiest use 350, and the performance [on things like the LINPACK test] is never any longer a ingredient of two various. Since the green500 measures performance per watt, it doesn’t topic how rapid that it is advisable additionally be, it issues how rapid that it is advisable additionally be for what number of watts you used. And that’s the ingredient that we glance with those PCI linked H100s, they are very laborious to beat because they don’t use heaps of electrical energy and they personal the same performance to the considerable increased wattage stuff on heaps of these calculations.
Lend a hand to top
Attain you demand to be the greenest supercomputer again in Might maybe simply?
Fisk: Effectively, we are constructing a brand new machine with 96 GPUs, these often is the SXM5s, water-cooled NV-linked units. We can know soon in the occasion that they’re going to personal better performance. As I talked about, they could additionally be sooner, however they could now not be extra efficient. But, one ingredient we found with our A100s turn out to be that nearly all of the performance is readily available in the first half of the wattage, so that you catch 90 p.c of the performance in the first 225 Watts. So, even handed one of many things that we’re going to are trying with the water-cooled system is to mosey it in energy capped mode, and look what form of performance we catch.
The prolonged mosey is going to be expensive. And the prolonged mosey is going to be very excessive powered. —Ian Fisk, Flatiron Institute
One nice ingredient regarding the water-cooled model is that it doesn’t need followers, for the reason that followers depend in opposition to your wattage. When these units are running, it’s about four kilowatts of energy per three units of dwelling (3U). So it’s fancy forty 100 watt light bulbs in a shrimp field. Cooling that down requires blowing a gargantuan amount of air all the strategy in which via it, so that that it is advisable additionally personal just a few 100 watts of followers. And with water cooling, you good personal a central pump, which method most distinguished savings. The warmth capacity of water is about 4000 instances the heat capacity of air by quantity, so that you wish to utilize loads much less of it.
It’s going to be attention-grabbing to search the following Green500 checklist in Might maybe simply of this one year. We’ll look who comes along and whether nobody beats us, or someone beats us, or we beat ourselves. It’s all ability.
What does the prolonged mosey glance fancy for Henri or its successor?
Fisk: The prolonged mosey is going to be expensive. And the prolonged mosey is going to be very excessive powered.
After we started, the GPU turn out to be a actually good resource that turn out to be very lawful for machine studying and definite kinds of linear algebra calculations. At the starting, each person used a single GPU. Then they started using them collectively in groups the set they would match their computation all the strategy in which via loads of nodes, as a lot as eight nodes. Now, we’re seeing extra and extra folks that must construct tightly linked gargantuan language fashions, the set it requires 100 GPUs or loads of 100s of GPUs linked in options that we never would personal imagined.
For the following quandary of sources we’re buying, the community connectivity is 16 instances better than those that came before that. It’s a the same quandary of equipment, however these ones personal 1.6 terabit of communication per node, as in comparison with 100 gigabits. And it makes the machines very expensive, because all of sudden the community fabric is a gargantuan ingredient in the aquire fee, since you would like heaps and hundreds InfiniBand switches and hundreds cables. And these are 800 gigabit—extraordinary, very excessive performance cables.
With tightly linked GPUs that it is advisable additionally catch fashions that personal 10 to the energy of 10 parameters. And right here’s what’s actually riding that categorical technology. —Ian Fisk, Flatiron Institute
We demand there’ll be hundreds folks that are running extinct excessive-performance computing codes. But now there’s this new neighborhood that must make use of astronomical chunks of very vital sources, and we’re trying to toughen those folks. It’s sophisticated, in allotment because we are competing with industries that construct this, too. All these sources are very laborious to aquire, they personal prolonged lead instances, they’re very expensive, in allotment because it’s driven by the AI gold flee that is going on upright now. We’re trying to determine our set in that, and so we’re buying a medium scale machine. And we don’t know what happens after that.
Lend a hand to top
What’s the Flatiron Institute using Henri for?
Fisk: It’s a combination. I’d utter, soundless 75 or 80 p.c is what I’d take into consideration canned machine studying functions. Right here’s PyTorch essentially, the set folks are constructing fashions to catch either simulation or prediction of heaps of things, finding correlations. This runs all the strategy in which via your entire spectrum. We’ve received folks that are having a investigate cross-test at easy options to understand the AI and catch better fashions. We additionally personal folks that are working on things fancy structural methods biology, purchasing for correlations of microbiome in the gut. We now personal folks working on protein structure, gene feature, having a investigate cross-test at gene sequences, and using machine studying ways to establish what’s going on.
The most latest mission is is known as
Polymathic AI. A simplistic abstract could be something fancy ChatGPT for science. The premise is to catch a gargantuan ample foundation model for science, the set you educate the AI algorithms loads about bodily processes, and then demand of them to construct things fancy fluid dynamics simulations. It’s a extraordinarily bold mission. And they’re trying to determine easy options to catch greater, easy options to scale up their work. And the muse on the abet of right here’s that with tightly linked GPUs that it is advisable additionally catch fashions that personal 10 to the energy of 10 parameters. And right here’s what’s actually riding that categorical technology.
Henri is a workhorse machine. Whenever you happen to head into the queue upright now, it’s completely fleshy. If I needed to mosey another green 500 test and utter: ‘I’m going to clutch this ingredient offline for two weeks,’ I’d personal riot on my hands. There could be pitchforks outdoors my set of labor. So yes, it’s a extraordinarily green efficient laptop. But on the pause of the day, its legacy is all of the unparalleled science it enables.
Lend a hand to top