HPE doubles controller nodes, boosts capability with denser drives, cuts caches and builds in 100Gbps hyperlinks to impart denser, sooner storage geared toward all aspects of the AI/ML pipeline
By
-
Yann Serra,
LeMagIT -
Antony Adshead,
Storage Editor
Published: 26 Mar 2024 16:30
In a push towards synthetic intelligence (AI) workloads, HPE has upgraded its Alletra MP storage arrays to join double the amount of servers and provide 4x the capability in the same rackspace.
A yr after initial launch, Alletra MP now has four administration nodes per chassis as an alternate of two, every with an AMD Epyc CPU with eight, 16 or 32 cores. Its 2U recordsdata nodes appreciate now also change into 1U, with 10 SSDs of 61.44TB for a most capability of 1.2PB in 2U. Beforehand, Alletra MP recordsdata nodes contained 20 SSDs of seven.68TB or 15.36TB (making as much as 300TB per node).
That develop in node numbers methodology Alletra MP can now join to 2x the amount of servers and affords them 4x the capability for a comparable datacentre rackspace, and a the same vitality utilization, fixed with HPE.
“We call this new generation Alletra MP for AI,” mentioned Olivier Tant, HPE Alletra MP NAS knowledgeable.
“That’s attributable to we mediate it’s perfectly suited to interchange storage solutions in accordance to GPFS or BtrFS, that are complicated to deploy, but that are on the entire ancient for AI workloads. We also think our arrays are extra efficient than those from DDN in HPC or Isilon in media workloads.”
The SAN block catch entry to version works treasure the NAS file catch entry to version with 100Gbps ROCE switches that enable any controller node to catch entry to any recordsdata node.
“The tremendous profit of our respond over the competition is that every particular person nodes in the cluster seek advice from every other,” mentioned Tant. “In other words, our competitors are diminutive to, to illustrate, 16 storage nodes of which three shall be ancient by redundant recordsdata for erasure coding. That’s 15% to 20% of capability. We are in a position to deploy a cluster of 140 nodes of which three are ancient for redundancy through erasure coding. We only lose about 2% of capability, and that’s a valid economic advantage.”
The secret recipe: 100Gbps ROCE switches between nodes
“Our respond is also better performing, which is attributable to, paradoxically, attributable to we don’t employ cache at controller stage,” mentioned Michel Father or mother, HPE Alletra MP NAS knowledgeable. “With NVMe/ROCE connectivity of 100Gbps all the contrivance in which through all aspects in the array, cache becomes counter-productive.
“Cache doesn’t inch up the rest, and actually slows down the array with incessant reproduction and verification operations,” he added. Primarily based on Father or mother, no other storage array on the market uses NVMe/ROCE at speeds as high as 100Gbps per port.
Hosts employ Ethernet or Infiniband (treasure minded with Nvidia GPUDirect) to catch entry to the controller node closest to them. One day of writes, that node carries out erasure coding and shards the main recordsdata to other SSD nodes. From the point of survey of network hosts, all controller nodes present the same file volumes and block LUNs.
In NAS mode – in which Alletra MP uses Gargantuan Files’s file catch entry to map – there would possibly be cache made up of hasty flash SCM from Kioxia. This buffer serves as a workspace to deduplicate and compress file recordsdata.
“Our map of recordsdata chop rate is one among the correct performing, fixed with varied benchmarks,” mentioned Tant. “All duplicates in the info are eradicated. Then an algorithm finds blocks most resemble every other and compresses them, and it’s very efficient.”
The one aspects of recordsdata which will seemingly be shared between several nodes are folks who outcome from erasure coding. By preference, a file shall be re-read from the SSD that contains your entire thing.
More precisely, at some stage in a read, the controller passes the build a question to to the first SSD node chosen by primarily the most on hand swap. Every recordsdata node holds the index of the entire cluster’s contents. If the node doesn’t shield the info to be read, it sends the controller’s requirements to the node that does.
Within the SAN version, the mechanism is a comparable with the exception of that it actually works by block in preference to at file stage.
With such an architecture, which is primarily primarily based fully extra on the payment of the switches than the controllers, it becomes straightforward to swap from one node to 1 more if one doesn’t respond quick ample on its Ethernet port.
One array for several forms of storage
NVMe SSDs are quickest at rebuilding a file from blocks of recordsdata, attributable to every 100Gbps hyperlink in Alletra MP is as hasty as or sooner than the network connection between the array and the application server. In competitors’ arrays that don’t employ switches between the controller and devoted SSD nodes, it’s regular to take hang of a survey at to optimise for order employ conditions.
“I’m elated of the industrial profit of Alletra MP over its competitors,” mentioned Tant. “In an AI project, an challenge on the entire has to position in characteristic an data pipeline. That methodology a extraordinarily performant storage array in writes obtain the output from these workloads. Then you reproduction its contents to a storage array that’s got read efficiency to practice the mannequin for machine studying. Then you retailer the ensuing mannequin in a hybrid array to employ it.”
“With Alletra MP you fair appreciate one array that’s as hasty for writes because it’s for ML and for utilisation of the mannequin,” he mentioned.
Read extra on AI and storage
-
HPE GreenLake for File Storage change geared toward AI
By: Adam Armstrong
-
HPE’s GreenLake for Block Storage goes GA
By: Adam Armstrong
-
Storage suppliers’ market fragment and technique 2023
By: Antony Adshead
-
Huawei launches OceanStor Pacific 9920 entry-stage NAS
By: Yann Serra