Heterogeneous computing is about optimizing resources

文章:加里·希尔森(Gary Hilson)

CXL是异质计算的一个示例。这是关于合成性和灵活性的,而无需过度提供。

The rapid emergence of the Compute Express Link (CXL) specification is an excellent example of heterogenous computing — but not all heterogenous computing is necessarily CXL. Rather, it’s about connecting to whatever mix of compute, memory, and storage will best tackle a given workload, without out the need to over-provision.

While the new protocol has as quickly gained traction to provide more efficient access to resources including memory, CXL is part of a broader trend in computing overall as data becomes less centralized and gets pushed into the edge to be used in more diverse workloads and a wider variety of devices.

It may sound flashy, said Ryan Baxter, senior director of Micron Technology’s cloud, computing and networking business unit, but heterogenous ultimately means it’s not monolithic and no longer just a standard memory connected to an x86 CPU. “You can take that tool to battle, but it may not be the most efficient anymore.” While it’s possible to do machine learning training using x86 servers, he said, “it’s not an architecture well-suited to be able to tackle that kind of problem.”

异质计算的关键价值主张是通过专用构建的加速器(例如PLIOPS)优化计算,内存和资源(例如SSD)的能力,这是一个基于键值(KV)的存储硬件加速器,可以与任何合作使用SSD(来源:PLIOPS)

异质计算通过利用许多芯和不同类型的加速器连接到非常高的带宽内存来采用更平行的方法。这甚至不是构建专业的X86服务器。百特说,十多年来,数据中心一直基于这样的概念,即您可以通过提供必要的硬件和资源而无需目的构建任何东西来旋转服务器以解决任何问题。他说,CXL是如何利用数据中心内现有的连接性在硬件级别上需要的合成性,以解决明天最有趣的问题。

Baxter said there will continue be innovation around x86 architectures. “It’s the standard workhorse for getting more bandwidth, but what’s needed is optionality to take a different tool to the battle.” AI training or video transcoding can benefit from more purpose-built hardware, which will emerge in the cloud first simply because of the sheer amount of workloads that get thrown at data centers. “It’s really an evolution of the workloads and use cases that’s driving the need for this heterogeneity at the hardware level.”

One of the key challenges CXL addresses is that memory has become the bottleneck, and the answer isn’t just to have faster memory or more of it. It’s about getting the data to the right memory for the use case as easily as possible without over-provisioning. Similarly, heterogenous computing will see data centers and hyperscalers looking to get the most out of their hardware. Baxter said some of the larger cloud customers are already heading in this direction. “They’re keenly aware of how much they’re utilizing underlying hardware.”

Micron’s heterogeneous-memory storage engine (HSE) enables developers using all-flash infrastructure to customize or enhance code for its unique use cases and maximize the capabilities of flash SSDs and other storage-class memory. (Source: Micron)

Not only are 86x servers no longer the only answer, but neither will be over-provisioning hardware, such as deploying 20% more flash SSDs than required. Quality of service (QoS) levels must still be maintained but with increased utilization and less over-provisioning, which is expensive. Baxter said a new interface such as CXL enables access to pools of hardware for workloads that require a little more than what the baseline configuration can provide. “You’re still going to have CPUs. You’re still going to have the memory,” he said. “But there’s some important changes coming on the server side that are going to necessitate the move to new interfaces like CXL.”

Besides different pools of memory and storage, accelerators are going to be a key part of making sure workloads have purpose hardware. Pliops’ developedits Extreme Data Processor (XDP) technology因为它认识到旧方法无法保持对数据存储容量的成倍增长需求以及对其处理的计算要求。公司总裁史蒂夫·费格胡特(Steve Fingerhut)表示,GPU的成功表明,有价值的东西是为了加速某些工作量而建立的目标。

Adding more standard servers and drives isn’t the answer, he said. An accelerator such as XDP increases performance, reduces costs and allows for an overall smaller footprint, said Fingerhut, by working in combination with the CPU. “We could call it a co-processor, but it is something that compliments the platform that everybody is buying today.”

Steve Fingerhut

Fingerhut说,PLIOPS解决方案解决了储存堆栈效率低下,这是由于处理器难以跟上越来越多的数据,这些数据越来越多地存储在SSD上。这是一个基于键值(KV)的存储硬件加速器,可以与任何SSD一起工作以进行工作负载性能并优化SSD使用情况,并为数据库和软件定义的存储中的工作负载提供了适应。它也是异质的,因为它可以是常用数据库应用程序的单一解决方案,包括RockSDB,MySQL和MongoDB,并利用NVME KV标准以及PCIE。他说,它不受特定的闪存类型或服务器型号的约束。“一旦开始使用XDP,就可以在部署Flash的任何地方使用它。它可以与任何SSD一起使用,并加速基本上基于Flash的应用程序。”

Pliops is using existing interfaces and protocols to make it easy to integrate with maximum flexibility, much like CXL is using PCIe to pull from a broad of pool of resources. “We’re taking full advantage of the NVMe performance latency scaling as well as NVMe over Fabrics or TCP.” Fingerhut said XDP becomes a third processor in the system that frees up analytics from inefficient host software because less data needs to be transferred and operations are efficient as possible.

在发展中专用acc Pliops并不孤单elerators that can be part of a resource pool in a more heterogenous computing environment. Fungible Inc.’s Storage Initiator (SI) cards allow standard servers to access NVMe over TCP (NVMe/TCP) storage targets, while the Fungible Data Processing Unit (DPU) is a processor purpose-built for data-centric workloads and unlock capacity previously stranded in siloed servers. Last year, Micron unveiled its heterogeneous-memory storage engine (HSE) aimed at getting more from SSDs and other storage-class memory (SCM), by enabling developers using all-flash infrastructure to customize or enhance code for its unique use cases. Similarly, Kioxia America Software-Enabled Flash (SEF) combines software flexibility, host control, and flash native semantics into a flash native API and purpose-built controller to make flash easier to manage and deploy across a PCIe connection.

Steve Woo

Theubiquity of PCIeand rapid adoption of CXL,already in its second iteration, are key enablers of heterogeneous computing. “[CXL] allows things like accelerators to talk to hosts in more of a peer-to-peer fashion,” said Rambus fellow Steve Woo. CPUs and GPUs sometimes need to communicate back and forth through memory, and CXL provides the necessary coherence, which makes programming models much easier in heterogeneous environments.

transition to DDR5 next year, there’s more bandwidth available, but also a great diversity of workloads, which in turn means a greater diversity of compute environments available on platforms such as Amazon Web Services and Microsoft Azure, said Woo. “They’re so diverse in terms of their compute capabilities, the amount of memory and the amount of disk you’re allowed to have.” It’s now possible to gang together CPUs to solve problems that don’t fit individual servers anymore, he said. “You need to find ways to expand things like the memory bandwidth and the memory capacity to meet these needs.”

Woo表示,对于数据中心,通过使用单独的内存,存储和加速器池的资源进行分类,将有利的扩展是有利的。例如,一个工作负载可能是更多的内存密集型,并且只需要一个CPU资源,这就是为什么在自己的池中将资源类型分组在一起的原因很多。他说:“您只需抓住所需的东西,并根据工作量特征构成资源。”“完成后,将其全部放回游泳池中。这有点像一个图书馆,您只是在检查所需的资源,然后在完成后将它们放回原处。”

Woo said everyone would prefer to do more with the resources they have. “By not over-provisioning things and allowing them to be used on the jobs as they’re needed, it allows your data center to do more work in that volume of space.”

杰夫·贾科维奇(Jeff Janukowicz)

杰夫·贾科维奇(Jeff Janukowicz), IDC research vice president for solid state storage and enabling technologies, said overall there’s common need to optimize the many compute, storage and memory resources as much as possible, whether it’s through CXL or heterogeneous computing by taking advantage of standardized interfaces. He said it took a while forNVMe to make inroads在市场上,因为需要编写软件并围绕它构建的生态系统,并且通过利用现有的生态系统,诸如CXL之类的新兴协议更容易,快速地采用。

It also allows for maximizing the resources in the system and means less over-provisioning. “Optimization is clearly a key factor,” he said. Often resources such as storage are over-provisioned to account for peak workloads, but a more flexible architecture can optimize those resources in both a peak environment and more mainstream environment. “It’s really going to help you to optimize your costs across the stack.”

Micron的百特说,优化异质计算提供的部分是通过互操作性。“协议和接口在多个实现中标准化的程度很重要。”但是,记忆也变得越来越关键,在任何异质系统中。他说:“这是您解决工作量的一定程度的性能和效率。”“客户真的很想了解他们如何以更具创造力的方式使用该内存。异质意味着工具箱中还有更多工具可以解决工作量。”

This article was originally published onEE Times.

Gary Hilsonis a general contributing editor with a focus on memory and flash technologies for EE Times.

Leave a comment