65 ExaFLOP AI Supercomputer being built by AWS and NVIDIA

admin
4 Min Read

As the artificial intelligence explosion continues the demand for more advanced artificial intelligence (AI) infrastructure continues to grow. In response to this need, Amazon Web Services (AWS) and NVIDIA have expanded their strategic collaboration to provide enhanced AI infrastructure and services by building a new powerful AI Supercomputer capable of providing 65 ExaFLOPs of processing power.

This partnership aims to integrate the latest technologies from both companies to drive AI innovation to new heights. One of the key aspects of this collaboration is AWS becoming the first cloud provider to offer NVIDIA GH200 Grace Hopper Superchips. These superchips come equipped with multi-node NVLink technology, a significant step forward in AI computing. The GH200 Grace Hopper Superchips present up to 20 TB of shared memory, a feature that can power terabyte-scale workloads, a capability that was previously unattainable in the cloud.

In addition to hardware advancements, the partnership extends to cloud services. NVIDIA and AWS are set to host NVIDIA DGX Cloud, NVIDIA’s AI-training-as-a-service platform, on AWS. This service will feature the GH200 NVL32, providing developers with the largest shared memory in a single instance. This collaboration will allow developers to access multi-node supercomputing for training complex AI models swiftly, thereby streamlining the AI development process.

The partnership between AWS and NVIDIA also extends to the ambitious Project Ceiba. This project aims to design the world’s fastest GPU-powered AI supercomputer. AWS will host this supercomputer, which will primarily serve NVIDIA’s research and development team. The integration of the Project Ceiba supercomputer with AWS services will provide NVIDIA with a comprehensive set of AWS capabilities for research and development, potentially leading to significant advancements in AI technology. Here are some other articles you may find of interest on the subject of AI supercomputers :

Summary of collaboration

To further bolster its AI offerings, AWS is set to introduce three new Amazon EC2 instances powered by NVIDIA GPUs. These include the P5e instances, powered by NVIDIA H200 Tensor Core GPUs, and the G6 and G6e instances, powered by NVIDIA L4 GPUs and NVIDIA L40S GPUs, respectively. These new instances will enable customers to build, train, and deploy their cutting-edge models on AWS, thereby expanding the possibilities for AI development.

Furthermore, AWS will host the NVIDIA DGX Cloud powered by the GH200 NVL32 NVLink infrastructure. This service will provide enterprises with fast access to multi-node supercomputing capabilities, enabling them to train complex AI models efficiently.

To boost generative AI development, NVIDIA has announced software on AWS, including the NVIDIA NeMo Retriever microservice and NVIDIA BioNeMo. These tools will provide developers with the resources they need to explore new frontiers in AI development.

The expanded collaboration between AWS and NVIDIA represents a significant step forward in AI innovation. By integrating their respective technologies, these companies are set to provide advanced infrastructure, software, and services for generative AI innovations. The partnership will not only enhance the capabilities of AI developers but also pave the way for new advancements in AI technology. As the collaboration continues to evolve, the possibilities for AI development could reach unprecedented levels.

Share This Article
By admin
test bio
Please login to use this feature.