The recent publicity around ChatGPT and other large language models (LLM) has brought attention to the use of foundation models as a fundamental business tool. While the consumer-facing part of generative AI using LLM has a lot of data to draw from, the reliability of the responses have shown issues when pushed to the limit. Enterprises, government agencies, and researchers may have issues using public models or have very private data pools they need to train on. If this is the case, training unique AI foundation models can be an expensive process. As these AI foundation models have continued to grow, a virtual supercomputer is needed for training the
large models. These are now being referred to as AI Supercomputers. Nvidia was one of the earliest companies to describe its systems as such.
The main functional difference between supercomputers and AI supercomputers is often the math format they use for computation. Traditional supercomputers used in HPC and advanced research focus on double-precision (64-bit) floating-point performance (see the TOP500 list). AI supercomputers, on the other hand, focus on lower-precision math, which may scale down to 8-bit floating point, used for model training because neural nets do not require higher precision. AI supercomputers, like their predecessors, often use GPUs for compute acceleration.
A traditional supercomputer is often a locked down, on-prem machine with a very specific bare-metal design and unique networking backbones. For example, the OpenAI supercomputer built by Microsoft Azure was “purpose-Built” with a specialized 400 gigabits per second network connectivity for each GPU server. These systems are massive, with thousands of CPUs and GPUs.
IBM Research developed its own AI supercomputer design, called Vela, originally for internal use. Vela has been designed to scale, but it’s also optimized for cloud architecture, allowing more flexible operation and use. IBM has also focused on a more traditional cloud infrastructure design by using Ethernet to network the racks, not InfiniBand or other specialized networks. Using cloud architecture allows the use of industry best-practice tools and allows for easier collaboration, more agility, and flexibility while running multiple workloads.
Vela is composed of multiple nodes each consisting of eight Nvidia A100 GPUs (80GB versions), which are interconnected by NVLink and NVSwitch. Each node has two 2nd Generation Intel Xeon Scalable processors (Cascade Lake), with 1.5TB of DRAM and four 3.2TB NVMe drives each. These compute nodes are connected via multiple 100G network interfaces. The total number of nodes was not revealed by IBM, but is being used to train models with tens of billions of parameters.