IBM Research Boosts Capacity of Vela AI Supercomputer
IBM Research has recently announced an unprecedented doubling of the capacity of the Vela AI Supercomputer, a pivotal component of the IBM Cloud. This significant enhancement is a strategic response to the burgeoning demand for WatsonX models and underpins the company’s aggressive plans to further fortify and extend AI inferencing with its proprietary accelerator, the IBM AIU. The exponential rise in AI usage by IBM clients, marked by the initiation of hundreds of development projects employing IBM WatsonX, has prompted the decision to bolster the Vela AI Supercomputer. The surge in WatsonX pipeline as revealed by IBM CEO Arvind Krishna is a testament to the intensifying momentum in AI projects embraced by the company’s clientele.
Upgrade Details and Innovations
The enhancement journey of the Vela AI Supercomputer unfolds with the establishment of a robust cloud infrastructure, Vela, engineered for training AI foundation models utilizing the NVIDIA A100 GPUs. The infrastructure was nimbly designed with the aim of containing costs while upgrading AI infrastructure, setting it apart as an example for others in the industry. Notably, IBM Research’s strategic choices included leveraging existing resources such as Intel Xeon CPUs interconnected with standard 2x100G Ethernet NICS, thereby demonstrating near-bare metal performance with lower capital costs. Moreover, to address the surging workload, the team doubled the number of GPUs per rack within the same available power envelope through a strategic power-capping strategy. This approach was complemented by the deployment of RDMA over Ethernet and the NVIDIA GPU-Direct RDMA, significantly amplifying GPU-GPU bandwidth while lowering latency.
Future Plans and Innovations
As the demand for Vela continues to surge, IBM Research is proactively preparing for future upgrades, with considerations for integration of H100 GPUs or the next-generation B100 GPUs. Additionally, the company is also exploring the development of a cost-effective inference processing infrastructure, exemplified by its home-built prototype AIU inference accelerator. Early tests have showcased its ability to optimize inference processing while maintaining low power consumption, a potential game-changer for the industry. Notably, IBM’s internal utilization of AI for various applications, coupled with the development of their own Foundation Models, serves as a testament to the company’s commitment to harnessing AI to drive innovation across its business operations.In conclusion, IBM’s strategic advancements in AI infrastructure and the promising prospects for the Vela AI Supercomputer underscore the company’s remarkable transformation into a major player in the AI landscape. The proactive stance in innovations and adept utilization of AI internally not only augur well for IBM’s continued growth but also positions them for a competitive advantage in the AI domain globally.