Elon Musk explains: how xAI built and launched a training cluster of 100,000 cards in 122 days

Wallstreetcn
2025.02.19 01:31
portai
I'm PortAI, I can summarize articles.

Elon Musk hosted the launch event for Grok 3, introducing its core features and the new tool "Deep Search." The xAI team successfully built the world's largest training cluster in 122 days, coordinating training using 100,000 H100 GPUs. The challenges faced by the team included ensuring the collaborative operation of all GPUs to avoid training errors caused by the failure of a single GPU. Musk emphasized the team's engineering achievements, breaking the expected timeline set by data center providers