
Grok 3 conducted an experiment for the AI community using 200,000 GPUs: Scaling Law did not hit a wall, but pre-training is not guaranteed

I'm PortAI, I can summarize articles.
Grok 3 uses 100,000 NVIDIA H100 cards for experiments, showing that the Scaling Law during the pre-training phase still holds, despite the issue of insufficient data. The Scaling Law has not reached a ceiling, and increasing the model size can still improve performance, but at a low cost-effectiveness ratio. The current effective Scaling methods, ranked by cost-effectiveness, are: Test time Scaling Law, RL Scaling Law, and pre-training phase Scaling Law
Log in to access the full 0 words article for free
Due to copyright restrictions, please log in to view.
Thank you for supporting legitimate content.

