In that same post, Meta also touts “Contemplating” mode, which it says will be “rolling out gradually” and which can “orchestrate multiple agents that reason in parallel.” By using up to 16 agents thinking in concert simultaneously, Meta says that Contemplating mode “enables superior performance with comparable latency.” That “superior performance” includes a reported high water mark of 58.4 on Humanity’s Last Exam (with the use of external tools), according to Meta.
A Meta graph shows how additional training leads to “compression” of token usage before additional gains in accuracy.
And while previous Llama models faced criticism for not taking advantage of reinforcement learning, Meta says Muse Spark shows “smooth predictable gains” after additional RL steps after pretraining, “improving model reliability without compromising reasoning diversity.” That reinforcement learning system also makes use of “thinking time penalties,” which Meta says balance the need to “maximize correctness” with optimizing the number of tokens used. In testing on the AIME 2025 benchmark, Meta says it saw a “phase transition” where the model started compressing equally accurate reasoning into “significantly fewer tokens.” After that compression, subsequent trained models slowly increased the token usage again to achieve even higher accuracy in less overall time than the previous uncompressed versions.
The release of Muse Spark comes alongside an update to Meta’s Advanced AI Scaling Framework, which the company says now covers a broader range of potential AI risks. The company says that the model “falls within safe margins across all frontier risk categories we measured,” but says that more details will only be available in an upcoming Safety & Preparedness Report.
Muse Spark is available now in the Meta AI app and via the meta.ai website, as well as a private preview API for “select partners.” Meta says the model will be available via WhatsApp, Instagram, Facebook, Messenger, and AI glasses “in the coming weeks.”
Leave a Reply