Unlock Editor’s Digest Lock for Free
FT editor Roula Khalaf will select your favorite stories in this weekly newsletter.
Mark Zuckerberg is building the audio capabilities of Meta this year as the social media giant advances its plans to generate revenue from rapidly developing technologies.
Meta plans to introduce improved voice capabilities to Llama 4, the latest open source, large-scale language model expected in the coming weeks. People familiar with the issue said, as they bet that future so-called AI-driven agents are conversational rather than text-driven.
The company is particularly focused on bringing the conversation between users and their voice models closer to a two-way, natural dialogue, allowing for interruptions from users rather than a more stringent question and answer format, one said.
The push of voice comes when CEO Zuckerberg outlines his bold plan to make the 1.7-ton Silicon Valley Company an “AI Leader.” 2025 will be called the 2025 AI product makeup or break year to commercialize technology in competition with rivals such as Openai, Microsoft, Google.
This has led the company to consider trying out AI Assistant Meta AI premium subscriptions for agent tasks such as booking reservations and video creation. They are also considering introducing paid ads or sponsored posts in AI Assistant search results, one of the people said.
This year, Zuckerberg revealed plans to build an AI engineering agent with mid-level engineer coding and problem-solving capabilities.
Meta declined to comment.
Chris Cox, the group’s chief product officer, highlighted some of the Lama 4 plans on Wednesday, saying “the speech will become a ‘omni model’ that will become native. . . Rather than translating the audio into text, sending the text to LLM, posting the text, or returning it to the speech.
Speaking at Morgan Stanley Technology, Media & Telecom Conference, he added: I think we’re still wrapped up our minds about how powerful it is. ”
Meta is also discussing guardrails on what the latest llama models can output and whether to lower them, and the two people are familiar with the issue.
The debate comes amid a launch from a rival and a charge of warnings from David Sachs of the newly appointed “I Tser.”
Openai released Voice Mode last year and focuses on providing a distinctive character, but Grok 3, created by Xai from Elon Musk and available on the X platform, rolled out the voice feature later last month to select users.
Recommended
The GROK model is specially designed to have fewer guardrails, including “unhinged mode” that intentionally responds in a way that is meant to be “unpleasant, inappropriate, and offensive,” according to the company.
Last year’s Meta released a “holy” version of the AI model of the third Llama iteration, following criticism that Llama 2 refused to answer innocent questions.
Allowing users to interact with AI assistants using voice commands is a key feature of Meta’s Ray Bans Smart Glasses, and has recently become a huge hit among consumers. The group accelerated its plans to build a lightweight headset that can take away smartphones as consumers’ main computing devices.
Additional Reports by Melissa Heckilla of London