YouTube says OpenAI training Sora with its videos would break the rules

Using YouTube videos to train OpenAI's text-to-video generator would be against the terms of service of the platform, says Mohan. — Image by on Freepik

The use of YouTube videos to train OpenAI’s text-to-video generator would be an infraction of the platform's terms of service, YouTube chief executive officer Neal Mohan said.

In his first public remarks on the topic, Mohan said he had no firsthand knowledge of whether OpenAI had, in fact, used YouTube videos to refine its artificial intelligence-powered video creation tool, called Sora. But if that were the case, it would be a “clear violation” of YouTube’s terms of use, he said.

“From a creator’s perspective, when a creator uploads their hard work to our platform, they have certain expectations,” Mohan said Thursday in an interview with Emily Chang, host of Bloomberg Originals.

“One of those expectations is that the terms of service is going to be abided by. It does not allow for things like transcripts or video bits to be downloaded, and that is a clear violation of our terms of service. Those are the rules of the road in terms of content on our platform.”

There has been much public debate over what material OpenAI uses to train the AI models underlying popular content creation products such as ChatGPT and DALL-E. Sora and other generative AI tools work by sucking up all sorts of content from around the web and using that data as the foundation from which the tools can generate new content, including videos, photos, narrative text and more.

As companies like OpenAI, Google and others race to develop more powerful artificial intelligence, they are looking to source as much content as possible to train their AI models to get better quality results. Google and YouTube are units of Alphabet Inc.

OpenAI, which is backed by Microsoft Corp, didn’t immediately respond to a request for comment. OpenAI chief technology officer Mira Murati said in an interview with the Wall Street Journal last month that she wasn’t sure whether Sora was trained on user-generated videos from YouTube, Facebook and Instagram.

The Journal reported this week that OpenAI has discussed training its next-generation large language model, GPT-5, on transcriptions of public YouTube videos, citing people familiar with the matter.

Mohan said Google adheres to YouTube’s individual contracts with creators before deciding whether to use videos from the platform in training the company’s own powerful AI model, Gemini.

“Lots of creators have different sorts of licensing contracts in terms of their content on our platform,” Mohan said. Though “some portion of that YouTube corpus maybe being used” to train models like Gemini, Google and YouTube ensure that using the videos as training data for Google’s AI is “in concert with whatever the terms of service or the contract that that creator has signed” beforehand, he said. – Bloomberg

Follow us on our official WhatsApp channel for breaking news alerts and key updates!


Next In Tech News

Microsoft Word just fixed its biggest copy-paste headache of all time
Online fraud is now growing faster than online retail, analysis shows
Microsoft expands Copilot AI assistant with project manager skills
Zimbabwe approves licensing of Musk's Starlink internet service
Tesla shareholders advised to reject Musk's $56 billion pay
Preview: Photorealistic ‘Empire of Ants’ turns a classic sci-fi novel into a real-time strategy game
Elon Musk plans xAI supercomputer, The Information reports
Teens are now gaming more than they’re watching TV
Robots could exacerbate labour shortages in the hotel and restaurant industry, research shows
New device helps paraplegics regain partial use of hands

Others Also Read