In a recent interview with The Wall Street Journal, Mira Murati, the Chief Technology Officer at OpenAI, remained evasive when questioned about the underlying data source used to train the company’s upcoming video-generating AI model, Sora.
When asked if publicly available data from social media platforms like YouTube, Instagram, or Facebook was used to train Sora, Murati responded vaguely, “I’m actually not sure about that…if they were publicly available — publicly available to use. But I’m not sure. I’m not confident about it.”
Murati, who has led major OpenAI projects like DALL-E 3 and GPT-4, declined to provide specifics, stating, “I’m just not going to go into detail about the data that was used. But it was publicly available or licensed data.” She later confirmed that data from Shutterstock was utilized.
The lack of transparency surrounding AI training data has raised concerns, with OpenAI facing legal actions alleging copyright infringement and unauthorized use of private user data. In July 2023, authors filed a lawsuit claiming ChatGPT generated summaries based on their copyrighted works.
Also Read: Apple Boosts AI with DarwinAI Acquisition for Generative AI Push