Amidst growing scrutiny, Apple and other major corporations find themselves embroiled in a controversy over alleged copyright violations involving YouTube creators' content.
Wired recently reported that third parties had downloaded videos as subtitle files, utilizing them to train large language model (LLM) systems. This practice reportedly included over 170,000 videos, featuring content from prominent YouTubers such as MKBHD, Jimmy Kimmel, PewDiePie, and MrBeast, among others.
The controversy centers on the use of these videos for training AI models, a process that some argue violates YouTube's rules against unauthorized extraction and automated access to content. While Apple itself did not perform the video transcriptions, a nonprofit organization named EleutherAI utilized the dataset for educational and developmental purposes, contributing to broader academic research.
One of the datasets in question, known as Pile, has been openly available for academic and developmental use. However, tech giants like Apple have reportedly utilized such compilations to train their advanced AI models, such as OpenELM launched earlier this year. This has sparked ethical concerns about consent, fair use, and the ethical implications of using copyrighted material for AI training.
As the debate continues to unfold, stakeholders and industry observers await responses from companies like Apple regarding their stance on these controversial practices and their commitment to ethical AI development.