Of Spotify’s 256 million tracks, Anna’s Archive scraped 99.9 percent of the metadata, and “archived” 86 million music files, prioritised by popularity. — Photo by Thibault Penin on Unsplash
Spotify’s library has become modern piracy’s latest treasure. Pirate activist group and shadow library Anna’s Archive claims to have “backed up” the world’s largest music streamer, and made Spotify’s music catalog publicly available.
The online hacking group – which claims to focus on content preservation, usually books and papers – allegedly scraped the majority of Spotify’s music files and metadata (album art and titles). Of Spotify’s 256 million tracks, Anna’s Archive scraped 99.9 percent of the metadata, and “archived” 86 million music files, prioritised by popularity. The group says it will release the actual music files in stages, also prioritised by popularity.
“Our mission (preserving humanity’s knowledge and culture) doesn’t distinguish among media types,” wrote Anna’s Archive in a December 20 blog post. “Sometimes an opportunity comes along outside of text. This is such a case. This Spotify scrape is our humble attempt to start such a ‘preservation archive’ for music. Of course Spotify doesn’t have all the music in the world, but it’s a great start.”
On December 22, Spotify told Billboard it had “identified and disabled the nefarious user accounts that engaged in unlawful scraping. We’ve implemented new safeguards for these types of anti-copyright attacks and are actively monitoring for suspicious behavior.” The company’s statement added, “Since day one, we have stood with the artist community against piracy, and we are actively working with our industry partners to protect creators and defend their rights.”
While Spotify reels in the aftermath of the security breach and deals with copyright issues, experts are concerned over the possibility of AI companies turning to the soon-to-be-released music file database for nonconsensual machine training.
“Training on pirated material is sadly common in the AI industry, so this stolen music is almost certain to end up training AI models,” artist protection activist Ed Newton-Rex told The Guardian. “This is why governments must insist AI companies reveal the training data they use.”
In a LinkedIn post cited by The Guardian, AI startup CEO Yoav Zimmerman expressed similar concerns, noting that the breach could allow anyone to “create their own personal free version of Spotify,” adding that “[i]t also just became dramatically easier for AI companies to train on modern music at scale. Again, the only thing stopping them is copyright law and the deterrent of enforcement. The uncomfortable truth is that there is no effective way to prevent large-scale data leakage in the modern world. If your media is accessible, even behind a paywall, it should be assumed it can and will be copied.” – Inc./Tribune News Service
