Seven content-licensing sellers of music, images, video and other datasets for use in training artificial intelligence systems have formed the sector’s first trade group, they said on Wednesday.
The Dataset Providers Alliance will advocate for “ethical data sourcing” for training AI systems, including rights for people featured in datasets and intellectual property rights protection for content owners, the companies said in a statement.
Founding members include US music dataset company Rightsify, image licensing service vAIsual, Japanese stock photo provider Pixta, and Germany-based data marketplace Datarade.
The entry of this breed of generative AI technologies in a sea of human creativity raises an outcry from content creators and a consequent string of copyright lawsuits against tech firms such as Google, Meta, opens new tab, and ChatGPT maker OpenAI, which is backed by Microsoft, opens new tab.
They’ve done so by training models on vast swaths of content, much of it scraped from the internet gratis, sans consent from those who created those works or own rights to them.
The use of this data, argue tech companies, is perfectly legal. They also buy access to thousands of private collections of content, both to satisfy specific needs for certain types of data and to hedge the legal and regulatory risks.
Wherever there’s the slightest possibility that demand will soar should copyright owners win their legal battles, there’s a host of companies already packaging content and selling access to it for use by AI systems.
As a result, groups have formed up to set ethical standards for that trade, like Fairly Trained, a non-profit founded this year which certifies models that have not used copyrighted materials without a license.
The DPA targets the content of those transactions, requiring, for example, that its members agree not to sell text data obtained by crawling the web or audio featuring people’s voices without their explicit consent.
Leave a Reply