AI Training Dataset LAION-5B Withdrawn Over Discovery of Child Abuse Material

admin
1 Min Read

According to cointelegraph: LAION-5B, a substantial artificial intelligence (AI) dataset used in training various widely-used text-to-image generators, has been pulled by its creator after a survey revealed it harbored thousands of suspected instances of child sexual abuse material (CSAM). LAION, the Large-scale Artificial Intelligence Open Network based in Germany, is a non-profit organization known for creating open source AI models and datasets that serve as backbone to several renowned text-to-image models.

Researchers at the Stanford Internet Observatory’s Cyber Policy Center, in their report published on December 20, exposed the presence of 3,226 instances of alleged CSAM in the LAION-5B dataset. Numerous suspicious instances were verified as CSAM by independent parties, as highlighted by David Thiel, Stanford Cyber Policy Center’s Big Data Architect and Chief Technologist.

Thiel noted that while the CSAM instances detected in the dataset may not drastically alter the outcomes of models trained on it, they are likely to exert some influence. The repetition of identical CSAM instances, furthermore, brings an added layer of concern due to its reinforcement of images of specific victims.

Share This Article
By admin
test bio
Please login to use this feature.