Data-Modeling Large Dat Set Search

A New Kind of AI Model Lets Data Owners Take Control

A new kind of large language model, developed by researchers at the Allen Institute for AI (Ai2), makes it possible to control how training data is used even after a model has been built.

MIT Technology Review

A major AI training data set contains millions of examples of personal data

Personally identifiable information has been found in DataComp CommonPool, one of the largest open-source data sets used to train image generation models. Millions of images of passports, credit cards ...

Some results have been hidden because they may be inaccessible to you

Show inaccessible results

A New Kind of AI Model Lets Data Owners Take Control

A major AI training data set contains millions of examples of personal data

Trending now