Generative AI Is Making Companies Even More Thirsty for Your Data

Zoom, the company that normalized attending business meetings in your pajama pants, was forced to unmute itself this week to reassure users that it would not use personal data to train artificial intelligence without their consent.

A keen-eyed Hacker News user last week noticed that an update to Zoom’s terms and conditions in March appeared to essentially give the company free rein to slurp up voice, video, and other data, and shovel it into machine learning systems.

The new terms stated that customers “consent to Zoom’s access, use, collection, creation, modification, distribution, processing, sharing, maintenance, and storage of Service Generated Data” for purposes including “machine learning or artificial intelligence (including for training and tuning of algorithms and models).”

The discovery prompted critical news articles and angry posts across social media. Soon, Zoom backtracked. On Monday, Zoom’s chief product officer, Smita Hasham, wrote a blog post stating, “We will not use audio, video, or chat customer content to train our artificial intelligence models without your consent.” The company also updated its terms to say the same.

Those updates seem reassuring enough, but of course many Zoom users or admins for business accounts might click “OK” to the terms without fully realizing what they’re handing over. And employees required to use Zoom may be unaware of the choice their employer has made. One lawyer notes that the terms still permit Zoom to collect a lot of data without consent. A spokesperson for the company, CJ Lin, says that customers get to choose whether to enable generative AI features or share their content with Zoom to help it improve its products.

The kerfuffle shows the lack of meaningful data protections at a time when the generative AI boom has made the tech industry even more hungry for data than it already was. Companies have come to view generative AI as a kind of monster that must be fed at all costs—even if it isn’t always clear what exactly that data is needed for or what those future AI systems might end up doing.

The ascent of AI image generators like DALL-E 2 and Midjourny, followed by ChatGPT and other clever-yet-flawed chatbots, was made possible thanks to huge amounts of training data—much of it copyrighted—that was scraped from the web. And all manner of companies are currently looking to use the data they own, or that is generated by their customers and users, to build generative AI tools.

Read More

Will Knight