I keep hearing about the privacy concerns for AI training on personal data, and I think this concern is unfounded and basically nonsense.
What is the threat model exactly? Let’s say your trove of personal data is used by Google for training the next iteration of Gemini. Everything you have: all medical records, financial history, texts and emails, location data, every sin you have committed in a detailed journal. Oops! What then?
Gemini would mix these in with every other recorded fact in the universe, and none of it would get tied to you. Compared to the vast ocean of data it is trained on in general, even the most thorough inventory of data about you personally is just a molecule or two. It probably couldn’t even regurgitate your name.
Google expends a great deal of effort to prevent specific data points from being included in general answers ("overfitting"). The whole point of Gemini and the entire field of machine learning is to have the computer generalize, not memorize.
Furthermore, if there were weird facts encoded about someone in Gemini, people would discount them. Without public documentation to point to (at which point you can’t really blame the LLM training) it would be indistinguishable from a hallucination.
Are we actually talking about inference instead?
Large companies, including all AI companies, personalize our accounts by keeping data about us, including using LLM inference now that it is available. If you are worried about databases that tie records to you personally, LLMs don’t change a thing. Just like you had to trust that pre-LLM Gmail was going to steward your email well, not give it to law enforcement without a warrant, not publish it on the web for all to see, etc, you have to trust that post-LLM Gemini won’t do the same, with the data you give it in each chat session.
Any company you can trust with your email, you can trust with your AI chats. Or, if you prefer, if you’d accept a company’s bad behavior because their email product is good enough, you can accept the same bad behavior if their LLM product is good in equal measure.