Short Discussion on OpenAI GDPR Documentation

OpenAI is now putting out some documentation regarding their approach to GDPR. And it hints at some limitations, which I talk about below.

Given the design of their LLM, I doubt they could comply with data deletion requests under GDPR and other privacy frameworks. The model breaks up words into tokens (you could think of these as syllables) and assigns weights regarding the next probable token in relation to the current token and the previous set of tokens. The tokens that can be the building blocks for personal information about someone most likely is used in thousands, if not millions, of other combinations of words in other contexts, so the token(s) can't be removed.

Could they remove the weighting or poison the probability of the next token that leads to personal information? I doubt it. For their model, they most likely can only suppress information as the response is generated.

If you read through this documentation, it implies that upon request, the possible removal from training data, and it doesn't mention removal from any model. There may be an attempt at an argument here about the model not storing personal information. Will be interesting to see where this goes.

