(This is the fifth installment in an on-going examination of the first principles of data privacy and security. The first installment can be read here. The second installment can be read here. The third installment can be read here. The fourth installment can be read here. These principles, often represented in regulations and privacy practices, form the foundation for how an organization should treat the customer data they collect.)
If everyone owns the data, then no one does. That’s an inevitable conclusion. Only the physical media on which data are stored are real property. In other words, data ownership is beside the point.
This is not as cynical a statement as it might seem. It reflects the real properties of the data points we care most about. The data have a subject, who it is about. But the data also have other parties that participate in the real word transaction that the data represent—even if that transaction is the result of analysis or aggregation.
A hospital takes your blood. A bank lends you money. A store sells you groceries. A person answers your phone call. These sentences each describe at least two parties that can lay claim to owning the digital record of the event, the data. And if we add in the complexity of telephone carriers, payment card processors, health insurance companies and all of their enabling support vendors, it would be difficult to find any transaction that ends up as a data point that is not “owned” by more than just two parties.
Who “owns” the data? It doesn’t matter. What matters is the rights and responsibilities each entity in the collection, disclosure and use of data have.
In many cases, what is a right for one party in the collection, disclosure and use of data represents a responsibility of another party. For example, if we say a subject has the right to formally consent to the collection, disclosure and/or use of data about them, then at least one of the parties that collect, disclose and/or use the data are responsible for soliciting that consent.
There are a pair of inseparable rights/responsibilities that make up the third principle of data privacy and security. The pair is access and correction. Sometimes they are summed up in the term “participation.”
What is the most important difference between the data point, “John bought a book about terrorism” and the data point, “He fathered 22 offspring?” The name “John” is almost as vague as the word “he.” John, usually a man’s first name, cannot be identified without more information. The same can be said for the pronoun “he.”
You are sure John is human because only humans buy things whereas many beings can be fathers. In fact, the “he” in the second data point refers to Snowflake, the white gorilla who lived at the Barcelona Zoo. Snowflake’s fatherhood describes acts of creation. John has bought a book that certainly sounds like it deals with destruction.
The number 22, the total number of Snowflake’s offspring, is precise whereas the phrase “a book about terrorism” is anything but. If the “book about terrorism” is a history book that John, a cadet at West Point, is reading for a class, it is very different than if it is a “how-to” that John, a 28 year old who just came back from a visit to a foreign country, has bought online.
Am I being facetious? Of course, but I am also making two important points about data that tie to the principle “participation.”
The first is that context alters the meaning of simple data points. If a subject is not aware of all the data collected about them, they cannot fully understand the context in which it will be interpreted. And the second is that only by having access and being able to make corrections, can the subject of a data point be able to meaningfully participate in its collection, disclosure, use and, ultimately, how it is interpreted. This is the reason that participation is one of the first principles of security and privacy.
Data points are individual observations. Sometimes the observations are direct transactions, such as a purchase, or an individual fact, such as one’s mother’s maiden name.
Sometimes the observations are the result of transformations of other observations: the count of how many offspring one has, the odds that one will default on a loan (i.e., a credit score), etc. Without the ability to know what observations are recorded about a subject and stored as data and then without the opportunity to correct the data, the subject of the data is in the same position as Snowflake. The subject can be observed, but cannot participate in the record of that observation.
Data points are observations. They can be inaccurate. The reason that participation is such an essential principle of data privacy and security is that there is often an asymmetric relationship between the stake a data subject has in the accuracy of the data about them and the stake a data collector has. It is a matter of scale. While the data collector may experience incorrect data as accounting for .01% of their dataset, for the subject of the data, if the only record about them in that dataset is incorrect, then the error rate is 100%.
Let’s look at the most common example: identity theft.
A bank is concerned when someone has their identity stolen and has fraudulent charges on their credit card account. But the victim of the identity theft stands to lose relatively more if the data are not corrected. It is uncommon for an individual to be financially liable for fraudulent charges on their credit card, and mostly illegal for that liability to be very much. However, victims of identity theft can have trouble getting a mortgage, renting an apartment and can suffer other consequences as a result of being victims, especially if they end up with a mistakenly bad credit report.
As I have mentioned elsewhere, it is increasingly common that individual data points are aggregated into predictive and descriptive models. Those data points are usually “de-identified.” Technically, they no longer concern the subject whose data combine with others to form a “cohort.” The fact that there is inaccurate data in the dataset used for the model is factored in as part of the model’s “margin of error.”
Should the subjects of such data have any ownership of or visibility into such datasets? Should the accuracy be their concern? I believe the time is coming when the answer to that question is “yes” if for no other reason, these models are increasingly influencing ourselves and the world around us.
But for now, the principle of “participation” is summed up by the idea that a subject needs to know what is “on file” about them and should have a defined mechanism for getting erroneous data about them corrected. The difference between the two data points mentioned above serve as a very good illustration of this, as well as why it is important. Consider the data point, “John bought a book about terrorism” and the data point, “He fathered 22 offspring.”
What is the biggest difference between the two data points? The opportunity for the subject to participate in its accuracy. Even if Snowflake knew about a 23rd offspring (imagine an albino gorilla smiling and winking), he would never have had an opportunity to know about and correct the data.
John, on the other hand, should have the opportunity to know this data point exists about him and be able to get it corrected. He did not, in fact, buy a book about “terrorism.” He bought a book about “tourism.” It was the only data entry error the clerk at the book warehouse made out of thousands of times that the clerk assigned a category to a book. The clerk corrected how the book was categorized within minutes; unfortunately John’s purchase was recorded in the brief time the book was mis-categorized. The clerk’s accuracy and attention to detail has been noted and he has been promoted. John, on the other hand, has been detained at the airport.