I am Not a Number, I am a Bunch of Numbers

(This is the third installment in an on-going examination of the first principles of data privacy and security. The first installment can be read here. The second installment can be read here. These principles, often represented in regulations and privacy practices, form the foundation for how an organization should treat the customer data they collect.)

We’ve both been awake for a while now. We’re almost done. I know what she wants. This isn’t my first time. She’s given me almost everything she can. It’s not her first time either. To remember what we’ve done. What I’ve done. To remember for a long time and to look back on it. She looks me in the eyes. She smiles and holds out her hand. I know what she wants. I hand her the plastic card. She scans it and I get five percent off selected items and a coupon good for one dollar off my next purchase of paper towels.

The paragraph above might seem like just a spoof, a shopper’s aubade that describes a mundane interaction at a supermarket as if it was the parting of lovers. But whereas the bittersweet parting of two lovers is an intimate moment, this is anything but that. Not just because a supermarket is a public place or because the transaction is essentially commercial. The scene cannot be intimate because the memory is inanimate. The “memory” to be created is data. Unless I’m a regular, there is no possibility that I will remember the cashier or that the cashier will remember me. Regardless, neither of us will ever look back and think “yes, that was the Saturday that the milk was purchased.” The memory is data and it is neither created by the cashier nor by me. When that card is scanned, each individual item I have bought at the supermarket that day will be remembered by being described, date stamped, time stamped and ascribed to me. It will be written out as a record and persisted in a database.

In the previous article, I discussed the data practice principle referred to as “notice.” Notice can be summed up as telling people what data you collect about them and how you intend to use it. For example, the first sentence of the BBC’s privacy notice is “When you interact with the BBC we sometimes receive or collect personal information about you.” And they proceed to tell you what they might do with what they collect. As I mentioned at the end of the last article, there are roles played in this transaction that should be defined and understood. These roles describe who does what in the exchange of data. While sometimes one individual or entity performs more than one role, any piece of data always has:

A subject (who it is about)
A receiver (who captures it)
Collectors (those who store and aggregate it)
Users (those who get value from accessing it)
Regulators (those who govern)

A notice tells you that “we receive, collect and use information about you according to some rules that regulate/govern this.” To fully understand what a given notice is telling you, you need to identify who is performing all these roles. For the Information Security professional to ensure data confidentiality and integrity, understanding these roles is crucial. Below, I will discuss each, beginning with regulators and working my way up the list.

Those who govern

A regulator can be any of the following: a combination of government agencies, a board of directors, a company’s general counsel and entities brought together for the purpose of governance like a board audit committee, a policy committee, or an Institutional Review Board (IRB)—a committee that research institutions put in place to ensure that research is conducted in an ethical manner.

The privacy notice itself is the most public demonstration of data regulation/governance.

In reality, the governance statements in a privacy notice range from the most specific to the vaguest part of the notice. In fact, they are sometimes both—the vaguest and most specific— in the same notice. For example, NASA.gov’s privacy notice is very specific when it tells you, “We will protect your information consistent with the principles of the Privacy Act, the e-Government act of 2002, the Federal Records Act, and as applicable, the Freedom of Information Act.” But then it has this vague disclaimer: “NASA will only share your information with another government agency if it relates to that agency, or as otherwise required by law.” (http://www.nasa.gov/about/highlights/HP_Privacy.html#privacy )

To be fair, all anyone should expect from a privacy notice in terms of defining the role of regulators and other governors is the promise that the organization will obey the law and has a concept that it refers to as “authorized uses” and will only use your data for those uses. The implication is that some group has the role of determining these “authorized uses.” And that group will govern. If the notice does not inform you about who that group is, then it is safe to assume there are at least two groups. One is the group of government agencies that govern/regulate that industry (the default here is the Federal Trade Commission and your State Attorney General). The other default governing body is whatever group runs the organization you’re dealing with (usually referred to as “the Board”). If you’re the subject of data recording, collection and use, it is good to know who is governing the activities that result in the existence of data about you. These groups not only govern what happens to your data, they also define and enforce your rights once your data is received.

If you are the users, collectors and receivers of data, knowing what entities govern your activities (and by what rules) is crucial.

Those who use, collect and receive

The Security Professional is usually aligned with an entity that has at least one of these three roles. In reality, when it comes to data that is stored and transmitted electronically, it is difficult to find an entity that does not do all three — use, collect and receive — all the time. Here are three examples that illustrate how entities are almost always users, collectors and receivers of data all at once.

Phone companies may have once been thought to have only transported data — the phone calls they enable. However, when the phone companies facilitate wiretaps and surveillance, they are receivers and collectors, but some other entity is the user. In order to operate as a business, they have to collect and use data about all the data they receive.

When you use your ATM card at an ATM machine that is not operated by your bank, the transaction is between you and your bank. Nonetheless, the entity operating the machine has to save data to reconcile its transactions, use data to calculate fee income, and analyze the data to know if it’s worth maintaining that machine at that location.

Consider how credit card data flows. For simplicity’s sake, let’s say a bank issues the credit card. The card is used at a store and the transaction is received by a point of service device which holds on to it at least long enough to transmit it to an entity that approves or declines the transaction. If the purchase is approved, the information goes to a payment processor. The payment processor keeps track of these transactions for the bank and manages the account, calculates interest, mails statements etc… The payment processor collects the data, adding to it as the account ages, but the bank uses it both by accessing the information at the payment processor and by kicking off another data flow that results in the transfer of funds. Also, data about that credit card account is sent to Credit Bureaus, who are another user. Primarily, the bureaus receive, collect and use descriptive data about the account, such as the account balance, available credit, 24 months’ worth of payment history, and other data.

The Credit Bureau uses the account data in two ways. First, it adds the account data to the information it has on the account holder. It then compares the total collection of the account holder’s details to scoring models and comes up with one or more “scores”—numbers that predict how the account holder will behave in the future and, by law, is “empirically derived, demonstrably and statistically sound.”[1] In addition, the Bureau also may collect the data as de-identified and use it regularly to refine and update the scoring models.

This last example about how data is received, collected, and used a number of times, is not at all unique to the credit card industry. Splitting data use into “what can be learned about you” and “what can be learned from you” is increasingly sought after where large data sets are available. This leaves us with one last role to define: “you.”

The subject of the data

I grew up in the Bronx. A block from my apartment building was a bakery and like most kids, I liked going to bakeries. This was a friendly place with cases of cookies and cakes and shelves of fresh bread. It smelled good in there. And the people behind the counter seemed to like kids. There was one older woman who always smiled. When I came in, it seemed like she singled me out. But she called all the kids over when they came in. She would always call us over and give us each a cookie. She had a kind face and an Eastern European accent and a number tattooed on her arm. I grew up knowing that it was bad to number people.

In the decades immediately following the atrocities of World War II, reducing someone to a number was one of the actions that represented those atrocities. The fact that most of the survivors of those atrocities had those numbers tattooed on their arms was a chilling reminder of that de-humanizing. It was considered so demeaning to reduce someone to a number that the late 60’s TV show, The Prisoner, had the main character regularly repeat “I am not a number, I am a free man.” We never learn his name. The juxtaposition is between being a number and a free man. As if the number itself imprisoned him. In a reversal that emphasizes the perceived sinister importance of numbers as identity, the most famous spy in the movies has, for decades, openly told strangers his name (“Bond. James Bond,”) while only a select few know him by his number: “double-oh seven.”

A new layer of meaning has been added to the association of people to numbers. As transactions move from a world of “in person,” “in kind” and “in cash,” a data centric layer has been added to transactions. That layer requires unique identifiers. Given the current state of technology, using unique numbers to represent unique identities is the most accurate and efficient method we have usually. People are generally aware of the inaccuracy of this scheme. They know if they lend their supermarket rewards card to a friend to use, that they, the registered owner of the card, will have those purchases ascribed to them.

Most importantly, the accuracy of identifying people with the data about them is at once an illusion and an agreement. Subjects of data are usually most interested in two aspects of data identifiers. They are interested in the agreement because when a subject of data identifies themselves using that number, they are agreeing to have their actions received, collected, used and ascribed to them. Also, they are interested in the illusion because someone else can assume that identity so easily (knowing the number, having the card, etc.). We will look more at this agreement in the article on consent and we will look more at this illusion in an upcoming article on individual access/correction.

The subject of the data worries about identity theft, unauthorized disclosure and sometimes about how their data is used to market to them. The information security professional is charged with protecting the subject’s privacy by safeguarding the data. And the entity, in the notice it shares with the subject, commits to making sure the subject’s information is used appropriately. Laws and policies regulate and govern all this activity. Except when the subject of the data is no longer the subject of the data.

De-identified data, data that actually is stripped of identifying markers, represents the greatest value that “big data” has to offer. And because it is unidentified, the sources of the data are assumed to not care that their data contributed to it. Millions of individuals’ de-identified credit account performance has gone into the development of credit scoring. Predictive models for everything from voting behavior to shopping patterns are developed with increasing accuracy because the quantity and quality of the data going into them keeps increasing. I do not mean to imply that these activities are necessarily good or bad. However, advances in the study of medicine, accident prevention and the effectiveness of certain diets, all owe their acceleration to statistical modeling based on the accumulation of this data. It would be hard to argue against increasing the safety of prescription drug use so it would be hard to object to articles with titles like this one: “Drug safety surveillance using de-identified EMR[electronic medical records] and claims data: issues and challenges.”[2]

And I have nothing against marketing per se or other kinds of activities that might use big data analytics to be more effective.

Like many people, I freely provide my identity at the moment in time when my behavior is predicted by a model. I apply for a credit card and give the credit card company “permissible purpose” to pull my credit bureau report and learn how I compare to a model developed with de-identified data. I provide my family history, my birthdate, my height and weight, and my doctor determines my risk factors by comparing me to those who have similar profiles. When the cashier swipes my card at the supermarket, coupons that are tailored to predictions of my future buying patterns are printed with my receipt. Personally Identifiable Information (PII) is increasingly matched up with models derived from de-identified data and that makes the PII more meaningful than it means by itself.

I think it is time to have the conversation about who owns the information regardless of how it is identified or de-identified. As statistical modeling using big data becomes increasingly common, perhaps individuals should be given more comprehensive notice of how their de-identified data is being used as part of large datasets.

We mostly have gotten used to the idea that our identities can be represented by numbers and this will not reduce us to “just a number.” Some of our decreasing sensitivity to being associated with one number stems from just how many numbers represent us. But while we may have gotten past thinking we could be reduced to just a number, we might want to start coming to grips with the increasing reality that we are each known as a part of a cohort.

[1] Regulation B. Sec. 202.2(p)

[2] Journal of the American Medical Informatics Association 2010;17:671-674 doi:10.1136/jamia.2010.008607

I am Not a Number, I am a Bunch of Numbers

By: David Sheidlower

CISO

Turner Construction

November 25, 2013

Those who govern

Those who use, collect and receive

The subject of the data

Leave a Reply Cancel reply