In August of 2010, Huping Zhou who had served as a researcher at the UCLA School of Medicine and had since been terminated, was sentenced to jail time for inappropriately looking at the medical records of his immediate supervisor and some notable celebrities including Drew Barrymore, Arnold Schwarzenegger, Tom Hanks, and Leonardo DiCaprio.
He had violated the privacy of individuals not under his care; he had abused his legitimate access to the electronic medical record system; and he had violated Federal privacy law.
Recently, the Bernie Sanders 2016 presidential campaign fired Josh Uretsky, their national data director. The campaign has said that the immediate cause of the termination was his inappropriate access of data owned by the Hillary Clinton campaign.
The employee claims he was documenting a bug in the software program that allowed him access to the data in the first place to illustrate the extent of the bug. The Clinton campaign claims he was exploiting a temporary flaw in the separation between the data of the two campaigns.
While at first glance, this can be described as another case of a worker abusing their legitimate access to records, the difference between Uretsky’s case and Zhou’s represents how the world of big data, massive collections of individual records that are used for precise analytics, has created a new paradigm in personal data.
The Democratic National Committee, through a third party vendor, maintains a comprehensive voter file and licenses use of it to campaigns for a fee. Both the Clinton and Sanders campaigns subscribe to this service.
This information is so essential to modern day campaigns that when the DNC temporarily suspended the Sanders campaign’s access to the system, the campaign filed suit in Federal Court to have the access restored.
In the suit, the campaign points out as an example of the importance of the file that “In a fundraising drive conducted between December 14, 2015 and December 16, 2015, the Campaign raised more than $2,400,000.00 – or more than $800,000.00 per day. Most of this money came from individual donors identified through, inter alia, the strategic use of Voter Data.” (https://berniesanders.com/wp-content/uploads/2015/12/Bernie2016vDNCComplaint.pdf )
In the case of “celebrity snooping” of medical records, patient privacy is clearly breached. But in the case of the campaign data, individual voter privacy was not necessarily accessed inappropriately.
According to the suit, by contract with the DNC, the campaigns have access to “demographic and geographic data for registered voters (such as name, address and jurisdiction); email addresses; voter registration status; telephone numbers; vote history; commercially acquired consumer data; ethnicity information; political party preference or affiliation, if any; candidate preference data, if any; and other key analytic metrics selected by the DNC.”
What was inappropriately accessed was the proprietary information that the Clinton campaign had, at their own expense, appended to the individual records. These derived attributes account for the real value of the data to the campaign.
The value is in the analysis these attribute enable and that allow the campaign to plan and execute strategies. The privacy expert Sara Degli Esposti describes this value as being contained in “actionable insights” which lead to “interventions.” In her article “When big data meets dataveillance: The hidden side of analytics”, which appeared in the journal Surveillance & Society, she writes:
“the term ‘actionable insights’ indicates a form of discernment generated to produce an action, rather than a theoretical description or comprehension of a phenomenon. Accordingly, the term ‘intervention’ gives emphasis not only to the active role played by analysts in creating the new knowledge, but also to the potential for change embedded in the knowledge created.” (Degli Esposti, S. 2014. When big data meets dataveillance: The hidden side of analytics. Surveillance & Society 12(2): 209-225. http://www.surveillance-and-society.org | ISSN: 1477-7487)
When big data is analyzed, actions can be taken based on that analysis. Hence, the Sanders campaign data analysts (three others aside from Uretsky appear to have accessed the data), if they were intentionally accessing the Clinton data to learn about the opposing campaign’s strategies or develop new ones of their own, were essentially doing what they were hired to do: run queries against the data and get actionable insights from it.
Perhaps the most famous actionable insight derived from data analysis was one of the first ones. In London in 1854, John Snow mapped the data on the Cholera outbreak centered around Broad Street and determined precisely what corrupted water supply must be causing the disease. Data analysts (now also known as “data scientists”) take great pride in the “aha” moments when their insights lead to breakthroughs. And this desire to find that game changing insight in the data may cloud the analyst’s judgement.
Whatever his motivation, Uretsky’s accessing the data can be seen as an example that when it comes to data access, there is a difference between “can” and “should.” Because individual level privacy was not compromised, this breach did not fall under any of the notification requirements that are law at both the Federal and State levels. It came to light because of the very public reaction of the DNC, which suspended the Sanders campaign’s access to the entire system.
Increasingly, in the world of big data, there is data about you that you have a stake in. For some types of data, your rights are spelled out in regulations such as Health Insurance Portability and Accountability Act (HIPAA) and Equal Credit Opportunity Act (ECOA) and then there is data about you that is not controlled.
These data attributes, derived from your personal data, group you into cohorts and allow the organizations that create and use it to take action on it. While there are some cohorts you belong to that you are aware of (i.e., “female between the age of 30 and 34”), there are others you can only guess at. I in this case “most likely to vote for Clinton” or “most likely to donate $100 or more.”
While it is a generally accepted principle of privacy among regulators that individuals should have the right to know what data is collected about them and be able to correct it, that right does not extend to the cohorts a data collector puts the individual in.
Being in the cohort that is the US no fly list perhaps is the best-known example of this. If a public record search showed that someone with a name identical to yours had been arrested on suspicion of terrorism, you could challenge that that record shows up in searches about you. But that would still not necessarily allow you to know that your name is on watch lists or get it removed from them.
When identity is abstracted from privacy, data is no longer in the control of the subjects of the data. This is both a loophole in regulatory frameworks and a necessary protection to ensure the proper handling of large datasets.