Aggregating is the inverse of broadcasting. What complicates this is that many technologies are now used for both. Cell phones are the best example. They are a device originally designed for communication.
Their original purpose was for transmitting information between individuals and they have evolved into one that can broadcast that information via social media. It is ironic that they have now been recognized by the US Supreme Court for their ability to aggregate data regardless of whether or not the individual device transmits it:
“Cell phones differ in both a quantitative and a qualitative sense from other objects that might be carried on an arrestee’s person. Notably, modern cell phones have an immense storage capacity. Before cell phones, a search of a person was limited by physical realities and generally constituted only a narrow intrusion on privacy. But cell phones can store millions of pages of text, thousands of pictures, or hundreds of videos”. (RILEY v. CALIFORNIA, No. 13–132. Argued April 29, 2014—Decided June 25, 2014)
The Court goes on to make the understated observation that “This has several interrelated privacy consequences.”
Broadcasts have focused points of origin, a single source. Broadcasting works when information, the broadcast, is made available from that source and received by large numbers of people. By definition, aggregations have many sources but are focused in terminating at a single storage point.[1]
Broadcasting has a bias towards celebrity. Aggregating’s bias is towards anonymity. The bias of an information-processing medium like aggregating or broadcasting is generally recognized, exploited and enhanced by those that use it. It is not a coincidence that the individuals who described this best, Harold Innis and Marshall McLuhan, had experienced the use of radio as propaganda leading up to and during World War II and, at least in McLuhan’s case, studied the rise of broadcast television as it grew in popularity and influence from the 1950’s to the 1970’s.
Aggregating’s bias is towards anonymity. What the Supreme Court acknowledged with the decision quoted above and what the EU Court of Justice also recognized with their decision regarding the right to be forgotten is that when such powerful information processing names a single individual, then that individual’s privacy may be compromised.
In other words, sometimes that anonymity is not assured. For example, when the aggregation is focused on an individual as in the case of a device (like a cell phone) or some specific algorithms (like a search result). Even when data are depersonalized, that anonymity cannot be taken for granted when the focus is on the individual. Dr. Latanya Sweeney demonstrated that almost 20 years ago when she singled out and identified a single individual’s health record in a statewide database of depersonalized records.
There are many privacy efforts aimed at addressing this. There are the court decisions above that focused on individual devices or data points. There are laws, policies and even data generalization and suppression solutions like those described in Dr. Sweeney’s k-anonymity approach. They are all generally seen as privacy protections.
In fact, these efforts are protections of an individual’s identity. But they also inadvertently serve to reinforce the bias of aggregation, the bias towards anonymity. In other words, the fact that aggregation tends to work against information being individually identifiable is part of the reason why, when it does not, that it stands out as a threat to privacy and has received such attention.
(The other, less understood reason that aggregating has a bias towards anonymity is the power of aggregation to take individuals and form them into a cohort. Aggregating is used here to mean more than mass storage however. It is more than, for example, a production transaction database, which serves a front-end application. It is the bringing together of data in order to focus on it. The relationship between identity and cohorts will be looked at in depth in my next article).
It may seem counterintuitive to suggest that privacy protections, the controls that help keep sensitive data from being identified, actually strengthen the argument that handling those very data is biased towards anonymity. Understanding this requires a closer look at those protections.
The right to be forgotten involves an individual’s right to suppress data points about them in an identifiable context. The Court decision around this and the EU Directive on Data Protection contain the details of how the right to be forgotten supports the argument that the bias of aggregation is anonymity and that privacy protections reinforce that.
The EU Court of Justice acknowledged that the data point itself might be accurate. Therefore the right to forget a data point establishes that sometimes the data point can “appear to be inadequate, irrelevant or no longer relevant or excessive in the light of the time that had elapsed” with respect to the subject that seeks to have it forgotten.
In the case of Google Spain SL, Google Inc. v Agencia Española de Protección de Datos (AEPD), Mario Costeja González, the Court did not order that the data point in question be deleted. They did not order that the data be unavailable for all search results. They specifically found that:
…following a request by the data subject pursuant to Article 12(b) of Directive 95/46, that the inclusion in the list of results displayed following a search made on the basis of his name of the links to web pages published lawfully by third parties and containing true information relating to him personally is, at this point in time, incompatible with Article 6(1)(c) to (e) of the directive because that information appears, having regard to all the circumstances of the case, to be inadequate, irrelevant or no longer relevant, or excessive in relation to the purposes of the processing at issue carried out by the operator of the search engine, the information and links concerned in the list of results must be erased. ” (EU decision, emphasis mine)
In other words, provided that the search engine does not include the data point in searches that explicitly name Mr. González, then the data point is fair game to appear in search results.
Consider the EU Data Directive’s definition of “processing:”
(b) “processing of personal data” (“processing”) shall mean any operation or set of operations which is performed upon personal data, whether or not by automatic means, such as collection, recording, organization, storage, adaptation or alteration, retrieval, consultation, use, disclosure by transmission, dissemination or otherwise making available, alignment or combination, blocking, erasure or destruction; (EU Directive 95/46/EC – The Data Protection Directive)
In other words the processing of data which involves aggregating, returning the results of a search, if it does not involve identifying the individual in the initial query is therefore granted the status of not being defined as the processing of personal data. The resulting dataset returned by the search may not be depersonalized, but since the search was, the Court found it did not meet the definition of processing personal data.
This case and its impact on privacy law and practices will no doubt develop and evolve over time. Regardless, it stands as an exceedingly precise example of the acknowledged non-personal nature of aggregation. Anonymity is the effect of aggregation unless the aggregation is specifically focused on the individual in some predetermined way. The two predetermined characteristics recognized above are the individual’s ownership of the aggregating technology (e.g., a cell phone) or the individual being named in the parameters of a search query.
The data point in the EU case, the case of a 1998 report of a real estate transaction in Spain, is, on the one hand, personal; it is history; and, by order of the Court, it is to be forgotten.
On the other hand, however, the Court recognized that there were valid reasons to include that data point in a query. For example, no analysis of the state of the Spanish Real Estate Market in the late 90’s would be truly complete without including this data point. And the Court’s decision in no way prohibits that data point from being in such analysis. In the next article, I’ll look at why the person performing that analysis couldn’t care less.
[1] Nothing related to distributed broadcasts, high availability, data replication or other support technologies materially impacts these observations because they are accurate descriptions of how broadcasting and aggregating function for their users even if the “plumbing” behind the scenes is different.