Scraping Away at Computer “Crime” – Federal Appeals Court Rules Against LinkedIn in online “scraping” case

Your domain is your domain. Your website is your website. You decide who can access your site, who can access your data, and how they can do that. You make those decisions through both technology (e.g., code, access control, userIDs, passwords, multifactor authentication) and contracts (terms of use, terms of service, privacy policies, software license agreements, etc.).

If a person or entity accesses your website or data without your consent, or in a way that you haven’t consented to, is that a “trespass?” Is it a crime?

In the “real” world, if you own a store (like say, Walmart), you can decide – to some extent – the terms and conditions under which a person may access – that is, enter – the store. You can decide that they are not permitted to “open carry” firearms in the store, or that they can’t enter before 7AM or after 9PM. If someone violates these “terms,” they are accessing the building without authorization (or in excess of their authorization) and therefore, in theory at least, “trespassing.”

So, by posting a “policy” on what people can and cannot do in your store, any violation of that policy subjects the violator to both ejectment and arrest. For example, in 1996 a man named Ronald Kahlow walked into a Best Buy in Reston, Virginia, entered the prices being charged for TVs on his laptop so he could comparison shop, and refused to stop when requested by store personnel, who said that they don’t permit recording of prices “for competitive reasons”. He was arrested for trespass. Kahlow was tried by the Commonwealth of Virginia, but acquitted by a Judge.

In the virtual world, it’s much the same way. Access to websites and other electronic “locations” are conditioned upon agreement to myriad contracts and policies. In theory, a violation of any of these policies makes the access “without authorization” or “in excess of authorization” and therefore subjects the violator not only to ejectment and arrest, but also to civil liability. Thus, when a woman named Lori Drew created a social media profile in the name of a fictitious person in violation of the social media’s terms of service, she was similarly arrested and tried for computer trespass. She was similarly tried and acquitted by a jury.

The United States Court of Appeals for the Ninth Circuit on September 9, 2019 considered the case of hiQ, a web analysis company, that used automated bots to collect publicly accessible data from users LinkedIn profiles.

Although the user data is (or can be made) publicly accessible by the users, LinkedIn restricts access to that data by, for example, prohibiting access by automated bots in its “robots.txt” file, and by seeking out and blocking automated access to its site. LinkedIn’s “Sentinel” system identifies automated access and throttles it, blocks what it deems to be “suspicious” IP addresses and collects data on and blocks about 95 million automated attempts to scrape data from their website every day.

Using automated bots, hiQ scrapes information that LinkedIn users have included on public LinkedIn profiles. LinkedIn sent hiQ a cause-and-desist letter, demanding that hiQ stop accessing and copying data from LinkedIn’s server.

HiQ filed suit, seeking injunctive relief based on California law and a declaratory judgment that LinkedIn could not lawfully invoke the Computer Fraud and Abuse Act (“CFAA”), the Digital Millennium Copyright Act, California Penal Code § 502(c), or the common law of trespass against it.LinkedIn’s user agreement – which regulates access to its website and data contained therein, provides:

users agree not to “[s]crape or copy profiles and information of others through any means (including crawlers, browser plugins and add-ons, and any other technology or manual work),” “[c]opy or use the information, content or data on LinkedIn in connection with a competitive service (as determined by LinkedIn),” “[u]se manual or automated software, devices, scripts robots, other means or processes to access, ‘scrape,’ ‘crawl’ or ‘spider’ the Services or any related data or information,” or “[u]se bots or other automated methods to access the Services.”

LinkedIn wanted to restrict access to “its” user data because it was using the data for its own analytics, and selling that analyzed data to its customers. It had invested substantial sums in creating and marketing the platform and encouraging users to log in and provide the data. It was seeking an injunction preventing hiQ from accessing the site, and the data, and had sent a “cease and desist” letter to HiQ to that effect. The Court enjoined LinkedIn from enforcing that “cease and desist” letter.

Whose Data is it, Anyway?

First, the Court noted that, while LinkedIn collected the data, it wasn’t LinkedIn’s data – it was data that LinkedIn’s users generated. The Court was unconvinced by LinkedIn’s argument that restricting the users’ data to analysis (and sale of that analysis) only by LinkedIn and not by any competitor was intended to protect the privacy of that user data, noting that LinkedIn allows its paying customers to access data of registrants, including to “follow” them, to be notified when their profiles change, and to connect with them for recruiting, marketing and sales. The Court also rejected the argument that LinkedIn has a recognizable “property” interest in the consumer data – the data belonged to the consumer, who decided to post it an make it accessible to the public.

Forgive Me My Trespasses

The more difficult part of the case involves the question of whether LinkedIn, like Walmart, can restrict access to otherwise “public” websites or data to competitors who wish to use that access and that data for things that are not in LinkedIn’s interest. It’s not that hiQ’s scraping is dangerous or harmful to LinkedIn’s networks. There was no argument that the scraping slowed down the network, deleted data, or impeded others’ access to the network.

No. LinkedIn wanted to keep hiQ from scraping the site because it didn’t want a competitor from having access to its users’ data, because LinkedIn wanted to be able to sell analysis of the data, and keep hiQ from doing so. And perhaps LinkedIn has a right to do so.

But is accessing the website in a way that may violate the Terms of Service, a per se violation of the trespass laws? This is where it gets sticky.

The federal (and most state) computer crime laws prohibit someone from “intentionally accessing” a computer “without authorization” or “in excess of authorization”. So the Court had to decide whether hiQ, in accessing what was essentially publicly accessible data in a manner that was prohibited by the website’s terms of use, constituted “accessing without authorization.” The Court concluded that it was not, noting that “[t]he [law]was enacted to prevent intentional intrusion onto someone else’s computer—specifically, computer hacking.´

The Court rejected what is sometimes called the “contract” theory, or “misappropriation” theory of computer trespass that has been adopted by other courts. Under the “contract” theory, a violation of terms of service renders the access which caused the violation into an “unauthorized” access, and therefore a trespass. Under the “misappropriation” theory, when someone uses access to a computer or computer network to “take” something that doesn’t belong to them (e.g., an employee taking customer data in anticipation of using that data to compete with their current employer) they again access the computer “without authorization” and violate the trespass statute.

The California federal court instead stated that “We therefore look to whether the conduct at issue is analogous to “breaking and entering.” They also restricted the statute to “unauthorized access” to “private information” through which there is, quoting my colleague Orin Kerr, “an authentication requirement, such as a password gate, is needed to create the necessary barrier that divides open spaces from closed spaces on the Web.”

Because the LinkedIn data was “public” and because there was no barrier to access it (other than the legal barrier of the contract) the access to the data was not “without authorization,” and therefore there was no computer “trespass”. The court observed that:

the CFAA contemplates the existence of three kinds of computer information: (1) information for which access is open to the general public and permission is not required, (2) information for which authorization is required and has been given, and (3) information for which authorization is required but has not been given (or, in the case of the prohibition on exceeding authorized access, has not been given for the part of the system accessed). Public LinkedIn profiles, available to anyone with an Internet connection, fall into the first category.

Really? Are LinkedIn profiles any more “public” than, say Facebook profiles? Are they still “public” when LinkedIn sends a cease and desist letter demanding that hiQ no longer access them? In a previous case, the same Ninth Circuit Court of Appeals decided a dispute between the social media giant and a company that sought to use automated tools to scrape “public” user profile data for marketing purposes. The Court ruled that, after Facebook sent the company a cease and desist letter, that company’s access to Facebook – at least for the purposes of scraping data – was no longer authorized in part because “Facebook has tried to limit and control access to its website” as to the purposes for which [the company] sought to use it. Indeed, Facebook requires its users to register with a unique username and password, and [the company] required that Facebook users provide their Facebook username and password to access their Facebook data on [the company’s] platform.” OK. But “public” data is public, right? Not clear. The Court summarized its position by stating: it appears that the CFAA’s prohibition on accessing a computer “without authorization” is violated when a person circumvents a computer’s generally applicable rules regarding access permissions, such as username and password requirements, to gain access to a computer. It is likely that when a computer network generally permits public access to its data, a user’s accessing that publicly available data will not constitute access without authorization under the CFAA.

The Court also rejected challenges based on hiQ’s alleged violations of the Stored Communications Act (limiting unauthorized access to communications), and common law trespass, but intimated that LinkedIn might have better luck if it sued hiQ under a common law “trespass to chattels” theory, or copyright infringement, misappropriation, unjust enrichment, conversion, breach of contract, or breach of privacy. Or not. The opinion merely suggests, but does not decide.

Lesson for Companies

The law of computer trespass, like the law of trespass generally, is muddy. It’s particularly difficult to prevent people and companies from using “legitimate” access to your website and data for what you decide are “illegitimate” purposes. Interestingly, the Court ruled that LinkedIn itself might be liable to hiQ for “tortious interference” with hiQ’s contracts with third parties – in other words, that even by kicking hiQ off its servers and networks with technical tools (rather than legal threats) LinkedIn might be doing something wrong.

Companies need to better define what is and is not permitted, and what data does and does not belong to them. While consumer’s data (the stuff they input) may not “belong” to the company, the analysis, compilation and structure of that data may be proprietary, copyrighted, or otherwise protected. Companies need to balance making their customer data public – or allowing the customers to make it public (like LinkedIn, Facebook and others do) with their corporate interests in keeping some of the secrets. And privacy remains a problem. While LinkedIn users know they are sharing their data with LinkedIn and others with access to the site, do they REALLY know who is accessing the data, how, and for what purposes? I think not.

The law on computer trespass continues to evolve. Magic 8 ball says, “situation murky, ask again later.”

Mark Rasch is an attorney and author of computer security, Internet law, and electronic privacy-related articles. He created the Computer Crime Unit at the United States Department of Justice, where he led efforts aimed at investigating and prosecuting cyber, high-technology, and white-collar crime.

Scraping Away at Computer “Crime” – Federal Appeals Court Rules Against LinkedIn in online “scraping” case

By: Mark Rasch

Cyber Law Editor

Security Current

September 11, 2019