top of page
  • Team

Context on the recent US decision to legalize LinkedIn data scraping

Updated: Apr 21, 2022

Debbie "The Data Diva" Reynolds discusses the US Computer Fraud and Abuse Act (CFAA) which includes information about the LinkedIn vs. HiQ case which is currently making headlines for making data scraping legal in the US

Many of our members and partners will have heard the news that a US appeals court has ruled that the practice of web scraping is LEGAL, dealing a blow to Microsoft-owned social networking site LinkedIn, which claimed the practice endangers their user's privacy.


LinkedIn post draws in expert opinions


Somewhat ironically, this decision kicked off a lively discussion thread on LinkedIn (where else?) that drew feedback from a host of leading privacy professionals from around the globe.


One of them was Debbie Reynolds (aka "The Data Diva") of Debbie Reynolds Consulting who offered some excellent context on the topic:

"For anyone who did not understand the difference between privacy as a human verse a consumer right, this ruling is a great example of the distinction. We lack the right to "not share" our data in the US as these rights are business-focused, not human-focused." - Debbie Reynolds on LinkedIn

Debbie shared a previously-recorded YouTube video (posted above with permisison) that provides a cogent breakdown of the LinkedIn case plus a related case involving Clearview AI along with her expert insights on its implications.


Another comment offered in the LinkedIn post was this comment from privacy expert Derek A. Lackey of Newport Thompson:

"The court is not looking at this through a personal privacy lens as many of us privacy professionals would expect them to. They are looking at it through the lens of fair business practices. Typical of the marketplace for the past 2 decades they seem to forget there are real people behind the "scraped data" that may be affected down the road. This is a very narrow viewpoint and I believe LinkedIn must continue the fight. Could this be why the US is becoming an island from the rest of the world? Their lack of "giving a damn" about personal privacy is staggering." - Derek Lackey on LinkedIn

Are all data copies created equal?


The debate on whether web scraping should be viewed as a scalable form of data collection that's essential to research, or banned as a watered-down version of identity theft is not going away anytime soon.


Interestingly, the Linkedin comments included a sub-thread on the nature of data copying itself, and what sort constitutes a genuine threat to our collective right to privacy.


For example:

  • Does auto-saved versioning of a data table represent a copy?

  • Is a temporary file cache a copy?

  • Is virtualized data (distributed query) a copy?

Chris McLellan, Director of Operations at the Data Collaboration Alliance (and the original poster) noted;

"There are no silver bullets. I suspect humanity will be chasing down and stamping out unwarrented data copies until the end of time. But that's not a good reason not to start now. 🤣" - Chris McLellan on LinkedIn

What's 'stealth scraping'?


Did you know that there's an approach to scraping data that goes virtually undetected? It's called "stealth scraping" and uses Google Chrome's cached pages rather than publisher web pages as its source.


But Ido Safruti, co-founder and CTO as PerimeterX pointed out waaaaay back in 2018, there may be a reasonably quick fix for this:

"If protecting your content on a specific page is more important than making that page’s content available on Google when your site is down, you should instruct crawlers not to display a cached version of that page. The instruction could be for crawlers in general or for Googlebot specifically." Ido Safruti in Forbes