Context on the recent US decision to legalize LinkedIn data scraping
Updated: Apr 21
Debbie "The Data Diva" Reynolds discusses the US Computer Fraud and Abuse Act (CFAA) which includes information about the LinkedIn vs. HiQ case which is currently making headlines for making data scraping legal in the US
Many of our members and partners will have heard the news that a US appeals court has ruled that the practice of web scraping is LEGAL, dealing a blow to Microsoft-owned social networking site LinkedIn, which claimed the practice endangers their user's privacy.
LinkedIn post draws in expert opinions
Somewhat ironically, this decision kicked off a lively discussion thread on LinkedIn (where else?) that drew feedback from a host of leading privacy professionals from around the globe.
One of them was Debbie Reynolds (aka "The Data Diva") of Debbie Reynolds Consulting who offered some excellent context on the topic:
"For anyone who did not understand the difference between privacy as a human verse a consumer right, this ruling is a great example of the distinction. We lack the right to "not share" our data in the US as these rights are business-focused, not human-focused." - Debbie Reynolds on LinkedIn
Debbie shared a previously-recorded YouTube video (posted above with permisison) that provides a cogent breakdown of the LinkedIn case plus a related case involving Clearview AI along with her expert insights on its implications.
Another comment offered in the LinkedIn post was this comment from privacy expert Derek A. Lackey of Newport Thompson:
"The court is not looking at this through a personal privacy lens as many of us privacy professionals would expect them to. They are looking at it through the lens of fair business practices. Typical of the marketplace for the past 2 decades they seem to forget there are real people behind the "scraped data" that may be affected down the road. This is a very narrow viewpoint and I believe LinkedIn must continue the fight. Could this be why the US is becoming an island from the rest of the world? Their lack of "giving a damn" about personal privacy is staggering." - Derek Lackey on LinkedIn
Are all data copies created equal?
The debate on whether web scraping should be viewed as a scalable form of data collection that's essential to research, or banned as a watered-down version of identity theft is not going away anytime soon.
Interestingly, the Linkedin comments included a sub-thread on the nature of data copying itself, and what sort constitutes a genuine threat to our collective right to privacy.
Does auto-saved versioning of a data table represent a copy?
Is a temporary file cache a copy?
Is virtualized data (distributed query) a copy?
Chris McLellan, Director of Operations at the Data Collaboration Alliance (and the original poster) noted;
"There are no silver bullets. I suspect humanity will be chasing down and stamping out unwarrented data copies until the end of time. But that's not a good reason not to start now. 🤣" - Chris McLellan on LinkedIn
What's 'stealth scraping'?
Did you know that there's an approach to scraping data that goes virtually undetected? It's called "stealth scraping" and uses Google Chrome's cached pages rather than publisher web pages as its source.
But Ido Safruti, co-founder and CTO as PerimeterX pointed out waaaaay back in 2018, there may be a reasonably quick fix for this:
"If protecting your content on a specific page is more important than making that page’s content available on Google when your site is down, you should instruct crawlers not to display a cached version of that page. The instruction could be for crawlers in general or for Googlebot specifically." Ido Safruti in Forbes
Join a "privacy-first" community
In early Summer 2022, the Data Collaboration Alliance is launching a major upgrade to the web app that hosts our Data Collaboration Community - it's a bit like LinkedIn, but with one crucial difference:
In this community, members and partners are able to fully-control access to their data.
It will be a free space where members can collaborate on open datasets for important causes and get access to a growing library of professional tools that have been built by privacy pros, for privacy pros.
We're also introducing a pioneering "Data Owner Bill of Rights" which will offer guarantees related to data access, data portability, and data deletion which are made possible by the Zero-Copy Integration framework on which the new community experience has been developed.
If you're a professional with an interest in Data Privacy and data-centric innovation then we invite you to join us today and earn Founding Member status! 🤗