From cavemen to clay tablets, and databases and dataware, this 10 minute overview provides historical context on how we arrived at today's troubling state of data privacy, and why there is still hope for a better future.
Watch the impact that successive data management technologies have had on people's ability to control their personal information.
The History of Data Ownership
On some level, data has always existed. It was in the chemical composition of the Big Bang, and it has persisted through the electron counts of the first atoms all the way through our DNA today. But all of this data was functionally unobservable until there was something to interpret it - the first mind.
So let’s start there.
The Age of Grunting
How did data exist? As thoughts and non-verbal sounds only.
How easy was it to copy data? Impossible.
How much control did people have over their data? Not applicable.
521,000,000 BCE - The first rudimentary brain appears in the fossil record, and the world is about to get a whole lot more interesting.
252,000,000 BCE - This is the start of the Mesozoic Era, the Age of Dinosaurs. From what modern science knows about life on Earth during this period, it’s reasonable to assume data exchanges involved non-verbal mating displays and territorial markings, and the dinosaur equivalent of, “Back off or I’ll eat you.”
65,000,000 BCE - The Cretaceous-Tertiary extinction event ends the reign of dinosaurs, and kills off some 70% of all life on Earth. No one knows exactly what happened, because the dinosaurs never got around to inventing long-term data storage.
18,000,000 BCE - The earliest ancestors of the great apes show up in fossil evidence around this time, meaning that it’s reasonable to assume data was being communicated through grunts and gestures that started forming the basis of true language as we know it.
200,000 BCE - Homo Sapiens appear on the scene, bringing with them the first true verbal language. But it would be over 150,000 years before they figured out a way to record these words as written data.
The Age of Symbols
How did data exist? Primarily as spoken language, but with the CEded recording abilities of symbols and pictographs.
How easy was it to copy data? Extremely difficult.
How much control did people have over their data? Almost total control.
40,000 BCE - Humans begin recording data as pictures through cave paintings and petroglyphs, but it would take a long time for these early images to evolve into a true written language
7000 BCE - So-called “proto writing” dates back to this time, with runes and symbols that convey meaning but don’t fit into a full written language system.
5000 BCE - Some cultures start using parchment mCEe from animal hides as their written medium of choice.
3500 BCE - Sumeria and Egypt have full-on written languages by this time, with thoughts carved into tablets of stone and clay.
3000 BCE - Papyrus, an early precursor to paper as we know it, becomes popular in Egypt and other societies throughout the Mediterranean.
2050 BCE - A brewer named Alulu in ancient Sumeria sells some beer to a customer. We know this because the small, clay receipt survives to this day.
1600 BCE - In China, bone, silk, and bamboo are popular choices for writing. Bone and bamboo are fragile, while silk is prohibitively expensive. Still, it takes over 2000 years to find a better solution.
The Age of Paper
How did data exist? Writing was much more commonplace, but literacy rates were extremely low.
How easy was it to copy data? Very difficult - all copies were made by hand, and someone had to have a unique set of skills in order to make a copy.
How much control did people have over their data? Still almost total control, although the relative portability of paper meant that control was indeed slipping away.
179 BCE - The earliest known paper fragment - a piece of a map - dates back to approximately this year.
105 CE - Han Dynasty court official Cai Lun formalizes the recipe for paper, combining plant fibers, rags, and other materials. Over the next few hundred years, Cai Lun’s paper recipe would spread throughout Asia, and paper would become an important trade item.
1100 CE - The techniques for making paper spread to Egypt, and from there quickly make it to the rest of the world. Paper would rule written data for nearly 1000 years.
The Age of Printing
How did data exist? Written language was fast becoming the dominant data form.
How easy was it to copy data? Not that hard. While printing presses were expensive and relatively hard to come by, once you had access to one it became easy to create mass copies of data - at least in comparison to the time and effort it took to create copies in the past.
How much control did people have over their data? Lots, although the proliferation of paper started making recordkeeping more difficult.
1440 - The invention of the printing press revolutionizes written data, making it possible to mass produce written documents for the first time ever. Previously, all writing needed to be copied by hand.
1605 - The world’s first weekly newspaper is published.
1650 - Estimates indicate that over 50% of the population in England could reCE, making it one of the first societies to break that mark.
1801 - Carbon paper is invented, making data easier than ever before to copy.
1804 - Punched cards are used to automate looms in the textile industry, laying the groundwork for computers to come.
1822 - Charles Babbage introduces the difference engine, a complex mechanism for conducting complex calculations.
1832 - Russian inventor Semyon Korsakov uses punch cards to store information for the first time, and invents the concept of using machines to search through data.
1936 - Alan Turing creates a machine capable of reading and following directions entered via tape, an early predecessor of the modern computer.
1949 - The first Xerox photocopiers hit the market, and the era of mass data copying begins in earnest.
The Age of Punch Cards
How did data exist? Paper was joined by punch cards and eventually the digital database.
How easy was it to copy data? Easy. Especially once digital data enters the picture.
How much control did people have over their data? Not as much. Copy machines made it very easy to replicate data, and as paper and digital mediums started to collide it created the problem of having multiple sources and formats of data.
1958 - The first computer is built. It would take less than 40 years for this new technology to displace paper as the king of data.
1959 CE - A patent is filed for the first microchip, and the world is about to change forever.
1967 CE - ARPANET is created in the United States, serving as a sort of prototype of the modern internet.
1979 CE - Oracle creates a relational database that becomes the standard for data storage for the next 40-plus years.
1991 CE - AOL for DOS is introduced, and will soon bring millions of Americans online.
1996 CE - Digital data storage became more cost-effective than paper data storage, signaling the end of paper’s dominance over data.
The Age of Databases
How did data exist? As paper fell out of favor, data existed primarily in digital format.
How easy was it to copy data? Extremely! The advent of online communications made data easier to copy than ever before, and data proliferation started to run rampant.
How much control did people have over their data? Very little.
1998 CE - Google Search launches, and an empire is born.
2006 CE - The first Amazon Web Services product is launched, paving the way for
2008 CE - Hadoop provides a search engine for working with unstructured data on a massive scale, opening the door for Big Data research.
2010 CE - The term Data Lake is coined. Data proliferation is at an all-time high.
20XX CE - Google Docs are launched, and hint at the shift in collaboration soon to come to data itself
201X CE - Data Virtualization
201X CE - Data Warehouse
201X CE - Data Integration
201X CE - Blockchain
201X CE - Homomorphic Encryption
The Age of Collaborative Intelligence (future)
How does data exist? Data is digital, but instead of databases and copy-based data integration, new digital solutions can be powered by ‘Zero-Copy’ systems and data access grants.
How easy is it to copy data? It’s extremely difficult (particularly in large volumes) without an access grant from the rightful data owner.
How much control do people have over their data? A huge amount - down to a single cell of data. Meaningful ownership is restored for the first time since people started scratching images into a piece of rock.
2018 CE - [Dataware] After 40 years of data silos and copies, new network-based architectures are introduced. Modeled on the human brain, they represent the first major evolution in data management for over 40 years.
2022 CE - Zero-Copy Integration framework is established
2021 CE - [CIN ] the Data Collaboration Alliance launches the Collaborative Intelligence Network for a global infrastructure of controlled data
Transcript
Data clearly has a lot of value—especially sensitive personal data.
You’d think that we’d want to own and protect it…
But instead, we copy it like crazy...
Making it almost impossible for anyone to own data in a meaningful way.
It wasn’t always like that. Data used to be highly protected - but for a lot of reasons, we lost our way.
Let’s explore the history of data ownership and see how we got to where we are today… and where we’re going.
Meet Ug.
Even though he lives in a cave and thinks fire is magic, his brain holds more data than any of the biggest companies on earth today.
He’ll always own his data, because it’s all in his head.
The only way he can share information is by grunting and gesturing, which might be mimicked, but can’t be copied.
Ug and his friends didn’t have a lot to say, but they were in total control of their information.
That changes about 150,000 years later, when humans start scratching and painting information onto stone.
In this form, data has left the human brain and exists physically for the first time.
This makes the first cracks in data ownership—someone could carve some prehistoric graffiti over the painted information long after the creator left the scene.
Things get trickier when humans start using abstract symbols to convey information as codified, copyable data - aka writing.
Just look at poor Alulu, a real brewer who sold some beer to a customer in ancient Sumeria.
We know this because a small clay receipt that recorded the transaction survives to this day.
Alulu lost control of his data because it was recorded in a copyable and portable format.
However, while written data was now portable, so few people could copy - or even understand - the writing on these tablets that data ownership was still easy to preserve.
When people swap clay and stone for papyrus and paper, data becomes even harder to control.
Paper as we know it today originated in China, and from there spread around the globe.
The earliest surviving scrap of paper - a piece of a map - dates to 179 BCE.
Low literacy rates are still helping to maintain control over data, as any copying must be done by hand by trained scribes and most of the world still can’t read or write… but not for much longer.
In 1440 Johannes Gutenberg invents the printing press and makes it possible to mass produce data for the first time ever.
200 years later, over 50% of the population in England could read, making it one of the first societies to break that mark.
In 1949, the first Xerox photocopiers hit the market and the era of mass data copying begins in full.
Xerox is so good at data copying it becomes a synonym for it. Airplanes make it easy to mail data around the globe.
Advances in technology mean that data is quickly getting out of control. In the span of just 600 years, humanity has achieved the ability to make copies with minimal effort. Advances in literacy mean these copies can be understood by millions of people.
But it’s still paper-based, meaning it could be physically locked away, and not everyone had a photocopier.
New and more advanced machines now enter our history of data ownership.
Punch cards and early computers make technology and data intertwined.
The first modern computer is developed in 1958, and the first microchip is right around the corner.
In the late 60s, Arpanet - an early internet used by researchers and scientists - hints at what is to come in terms of large-scale data copying.
Things start to change fast, as data slips further out of our control.
Data has become extremely easy to copy, and data proliferation starts to become a real problem with major consequences for control and ownership.
Copying isn’t quite as easy as mouse-click yet, and not everyone has access to this technology, but the digital seeds have been sewn.
Oracle launches the first relational database in 1979, setting a new standard for data storage… and new depths for data control.
Applications and app-specific data silos follow close behind.
Just a dozen years later, America Online introduces millions of everyday people to the web - and the world of large-scale cut-and-paste data copying.
In 1994, 23-year old Netscape engineer Lou Montull invents the browser cookie.
Real cookies are good, but browser cookies invisibly collect and copy data for advertisers, setting the stage for the user tracking and targeted marketing we all know so well today.
By 1996, digital data storage is more cost-effective than paper, officially ending paper’s thousand-year reign as king of data.
Data is fast becoming impossible to control. It exists almost entirely digitally, and can not only be copied but transported around the world instantly and with little effort.
When the Apple App Store launches in 2008 and introduces the world to “an app for everything,” the era of data silos is in full swing.
There’s a database for every app, and copies of data for every database.
New technologies like Data Lakes and Data Warehouses emerge to connect the silos through a process known as data integration.
Which really involves making more copies of data than ever before.
Like the loss of control created by Lou’s cookies, this loss of ownership is largely unseen by human eyes, happening without anyone even knowing.
It’s been over 200-thousand years since Ug first interpreted data with his powerful brain, and over 40-thousand years since data was first recorded in any meaningful way.
Data copies are now rampant and people have next to no control over their information.
But a new innovation aims to change all that by digitally replicating what worked so well in the first place - Ug’s beautiful brain.
After 40 years of Web 2.0, defined by data silos and copies, new technologies that square the circle of integration and control are entering the scene.
Innovations like Blockchain, data fabrics, and dataware are reshaping the data management landscape by doing things like decentralizing data into nodes and using data links instead of copies.
In 2022, the Zero-Copy Integration framework is established to provide innovators with an approach to build new technology without creating new databases or copy-based data integration.
This is the start of the elimination of silos and copies that have been the main threats to data ownership in the digital era.
Giving control of data back to the people and organizations who create it - just the way Ug would have wanted.
If you’re ready to join the zero-copy revolution, visit datacollaboration.org
About the Data Collaboration Alliance
The Data Collaboration Alliance is a nonprofit that’s dedicated to establishing CONTROL as the foundation of meaningful data ownership and global collaborative intelligence. Our approach of eliminating copies to achieve this goal is similar to how societies already protect the value of currency, identity, and intellectual property - and it works for data, too. Our advisors include the Executive Director of the Mozilla Foundation and the ex-CIO of Dropbox.
In concert with our partners, we're accelerating the establishment of new technologies, standards, and methodologies in data management and application development that support a future for technology that's more controlled, collaborative, and efficient.
Key programs:
Community - our members access free privacy apps and data crowdsourcing
Partner Success - amplifying products, content, and leaders
Advocacy - we support 'zero copy' technologies, standards, and methodologies
Research - we support proofs of concept for web3 interoperability
Software For Good - our partners support data-centric research teams
Speakers Bureau - thought leaders available for media and events
Stay up-to-date with our progress by subscribing to our newsletter and podcast and following us on LinkedIn and Twitter.
Comments