Table of Contents
As you know, every second of every day, we’re generating and acquiring new data. As you read this, someone is collecting data on the fact that you are reading it. Someone is collecting data on your device type, your browsing habits, where you fit within a larger pattern. They’re predicting your interests, values, maybe even political leanings based on these behaviors.
But the person who collects your data may only have one use for it. Others may be able to extract completely different insights. Or perhaps the person collecting the data can only get half the picture from what they observe on their own.
Only by trading the information they collect they each draw out maximum value. And that’s where the concept of data marketplaces began.
What are data marketplaces?
A data marketplace is essentially an online store or platform where you can buy various types of data from different sources. That could include market data, research, demographic and psychographic data, personal information and data from advertising. The vendor may offer this in specific formats for a particular client or may vary the way they mix and structure the data.
As big data has got, well, bigger (and more complex), data marketplaces have become a key feature in the data economy. Today, all types of organizations and clients buy from data marketplaces, from individual data scientists and analysts through to governments and market intelligence agencies. Companies increasingly understand that the data they collect is an asset not only to themselves but to others.
How do they benefit data scientists?
Data marketplaces are particularly valuable to data scientists because they have the potential to overcome many common pitfalls that arise when you’re relying too much on internally sourced or generated data.
Part of this, as we’ll see in a moment, is because the interests and objectives of data sellers and data buyers are more closely aligned.
Some of the most pertinent benefits are:
- Access to the broader data economy
First, using a data marketplace means you are, in essence, opening yourself up to the wider data economy. By accessing a vast library of crowdsourced data, you avoid the risk of basing your models on single-source data. If a single source contains errors or reflects too small a data pool to give you the big-picture insights you need, this can seriously distort your prediction models. Far better to broaden your perspective with data from the broader marketplace.
- Cleaner, neater datasets
Secondly, we’re all guilty of letting things fall into disarray only when we’re going to see it. There’s a big difference between preparing any kind of product for personal use and creating something you intend to sell. When you buy data from the data marketplace, you know that sellers have an incentive to structure their data in ways that are properly organized and accessible.
- Standardized data delivery
On a similar note, even if sellers of data don’t feel personally motivated to clean up data properly before they put it on the market, they might have to do so. Exchanging data on a common platform inevitably requires a level of standardization in order to function.
If datasets and models are already aligned for the purpose of sharing them between buyers and sellers, this potentially saves you a lot of hard work when you first come to combine data from multiple sources, all purchased through the data marketplace. However, bear in mind that many emerging marketplaces are still finding their feet — unstructured data is still a challenge for now.
- A more equitable exchange
Finally, one common concern when it comes to centralized depositories is that those who own the data have little control over pricing and sales, while consumers are limited in who they can buy from. Smaller, self-service data marketplaces instead give providers and buyers more choice, both in terms of setting prices and choosing who to buy from.
It also addresses some issues of consent. Some approaches to data collection, such as web scraping, mean collecting masses of data in ways that bring no rewards to the original owners of the data. This is increasingly viewed as unfair and unethical. Data marketplaces are a good way to keep everything above board and ensure that data owners are fairly compensated.
- Better use of resources
Collecting data manually is a time-consuming, fiddly, and often expensive task. Many data scientists would prefer to dedicate their valuable time to tasks that better reflect their unique skillsets, like building prediction models that deliver critical insights, rather than the grunt work of trawling through badly presented data. If you can purchase this from a data marketplace for a reasonable sum, this can save you a ton of time and energy. However, it doesn’t completely remove the incredibly time-consuming and resource-heavy data acquisition process. Plus, you’re still not guaranteed you’ll see an uplift to your model.
In some cases, like the web scraping example described above, by the time you’ve harmonized your data so that it’s fit for purpose, you may find that it would have been more cost-effective to simply buy properly standardized data from the owner in the first place.
- Jump straight to the good stuff
When you’re buying a particular machine learning dataset from a third party, you can pick and choose what kind of data is actually going to be valuable to you. You aren’t diving into a vast data lake searching for what you need. You aren’t racking up your computing power simply to process the swathes and swathes of irrelevant information in order to track down the little bit that you need right now. You cut straight to the valuable bits.
Data providers have good reason to isolate the kind of high-quality, valuable data you need because they make more money on sales that way. From your point of view, that’s a ton of hassle averted.
Final thoughts: the role of data discovery tools
While data marketplaces are emerging as a vital component of the data economy, they’re still evolving. To get the most out of the information, you need an arsenal of reliable data discovery tools.
That’s because much of the data out there for sale is still being offered in unstructured or unrefined formats, in silos or incompatible data models. For now, you need to be able to harmonize and enrich this data to make it truly useful.
In fact, even as data marketplaces become more and more established and sophisticated, you still face the challenges that come with connecting to specific marketplaces and combining information from multiple sources. Investing in the right kind of platform now that can automate much of this for you will put you in a great position to maximize the opportunities as they come.