In(credible): Uncovering Useful Data for Your Next Content Piece

In September 2015, Harvard Business Review published an article proclaiming data to be the next big thing in content marketing. By then, we at Distilled had been producing data-driven content for at least a three years: from static infographics and survey-driven PR campaigns to data-driven quizzes and complex interactives.

In this post, I’m going to talk about where to find great data for your content: from national statistics and APIs to newsletters and podcasts. Before we get into it, it’s important to answer one question.

Why get data-driven in the first place?

For one reason or another, when producing and promoting content for our clients at Distilled, we rarely compete with their business competitors. Instead, we are competing with their content competitors who are often publishers – publishers with hundreds of editorial staff paid to come up with stories to win the attention of their ever-demanding audience.

So, in order to get our client the placements and exposure they need, we have to stand out from the noise and deliver something more insightful, interesting and compelling than journalists can come up with. One way to do this is with data.

Using data to tell stories shows an aspiration to the truth. Leaving opinion pieces to the journalists, producing a data-driven piece lets you talk about a given subject in a way that reveals something about that was previously unknown, reaffirm existing beliefs, or present the topic in a completely new light. Doing one of the above successfully is what allowed us to get our clients lots of coverage and our work featured on some of the top publications in the world.

So, how do you get started? Where do you begin?

The journey to data-driven content doesn’t always start with data. More often than not, I start with a hunch for an idea that leads me to questions like “Can we find data on X?”, “Is there any data out there that would show Y?”, or “Where can we find comprehensive information on Z?”. This when I start eagerly googling for answers to those questions and looking for data to substantiate my ideas.

First as a Data Journalist, and then as a Creative Lead, I have been doing this for four years now and in this post, I’m going to tell you about some of my favourite places to find great data to our ideas legs.

Data can be they key to creative content - but it can be hard to know where to start.

My go-to sources

These sources are available free of charge, easy to access and provide a tonne of data collated in an easily digestible way. All you need to have to make these work is an internet connection and a spreadsheet software.

National statistics

Probably the most easily accessible and therefore most used source of data is that collated by governments. In the UK, it’s the Office for National Statistics, and in the US, it’s the Census Bureau, along with a who set of other statistical agencies detailed on this page.

Both countries have made considerable efforts in making their data available online and have developed very handy search engines to find it: and Sifting through datasets using these search engines isn’t as straightforward as browsing the more neatly organised websites. However, it gives you an opportunity to go from search to datasets quicker, provided you know what you’re looking for.

Global statistics

Statistics on the European Union are collected and published by Eurostat, which collects the data you’d normally find on websites of national statistics authorities of individual countries in one place, and in English.

For all sorts of global stats, there are websites like Gapminder, World Bank Data, Google Public Data Explorer that allow you to compare countries of the world on the variety of metrics.

Finally, there’s also OECD Better Life Index – a visual comparison of what life is like in 35 of the most developed countries of the world in terms of housing, education, and well-being. Data behind it can be downloaded from the FAQ page.

Open sources

First of all, there’s Wikipedia. I can hear your guffaws already - who uses Wikipedia as a credible source? Well, we never use it as our primary source, but it’s a great place to discover other, more credible sources. The great thing about Wikipedia is that it’s full of lists: people, airports, places, brands – and these lists can well serve as a seed dataset for your next piece of content.

In our recent piece called The Ventures of the PayPal Mafia, for example, we started with a list on Wikipedia and worked through the sources cited in the References section of the page.

Ventures of the PayPal Mafia was inspired by a simple Wikipedia list.

In this category are also open sources aimed at a particular topic like MusicBrainz and SportReference.

Industry reports

Depending on what industry your client is in, there often would be professional associations or organisations that conduct research and collect data on the industry in question.

Other than industry-specific organisations, top accounting firms like Deloitte, KPMG, and PWC are known for their extensive research into a range of industry sectors and markets. Although not data-rich in terms of ready-made spreadsheets with stats, they offer reliable insight into an array of industries with tables of key stats to substantiate your content creation efforts.

More advanced sources

These sources require a little bit more effort to use: they either come in massive datasets that will exceed the # of rows in Excel or require a bit of tech savvy to access or parse the data.

Government surveys

Beyond the neatly collated spreadsheets that live on the national statistics sites, are larger datasets that underpin some of the core data you find on the likes of ONS and Census Bureau. They are datasets from regular government-conducted surveys on matters like family and well-being, jobs and employment, earnings and income.

In the UK, you can access these datasets via UK Data Service. Unfortunately, for you will have to pay £450 for any commercial use of data, but that’s worth the money if your idea hinges on the dataset.

In the US, it’s a lot more open and you access the datasets of the big government surveys free of charge.

Some of the most popular surveys include:

  • American Time Use Survey – detailing time Americans spent studying, taking care of others, sleeping, working, exercising, shopping, commuting, etc.

American news publications use these sources to produce some of the best pieces of journalism like this Wall Street Journal Pay Gap interactive or the New York Times piece on Middle-Class Jobs. The data is there for the taking. Granted, the data isn’t as easily parseable as a succinctly collated national statistics spreadsheets you’d find online, but the insights to be it are definitely worth the effort.

Data repositories

There aren’t too many of these left, but there are a few worth mentioning. Freebase has long been the database behind Google’s Knowledge Graph. It’s since been shut down, but all the information Google has on films, music, art, people, places, and things is still out there, all in a hefty 22Gb download.

Another entrant on this list is Kaggle. Kaggle is an online platform for competitions in data science and machine learning, which has accumulated a vast array of datasets over the years and made them available to the public. With a varied selection of topics covered, Kaggle is the place to find data on everything from YCombinator companies to Shark Attacks.


Disclaimer: always check terms of use of the API before using it. Some explicitly prohibit commercial use, which often means you’re not allowed to use them without permission.

Most web services you use on the daily basis, such as Google, Facebook, Instagram, Spotify offer access to the data that underpins them via APIs. Same goes for more topical resources like RottenTomatoes or Flickr, The data is offered mostly for the purposes of building applications and integrations with the aforementioned services, but they can equally be used for data mining.

One of our most successful pieces of content to date, the Food Capitals of Instagram, was based on the data on hashtagged photos we got from Instagram API.

Accessing data from APIs tends to require some basic scripting. A lot of them are available in XML or JSON, so if you are comfortable with XPath scraping or writing a few lines of code in JavaScript – great. If not, don’t worry. Google Sheets and our Guide to ImportXML can help, along with Paul Gambill’s excellent write-up of ImportJSON. Google Sheets can only get you so much, though, so if you’re thinking about a large-scale data mining, might be worth doing or getting someone to do it for you in Python, Ruby, or Javascript.

Unlikely sources

These aren’t data sources as such, but I still feel compelled to share them with you because I find them increasingly more useful in my day-to-day job.


Yes, seriously. I’m sure you’ve noticed that newsletters made a comeback recently and, sure enough, there’s one for data junkies like you and me. Data is Plural is a mailing list that will send you 5-6 datasets every week. The selection of datasets you get is incredibly varied, such as coups, technology questions, millions of fires, world heritage sites, and paperwork.

Other newsletters you should consider subscribing to include Data Elixir and Data Science Weekly. These are data science newsletters, which amongst updates and news from this emerging field, you will occasionally find interesting datasets to explore.


If you’re into podcasts already – great, these are just a few more to add to your favourites. If you’re not, but you’re into data – you would love these. Now, obviously, you don’t get the data from the podcast as such. However, if you listen to these two, you can hear about cool datasets being used and where they were taken from.

Data Stories is a podcast about data and data visualisation led by Moritz Stefaner and Enrico Bertini – two prominent figures in the dataviz world. Every episode is an interview with either a data journalist or a dataviz designer talking about one of their recent projects. They discuss the ideas behind it, how it was created and, what data they used and how they got it to work.

Partially Derivative is a data science podcast where the hosts discuss the role of data in the multitude of fields of human endeavour: from art, science, journalism, politics, sport, and sex. Other than hearing fascinating stories about how people use data to advance our understanding of the world, you’ll catch references to the types of data they use and where they find it

Sources from other infographics

This feels a bit like cheating, but trust me it’s fine. If you’re not convinced by Picasso who said that good artists copy and great artists steal, read this book about how creativity is all about bringing together different sources of inspiration and information.

If you search for <topic> + infographic in Google Images you will find a lot of, well, badly designed infographics. However, if you’re not on the hunt for visual inspiration, these could prove quite useful in figuring out what are the sources of data available on a given subject.

Some of our best work has been inspired by infographics we found on the web, sussed out their sources, and produced great pieces of content that offered a much better user experience as wells brought something new to the existing narrative.

Daily Routines of Famous Creatives, one of my all-time favourites, was inspired by this static infographic. We were looking to do a piece of content on habits and their role in creativity and found the static piece that, if anything else, led us to the data source that made our piece possible – the book called Daily Rituals.

The Daily Routines piece was inspired by an existing infographic.


The list is by no means exhaustive, it’s a mere collection of resources that I have accumulated over the years. Every time we’re starting on a new piece of content, I have to go away and do research from scratch. However, the resources above are sites and databases, to which I find myself frequently returning to.

It wouldn’t be a data post without a spreadsheet, so here’s a GSheet with all the sources mentioned above.

If I had to leave you with one thought at the end of this round-up post, it’s this. The more unique your dataset, the more likely your story will be original. This doesn’t mean you can’t find original stories in the stats readily available from the ONS or the Census Bureau. But, if the dataset is readily available, chances are, a keen journalist already got to it and potentially written about it. So, use different sources, mash your datasets together and dig out insights.

Now, go on and find some data for your new piece of content! And if you have any questions at all about any of the above, do drop me a line in the comments.

Get blog posts via email