You’ve probably summarized data with a number — like an average. For instance, the *median* is an average. It tells you what number is at the middle of your data, if you were to sort all the numbers from smallest to biggest. But an average says nothing about how those numbers are spread out.

Think about the average transaction value for an online store. Is the average high because there are many sales near that price? Or is it that a few large sales make the average look bigger? A *histogram* will show you which is the truth.

A histogram is an image summarizing how numbers are spread out. It breaks data into *buckets*, and shows how many numbers are in each bucket. Usually, each bucket has the same *width*. The width of a bucket is which numbers go in that bucket. A bucket that would hold any of the numbers 1 2 3 4 5 has the same width as a bucket that holds the numbers 6 7 8 9 10. Both have a width of 5.

Let’s try splitting numbers into these buckets. Call the buckets [1, 5] and [6, 10]. We’ll use the numbers (1, 2, 3, 4, 4, 4, 9). Now count how many numbers are in each bucket: [1, 5] ← 6 and [6, 10] ← 1. As a table:

Bucketing (1, 2, 3, 4, 4, 4, 9) |
||

Bucket | Count | Numbers |

[1, 5] | 6 | 1 2 3 4 4 4 |

[6, 10] | 1 | 9 |

median = 4 |

The median and our buckets say different things about the same data. The median says, “the number 4 is at the middle of this data. ” Which is true! The histogram says, “most of the data is in [1, 5]... only a single value is in [6, 10].” Which is also true! Bucketing gives us facts about the spread of numbers that averages cannot.

That’s all the theory for today! Let’s walk through the steps necessary to build a histogram from scratch. Then, we’ll analyze the example histogram to understand what it reveals.

*These steps are all replicable in Excel, too — right down the function names.*

>> See the example Sheet <<

## First, choose how many buckets you need

*Follow along on the “Buckets” tab of the example sheet.*

A histogram lumps similar numbers into buckets, and shows how many fall into each bucket. First, we need to know how many buckets to use. You can choose any number. A good choice will be small enough to summarize the data, but large enough to show interesting facts.

Here’s one approach:

- Count how many numbers you have.
- Take the square root of the count.
- Round the square root up to the nearest whole number. You can’t have half a bucket!

Say you have 48 data points. Then, √48 ≈ 6.93, which rounds up to 7. That’s how many buckets to use.

## Choose the width of the buckets

*Follow along on the “Buckets” tab of the example sheet.*

Once you know the number of buckets, you must find a width for the buckets that covers all your data. If you choose a width that is too small, then your buckets might not include every number in your data.

This is an important choice. The point of the histogram is to make the spread of numbers easier for people to see. So, the choice of bucket width should make the data easier to understand. In general, it’s best to use multiples of 2, 5, or 10 for this. You’ll see what I mean in the example.

Here’s the process:

- Find the range of your data. The range is the biggest number minus the smallest.
- Divide the range by the number of buckets.
- Round up to the nearest “nice” number.

In the example sheet, the biggest number is $5,473. The smallest is $20. So the range is $5,453.

The number of buckets is 7. $5,453 / 7 = $779. This rounds up to $800, which is the nearest “nice” multiple of 2 and 10. We’ll use $800 as our bucket width. If we chose a width of, say, $750, then the total range of our buckets would be 7 ⨉ $750 = $5250. Since $5250 is smaller than $5,453 some of our numbers would fall outside of any bucket!

Regardless of how you wind up drawing the chart, you should choose the width of the buckets yourself. Don’t let a spreadsheet app choose for you.

## Sort your data into buckets

*Follow along on the “Buckets” tab of the example sheet.*

Depending on how you chart your data, you may not need this step. But it’s a useful skill, so try it anyway!

First, you need to determine the smallest value to chart. In the example, we’ve chosen to start at zero. Often (at least in marketing) this will be a reasonable and easy choice. Transaction amounts start from free ($0) and get bigger from there. If you need the chart to start above zero, start at the nearest “nice” number below the smallest number in your data.

Now, make a list with the biggest number contained in each bucket. This is easy with a spreadsheet. Start with the first number (the first “top end” of a bucket). Get the rest of the rows by taking the number in the row before it and adding whatever the width of a bucket is:

*Note that there is no “0” row in our list of buckets. Each number is the upper bound of a bucket.*

Now, let’s count how many numbers are in each bucket. Fortunately, Sheets will do this for us! We’ll use the FREQUENCY function. The result looks like this:

This cell is where we’re using FREQUENCY.

FREQUENCY is counting the number of data points at or below each boundary. So $800 exactly would actually be counted in the $1,600 bucket. Don’t worry about this. But if anyone asks — now you know!

## Create the histogram as a bar chart

*Follow along on the “Buckets” tab of the example sheet.*

Sheets has a built-in histogram tool. Don’t use it. It’s hard to control what charts you get.

The good thing is, you’ve already done all the work you need to draw the histogram without fancy tools. The built-in bar chart is good enough for you! It’s dead simple, and much easier to get a good chart.

Start by selecting the buckets and count:

Then click **Insert > Chart**:

Sheets should decide to insert a bar chart. If it doesn’t, select “Column” as your chart type in the editor:

...and you’re done! You should have something that looks like this chart:

### This is a histogram. You’ve done it!

## Now, inspect your data

Follow along on the “Histogram” tab of the example sheet.

That was a lot of work for a simple chart. Here’s the payoff — and it’s a good one. What do you see when you look at this histogram?

Here’s what I see, in plain English:

- There are two kinds of transactions. A bunch of smaller transactions are in the smallest bucket, and a few bigger ones that are more spread out. (In our business, smaller amounts tend to be subscription payments. Larger amounts are for conference tickets.)
- The thicker bar at $1,600 says “lots of folks are buying one ticket for Distilled’s conference”...
- ...but about half of the “ticket” transactions are for multiple tickets.

If we had used “average transaction value” instead of a histogram, we wouldn’t see that there are two types of transactions!

>> See the example Sheet <<

## Further reading

- Here’s a great summary of how to create a histogram — this is where I started when I wrote this post: http://www.oswego.edu/~srp/stats/hist_con.htm
- Ilya Grigorik writes about why page weight can’t be described with averages. He uses a handful of histograms in this post to explain: https://www.igvita.com/2016/01/12/the-average-page-is-a-myth/

### Get blog posts via email

##### About the author

Ben is a Principal Consultant who joined Distilled in 2010. Now he focuses on leveling up our team. Through group training and internal consultation, he guides team members as they effect change for our clients. read more