Sketching Sketchy Bar Charts

People systematically misinterpret bar charts of averages. In a pair of clever drawing experiments, Wellesley researchers highlight three surprisingly common bar-driven fallacies.

Three hand-drawn bar charts showing underestimated variability.

A fun thing that happens with dataviz: You spend hours and hours researching a story, pouring through the data, and lovingly crafting a perfect chart. Then you present it to your client or boss or followers, and they point out a crucial, fatal flaw: 

“OMG, why isn’t this a bar chart?!”

You take a deep breath. You try not to reveal that a little piece of your soul has died, leaving a smudge of black ash, just to the left of your heart.

But you’re a professional! You also know there’s merit to their feedback. Bar charts are known for being low-fuss and straightforward, and that’s often the better choice than something needlessly fancy. 

But are bar charts really as straightforward as they seem? Let’s test it out. 

A bar chart showing selected results from Wilmer & Kerns' study.
This is a bar chart, where each bar represents responses to a different bar chart. Overall it shows that people severely misinterpret bar charts (just like this one). So there’s a good chance you’ll misinterpret this bar chart, just like Wilmer and Kerns’ participants misinterpreted their bar charts. You can find out by taking the test below. Then send us your results, so we can make yet another bar chart about people misinterpreting bar charts of misinterpreted bar charts. And we can recursively create misunderstood bar charts from now until infinity.

Above is a bar chart about misinterpreting bar charts (Ooo, meta!). It shows how accurately people interpreted four different bar charts, covering four different topics from Wilmer & Kerns’ experiment: gender, social science, clinical, and aging.  Each bar represents 133 participants’ average OLI interpretation scores, which is an accuracy measure we’ll unpack later. 

How would you interpret this chart? Even without understanding “OLI,” you can still read some of the basic facts: 

  • The “Aging” bar chart did the best (27), with a nine point gap between to the next best “Clinical” chart (18).
  • The “Gender” bar chart did the worst, with the lowest average score (8).  
  • All four charts had average scores below 50, indicating inaccurate interpretations. 

But there’s more to dataviz than parroting these basic facts. Visualizations also create a mental impression about the shape of the underlying data, which influences our interpretations in important ways.

When you imagine the data behind this chart, what does your mental image look like? If you were to draw out the data points behind these averages, where would they go? 

If you’d like to check your interpretation, pause here, grab some paper and a pen, then:

  1. First, draw a quick outline of the chart above. 
  2. Then imagine the individual test scores for the people who took Wilmer & Kerns’ test. 
  3. Then for each of the four bars, draw 20 dots to represent the scores for 20 random participants. 
  4. Then read on to find out how to interpret your interpretation.
Collage of 134 bar chart interpretation drawings, submitted for Jeremy Wilmer and Sarah Kerns’ study: “What’s really wrong with bar graphs of mean values: variable and inaccurate communication of evidence on three key dimensions.”

Jeremy Wilmer and Sarah Kerns, researchers at Wellesley College, ran a pair of drawing experiments where they asked participants to redraw a set of four different bar charts, then annotate the bars with dots representing where they imagined the underlying data might be in the original dataset. Notably, they did not make participants suffer through “bar charts about bar charts,” but we won’t hold that against them.

In their 2022 study, 134 participants responded with a stack of 536 hand-drawn bar charts. At 20 dots per bar, they ended up with 40,000+ dots that they could scan, analyze, and compare, to understand how people imagine the underlying distributions behind the charts.

Their headline result: 76% of those 536 sketches showed at least one major misinterpretation of the data behind the stimuli charts. That is, most people misinterpreted most of the bar charts.

top left: a realistic distribution of data behind averages. top right: a distribution showing too little variability. bottom left: a distribution showing the data below the average. bottom right: a distribution that's suspiciously uniform.
Four plots showing four different ways people might interpret a bar chart of averages. The dots represent viewers’ imagined distributions for the data points underlying the bar charts. The green plot represents an “accurate” interpretation, the three orange plots represent common biased interpretations found in Wilmer 2022.

For the topics in the study (as well as our meta bar chart), a realistic interpretation would look like the green plot above, showing distributions that are centered(-ish) around the average, widely dispersed, overlapping between categories, and probably with some normal(-ish) shape to them.   

But only 24% of the 536 sketches looked like the green plot. Instead, when they imagined the underlying data, most participants showed one of three common misinterpretations: 

  1. Underestimating variability: 217 (40%) drawings underestimated the amount of variability in the distribution. So even though these distributions should overlap across categories, people assumed there was little-or-no overlap. Willmer and Kerns refer to this as the “dichotomization fallacy” because, in effect, it leads to viewers falsely assuming there are clear divisions between categories. The “OLI” score in the meta bar chart refers to the "overlap index" in the study.
  2. “Within the bar” bias: 129 (24%) drawings showed the data points falling inside the area of the bar, rather than balanced around the bar’s end point which represents the average. 
  3. False uniformity: 125 (23%) drawings showed the data as evenly distributed and ignored the shape of the distribution. 

Not only are these misinterpretations common, they can have serious downstream consequences. For example, underestimating variability can lead to overestimating the differences between chart categories, with downstream impacts like:

  • Mistaking coincidence for causality.
  • Nudging business leaders into overpaying for ineffective equipment.
  • Misleading patients into accepting medical treatments they might otherwise decline.
  • Reinforcing harmful stereotypes about people from marginalized communities. 

What does this mean for dataviz?

Many of us have heard the critique “that could have been a bar chart.” But despite their perceived simplicity, bar charts of averages can be surprisingly misleading. 

Subscribe to Effect & Affect

Don’t miss out on the latest issues. Sign up now to get access to the library of members-only issues.
jamie@example.com
Subscribe