Tuesday, April 28, 2015

The truth behind bars: Representations and misrepresentations of data sets in research articles

Figure 1. Multiple data sets can be presented as the same bar graph.
(Weissgerber et al. 2015).
One of the most common graphical representations of data in research papers is the bar graph. If you read my recent post, you saw that I represented percentages of respondents in different groups in a survey in such a graph. However, it is not the most accurate way to display all types of data collected and analyzed in experimental studies. For a simple number, like a percentage, a bar graph is sufficient; however, when you have multiple data points from individuals within a group, representing continuous data, showing the height or length of a bar as the average, plus and minus an error bar that represents a measure of the variation, is likely hiding the true nature of the data. What I mean by this is that bar graphs hide the spread of the data points, which can result in altogether different interpretations of the data. It is generally recommended that authors use scatter plots to show every data point, especially in studies with small sample sizes. Various sets can be represented by the exact same bar graph (see Figure 1 from the linked paper) The open-access article published last week in PLOS Biology by Weissgerber et al. gives several informational examples as to why this is the case.

In this study, the authors performed a systematic review of approximately 700 research articles published in top tier physiology journals and assessed the use of different types of graphs to represent various kinds of data. The authors state in the abstract:


"Papers rarely included scatterplots, box plots, and histograms that allow readers to critically evaluate continuous data. Most papers presented continuous data in bar and line graphs. This is problematic, as many different data distributions can lead to the same bar or line graph. The full data may suggest different conclusions from the summary statistics."


This should be taken as a nudge for scientists to think about displaying their data with more appropriate graphical representations. Most scientists are culpable for using bar graphs in this sense. Perhaps not only will this improve the interpretation and discussion of the data by the researchers, but also the readers as it improves the transparency of the collected data by pulling the curtain off and exposing it as it should.


Weissgerber TL, Milic NM, Winham SJ, Garovic VD (2015) Beyond Bar and Line Graphs: Time for a New Data Presentation Paradigm. PLoS Biol 13(4): e1002128. doi:10.1371/journal.pbio.1002128.http://journals.plos.org/plosbiology/article?id=10.1371/journal.pbio.1002128

No comments:

Post a Comment