“Damned to be concrete”: Considering productive uncertainty in data visualization

Marx, V. (2013). “Data visualization: Ambiguity as a fellow traveler.” Nature Methods, 10(7), 613-615.

In their musings on the importance of uncertainty with regards to social networks and educational attainment, Jordan and McDaniel (In Press) bring to the forefront an interesting concept of “productive uncertainty” (pp.5).  This idea allows that while uncertainty is not always pleasant—and while learners will often seek to minimize it—the experience is not without value.  Marx (2013), while discussing the complexities and shortcomings common among data visualizations, expands upon this concept; uncertainty, particularly within a statistical realm, can illuminate new characteristics of the data or new methodologies that address shortcomings in collection or analysis.  However, data visualizations themselves can obscure or outright hide this level of detail.  So how do we visualize data in a way that is both simple and transparent?

“[With visuals], we are damned to be concrete” (Marx, 2013, pp. 613).

Marx (2013), using examples from genomic and biomedical research, poses an interesting question: In discussing scientific results, researchers often feel compelled to gloss over, if not exactly obscure, uncertainty in their data.  These questions can arise from inconsistent collection, imperfect aggregation, or even unexpected results.  However, these “unloved fellow travelers of science” (pp. 613) cannot exist visually in the type of “grey area” analysis that Marx contends they often do while in text.  When faced with creating an honest visualization, then, researchers must decide to what extent they will account for study uncertainty.  Marx, in explaining the potential impacts of this decision, advocates that researchers strongly consider two points: First, that uncertainty may have implication upon the data itself; and second, that a transparent consideration of uncertainty strongly impacts “what comes next.”

Thus, Marx (2013) is explicitly pushing productivity over negativity when reflecting upon uncertainty in data or the wider study; however, she is also acknowledging that even within the specific context of biomedical researchers, the pull to minimize uncertainty when broadly discussing results exists.

Down the rabbit hole: Analysis can create uncertainty too

One should also consider the process—largely mathematical, in this context—of moving from a raw dataset to a clean visualization.  Common steps for creating data visualizations, particularly in genomics and the biomedical sciences, often include aggregating data from different sources (and thus methods of collection) or summarizing large and complex markers into something more easily digestible.  By attempting to standardize disparate collection methods into something more uniform, or by summarizing disparate study groups or grouped variables, an important level of detail is lost.  These processes themselves can obscure data, which in turn obscures uncertainty for the end audience, whose exposure to this study may wholly lie in the visualization.  Going somewhat down the rabbit hole, this in itself can therefore create new uncertainty.

Certainly, simplicity is important in a data visualization; however, as Marx argues, researchers also have an obligation to consider that by glossing over details of uncertainty, or by creating new sources of uncertainty through their analyses, the wider community may understand their work less, or may make assumptions of their findings that are unfounded.

In particular, missing data presents a complex dilemma.  Marx (2013) gives the example of a genomic sequencing experiment, seeking to map a stretch of genetic material that contains 60 million bases:

“The scientists obtain results and a statistical distribution of sequencing coverage across the genome.  Some stretches might be sequenced 100-fold, whereas other stretches have lower sequencing depths or no coverage at all…But after an alignment, scientists might find that they have aligned only 50 million of the sought-after 60 million bases…This ambiguity in large data sets due to missing data—in this example, 10 million bases—is a big challenge” (pp. 614).

As opposed to data that is statistically uncertain, or uncertain by virtue of its collection methods, missing data is a true negative whose effect is difficult to truthfully express and explain.

So how do we show uncertainty visually?

Marx suggests several methods for including uncertainty visually when discussing data.  Broadly, she suggests including some representation of uncertainty within a visualization; this can be layered on top of the data visualized—for example, using color coding or varying levels of transparency to indicate more and less certain data.  A visualization can also account for uncertainty separate from the data, by using an additional symbol to denote certainty or the reverse, for example.  She also discusses contrasting analyses of similar (or the same) data that have reached differing conclusions; taking into account their methods of analysis, this inclusion of multiple viewpoints can also round a discussion of uncertainty.

In addition to understanding how to represent uncertainty visually, however, one should also consider how and when (during a study or study analysis) one should tabulate uncertainty.  One platform looking to incorporate uncertainty holistically into data visualization is Refinery.  In particular, Marx notes that this system seeks to find “ways to highlight for scientists what might be missing in their data and their analysis steps” (pp. 614), both addressing uncertainty situated in data and analysis.  As shown below, this system considers uncertainty at all steps throughout the data analysis, rather than only at the end, giving a more rounded picture of how uncertainty has influenced the study at all levels.

“The team developing the visualization platform Refinery (top row) is testing how to let users track uncertainty levels (orange) that arise in each data analysis step” (Marx, 2013, pp. 615).

In the graphic, the blue boxes represent data at different stages during analysis.  Orange, in the top row, represents the types of uncertainty that may arise during each analytical step, concluding that the orange error bars in the bar graph to the far right are much more comprehensive in their calculation.  The light blue bar in the bottom row shows the disparity, theoretically, when error is only taken into account at the end of an analysis.  While the magnitude of uncertainty may not be as significantly different as shown in the graphic, researchers are better able to account for what causes or has caused error in the top row; they are better able to situate their uncertainty.

A picture may be worth a thousand words, but do they have to tell one story?

Analyzing data is often a narrative process; however, as Marx (2013) alludes, there can be consequences to how one tells their story.  Washing over uncertainty, in both preparing and discussing results, can be misleading, limiting both a researcher’s true understanding of their own data, and collaborations or theories that use the data as a foundation for further study.  Marx, however, is not disparaging researchers who fail to consider uncertainty as dishonest; she is promoting the idea that considering uncertainty positive—or productive—can lead research in novel directions.

Sources

Jordan, M.E. & McDaniel, R.R. (In Press). “Managing uncertainty during collaborative problem solving in elementary school teams: The role of peer influences in robotics engineering activity.” The Journal of the Learning Sciences, 1-49.

Leave a Reply

Your email address will not be published. Required fields are marked *