The Periodic Table of Data Visualizations

Lengler, R. & Eppler, M. (2007).  Towards a periodic table of visualization methods for management. IASTED Proceesing of the Conference on Graphics and Visualization in Engineering.  Lecture conducted from Clearwater, FL.   In this article, Lengler and Eppler (2007) discuss the current state of data visualization as an area of academic inquiry; define their focus in visualization type and usage; and develop an infomap designed to group like methods of visualization for researcher and educator ease. A visualization, for the purposes of this article, is defined by Lengler and Eppler as

“a systematic, rule-based, external, permanent and graphic representation that depicts information in a way that is conducive to acquiring insights, developing an elaborate understanding, or communicating experiences” (pp. 1).

Data visualization as a fractured field

Lengler and Eppler (2007) open with a reflection upon the current state of data visualization literature.  Described as an “emergent” (pp. 1) field, work on data visualization is fractured across multiple, disparate fields—from computer programming to education.  The danger with this, the authors note, is the possibility that scholars may pursue theoretical work or breakthrough ideas in parallel with each other, rather than building collaboratively from each other’s works; this, in turn, could impede the development of data visualization research as its own distinct field. This characterization—a highly dichotomous bed of literature—reminds me strongly of Dr. Jordan’s (2014) thoughts on her work in researching educational “uncertainty.”  Much of the literature foundational for her thesis comes from disciplines focusing upon organization or management; likewise, this discussion of categorizing data visualizations is heavily rooted in management research, perhaps owing to Eppler and Lengler’s management backgrounds.

Overlap between management and education

Lengler and Eppler hone in upon visualization methods that are easily applicable within the field of management; that is, methods that are outcome oriented and favor a strong focus upon problem solving.  Because of this problem-solving focus, most if not all of the visualization methods presented are easily translatable to an educational (or more specifically classroom) environment.  As the authors interpret it, the “key for better execution is to engage employees” (pp. 2).  Through an educational lens, the same could be said of the need for educators to engage their students; Howard (2003) would certainly agree with the importance in considering what Lengler and Eppler term cognitive, social and emotional challenges facing managers; visualization methods, to their end, are tools—“advantages” (pp. 2)—to better understand and incorporate the perspectives of employees, and should either help to simplify a discussion or to foment new ideas and innovations.  This, of course, can is also true in reverse: A good visualization will give employees as much insight into their managers as vice versa.

The data visualization of data visualizations

In order to walk the walk, so to speak, the authors create a visualization—specifically, an infomap—to categorize and explore relationships between particular methods of displaying or interacting with data.  They chose visualization methods that were problem-solving or outcome oriented, per their focus on managerial research.  They also chose visualization methods that are easy to produce (though they may vary in complexity). This infomap is visually based upon the Periodic Table of Elements.  The authors note that the Periodic Table, in particular, is an excellent example of a co-opted visual metaphor; while widely recognized and used within several scientific fields, including chemistry, the Periodic Table is also understood outside of a scientific context as a shorthand to group or describe a complex topic.  Their “Periodic Table of Visualization Methods” is given as one of many examples of nonscientific fields using both the structure and shorthand connotation of the Periodic Table to describe something completely beyond chemical elements.

(Lengler & Eppler, 2007)

To help guide their discussion, Lengler and Eppler codify visualization methods on several axes, beginning first with their complexity and application.  Complexity is visualized as an ordinal characteristic; that is, the authors line up like methods in columns, from simplest at the top to most complex at the bottom.  Application is a bit more complex.  Methods are categorized by color into one of six “groups”:

  • Data visualizations, or “visualizations of quantitative data in schematic form” (pp. 3);
  • Information visualizations, or “the use of interactive visual representations of data to amplify cognition” (pp. 3);
  • Concept visualizations, or “methods to elaborate (mostly) qualitative concepts…through the help of rule-guided mapping procedures” (pp. 3-4);
  • Metaphor visualizations, or “effective and simple templates to convey complex insights” (pp. 4), such as story lines;
  • Strategy visualizations, or the “systematic use of complementary visual representations to improve the analysis, development, formulation, communication and implementation of strategies in organizations” (pp. 4); and
  • Compound visualizations, or methods that combine two or more of the following groupings or formats.

However, the authors also note that the categories listed above are not mutually exclusive; visualization methods can and do belong to multiple “groups.”  They attempted to streamline this process by focusing on both the complexity of a method—removing compound visualizations from ambiguity—and its interactive intent. In addition to grouping like methods, Lengler and Eppler also attempt to systematically categorize each method in their chart.  They focus on interaction, or the strengths of a visualization: Does it provide an excellent summary or overview of data, or does it better drill down into the details?  The authors also take into account what they term “cognitive processes” (pp. 4): Is the visualization an aid to simplify a complex concept (convergent thinking), or does it better jumpstart new and innovative ideas (divergent thinking)? To view the full infomap in all its interactive glory, with scroll over examples of all visualization methods listed, please visit: http://www.visual-literacy.org/periodic_table/periodic_table.pdf

Sources

Jordan, M. (2014, June). Managing uncertainty during collaborative problem solving in elementary school teams.  Lecture conducted from Arizona State University, Phoenix, AZ.

“Damned to be concrete”: Considering productive uncertainty in data visualization

Marx, V. (2013). “Data visualization: Ambiguity as a fellow traveler.” Nature Methods, 10(7), 613-615.

In their musings on the importance of uncertainty with regards to social networks and educational attainment, Jordan and McDaniel (In Press) bring to the forefront an interesting concept of “productive uncertainty” (pp.5).  This idea allows that while uncertainty is not always pleasant—and while learners will often seek to minimize it—the experience is not without value.  Marx (2013), while discussing the complexities and shortcomings common among data visualizations, expands upon this concept; uncertainty, particularly within a statistical realm, can illuminate new characteristics of the data or new methodologies that address shortcomings in collection or analysis.  However, data visualizations themselves can obscure or outright hide this level of detail.  So how do we visualize data in a way that is both simple and transparent?

“[With visuals], we are damned to be concrete” (Marx, 2013, pp. 613).

Marx (2013), using examples from genomic and biomedical research, poses an interesting question: In discussing scientific results, researchers often feel compelled to gloss over, if not exactly obscure, uncertainty in their data.  These questions can arise from inconsistent collection, imperfect aggregation, or even unexpected results.  However, these “unloved fellow travelers of science” (pp. 613) cannot exist visually in the type of “grey area” analysis that Marx contends they often do while in text.  When faced with creating an honest visualization, then, researchers must decide to what extent they will account for study uncertainty.  Marx, in explaining the potential impacts of this decision, advocates that researchers strongly consider two points: First, that uncertainty may have implication upon the data itself; and second, that a transparent consideration of uncertainty strongly impacts “what comes next.”

Thus, Marx (2013) is explicitly pushing productivity over negativity when reflecting upon uncertainty in data or the wider study; however, she is also acknowledging that even within the specific context of biomedical researchers, the pull to minimize uncertainty when broadly discussing results exists.

Down the rabbit hole: Analysis can create uncertainty too

One should also consider the process—largely mathematical, in this context—of moving from a raw dataset to a clean visualization.  Common steps for creating data visualizations, particularly in genomics and the biomedical sciences, often include aggregating data from different sources (and thus methods of collection) or summarizing large and complex markers into something more easily digestible.  By attempting to standardize disparate collection methods into something more uniform, or by summarizing disparate study groups or grouped variables, an important level of detail is lost.  These processes themselves can obscure data, which in turn obscures uncertainty for the end audience, whose exposure to this study may wholly lie in the visualization.  Going somewhat down the rabbit hole, this in itself can therefore create new uncertainty.

Certainly, simplicity is important in a data visualization; however, as Marx argues, researchers also have an obligation to consider that by glossing over details of uncertainty, or by creating new sources of uncertainty through their analyses, the wider community may understand their work less, or may make assumptions of their findings that are unfounded.

In particular, missing data presents a complex dilemma.  Marx (2013) gives the example of a genomic sequencing experiment, seeking to map a stretch of genetic material that contains 60 million bases:

“The scientists obtain results and a statistical distribution of sequencing coverage across the genome.  Some stretches might be sequenced 100-fold, whereas other stretches have lower sequencing depths or no coverage at all…But after an alignment, scientists might find that they have aligned only 50 million of the sought-after 60 million bases…This ambiguity in large data sets due to missing data—in this example, 10 million bases—is a big challenge” (pp. 614).

As opposed to data that is statistically uncertain, or uncertain by virtue of its collection methods, missing data is a true negative whose effect is difficult to truthfully express and explain.

So how do we show uncertainty visually?

Marx suggests several methods for including uncertainty visually when discussing data.  Broadly, she suggests including some representation of uncertainty within a visualization; this can be layered on top of the data visualized—for example, using color coding or varying levels of transparency to indicate more and less certain data.  A visualization can also account for uncertainty separate from the data, by using an additional symbol to denote certainty or the reverse, for example.  She also discusses contrasting analyses of similar (or the same) data that have reached differing conclusions; taking into account their methods of analysis, this inclusion of multiple viewpoints can also round a discussion of uncertainty.

In addition to understanding how to represent uncertainty visually, however, one should also consider how and when (during a study or study analysis) one should tabulate uncertainty.  One platform looking to incorporate uncertainty holistically into data visualization is Refinery.  In particular, Marx notes that this system seeks to find “ways to highlight for scientists what might be missing in their data and their analysis steps” (pp. 614), both addressing uncertainty situated in data and analysis.  As shown below, this system considers uncertainty at all steps throughout the data analysis, rather than only at the end, giving a more rounded picture of how uncertainty has influenced the study at all levels.

“The team developing the visualization platform Refinery (top row) is testing how to let users track uncertainty levels (orange) that arise in each data analysis step” (Marx, 2013, pp. 615).

In the graphic, the blue boxes represent data at different stages during analysis.  Orange, in the top row, represents the types of uncertainty that may arise during each analytical step, concluding that the orange error bars in the bar graph to the far right are much more comprehensive in their calculation.  The light blue bar in the bottom row shows the disparity, theoretically, when error is only taken into account at the end of an analysis.  While the magnitude of uncertainty may not be as significantly different as shown in the graphic, researchers are better able to account for what causes or has caused error in the top row; they are better able to situate their uncertainty.

A picture may be worth a thousand words, but do they have to tell one story?

Analyzing data is often a narrative process; however, as Marx (2013) alludes, there can be consequences to how one tells their story.  Washing over uncertainty, in both preparing and discussing results, can be misleading, limiting both a researcher’s true understanding of their own data, and collaborations or theories that use the data as a foundation for further study.  Marx, however, is not disparaging researchers who fail to consider uncertainty as dishonest; she is promoting the idea that considering uncertainty positive—or productive—can lead research in novel directions.

Sources

Jordan, M.E. & McDaniel, R.R. (In Press). “Managing uncertainty during collaborative problem solving in elementary school teams: The role of peer influences in robotics engineering activity.” The Journal of the Learning Sciences, 1-49.

The wide world of Wordles: Discussion of “Participatory visualizations with Wordle”

Viegas, F.B., Wattenberg, M. & J. Feinberg (2009). “Participatory visualizations with Wordle.” IEEE Transactions on Visualization and Computer Graphics, 15(6), 1137-1144.

In this article, Viegas et al. (2009) introduce “Wordles,” their distinctions among similar data visualizations, and methodology to discover certain characteristics of Wordle users and their wider community.

Wordles represent a popular form of tag clouds, a common data visualization generally used to represent word frequency in text, with more frequent words represented in bigger and less frequent in smaller font. However, there are some key differences between an average tag cloud and a Wordle, in both their calculation and final appearance. In a Wordle, text size and word frequency are represented linearly; that is, the size of a word increases the same amount for each time it appears. Often, tag clouds calculate word size by utilizing the square root instead. Additionally, the Wordle algorithm allows words to appear in any free space not occupied by text–for example, in the space of an “o” or rotated vertically along the side of an “l.” The authors note that these changes were made for aesthetic reasons; however, particularly regarding how text size is calculated, the side effect may be a more straightforward relationship between size and frequency.

The authors also speak to their expectations of the Wordle community as casual infovis and a participatory culture. Casual infovis refers to situations or communities where lay users depict information in a personally meaningful way. Participatory culture refers to the tenor of conversation between the generator of information (or Wordles) and their audience; this very commonly occurs on the Internet, in the form of website user feedback, fan fiction, or comment boards on news stories or blog posts, to name a few examples.

“Wordles in the wild”: Methods and results

Because Wordle does not collect demographic information for users, who can make and download a graphic without logging in or creating an account, Wordle has little data to describe their users beyond the Wordles they create. To learn more about the wider community of Wordle users, the authors use a dual approach: Research into “Wordles in the wild,” an Internet search of previously created graphics and how they have been used online; and a survey of current visitors to the Wordle site.

“Wordles in the wild” (pp. 1139) were initially identified through Google search. The authors examined the first 500 sites returned for “Wordle,” and used these “prominent” (pp. 1139) examples to guide more specific research. Through this process, the authors identified several major categories for both Wordle users and how Wordle graphics are used, the largest being “education.” While a rather ingenious way to collect context, in the face of little circumstantial data to understand how Wordles have been used, snowball research does yield very little control over both the completeness and quality of found data.

Wordle also placed a survey link on its homepage, asking users to provide feedback about themselves and their graphics. The survey was first piloted for two days, and following feedback and revisions reposted for one week; the authors do not note specifically what feedback was given, or how the survey changed. During the week it was live, the survey received about 4,300 responses, which (assuming one Wordle per user per day, with no user overlap) represents a response rate of about 11%; although the authors note a margin of error of about 1%, they also recognize that given difficulties controlling for demographic variables and self-selection bias, the results should only be viewed as “a general guide” (pp. 1140).

The authors do admit a significant selection bias in this data, among both “wild Wordles” and survey respondents; they do not delve deeply into demographic data, beyond sex, age and occupation.

Do Wordles even count as a data visualization?

Given the authors’ results, there is little question that Wordle users clearly represent a participatory culture. They outline several ways that users collaborate with not only their data, but also their audience. As one example of professional use: Journalists, particularly during the 2008 presidential election, used Wordle to illuminate trends from political text and speeches. There are also many examples of personal or “fun” uses given, particularly focusing upon Wordles as gifts–for baby showers, church groups, and so on.

The authors, however, do note that the categorization of the Wordle community as “casual infovis” does not clearly convey some of the Wordle community’s more interesting characteristics. For example, “casual” doesn’t quite express the personal connection many users expressed toward their Wordle text; over half indicated that had written it themselves. Also, not all users identify their graphics or the use thereof, analytical or otherwise, as personally meaningful.

Besides the characteristics of Wordle users, the strong focus upon creating Wordles rather than using them as an analytical tool demonstrates to the authors that Wordles are not being utilized as intended, or perhaps as expected. Particularly considering the large number of survey respondents who did not understand the significance of word size within a graphic, does this then disqualify Wordles from truly being data visualizations?

This may be true in the wider community of users–particularly when considering the Wordles created as Valentine’s Day cards for spouses, or as bridal gifts and birthday presents. Wordles as gifts, or Wordles created for fun seem commonly to not have an analytical context. However, I would argue that within education, Wordle is working as intended, plus some. Educators create Wordles of new vocabulary words or Shakespearan sonnets to illuminate classroom discussion; students likewise are asked to participate in creating new Wordle graphics as an assignment or classroom activity. Bandeen and Sawain (2012) outline several concrete applications for Wordles in class, including (broadly):

  • Understanding major concepts
  • Identifying and defining unfamiliar terms
  • Connecting current passages with previous readings
  • Pointing out unexpected words
  • Identifying missing words
  • Theorizing connections among words

which pull from all levels of the Bloom’s taxonomy. In addition to serving as an analytical tool to guide discussion, Wordles (or tag clouds in general) are used collaboratively to explore texts in unique or unusual ways not always apparent at first read. Whether students are creating or viewing Wordle graphics, and whether or not the graphics are used in strictly an “analytical” sense, they are actively engaging the material in a meaningful way–both as casual infovis and a participatory culture.

Sources

Bandeen, H.M. & Sawain, J.E. (2012). Encourage students to read through the use of data visualizations. College Teaching, 60, 38-39.