62  Data Visualization Principles

Data visualization is the discipline of mapping numbers to marks that the eye and mind can read quickly and honestly. A good chart compresses thousands of records into a shape that a reader grasps in seconds, while a poor chart hides structure, exaggerates noise, or misleads outright. This chapter treats visualization not as decoration but as a formal language with its own grammar, its own perceptual constraints, and its own ethics. We move from theory to practice: the grammar of graphics that underlies modern plotting libraries, the perceptual science that ranks one encoding above another, a decision framework for choosing a chart, a catalogue of common deceptions, and the design philosophies of Edward Tufte and William Cleveland that tie the subject together.

62.1 1. The Grammar of Graphics

62.1.1 1.1 From chart types to a generative system

Early plotting tools offered a fixed menu: bar chart, pie chart, line chart, scatter plot. You picked one and fed it data. The trouble is that real questions rarely fit a named template, and the menu approach gives no principled way to combine, layer, or extend graphics. Leland Wilkinson’s The Grammar of Graphics (1999) replaced the menu with a generative system. Rather than enumerate chart types, it defines the components from which any statistical graphic can be assembled, much as a natural language grammar generates sentences rather than listing them.

The core insight is that a chart is a mapping. Variables in a dataset are mapped to visual properties of geometric objects through a set of well defined transformations. Once you describe a graphic this way, the named chart types fall out as special cases. A bar chart and a stacked area chart are not different species; they are the same grammar with different position adjustments and geometries.

62.1.2 1.2 The layered components

Hadley Wickham’s A Layered Grammar of Graphics (2010), the basis of the widely used ggplot2 library, organizes the grammar into a small set of orthogonal layers. A complete specification names each of the following.

  • Data: the table of observations being displayed.
  • Aesthetic mappings: assignments from variables to visual channels such as x position, y position, color, size, and shape.
  • Geometries: the marks that represent observations, such as points, lines, bars, or polygons.
  • Statistical transformations: computations applied before drawing, such as binning for a histogram or smoothing for a trend line.
  • Scales: functions that translate data values into channel values and that define the legends and axes a reader uses to invert that translation.
  • Coordinate systems: the space in which positions are interpreted, most often Cartesian but sometimes polar or geographic.
  • Facets: rules that split the data into small multiples, one panel per subset.

A pseudo specification makes the orthogonality concrete.

plot(data = sales)
  + map(x = month, y = revenue, color = region)
  + geom_line()
  + scale_y(type = "log")
  + facet_by(product)

Changing one layer leaves the others untouched. Swap geom_line for geom_point and you have a scatter plot of the same mapping. Add a stat_smooth layer and a trend line overlays the raw points. This composability is why the grammar of graphics now underlies ggplot2, Vega-Lite, plotnine, and the Observable Plot library. Understanding the grammar lets you reason about any of these tools rather than memorizing each one’s syntax.

62.1.3 1.3 Why the grammar matters for analysis

The grammar disciplines thinking. Before you draw anything, you must decide which variable occupies which channel, and that decision is the substance of the chart. A frequent analytical mistake is to reach for a chart type and then bend the data to fit it. The grammar reverses the order: state the mapping that answers your question, and the geometry follows. It also exposes hidden choices. A pie chart, for example, is a stacked bar in polar coordinates with angle as the position channel, and naming it that way immediately raises the perceptual question of whether angle is a good channel at all.

62.2 2. Visual Encoding and Human Perception

62.2.1 2.1 Marks and channels

Jacques Bertin’s Semiology of Graphics (1967) first systematized the visual variables available to a designer: position, size, shape, value (lightness), color hue, orientation, and texture. In modern terms we distinguish marks, the geometric primitives such as points, lines, and areas, from channels, the properties of those marks that encode data such as position, length, area, hue, and saturation. Every chart is a choice of marks and an assignment of data variables to channels.

Not all channels are equal. The central empirical result of the field is that channels form a perceptual ranking, and that ranking should drive design.

62.2.2 2.2 The Cleveland and McGill ranking

William Cleveland and Robert McGill, in a landmark 1984 experiment, asked subjects to judge quantities encoded by different channels and measured the error. Their results, refined by later work from Heer and Bostock, give an approximate accuracy ordering for encoding a quantitative variable, from most to least accurate.

  1. Position along a common scale (aligned axes).
  2. Position along non-aligned but identical scales.
  3. Length.
  4. Angle and slope.
  5. Area.
  6. Volume and curvature.
  7. Color saturation and color hue.

The practical consequence is direct. When precise comparison matters, encode the quantity as position or length. This is why a dot plot or bar chart almost always beats a pie chart: the pie asks the reader to compare angles and areas, channels that sit low on the ranking, while the bar asks for a length comparison along a common scale. It is also why bubble charts, which encode magnitude as area, should be reserved for cases where rough magnitude is enough and precise reading is not required. People systematically underestimate area differences, perceiving area as roughly its actual value raised to a power below one.

62.2.3 2.3 Preattentive processing and Gestalt grouping

Some visual distinctions are processed in parallel, before conscious attention, in a few tens of milliseconds. A single red dot among blue dots, or one long bar among short ones, pops out regardless of how many distractors are present. These preattentive attributes, including hue, orientation, size, and motion, are the mechanism behind effective highlighting. Encode the one series you want the reader to notice in a distinct hue and the eye finds it instantly; rely on a text label alone and the reader must search serially.

The Gestalt principles describe how the visual system groups marks into wholes. Proximity, similarity, common enclosure, and connectedness all cause marks to be read as belonging together. Connectedness is the strongest grouping cue, which is the deep reason a line chart communicates a time series so well: the connecting line asserts that the points form one continuous quantity evolving over time. Designers exploit these principles deliberately, for instance using a shared background tint to bind a group of bars, and violate them at their peril, for instance placing a legend far from the marks it explains and forcing the reader to bridge the gap.

62.2.4 2.4 Color used well and badly

Color carries the most cognitive baggage of any channel. Three rules carry most of the practical weight. First, match the color scale to the data type: a sequential scale (light to dark of one hue) for ordered magnitudes, a diverging scale (two hues meeting at a neutral midpoint) for data with a meaningful center such as profit and loss, and a categorical palette of distinct hues for unordered groups. Second, ensure perceptual uniformity, so that equal steps in data produce equal steps in perceived color; the rainbow or jet colormap fails this badly, inventing false boundaries where its lightness jumps and hiding real structure where its lightness is flat. Perceptually uniform maps such as viridis solve this. Third, design for the roughly eight percent of men with color vision deficiency by never relying on red versus green alone and by checking palettes against a color blindness simulator.

62.3 3. Choosing the Right Chart

62.3.1 3.1 Start from the question and the data type

Chart selection follows from two inputs: the analytical question and the measurement types of the variables involved. The question typically falls into one of a few intents.

  • Comparison of values across categories.
  • Trend of a quantity over an ordered dimension, usually time.
  • Distribution of a single variable.
  • Relationship between two or more variables.
  • Part to whole decomposition of a total.

Cross this intent with whether each variable is quantitative, ordinal, or nominal, and a small set of appropriate geometries emerges.

62.3.2 3.2 A working decision guide

The following table captures the common cases. It is a starting point, not a cage.

Intent          Variables                 Recommended chart
-------------   -----------------------   ----------------------------
Comparison      1 nominal, 1 quantity     Bar chart (sorted) or dot plot
Trend           1 time, 1 quantity        Line chart
Distribution    1 quantity                Histogram or density plot
Distribution    1 nominal, 1 quantity     Box plot or violin, or strip
Relationship    2 quantities              Scatter plot
Relationship    2 quantities + 1 group    Scatter with color facet
Part to whole   1 nominal summing to 100  Stacked bar; rarely a pie

A few notes earn their place. For comparison, sort the bars by value unless the categories have an inherent order such as days of the week; an unsorted bar chart wastes the reader’s effort on a ranking they could have been handed. For distribution, a histogram’s message depends heavily on bin width, so try several. For part to whole, prefer a single stacked bar or a set of bars over a pie whenever there are more than two or three slices, because the angle comparison degrades rapidly with slice count.

62.3.3 3.3 Small multiples over overloaded single charts

When a relationship varies across many groups, the instinct to cram every group into one chart with a dozen colored lines produces a tangle that Tufte calls spaghetti. The better tool is the small multiple: a grid of identical small charts, one per group, sharing scales and axes. The reader learns the chart once and then scans the grid, with comparison reduced to noticing how the shape shifts from panel to panel. Small multiples scale to dozens of groups where a single overlaid chart collapses at four or five.

62.4 4. Avoiding Misleading Visuals

62.4.1 4.1 The axis: truncation and dual scales

The most common deception is the truncated bar axis. Bars encode magnitude through length measured from a baseline, so the baseline must be zero. Starting the axis at a nonzero value multiplies small differences into apparent chasms and is a textbook way to manufacture alarm or excitement from trivial change. The zero baseline rule applies specifically to length encodings. Line charts, which encode value through position rather than length, may legitimately use a nonzero baseline to reveal fine variation, provided the axis is clearly labeled, but even then a truncated line can mislead a careless reader and should be used with disclosure.

The dual y axis chart, plotting two series against two differently scaled vertical axes, is nearly always a trap. The crossing point and relative slopes of the two lines depend entirely on the arbitrary choice of the two scales, so the chart can suggest any correlation the author wishes. Prefer indexing both series to a common base of 100 and plotting them on one axis, or use two stacked panels.

62.4.2 4.2 Area, 3D, and the lie factor

Tufte defines the lie factor as the ratio of the size of an effect shown in the graphic to the size of the effect in the data. An honest graphic has a lie factor near one. Two practices routinely inflate it. The first is encoding a one dimensional quantity as the width and height of an icon at once, so that doubling a value quadruples the displayed area and overstates the change. The second is gratuitous three dimensional rendering of two dimensional data, where perspective makes nearer elements loom larger than their values warrant and occlusion hides smaller ones. A 3D pie chart is the canonical offender, distorting the very angles it asks the reader to compare.

62.4.3 4.3 Cherry picking, aggregation, and missing context

Not all distortion lives in the axes. Selecting a flattering time window, a practice visible in many financial charts, can reverse the apparent trend. Aggregation can hide reversal: Simpson’s paradox occurs when a trend present within every subgroup vanishes or flips when the groups are pooled, and only a disaggregated or faceted view reveals it. Omitting the denominator, plotting raw counts where a rate is the meaningful quantity, makes larger populations look like hotspots merely because they are larger. The defenses are disclosure of the full range, presentation of rates alongside counts, and skepticism toward any single number stripped of its base.

62.4.4 4.4 An integrity checklist

Before publishing, audit the chart against a short list. Does every length encoding start at zero? Is the data to ink ratio high, or is the message buried under decoration? Are the axes labeled with units? Is the source cited and the sample size visible? Would a reader reach the same conclusion you did, or only the one you wanted them to reach? The last question is the ethical core of the field.

62.5 5. The Principles of Tufte and Cleveland

62.5.1 5.1 Tufte: maximize the data ink ratio

Edward Tufte’s The Visual Display of Quantitative Information (1983) is the field’s most cited text, and its arguments are mostly about subtraction. Tufte defines chartjunk as visual elements that carry no information: heavy gridlines, decorative backgrounds, redundant borders, drop shadows, and the moiré patterns of cross hatching. He proposes the data ink ratio, the proportion of a graphic’s ink devoted to representing data rather than to ornament, and urges that designers maximize it within reason. The practical program is to erase: remove every mark that, if deleted, would cost the reader no information. The result is a sparse, high contrast graphic in which the data, not the frame, draws the eye.

Tufte’s related contributions include the sparkline, a small word sized line chart embedded inline in text to show a trend without breaking the prose, and a sustained insistence on graphical integrity through the lie factor. His critique of decoration can be taken too far, and later researchers have shown that a small amount of memorable visual embellishment can aid recall, but the core discipline of removing the meaningless remains sound.

62.5.2 5.2 Cleveland: graphical perception as an empirical science

Where Tufte argues from taste and principle, William Cleveland argues from experiment. His books The Elements of Graphing Data (1985) and Visualizing Data (1993), together with the 1984 paper with McGill, recast chart design as a question to be answered by measuring human accuracy. The channel ranking of section 2.2 is his most influential result and converts design choices into testable claims.

Cleveland’s constructive contributions are as important as his rankings. He championed the dot plot as a replacement for the bar chart and the pie chart, because it uses the high accuracy position channel while avoiding the heavy ink of bars. He introduced trellis displays, the rigorous form of small multiples with shared scales, into mainstream statistical practice. And he developed loess, locally weighted scatter plot smoothing, so that an analyst could lay an honest, assumption light trend curve over a cloud of points and let the data speak. The thread uniting his work is that visualization is a tool for seeing structure in data, and that the design that lets a person see most accurately is empirically the better design.

62.5.3 5.3 Reconciling the two

Tufte and Cleveland are complementary rather than rival. Tufte tells you what to remove; Cleveland tells you which encodings to keep. A graphic that honors both is sparse in ornament and built on high accuracy channels: a sorted dot plot with light reference lines, faceted into small multiples where the data vary by group, with no chartjunk and a lie factor of one. That description doubles as a default recommendation for most analytical charts, and it is no accident that it is exactly what the grammar of graphics, the perceptual ranking, and the two design philosophies independently recommend.

62.6 6. Summary

Visualization is a language whose grammar is the mapping of variables to marks and channels, whose physics is the perceptual accuracy of those channels, and whose ethics is the faithful preservation of the effect sizes in the data. Choose the chart from the question and the data types, prefer position and length to angle and area, subtract everything that is not data, and never let the axes or the aggregation tell a story the numbers do not. These principles outlast any particular tool, and they turn a chart from a picture into an argument that a skeptical reader can trust.

62.7 References

  1. Wilkinson, L. The Grammar of Graphics, 2nd ed. Springer, 2005. https://link.springer.com/book/10.1007/0-387-28695-0
  2. Wickham, H. “A Layered Grammar of Graphics.” Journal of Computational and Graphical Statistics, 2010. https://vita.had.co.nz/papers/layered-grammar.html
  3. Bertin, J. Semiology of Graphics: Diagrams, Networks, Maps. ESRI Press, 2010 (orig. 1967). https://esripress.esri.com/display/index.cfm?fuseaction=display&websiteID=185
  4. Cleveland, W. S., and McGill, R. “Graphical Perception: Theory, Experimentation, and Application to the Development of Graphical Methods.” Journal of the American Statistical Association, 1984. https://www.jstor.org/stable/2288400
  5. Cleveland, W. S. The Elements of Graphing Data. Hobart Press, 1994. https://www.stat.purdue.edu/~wsc/
  6. Tufte, E. R. The Visual Display of Quantitative Information, 2nd ed. Graphics Press, 2001. https://www.edwardtufte.com/book/the-visual-display-of-quantitative-information/
  7. Heer, J., and Bostock, M. “Crowdsourcing Graphical Perception: Using Mechanical Turk to Assess Visualization Design.” CHI, 2010. https://idl.uw.edu/papers/crowdsourcing-graphical-perception
  8. Munzner, T. Visualization Analysis and Design. CRC Press, 2014. https://www.cs.ubc.ca/~tmm/vadbook/
  9. Healy, K. Data Visualization: A Practical Introduction. Princeton University Press, 2018. https://socviz.co/
  10. Satyanarayan, A., et al. “Vega-Lite: A Grammar of Interactive Graphics.” IEEE TVCG, 2017. https://idl.uw.edu/papers/vega-lite