62 Data Visualization Principles

Data visualization is the discipline of mapping numbers to marks that the eye and mind can read quickly and honestly. A good chart compresses thousands of records into a shape that a reader grasps in seconds, while a poor chart hides structure, exaggerates noise, or misleads outright. This chapter treats visualization not as decoration but as a formal language with its own grammar, its own perceptual constraints, and its own ethics. We move from theory to practice: the grammar of graphics that underlies modern plotting libraries, the perceptual science that ranks one encoding above another, a decision framework for choosing a chart, a catalogue of common deceptions, and the design philosophies of Edward Tufte and William Cleveland that tie the subject together.

62.1 1. The Grammar of Graphics

62.1.1 1.1 From chart types to a generative system

Early plotting tools offered a fixed menu: bar chart, pie chart, line chart, scatter plot. You picked one and fed it data. The trouble is that real questions rarely fit a named template, and the menu approach gives no principled way to combine, layer, or extend graphics. Leland Wilkinson’s The Grammar of Graphics (1999) replaced the menu with a generative system. Rather than enumerate chart types, it defines the components from which any statistical graphic can be assembled, much as a natural language grammar generates sentences rather than listing them.

The core insight is that a chart is a mapping. Variables in a dataset are mapped to visual properties of geometric objects through a set of well defined transformations. Once you describe a graphic this way, the named chart types fall out as special cases. A bar chart and a stacked area chart are not different species; they are the same grammar with different position adjustments and geometries.

62.1.2 1.2 The layered components

Hadley Wickham’s A Layered Grammar of Graphics (2010), the basis of the widely used ggplot2 library, organizes the grammar into a small set of orthogonal layers. A complete specification names each of the following.

Data: the table of observations being displayed.
Aesthetic mappings: assignments from variables to visual channels such as x position, y position, color, size, and shape.
Geometries: the marks that represent observations, such as points, lines, bars, or polygons.
Statistical transformations: computations applied before drawing, such as binning for a histogram or smoothing for a trend line.
Scales: functions that translate data values into channel values and that define the legends and axes a reader uses to invert that translation.
Coordinate systems: the space in which positions are interpreted, most often Cartesian but sometimes polar or geographic.
Facets: rules that split the data into small multiples, one panel per subset.

A pseudo specification makes the orthogonality concrete.

plot(data = sales)
  + map(x = month, y = revenue, color = region)
  + geom_line()
  + scale_y(type = "log")
  + facet_by(product)

Changing one layer leaves the others untouched. Swap geom_line for geom_point and you have a scatter plot of the same mapping. Add a stat_smooth layer and a trend line overlays the raw points. This composability is why the grammar of graphics now underlies ggplot2, Vega-Lite, plotnine, and the Observable Plot library. Understanding the grammar lets you reason about any of these tools rather than memorizing each one’s syntax.

62.1.3 1.3 Why the grammar matters for analysis

The grammar disciplines thinking. Before you draw anything, you must decide which variable occupies which channel, and that decision is the substance of the chart. A frequent analytical mistake is to reach for a chart type and then bend the data to fit it. The grammar reverses the order: state the mapping that answers your question, and the geometry follows. It also exposes hidden choices. A pie chart, for example, is a stacked bar in polar coordinates with angle as the position channel, and naming it that way immediately raises the perceptual question of whether angle is a good channel at all.

62.1.4 1.4 The grammar as a formal mapping

It is worth stating the grammar in slightly more formal terms, because doing so clarifies exactly which functions a chart composes. Let a dataset be a table of $n$ observations on $p$ variables, so observation $i$ is a tuple $\mathbf{x}_i = (x_{i1}, \dots, x_{ip})$. A statistical graphic is the composition of three stages.

First, a statistical transformation $S$ maps the raw table to a derived table. For a scatter plot $S$ is the identity. For a histogram $S$ assigns each observation to a bin and returns bin counts. For a boxplot $S$ returns the five number summary per group. Formally $S$ is any function from the data table to a table of marks.

Second, for each derived row $r$ and each visual channel $c$ in use, a scale $f_c$ maps the data value to a channel value, \[ f_c : \text{domain}(v_c) \to \text{range}(c), \] where $v_c$ is the variable assigned to channel $c$. A linear position scale is $f_x(v) = a + b\,v$; a logarithmic scale is $f_x(v) = a + b\,\log v$; a sequential color scale maps an ordered magnitude to a path through a perceptually uniform color space. The legend and the axis are simply the visual presentation of $f_c^{-1}$, the inverse that lets a reader recover the data value from the mark.

Third, the coordinate system $C$ places the channel-valued marks into the plane. Cartesian coordinates apply the identity, $C(x, y) = (x, y)$. Polar coordinates apply $C(\theta, \rho) = (\rho \cos\theta, \rho \sin\theta)$, which is exactly the transformation that turns a stacked bar into a pie. Writing the pie this way makes its cost explicit: the quantity the reader cares about now lives in the angle $\theta$, a low-accuracy channel, rather than in a Cartesian length.

The full graphic is therefore the composition $C \circ f \circ S$ applied to the data, where $f$ denotes the family of scales acting channel by channel. Two charts are the same graphic when these three functions agree, which is the precise sense in which a bar chart and a stacked area chart are one grammar with different geometries and position adjustments.

62.2 2. Visual Encoding and Human Perception

62.2.1 2.1 Marks and channels

Jacques Bertin’s Semiology of Graphics (1967) first systematized the visual variables available to a designer: position, size, shape, value (lightness), color hue, orientation, and texture. In modern terms we distinguish marks, the geometric primitives such as points, lines, and areas, from channels, the properties of those marks that encode data such as position, length, area, hue, and saturation. Every chart is a choice of marks and an assignment of data variables to channels.

Not all channels are equal. The central empirical result of the field is that channels form a perceptual ranking, and that ranking should drive design.

62.2.2 2.2 The Cleveland and McGill ranking

William Cleveland and Robert McGill, in a landmark 1984 experiment, asked subjects to judge quantities encoded by different channels and measured the error. To make the comparison rigorous they reported the log absolute error. A subject who judges that the smaller of two quantities is some percentage of the larger produces a judged value $\hat{p}$ against a true value $p$, and the error for one trial is \[ \text{error} = \log_2\!\bigl(|\hat{p} - p| + \tfrac{1}{8}\bigr). \] The additive constant of one eighth keeps the logarithm finite for perfect judgments and damps the dominance of near-exact trials. Averaging this quantity over many subjects and stimuli gives a single accuracy score per channel, lower being better, and these scores are what produce the ranking. Their results, refined by later crowdsourced replication from Heer and Bostock, give an approximate accuracy ordering for encoding a quantitative variable, from most to least accurate.

Position along a common scale (aligned axes).
Position along non-aligned but identical scales.
Length.
Angle and slope.
Area.
Volume and curvature.
Color saturation and color hue.

The practical consequence is direct. When precise comparison matters, encode the quantity as position or length. This is why a dot plot or bar chart almost always beats a pie chart: the pie asks the reader to compare angles and areas, channels that sit low on the ranking, while the bar asks for a length comparison along a common scale. It is also why bubble charts, which encode magnitude as area, should be reserved for cases where rough magnitude is enough and precise reading is not required.

The underestimation of area is not a vague tendency but follows a measured law. Stevens’ power law states that perceived magnitude $\psi$ relates to physical stimulus magnitude $\phi$ by \[ \psi = k\,\phi^{\,\beta}, \] where the exponent $\beta$ depends on the channel. For length and position $\beta \approx 1$, so perception tracks the data faithfully. For area the exponent measured by Stevens falls near $0.7$, and for volume nearer $0.5$. Consider two regions whose true areas are in a ten to one ratio, $\phi_2 / \phi_1 = 10$. The perceived ratio is \[ \frac{\psi_2}{\psi_1} = \left(\frac{\phi_2}{\phi_1}\right)^{\beta} = 10^{0.7} \approx 5.0, \] so a reader judging by area sees only about half the true difference. The same tenfold contrast read as length would be perceived as tenfold. This single calculation explains why area and volume encodings compress large differences and why they sit far down the Cleveland and McGill ranking.

62.2.3 2.3 Preattentive processing and Gestalt grouping

Some visual distinctions are processed in parallel, before conscious attention, in a few tens of milliseconds. A single red dot among blue dots, or one long bar among short ones, pops out regardless of how many distractors are present. These preattentive attributes, including hue, orientation, size, and motion, are the mechanism behind effective highlighting. Encode the one series you want the reader to notice in a distinct hue and the eye finds it instantly; rely on a text label alone and the reader must search serially.

The Gestalt principles describe how the visual system groups marks into wholes. Proximity, similarity, common enclosure, and connectedness all cause marks to be read as belonging together. Connectedness is the strongest grouping cue, which is the deep reason a line chart communicates a time series so well: the connecting line asserts that the points form one continuous quantity evolving over time. Designers exploit these principles deliberately, for instance using a shared background tint to bind a group of bars, and violate them at their peril, for instance placing a legend far from the marks it explains and forcing the reader to bridge the gap.

62.2.4 2.4 Color used well and badly

Color carries the most cognitive baggage of any channel. Three rules carry most of the practical weight. First, match the color scale to the data type: a sequential scale (light to dark of one hue) for ordered magnitudes, a diverging scale (two hues meeting at a neutral midpoint) for data with a meaningful center such as profit and loss, and a categorical palette of distinct hues for unordered groups. Second, ensure perceptual uniformity, so that equal steps in data produce equal steps in perceived color; the rainbow or jet colormap fails this badly, inventing false boundaries where its lightness jumps and hiding real structure where its lightness is flat. Perceptually uniform maps such as viridis solve this. Third, design for the roughly eight percent of men with color vision deficiency by never relying on red versus green alone and by checking palettes against a color blindness simulator.

62.3 3. Choosing the Right Chart

62.3.1 3.1 Start from the question and the data type

Chart selection follows from two inputs: the analytical question and the measurement types of the variables involved. The question typically falls into one of a few intents.

Comparison of values across categories.
Trend of a quantity over an ordered dimension, usually time.
Distribution of a single variable.
Relationship between two or more variables.
Part to whole decomposition of a total.

Cross this intent with whether each variable is quantitative, ordinal, or nominal, and a small set of appropriate geometries emerges. The following diagram traces the path from intent to geometry.

flowchart TD
    Q["What is the question"] --> CMP["Compare categories"]
    Q --> TRD["Show a trend over time"]
    Q --> DST["Show a distribution"]
    Q --> REL["Show a relationship"]
    Q --> PTW["Show part to whole"]
    CMP --> BAR["Sorted bar or dot plot"]
    TRD --> LIN["Line chart"]
    DST --> HIST["Histogram or density"]
    DST --> BOX["Box or violin by group"]
    REL --> SCAT["Scatter plot"]
    PTW --> STK["Stacked bar, rarely a pie"]

62.3.2 3.2 A working decision guide

The following table captures the common cases. It is a starting point, not a cage.

Intent          Variables                 Recommended chart
-------------   -----------------------   ----------------------------
Comparison      1 nominal, 1 quantity     Bar chart (sorted) or dot plot
Trend           1 time, 1 quantity        Line chart
Distribution    1 quantity                Histogram or density plot
Distribution    1 nominal, 1 quantity     Box plot or violin, or strip
Relationship    2 quantities              Scatter plot
Relationship    2 quantities + 1 group    Scatter with color facet
Part to whole   1 nominal summing to 100  Stacked bar; rarely a pie

A few notes earn their place. For comparison, sort the bars by value unless the categories have an inherent order such as days of the week; an unsorted bar chart wastes the reader’s effort on a ranking they could have been handed. For distribution, a histogram’s message depends heavily on bin width, so try several. For part to whole, prefer a single stacked bar or a set of bars over a pie whenever there are more than two or three slices, because the angle comparison degrades rapidly with slice count.

62.3.3 3.3 Small multiples over overloaded single charts

When a relationship varies across many groups, the instinct to cram every group into one chart with a dozen colored lines produces a tangle that Tufte calls spaghetti. The better tool is the small multiple: a grid of identical small charts, one per group, sharing scales and axes. The reader learns the chart once and then scans the grid, with comparison reduced to noticing how the shape shifts from panel to panel. Small multiples scale to dozens of groups where a single overlaid chart collapses at four or five.

62.4 4. Avoiding Misleading Visuals

62.4.1 4.1 The axis: truncation and dual scales

The most common deception is the truncated bar axis. Bars encode magnitude through length measured from a baseline, so the baseline must be zero. Starting the axis at a nonzero value multiplies small differences into apparent chasms and is a textbook way to manufacture alarm or excitement from trivial change. The zero baseline rule applies specifically to length encodings. Line charts, which encode value through position rather than length, may legitimately use a nonzero baseline to reveal fine variation, provided the axis is clearly labeled, but even then a truncated line can mislead a careless reader and should be used with disclosure.

The dual y axis chart, plotting two series against two differently scaled vertical axes, is nearly always a trap. The crossing point and relative slopes of the two lines depend entirely on the arbitrary choice of the two scales, so the chart can suggest any correlation the author wishes. Prefer indexing both series to a common base of 100 and plotting them on one axis, or use two stacked panels.

62.4.2 4.2 Area, 3D, and the lie factor

Tufte defines the lie factor precisely as \[ \text{lie factor} = \frac{\text{size of effect shown in the graphic}}{\text{size of effect in the data}}, \] where the size of an effect is the relative change, $|v_2 - v_1| / v_1$. An honest graphic has a lie factor near one.

A concrete case shows how fast the factor inflates. Suppose a value grows from $100$ to $200$, a true effect of $1.0$, or one hundred percent. An honest length encoding doubles the bar, also a one hundred percent change, giving a lie factor of one. Now encode the same value as the height of a pictogram while scaling its width proportionally, so the icon’s area carries the quantity. Doubling the linear dimension multiplies the area by $2^2 = 4$, a displayed effect of $3.0$, or three hundred percent. The lie factor is \[ \frac{3.0}{1.0} = 3.0, \] meaning the graphic overstates the change threefold. This is exactly the area trap that Stevens’ power law in section 2.2 already warned against, seen now from the author’s side rather than the reader’s. The second routine inflation is gratuitous three dimensional rendering of two dimensional data, where perspective makes nearer elements loom larger than their values warrant and occlusion hides smaller ones. A 3D pie chart is the canonical offender, distorting the very angles it asks the reader to compare.

62.4.3 4.3 Cherry picking, aggregation, and missing context

Not all distortion lives in the axes. Selecting a flattering time window, a practice visible in many financial charts, can reverse the apparent trend. Aggregation can hide reversal: Simpson’s paradox occurs when a trend present within every subgroup vanishes or flips when the groups are pooled, and only a disaggregated or faceted view reveals it. The arithmetic is worth seeing once, because it shows the paradox is not a curiosity but an unavoidable property of weighted averages. In a documented kidney-stone study, treatment A succeeds in $81$ of $87$ small-stone cases, a rate of $0.93$, and in $192$ of $263$ large-stone cases, a rate of $0.73$. Treatment B succeeds in $234$ of $270$ small-stone cases, $0.87$, and in $55$ of $80$ large-stone cases, $0.69$. Within each stone size A beats B. Yet the pooled rates are \[ \text{A} = \frac{81 + 192}{87 + 263} = \frac{273}{350} \approx 0.78, \qquad \text{B} = \frac{234 + 55}{270 + 80} = \frac{289}{350} \approx 0.83, \] so the pooled comparison reverses and favors B. The reversal happens because A was applied mostly to the hard large-stone cases while B was applied mostly to the easy small-stone cases, so the pooled order is decided by how the case counts are distributed across the strata rather than by the within-stratum rates. A single faceted chart, one panel per stone size, makes the consistent within-group advantage of A visible where the pooled bar would hide it. Omitting the denominator, plotting raw counts where a rate is the meaningful quantity, makes larger populations look like hotspots merely because they are larger. The defenses are disclosure of the full range, presentation of rates alongside counts, faceting before pooling, and skepticism toward any single number stripped of its base.

62.4.4 4.4 An integrity checklist

Before publishing, audit the chart against a short list. Does every length encoding start at zero? Is the data to ink ratio high, or is the message buried under decoration? Are the axes labeled with units? Is the source cited and the sample size visible? Would a reader reach the same conclusion you did, or only the one you wanted them to reach? The last question is the ethical core of the field.

62.5 5. The Principles of Tufte and Cleveland

62.5.1 5.1 Tufte: maximize the data ink ratio

Edward Tufte’s The Visual Display of Quantitative Information (1983) is the field’s most cited text, and its arguments are mostly about subtraction. Tufte defines chartjunk as visual elements that carry no information: heavy gridlines, decorative backgrounds, redundant borders, drop shadows, and the moiré patterns of cross hatching. He proposes the data-ink ratio, \[ \text{data-ink ratio} = \frac{\text{ink used to represent data}}{\text{total ink used to print the graphic}}, \] the proportion of a graphic’s ink devoted to representing data rather than to ornament, and urges that designers maximize it within reason. The ratio lies between zero and one, and the program of erasing non-data ink drives it toward one. The practical program is to erase: remove every mark that, if deleted, would cost the reader no information. Removing a heavy gridline raises the numerator’s share without touching the data; replacing a filled bar with a single dot removes the bar’s interior ink while preserving the one position that carries meaning, which is the direct argument for Cleveland’s dot plot. The result is a sparse, high contrast graphic in which the data, not the frame, draws the eye.

Tufte’s related contributions include the sparkline, a small word sized line chart embedded inline in text to show a trend without breaking the prose, and a sustained insistence on graphical integrity through the lie factor. His critique of decoration can be taken too far, and later researchers have shown that a small amount of memorable visual embellishment can aid recall, but the core discipline of removing the meaningless remains sound.

62.5.2 5.2 Cleveland: graphical perception as an empirical science

Where Tufte argues from taste and principle, William Cleveland argues from experiment. His books The Elements of Graphing Data (1985) and Visualizing Data (1993), together with the 1984 paper with McGill, recast chart design as a question to be answered by measuring human accuracy. The channel ranking of section 2.2 is his most influential result and converts design choices into testable claims.

Cleveland’s constructive contributions are as important as his rankings. He championed the dot plot as a replacement for the bar chart and the pie chart, because it uses the high accuracy position channel while avoiding the heavy ink of bars. He introduced trellis displays, the rigorous form of small multiples with shared scales, into mainstream statistical practice. And he developed loess, locally weighted scatter plot smoothing, so that an analyst could lay an honest, assumption light trend curve over a cloud of points and let the data speak. The thread uniting his work is that visualization is a tool for seeing structure in data, and that the design that lets a person see most accurately is empirically the better design.

62.5.3 5.3 Reconciling the two

Tufte and Cleveland are complementary rather than rival. Tufte tells you what to remove; Cleveland tells you which encodings to keep. A graphic that honors both is sparse in ornament and built on high accuracy channels: a sorted dot plot with light reference lines, faceted into small multiples where the data vary by group, with no chartjunk and a lie factor of one. That description doubles as a default recommendation for most analytical charts, and it is no accident that it is exactly what the grammar of graphics, the perceptual ranking, and the two design philosophies independently recommend.

62.6 6. When to Use What, and the Recurring Pitfalls

The principles above compress into a short field guide. The defaults below hold for analytical work where accuracy and honesty matter more than novelty.

Reach for position and length first. A sorted bar chart or dot plot answers comparison questions with the highest-accuracy channels, and a line chart answers trend questions because connectedness is the strongest grouping cue. Use these unless you have a specific reason not to.
Reserve area, angle, and color magnitude for cases where rough reading suffices. A bubble chart or a heatmap conveys broad pattern, not precise value, because Stevens’ exponent for these channels is well below one.
Prefer small multiples to overloaded single charts the moment more than three or four groups appear. The reader learns one panel and scans the rest.
Match the color scale to the data type: sequential for ordered magnitude, diverging for data with a meaningful center, categorical for unordered groups, and always a perceptually uniform map rather than the rainbow.

The pitfalls recur with such regularity that they form a checklist of their own. Truncating a length axis below zero, encoding one quantity as two-dimensional area, rendering flat data in fake three dimensions, juxtaposing two arbitrarily scaled y axes, pooling across strata that hide a Simpson reversal, and plotting counts where a rate is meant. Each of these has appeared above with the mathematics that explains why it deceives. An analyst who can name the mechanism, a lie factor above one, a low Stevens exponent, an unweighted pooling, is far better defended than one who merely memorizes a list of forbidden charts.

62.7 7. Summary

Visualization is a language whose grammar is the mapping of variables to marks and channels, whose physics is the perceptual accuracy of those channels, and whose ethics is the faithful preservation of the effect sizes in the data. Choose the chart from the question and the data types, prefer position and length to angle and area, subtract everything that is not data, and never let the axes or the aggregation tell a story the numbers do not. These principles outlast any particular tool, and they turn a chart from a picture into an argument that a skeptical reader can trust.

62.8 References

Wilkinson, L. The Grammar of Graphics, 2nd ed. Springer, 2005. https://link.springer.com/book/10.1007/0-387-28695-0
Wickham, H. “A Layered Grammar of Graphics.” Journal of Computational and Graphical Statistics, 2010. https://vita.had.co.nz/papers/layered-grammar.html
Bertin, J. Semiology of Graphics: Diagrams, Networks, Maps. ESRI Press, 2010 (orig. 1967). https://esripress.esri.com/display/index.cfm?fuseaction=display&websiteID=185
Cleveland, W. S., and McGill, R. “Graphical Perception: Theory, Experimentation, and Application to the Development of Graphical Methods.” Journal of the American Statistical Association, 1984. https://www.jstor.org/stable/2288400
Cleveland, W. S. The Elements of Graphing Data. Hobart Press, 1994. https://www.stat.purdue.edu/~wsc/
Tufte, E. R. The Visual Display of Quantitative Information, 2nd ed. Graphics Press, 2001. https://www.edwardtufte.com/book/the-visual-display-of-quantitative-information/
Heer, J., and Bostock, M. “Crowdsourcing Graphical Perception: Using Mechanical Turk to Assess Visualization Design.” CHI, 2010. https://idl.uw.edu/papers/crowdsourcing-graphical-perception
Munzner, T. Visualization Analysis and Design. CRC Press, 2014. https://www.cs.ubc.ca/~tmm/vadbook/
Healy, K. Data Visualization: A Practical Introduction. Princeton University Press, 2018. https://socviz.co/
Satyanarayan, A., et al. “Vega-Lite: A Grammar of Interactive Graphics.” IEEE TVCG, 2017. https://idl.uw.edu/papers/vega-lite
Stevens, S. S. “On the Psychophysical Law.” Psychological Review, 64(3), 153-181, 1957. https://doi.org/10.1037/h0046162
Charig, C. R., Webb, D. R., Payne, S. R., and Wickham, J. E. “Comparison of Treatment of Renal Calculi by Open Surgery, Percutaneous Nephrolithotomy, and Extracorporeal Shockwave Lithotripsy.” British Medical Journal, 292(6524), 879-882, 1986. https://doi.org/10.1136/bmj.292.6524.879

# Data Visualization Principles Data visualization is the discipline of mapping numbers to marks that the eye and mind can read quickly and honestly. A good chart compresses thousands of records into a shape that a reader grasps in seconds, while a poor chart hides structure, exaggerates noise, or misleads outright. This chapter treats visualization not as decoration but as a formal language with its own grammar, its own perceptual constraints, and its own ethics. We move from theory to practice: the grammar of graphics that underlies modern plotting libraries, the perceptual science that ranks one encoding above another, a decision framework for choosing a chart, a catalogue of common deceptions, and the design philosophies of Edward Tufte and William Cleveland that tie the subject together. ## 1. The Grammar of Graphics ### 1.1 From chart types to a generative system Early plotting tools offered a fixed menu: bar chart, pie chart, line chart, scatter plot. You picked one and fed it data. The trouble is that real questions rarely fit a named template, and the menu approach gives no principled way to combine, layer, or extend graphics. Leland Wilkinson's *The Grammar of Graphics* (1999) replaced the menu with a generative system. Rather than enumerate chart types, it defines the components from which any statistical graphic can be assembled, much as a natural language grammar generates sentences rather than listing them. The core insight is that a chart is a mapping. Variables in a dataset are mapped to visual properties of geometric objects through a set of well defined transformations. Once you describe a graphic this way, the named chart types fall out as special cases. A bar chart and a stacked area chart are not different species; they are the same grammar with different position adjustments and geometries. ### 1.2 The layered components Hadley Wickham's *A Layered Grammar of Graphics* (2010), the basis of the widely used ggplot2 library, organizes the grammar into a small set of orthogonal layers. A complete specification names each of the following. - **Data**: the table of observations being displayed. - **Aesthetic mappings**: assignments from variables to visual channels such as x position, y position, color, size, and shape. - **Geometries**: the marks that represent observations, such as points, lines, bars, or polygons. - **Statistical transformations**: computations applied before drawing, such as binning for a histogram or smoothing for a trend line. - **Scales**: functions that translate data values into channel values and that define the legends and axes a reader uses to invert that translation. - **Coordinate systems**: the space in which positions are interpreted, most often Cartesian but sometimes polar or geographic. - **Facets**: rules that split the data into small multiples, one panel per subset. A pseudo specification makes the orthogonality concrete. ```text plot(data = sales) + map(x = month, y = revenue, color = region) + geom_line() + scale_y(type = "log") + facet_by(product) ``` Changing one layer leaves the others untouched. Swap `geom_line` for `geom_point` and you have a scatter plot of the same mapping. Add a `stat_smooth` layer and a trend line overlays the raw points. This composability is why the grammar of graphics now underlies ggplot2, Vega-Lite, plotnine, and the Observable Plot library. Understanding the grammar lets you reason about any of these tools rather than memorizing each one's syntax. ### 1.3 Why the grammar matters for analysis The grammar disciplines thinking. Before you draw anything, you must decide which variable occupies which channel, and that decision is the substance of the chart. A frequent analytical mistake is to reach for a chart type and then bend the data to fit it. The grammar reverses the order: state the mapping that answers your question, and the geometry follows. It also exposes hidden choices. A pie chart, for example, is a stacked bar in polar coordinates with angle as the position channel, and naming it that way immediately raises the perceptual question of whether angle is a good channel at all. ### 1.4 The grammar as a formal mapping It is worth stating the grammar in slightly more formal terms, because doing so clarifies exactly which functions a chart composes. Let a dataset be a table of $n$ observations on $p$ variables, so observation $i$ is a tuple $\mathbf{x}_i = (x_{i1}, \dots, x_{ip})$. A statistical graphic is the composition of three stages. First, a statistical transformation $S$ maps the raw table to a derived table. For a scatter plot $S$ is the identity. For a histogram $S$ assigns each observation to a bin and returns bin counts. For a boxplot $S$ returns the five number summary per group. Formally $S$ is any function from the data table to a table of marks. Second, for each derived row $r$ and each visual channel $c$ in use, a scale $f_c$ maps the data value to a channel value, $$ f_c : \text{domain}(v_c) \to \text{range}(c), $$ where $v_c$ is the variable assigned to channel $c$. A linear position scale is $f_x(v) = a + b\,v$; a logarithmic scale is $f_x(v) = a + b\,\log v$; a sequential color scale maps an ordered magnitude to a path through a perceptually uniform color space. The legend and the axis are simply the visual presentation of $f_c^{-1}$, the inverse that lets a reader recover the data value from the mark. Third, the coordinate system $C$ places the channel-valued marks into the plane. Cartesian coordinates apply the identity, $C(x, y) = (x, y)$. Polar coordinates apply $C(\theta, \rho) = (\rho \cos\theta, \rho \sin\theta)$, which is exactly the transformation that turns a stacked bar into a pie. Writing the pie this way makes its cost explicit: the quantity the reader cares about now lives in the angle $\theta$, a low-accuracy channel, rather than in a Cartesian length. The full graphic is therefore the composition $C \circ f \circ S$ applied to the data, where $f$ denotes the family of scales acting channel by channel. Two charts are the same graphic when these three functions agree, which is the precise sense in which a bar chart and a stacked area chart are one grammar with different geometries and position adjustments. ## 2. Visual Encoding and Human Perception ### 2.1 Marks and channels Jacques Bertin's *Semiology of Graphics* (1967) first systematized the visual variables available to a designer: position, size, shape, value (lightness), color hue, orientation, and texture. In modern terms we distinguish **marks**, the geometric primitives such as points, lines, and areas, from **channels**, the properties of those marks that encode data such as position, length, area, hue, and saturation. Every chart is a choice of marks and an assignment of data variables to channels. Not all channels are equal. The central empirical result of the field is that channels form a perceptual ranking, and that ranking should drive design. ### 2.2 The Cleveland and McGill ranking William Cleveland and Robert McGill, in a landmark 1984 experiment, asked subjects to judge quantities encoded by different channels and measured the error. To make the comparison rigorous they reported the log absolute error. A subject who judges that the smaller of two quantities is some percentage of the larger produces a judged value $\hat{p}$ against a true value $p$, and the error for one trial is $$ \text{error} = \log_2\!\bigl(|\hat{p} - p| + \tfrac{1}{8}\bigr). $$ The additive constant of one eighth keeps the logarithm finite for perfect judgments and damps the dominance of near-exact trials. Averaging this quantity over many subjects and stimuli gives a single accuracy score per channel, lower being better, and these scores are what produce the ranking. Their results, refined by later crowdsourced replication from Heer and Bostock, give an approximate accuracy ordering for encoding a quantitative variable, from most to least accurate. 1. Position along a common scale (aligned axes). 2. Position along non-aligned but identical scales. 3. Length. 4. Angle and slope. 5. Area. 6. Volume and curvature. 7. Color saturation and color hue. The practical consequence is direct. When precise comparison matters, encode the quantity as position or length. This is why a dot plot or bar chart almost always beats a pie chart: the pie asks the reader to compare angles and areas, channels that sit low on the ranking, while the bar asks for a length comparison along a common scale. It is also why bubble charts, which encode magnitude as area, should be reserved for cases where rough magnitude is enough and precise reading is not required. The underestimation of area is not a vague tendency but follows a measured law. Stevens' power law states that perceived magnitude $\psi$ relates to physical stimulus magnitude $\phi$ by $$ \psi = k\,\phi^{\,\beta}, $$ where the exponent $\beta$ depends on the channel. For length and position $\beta \approx 1$, so perception tracks the data faithfully. For area the exponent measured by Stevens falls near $0.7$, and for volume nearer $0.5$. Consider two regions whose true areas are in a ten to one ratio, $\phi_2 / \phi_1 = 10$. The perceived ratio is $$ \frac{\psi_2}{\psi_1} = \left(\frac{\phi_2}{\phi_1}\right)^{\beta} = 10^{0.7} \approx 5.0, $$ so a reader judging by area sees only about half the true difference. The same tenfold contrast read as length would be perceived as tenfold. This single calculation explains why area and volume encodings compress large differences and why they sit far down the Cleveland and McGill ranking. ### 2.3 Preattentive processing and Gestalt grouping Some visual distinctions are processed in parallel, before conscious attention, in a few tens of milliseconds. A single red dot among blue dots, or one long bar among short ones, pops out regardless of how many distractors are present. These **preattentive** attributes, including hue, orientation, size, and motion, are the mechanism behind effective highlighting. Encode the one series you want the reader to notice in a distinct hue and the eye finds it instantly; rely on a text label alone and the reader must search serially. The Gestalt principles describe how the visual system groups marks into wholes. Proximity, similarity, common enclosure, and connectedness all cause marks to be read as belonging together. Connectedness is the strongest grouping cue, which is the deep reason a line chart communicates a time series so well: the connecting line asserts that the points form one continuous quantity evolving over time. Designers exploit these principles deliberately, for instance using a shared background tint to bind a group of bars, and violate them at their peril, for instance placing a legend far from the marks it explains and forcing the reader to bridge the gap. ### 2.4 Color used well and badly Color carries the most cognitive baggage of any channel. Three rules carry most of the practical weight. First, match the color scale to the data type: a sequential scale (light to dark of one hue) for ordered magnitudes, a diverging scale (two hues meeting at a neutral midpoint) for data with a meaningful center such as profit and loss, and a categorical palette of distinct hues for unordered groups. Second, ensure perceptual uniformity, so that equal steps in data produce equal steps in perceived color; the rainbow or jet colormap fails this badly, inventing false boundaries where its lightness jumps and hiding real structure where its lightness is flat. Perceptually uniform maps such as viridis solve this. Third, design for the roughly eight percent of men with color vision deficiency by never relying on red versus green alone and by checking palettes against a color blindness simulator. ## 3. Choosing the Right Chart ### 3.1 Start from the question and the data type Chart selection follows from two inputs: the analytical question and the measurement types of the variables involved. The question typically falls into one of a few intents. - **Comparison** of values across categories. - **Trend** of a quantity over an ordered dimension, usually time. - **Distribution** of a single variable. - **Relationship** between two or more variables. - **Part to whole** decomposition of a total. Cross this intent with whether each variable is quantitative, ordinal, or nominal, and a small set of appropriate geometries emerges. The following diagram traces the path from intent to geometry. ```{mermaid} flowchart TD Q["What is the question"] --> CMP["Compare categories"] Q --> TRD["Show a trend over time"] Q --> DST["Show a distribution"] Q --> REL["Show a relationship"] Q --> PTW["Show part to whole"] CMP --> BAR["Sorted bar or dot plot"] TRD --> LIN["Line chart"] DST --> HIST["Histogram or density"] DST --> BOX["Box or violin by group"] REL --> SCAT["Scatter plot"] PTW --> STK["Stacked bar, rarely a pie"] ``` ### 3.2 A working decision guide The following table captures the common cases. It is a starting point, not a cage. ```text Intent Variables Recommended chart ------------- ----------------------- ---------------------------- Comparison 1 nominal, 1 quantity Bar chart (sorted) or dot plot Trend 1 time, 1 quantity Line chart Distribution 1 quantity Histogram or density plot Distribution 1 nominal, 1 quantity Box plot or violin, or strip Relationship 2 quantities Scatter plot Relationship 2 quantities + 1 group Scatter with color facet Part to whole 1 nominal summing to 100 Stacked bar; rarely a pie ``` A few notes earn their place. For comparison, sort the bars by value unless the categories have an inherent order such as days of the week; an unsorted bar chart wastes the reader's effort on a ranking they could have been handed. For distribution, a histogram's message depends heavily on bin width, so try several. For part to whole, prefer a single stacked bar or a set of bars over a pie whenever there are more than two or three slices, because the angle comparison degrades rapidly with slice count. ### 3.3 Small multiples over overloaded single charts When a relationship varies across many groups, the instinct to cram every group into one chart with a dozen colored lines produces a tangle that Tufte calls spaghetti. The better tool is the **small multiple**: a grid of identical small charts, one per group, sharing scales and axes. The reader learns the chart once and then scans the grid, with comparison reduced to noticing how the shape shifts from panel to panel. Small multiples scale to dozens of groups where a single overlaid chart collapses at four or five. ## 4. Avoiding Misleading Visuals ### 4.1 The axis: truncation and dual scales The most common deception is the truncated bar axis. Bars encode magnitude through length measured from a baseline, so the baseline must be zero. Starting the axis at a nonzero value multiplies small differences into apparent chasms and is a textbook way to manufacture alarm or excitement from trivial change. The zero baseline rule applies specifically to length encodings. Line charts, which encode value through position rather than length, may legitimately use a nonzero baseline to reveal fine variation, provided the axis is clearly labeled, but even then a truncated line can mislead a careless reader and should be used with disclosure. The dual y axis chart, plotting two series against two differently scaled vertical axes, is nearly always a trap. The crossing point and relative slopes of the two lines depend entirely on the arbitrary choice of the two scales, so the chart can suggest any correlation the author wishes. Prefer indexing both series to a common base of 100 and plotting them on one axis, or use two stacked panels. ### 4.2 Area, 3D, and the lie factor Tufte defines the **lie factor** precisely as $$ \text{lie factor} = \frac{\text{size of effect shown in the graphic}}{\text{size of effect in the data}}, $$ where the size of an effect is the relative change, $|v_2 - v_1| / v_1$. An honest graphic has a lie factor near one. A concrete case shows how fast the factor inflates. Suppose a value grows from $100$ to $200$, a true effect of $1.0$, or one hundred percent. An honest length encoding doubles the bar, also a one hundred percent change, giving a lie factor of one. Now encode the same value as the height of a pictogram while scaling its width proportionally, so the icon's area carries the quantity. Doubling the linear dimension multiplies the area by $2^2 = 4$, a displayed effect of $3.0$, or three hundred percent. The lie factor is $$ \frac{3.0}{1.0} = 3.0, $$ meaning the graphic overstates the change threefold. This is exactly the area trap that Stevens' power law in section 2.2 already warned against, seen now from the author's side rather than the reader's. The second routine inflation is gratuitous three dimensional rendering of two dimensional data, where perspective makes nearer elements loom larger than their values warrant and occlusion hides smaller ones. A 3D pie chart is the canonical offender, distorting the very angles it asks the reader to compare. ### 4.3 Cherry picking, aggregation, and missing context Not all distortion lives in the axes. Selecting a flattering time window, a practice visible in many financial charts, can reverse the apparent trend. Aggregation can hide reversal: Simpson's paradox occurs when a trend present within every subgroup vanishes or flips when the groups are pooled, and only a disaggregated or faceted view reveals it. The arithmetic is worth seeing once, because it shows the paradox is not a curiosity but an unavoidable property of weighted averages. In a documented kidney-stone study, treatment A succeeds in $81$ of $87$ small-stone cases, a rate of $0.93$, and in $192$ of $263$ large-stone cases, a rate of $0.73$. Treatment B succeeds in $234$ of $270$ small-stone cases, $0.87$, and in $55$ of $80$ large-stone cases, $0.69$. Within each stone size A beats B. Yet the pooled rates are $$ \text{A} = \frac{81 + 192}{87 + 263} = \frac{273}{350} \approx 0.78, \qquad \text{B} = \frac{234 + 55}{270 + 80} = \frac{289}{350} \approx 0.83, $$ so the pooled comparison reverses and favors B. The reversal happens because A was applied mostly to the hard large-stone cases while B was applied mostly to the easy small-stone cases, so the pooled order is decided by how the case counts are distributed across the strata rather than by the within-stratum rates. A single faceted chart, one panel per stone size, makes the consistent within-group advantage of A visible where the pooled bar would hide it. Omitting the denominator, plotting raw counts where a rate is the meaningful quantity, makes larger populations look like hotspots merely because they are larger. The defenses are disclosure of the full range, presentation of rates alongside counts, faceting before pooling, and skepticism toward any single number stripped of its base. ### 4.4 An integrity checklist Before publishing, audit the chart against a short list. Does every length encoding start at zero? Is the data to ink ratio high, or is the message buried under decoration? Are the axes labeled with units? Is the source cited and the sample size visible? Would a reader reach the same conclusion you did, or only the one you wanted them to reach? The last question is the ethical core of the field. ## 5. The Principles of Tufte and Cleveland ### 5.1 Tufte: maximize the data ink ratio Edward Tufte's *The Visual Display of Quantitative Information* (1983) is the field's most cited text, and its arguments are mostly about subtraction. Tufte defines **chartjunk** as visual elements that carry no information: heavy gridlines, decorative backgrounds, redundant borders, drop shadows, and the moiré patterns of cross hatching. He proposes the **data-ink ratio**, $$ \text{data-ink ratio} = \frac{\text{ink used to represent data}}{\text{total ink used to print the graphic}}, $$ the proportion of a graphic's ink devoted to representing data rather than to ornament, and urges that designers maximize it within reason. The ratio lies between zero and one, and the program of erasing non-data ink drives it toward one. The practical program is to erase: remove every mark that, if deleted, would cost the reader no information. Removing a heavy gridline raises the numerator's share without touching the data; replacing a filled bar with a single dot removes the bar's interior ink while preserving the one position that carries meaning, which is the direct argument for Cleveland's dot plot. The result is a sparse, high contrast graphic in which the data, not the frame, draws the eye. Tufte's related contributions include the **sparkline**, a small word sized line chart embedded inline in text to show a trend without breaking the prose, and a sustained insistence on graphical integrity through the lie factor. His critique of decoration can be taken too far, and later researchers have shown that a small amount of memorable visual embellishment can aid recall, but the core discipline of removing the meaningless remains sound. ### 5.2 Cleveland: graphical perception as an empirical science Where Tufte argues from taste and principle, William Cleveland argues from experiment. His books *The Elements of Graphing Data* (1985) and *Visualizing Data* (1993), together with the 1984 paper with McGill, recast chart design as a question to be answered by measuring human accuracy. The channel ranking of section 2.2 is his most influential result and converts design choices into testable claims. Cleveland's constructive contributions are as important as his rankings. He championed the **dot plot** as a replacement for the bar chart and the pie chart, because it uses the high accuracy position channel while avoiding the heavy ink of bars. He introduced **trellis displays**, the rigorous form of small multiples with shared scales, into mainstream statistical practice. And he developed **loess**, locally weighted scatter plot smoothing, so that an analyst could lay an honest, assumption light trend curve over a cloud of points and let the data speak. The thread uniting his work is that visualization is a tool for seeing structure in data, and that the design that lets a person see most accurately is empirically the better design. ### 5.3 Reconciling the two Tufte and Cleveland are complementary rather than rival. Tufte tells you what to remove; Cleveland tells you which encodings to keep. A graphic that honors both is sparse in ornament and built on high accuracy channels: a sorted dot plot with light reference lines, faceted into small multiples where the data vary by group, with no chartjunk and a lie factor of one. That description doubles as a default recommendation for most analytical charts, and it is no accident that it is exactly what the grammar of graphics, the perceptual ranking, and the two design philosophies independently recommend. ## 6. When to Use What, and the Recurring Pitfalls The principles above compress into a short field guide. The defaults below hold for analytical work where accuracy and honesty matter more than novelty. - Reach for **position and length** first. A sorted bar chart or dot plot answers comparison questions with the highest-accuracy channels, and a line chart answers trend questions because connectedness is the strongest grouping cue. Use these unless you have a specific reason not to. - Reserve **area, angle, and color magnitude** for cases where rough reading suffices. A bubble chart or a heatmap conveys broad pattern, not precise value, because Stevens' exponent for these channels is well below one. - Prefer **small multiples** to overloaded single charts the moment more than three or four groups appear. The reader learns one panel and scans the rest. - Match the **color scale to the data type**: sequential for ordered magnitude, diverging for data with a meaningful center, categorical for unordered groups, and always a perceptually uniform map rather than the rainbow. The pitfalls recur with such regularity that they form a checklist of their own. Truncating a length axis below zero, encoding one quantity as two-dimensional area, rendering flat data in fake three dimensions, juxtaposing two arbitrarily scaled y axes, pooling across strata that hide a Simpson reversal, and plotting counts where a rate is meant. Each of these has appeared above with the mathematics that explains why it deceives. An analyst who can name the mechanism, a lie factor above one, a low Stevens exponent, an unweighted pooling, is far better defended than one who merely memorizes a list of forbidden charts. ## 7. Summary Visualization is a language whose grammar is the mapping of variables to marks and channels, whose physics is the perceptual accuracy of those channels, and whose ethics is the faithful preservation of the effect sizes in the data. Choose the chart from the question and the data types, prefer position and length to angle and area, subtract everything that is not data, and never let the axes or the aggregation tell a story the numbers do not. These principles outlast any particular tool, and they turn a chart from a picture into an argument that a skeptical reader can trust. ## References 1. Wilkinson, L. *The Grammar of Graphics*, 2nd ed. Springer, 2005. https://link.springer.com/book/10.1007/0-387-28695-0 2. Wickham, H. "A Layered Grammar of Graphics." *Journal of Computational and Graphical Statistics*, 2010. https://vita.had.co.nz/papers/layered-grammar.html 3. Bertin, J. *Semiology of Graphics: Diagrams, Networks, Maps*. ESRI Press, 2010 (orig. 1967). https://esripress.esri.com/display/index.cfm?fuseaction=display&websiteID=185 4. Cleveland, W. S., and McGill, R. "Graphical Perception: Theory, Experimentation, and Application to the Development of Graphical Methods." *Journal of the American Statistical Association*, 1984. https://www.jstor.org/stable/2288400 5. Cleveland, W. S. *The Elements of Graphing Data*. Hobart Press, 1994. https://www.stat.purdue.edu/~wsc/ 6. Tufte, E. R. *The Visual Display of Quantitative Information*, 2nd ed. Graphics Press, 2001. https://www.edwardtufte.com/book/the-visual-display-of-quantitative-information/ 7. Heer, J., and Bostock, M. "Crowdsourcing Graphical Perception: Using Mechanical Turk to Assess Visualization Design." *CHI*, 2010. https://idl.uw.edu/papers/crowdsourcing-graphical-perception 8. Munzner, T. *Visualization Analysis and Design*. CRC Press, 2014. https://www.cs.ubc.ca/~tmm/vadbook/ 9. Healy, K. *Data Visualization: A Practical Introduction*. Princeton University Press, 2018. https://socviz.co/ 10. Satyanarayan, A., et al. "Vega-Lite: A Grammar of Interactive Graphics." *IEEE TVCG*, 2017. https://idl.uw.edu/papers/vega-lite 11. Stevens, S. S. "On the Psychophysical Law." *Psychological Review*, 64(3), 153-181, 1957. https://doi.org/10.1037/h0046162 12. Charig, C. R., Webb, D. R., Payne, S. R., and Wickham, J. E. "Comparison of Treatment of Renal Calculi by Open Surgery, Percutaneous Nephrolithotomy, and Extracorporeal Shockwave Lithotripsy." *British Medical Journal*, 292(6524), 879-882, 1986. https://doi.org/10.1136/bmj.292.6524.879