64 Interactive Visualization Tools
Static charts answer questions you already know to ask. A bar chart of model accuracy by class tells you which classes are hard, but it cannot tell you why a particular image was misclassified, how confidence shifts as you slide a decision threshold, or whether a cluster of errors hides inside a feature subspace you never plotted. Interactivity closes that gap. It turns a single rendered answer into a surface you can probe, where the next view is one hover, brush, or filter away. For machine learning work, where the interesting structure lives in high dimensional spaces and long tails, that capability is not a luxury. This chapter surveys the modern Python ecosystem for interactive visualization, explains when interactivity earns its cost, and shows how to build both exploratory and explanatory interactive views for ML.
64.1 1. When Interactivity Helps
64.1.1 1.1 The two jobs of a visualization
Visualizations do one of two jobs, and confusing them is the most common reason interactive tools get misused. An exploratory visualization helps you, the analyst, discover something you did not already know. An explanatory visualization communicates something you already understand to an audience that does not. The audience, the level of polish, and the acceptable complexity differ sharply between the two.
Exploration rewards breadth and speed. You want to slice, zoom, recolor, and recompute quickly, accepting rough edges because you are the only consumer. Explanation rewards focus and restraint. You have found the insight, and now every interactive control you add is a question you are asking your reader to answer for themselves. Sometimes that is exactly right, because the reader genuinely has different questions. Often it is a sign you have not finished thinking.
64.1.2 1.2 What interactivity actually buys you
Interactivity pays off when the data has more structure than a single static frame can hold, and when the viewer benefits from steering. Three patterns recur in ML work.
The first is high cardinality. A scatter plot of ten thousand embeddings is an ink blob until you can zoom into a region and hover to read the underlying record. The second is conditional structure, where the relationship you care about only appears after you filter or facet, such as a calibration curve that looks fine overall but falls apart for one customer segment. The third is parameter sensitivity, where the question is how an output moves as an input changes, such as precision and recall as a function of a threshold, or a partial dependence curve as you swap the feature being held fixed.
64.1.3 1.3 When to stay static
Interactivity has real costs. It adds JavaScript payload, slows page loads, complicates reproducibility, and can fail silently when a dependency drifts. If the message is a single comparison, a static figure is faster to make, faster to read, and trivially embeddable in a paper or slide. A good rule is that interactivity should remove ambiguity, not add decoration. If a reader can extract the full message without touching a control, the controls are noise. Reserve interaction for the moments where the reader genuinely needs to ask a question you cannot answer for them in advance.
64.2 2. The Library Landscape
Three plotting libraries dominate interactive work in Python, and they embody three different philosophies. Understanding the philosophy matters more than memorizing the API, because it predicts where each tool will feel natural and where it will fight you.
64.2.1 2.1 Plotly: imperative and batteries included
Plotly builds figures imperatively. You construct traces, attach them to a figure, and tune a layout dictionary. Its plotly.express module offers a high level interface that produces a richly interactive chart from a single call, complete with hover tooltips, zoom, pan, and a legend that toggles series on click.
import plotly.express as px
fig = px.scatter(
df, x="pc1", y="pc2", color="label",
hover_data=["sample_id", "confidence"],
title="Embedding projection by predicted label",
)
fig.update_layout(legend_title_text="Class")Plotly’s strengths are coverage and polish. It handles 3D surfaces, geographic maps, and animation, and its output works in notebooks, exported HTML, and dashboards without changes. Its cost is that complex customization means navigating a large and sometimes inconsistent configuration surface, and large datasets can produce heavy HTML unless you downsample or switch to its WebGL backends.
64.2.2 2.2 Bokeh: a model for building interactive applications
Bokeh is less a chart library than a framework for browser based interactive graphics. Its core abstraction is the ColumnDataSource, a shared data model that glyphs render from and that widgets and callbacks mutate. Because the data source is explicit and shared, Bokeh excels when multiple views must stay linked, such as a brush on one plot that highlights the same points on three others.
from bokeh.plotting import figure
from bokeh.models import ColumnDataSource
source = ColumnDataSource(df)
p = figure(title="Residuals vs fitted", tools="box_select,lasso_select,reset")
p.scatter("fitted", "residual", source=source, size=6, alpha=0.5)Bokeh can run with a live Python server, which means callbacks execute real Python rather than precompiled JavaScript. That unlocks interactions backed by arbitrary computation, including rerunning a model on the selected subset. The tradeoff is that a server adds deployment weight, and standalone HTML output limits you to callbacks that can be expressed in Bokeh’s JavaScript layer.
64.2.3 2.3 Altair: declarative grammar of graphics
Altair takes the opposite stance from Plotly. You do not describe how to draw; you declare what the data means by mapping columns to visual channels such as x, y, color, and size. It compiles to Vega-Lite, a JSON specification that a JavaScript runtime renders. The declarative style makes Altair concise and composable, and its selection and binding primitives express linked filtering and cross highlighting with remarkable economy.
import altair as alt
brush = alt.selection_interval()
base = alt.Chart(df).add_params(brush)
points = base.mark_circle().encode(
x="feature_x", y="feature_y",
color=alt.condition(brush, "label:N", alt.value("lightgray")),
)
bars = base.mark_bar().encode(x="count()", y="label:N").transform_filter(brush)
points | barsAltair’s discipline is its gift and its limit. Composed views and linked selections that would take pages of callback code elsewhere fall out of a few operators. But Vega-Lite historically materializes data into the spec, so very large datasets need aggregation, sampling, or an external data URL before they render comfortably.
64.2.4 2.4 Choosing among them
For fast exploratory plotting and presentation ready figures with minimal fuss, reach for Plotly. For applications where linked views, custom widgets, and server side computation are central, reach for Bokeh. For analysis where the visualization is a precise statement about data relationships and you value reproducible, composable specifications, reach for Altair. None is wrong; they optimize for different parts of the workflow, and mature teams often use more than one.
64.3 3. Building Exploratory Interactive Views
64.3.1 3.1 The exploratory mindset
Exploratory interaction is a conversation with your data, and the goal is to lower the cost of each question to near zero. You are looking for surprises: a cluster that should not exist, an outlier that breaks a trend, a subgroup where the model behaves differently. The right tool here is whatever lets you go from hypothesis to picture fastest, which usually means a notebook, a dataframe, and one of the express style APIs.
64.3.2 3.2 Exploring embeddings and high dimensional structure
ML systems generate embeddings everywhere, from word vectors to image features to user representations. A two dimensional projection from UMAP or t-SNE turns those vectors into something you can look at, and interactivity turns looking into investigating. Hovering reveals the source record, color encodes a label or cluster id, and zoom lets you separate a dense region into its constituents.
import plotly.express as px
fig = px.scatter(
proj_df, x="x", y="y", color="cluster",
hover_data={"text": True, "x": False, "y": False},
opacity=0.6,
)
fig.update_traces(marker=dict(size=4))The decisive feature is the tooltip. Seeing that a tight cluster of “neutral” sentiment points all contain sarcasm, or that an embedding outlier is a corrupted record, is the kind of discovery that static plots cannot surface because the identity of each point is invisible until you ask.
64.3.3 3.3 Linked views and brushing
The single most powerful exploratory technique is the linked selection, where selecting points in one view filters or highlights them in every other view. This lets you pose conditional questions directly. Brush the high error region of a residual plot, and watch which feature values, which classes, and which time periods light up across the other panels. Altair expresses this with a shared selection parameter, and Bokeh with a shared ColumnDataSource. Either way, the analyst is no longer reading separate charts but interrogating one dataset from several angles at once.
64.3.4 3.4 Interactive error analysis
Error analysis is where interactivity earns its keep for model builders. Build a confusion matrix where clicking a cell lists the misclassified examples in that cell, then displays each example with its true label, predicted label, and probability. Add a threshold slider and watch the matrix recompute live. Add filters for metadata such as image source, text length, or acquisition date, and you can isolate the conditions under which the model fails. This workflow converts an aggregate metric into a stack of concrete, inspectable mistakes, which is what actually drives the next modeling decision.
64.3.5 3.5 Keeping exploration honest
Exploratory freedom invites overfitting your eyes. When you slice a dataset many ways, some apparent pattern will look striking by chance. Treat exploratory findings as hypotheses, not conclusions, and confirm anything important on a held out split or with a statistical test before you act on it. Interactivity makes it trivial to manufacture a compelling but spurious story, so the discipline of confirmation matters more, not less.
64.4 4. Building Explanatory Interactive Views
64.4.1 4.1 From discovery to communication
Once you know the message, the design problem inverts. You are no longer minimizing the cost of asking questions; you are guiding a reader to a conclusion while letting them verify it. The best explanatory interactives are mostly static. They present a clear default view that carries the message on its own, then offer a small number of controls for the questions you anticipate the reader will have.
64.4.2 4.2 Annotation and guided defaults
A reader landing on your chart has none of your context, so the default state must do the heavy lifting. Title the chart with the takeaway rather than the variables. Annotate the specific points that matter, such as the threshold you chose and why. Set the initial zoom, the default filter, and the highlighted series so that the message is visible before any interaction. Every control you expose should answer a question a thoughtful reader would actually ask, not merely a question the tool makes easy to enable.
64.4.3 4.3 Explaining model behavior interactively
Interactive explanation shines for model behavior that is inherently conditional. A partial dependence explorer that lets a stakeholder pick a feature and see its modeled effect makes a black box legible. A threshold widget that shows precision, recall, and the resulting count of false positives and false negatives lets a product owner feel the tradeoff in business terms rather than reading it off a table. A local explanation view, where clicking a prediction reveals the feature attributions that drove it, turns “the model said no” into “the model said no because these three inputs pushed it over the line.” In each case the interactivity is tightly scoped to the one degree of freedom the audience cares about.
64.4.4 4.4 Performance and accessibility
Explanatory views reach people on varied hardware and networks, so weight matters. Aggregate or sample before rendering, prefer WebGL or canvas backends for large point counts, and lazy load below the fold. Accessibility matters too. Color must not be the only channel carrying meaning, hover only information must have a non hover fallback for keyboard and touch users, and text should remain legible at the sizes your audience will actually use. An interactive chart that excludes part of its audience has failed at the one job, communication, that justified building it.
64.5 5. Dashboards
When several linked views, controls, and live computations belong together as a tool rather than a single figure, you have a dashboard. Two Python frameworks dominate, and they differ in how much control they hand you.
64.5.1 5.1 Streamlit: scripts that become apps
Streamlit turns a plain Python script into a web app by rerunning the whole script top to bottom on every interaction. A widget call returns its current value, and you use that value as an ordinary variable. The mental model is delightfully simple: there are no callbacks, just a script that reads its inputs and draws its outputs.
import streamlit as st
threshold = st.slider("Decision threshold", 0.0, 1.0, 0.5)
preds = (scores >= threshold).astype(int)
st.metric("Precision", f"{precision(y, preds):.3f}")
st.metric("Recall", f"{recall(y, preds):.3f}")
st.plotly_chart(confusion_figure(y, preds))This model makes Streamlit the fastest way to wrap a model or analysis in a shareable interface, which is why it dominates internal ML demos and prototypes. The cost of rerunning everything is managed with caching decorators that memoize expensive steps such as loading data or running inference. Streamlit’s simplicity becomes a limit when you need fine grained layout control or complex stateful interactions that resist the rerun model.
64.5.2 5.2 Dash: declarative apps with explicit callbacks
Dash, built on Plotly and Flask, takes the callback approach. You declare a layout of components, each with an id, then write callback functions that name their inputs and outputs explicitly. Only the affected components recompute when an input changes, which gives precise control over what updates and when.
@app.callback(
Output("roc", "figure"),
Input("model-dropdown", "value"),
)
def update_roc(model_name):
return roc_figure(results[model_name])That explicitness is more verbose than Streamlit but scales better to large, multi page applications with intricate dependencies between controls. Dash is the stronger choice when a dashboard becomes a maintained product with many users rather than a quick internal tool, and when you need the layout and update behavior to be exactly as specified.
64.5.3 5.3 Choosing a framework
Pick Streamlit when speed of construction and a simple mental model matter most, which covers the majority of internal ML tooling and rapid prototypes. Pick Dash when you need production grade structure, granular update control, and multi page complexity. Both render the same underlying Plotly figures, so the visualization skills transfer; what differs is the application scaffolding around them.
64.5.4 5.4 Dashboards as ML interfaces
For ML teams, dashboards become the connective tissue between models and humans. A monitoring dashboard tracks prediction distributions, input drift, and live performance against a baseline, alerting when the world shifts away from the training data. An evaluation dashboard compares candidate models across slices so a reviewer can see not just which model wins on average but where each one wins and loses. A human in the loop labeling or review interface surfaces low confidence predictions for a person to correct, feeding the corrections back into the training set. In each case the dashboard is not a report but a workplace, and the same principles apply: a strong default view, scoped interactivity, and ruthless attention to load time.
64.6 6. Practical Guidance
64.6.1 6.1 A decision checklist
Before adding interactivity, ask whether a static figure conveys the message. If it does, stop. If the data has high cardinality, conditional structure, or parameter sensitivity that a single frame cannot hold, choose a tool by the dominant need: express plotting for exploration, linked views for relational analysis, and a dashboard framework when controls and computation must live together. Match polish to audience, keeping exploratory views rough and fast and explanatory views focused and annotated.
64.6.2 6.2 Performance habits that scale
Most interactive performance problems come from sending too much data to the browser. Aggregate or sample before plotting, since a reader cannot perceive a million overlapping points anyway. Use WebGL backends, available in Plotly and Bokeh, when you genuinely need tens of thousands of marks. Cache expensive computations so interaction recomputes only what changed. Precompute projections and summaries offline rather than on every page load. These habits keep an interactive view responsive, and responsiveness is what makes interaction feel like thinking rather than waiting.
64.6.3 6.3 Reproducibility and embedding
Interactive artifacts complicate reproducibility because they bundle data, code, and a JavaScript runtime whose versions can drift. Pin library versions, and for figures destined for a paper or a Quarto book, export a self contained HTML file or fall back to a static image so the artifact survives independent of a running server. A figure that renders today but breaks on a dependency bump next quarter has a short and frustrating life, so treat the export format as part of the design, not an afterthought.
64.7 References
- Plotly Python Open Source Graphing Library. https://plotly.com/python/
- Bokeh Documentation. https://docs.bokeh.org/en/latest/
- Vega-Altair: Declarative Visualization in Python. https://altair-viz.github.io/
- Vega-Lite: A Grammar of Interactive Graphics. https://vega.github.io/vega-lite/
- Streamlit Documentation. https://docs.streamlit.io/
- Dash Documentation by Plotly. https://dash.plotly.com/
- Satyanarayan, A., Moritz, D., Wongsuphasawat, K., Heer, J. Vega-Lite: A Grammar of Interactive Graphics. IEEE Transactions on Visualization and Computer Graphics, 2017. https://ieeexplore.ieee.org/document/7539624
- Wilke, C. O. Fundamentals of Data Visualization. https://clauswilke.com/dataviz/
- McInnes, L., Healy, J., Melville, J. UMAP: Uniform Manifold Approximation and Projection. https://arxiv.org/abs/1802.03426
- Wexler, J., et al. The What-If Tool: Interactive Probing of Machine Learning Models. https://arxiv.org/abs/1907.04135