15  A Preview of the Road Ahead

This chapter closes Part I (Foundations) by laying out a map of the territory that follows. You have now seen why artificial intelligence matters, how the field arrived at its present moment, and what conceptual vocabulary the rest of the book assumes. The purpose here is different. Rather than introduce new technical material, this chapter explains how the book is organized, why the parts are ordered the way they are, and how you can navigate them according to your background and goals. Think of it as a trailhead sign: a description of the trails, their difficulty, where they connect, and what view waits at the summit.

15.1 1. The Shape of the Book

“AI in Action” is built around a single conviction: that modern artificial intelligence is best understood as a layered stack, where each layer rests on the one beneath it and exposes a cleaner interface to the one above. You cannot reason carefully about transformers without linear algebra and probability. You cannot build a retrieval system without understanding embeddings, and you cannot understand embeddings without first understanding how data is represented. Agents, in turn, orchestrate language models, retrieval, and tools into systems that act in the world. The book follows this dependency structure deliberately.

The material is divided into eight parts.

15.1.1 1.1 Part I: Foundations

This part, which you are completing now, sets the intellectual and historical context. It motivates the field, surveys its trajectory, and establishes shared terminology. Its job is orientation rather than mastery. By design it is the part you can read fastest, because it asks you to absorb ideas rather than derive them.

15.1.2 1.2 Part II: Mathematical Foundations

Here the book becomes concrete and quantitative. We develop the linear algebra (vectors, matrices, norms, decompositions), the calculus of optimization (gradients, the chain rule, convexity), and the probability and statistics (distributions, expectation, estimation, information theory) that the rest of the book uses constantly. The treatment is practical: every concept is connected to a place where it later appears, so that the mathematics never feels like an arbitrary detour.

15.1.3 1.3 Part III: Data

No model is better than the data it learns from. This part covers data representation, collection, annotation, cleaning, handling of missing values, and the treatment of time and other structured signals. It also confronts the ethical and practical realities of privacy, consent, and bias. These chapters are where the abstractions of Part II meet the messy texture of the real world.

15.1.4 1.4 Part IV: Classical Machine Learning

Before neural networks dominated, a rich body of methods established the core ideas of learning from data: linear and logistic regression, decision trees, support vector machines, clustering, dimensionality reduction, and the discipline of evaluation. These methods remain the right tool for many problems, and they teach the concepts (overfitting, regularization, the bias and variance tradeoff, cross validation) that carry forward into deep learning.

15.1.5 1.5 Part V: Neural Networks

This part builds the deep learning toolkit from first principles: the perceptron, the multilayer network, backpropagation, and the practical machinery of training (initialization, optimizers, normalization, regularization). It then surveys the major architectural families, including convolutional networks for images and recurrent networks for sequences. By the end you understand not only how these models work but why they are trained the way they are.

15.1.6 1.6 Part VI: The Transformer and Language Models

At the center of the book sits the transformer, the architecture that reshaped the field. We derive attention carefully, build up the full encoder and decoder stacks, and then trace the path from pretraining to large language models, instruction tuning, and alignment. This part connects the mathematics, the data practices, and the neural network foundations into the systems that power today’s most capable AI.

15.1.7 1.7 Part VII: Retrieval and Knowledge Graphs

Language models are powerful but bounded by what they internalized during training. This part shows how to extend them with external memory: vector search, retrieval augmented generation, and knowledge graphs that encode structured relationships. Here embeddings become infrastructure, and the reader learns how to ground a model’s outputs in verifiable sources.

15.1.8 1.8 Part VIII: Agents and Applications

The final part moves from models to systems that act. We cover agent architectures, planning, tool use, multi-agent coordination, and the interoperability protocols that let agents communicate. We close with applications that integrate everything: building, evaluating, and deploying real systems responsibly. This is where the book delivers on its title, turning understanding into action.

15.2 2. How the Parts Build on Each Other

The ordering is not arbitrary, and it is worth seeing the dependencies explicitly so that you can decide what to read and what to defer.

15.2.1 2.1 The Vertical Spine

There is a vertical spine running from mathematics upward: Part II supplies the language of vectors, gradients, and probability; Part IV uses that language to define learning and evaluation; Part V generalizes learning to deep networks; Part VI specializes deep networks into transformers and language models. Each step assumes fluency with the previous one. A reader who skips the gradient material in Part II will find backpropagation in Part V opaque, and a reader who skips backpropagation will find the training dynamics of language models in Part VI mysterious.

15.2.2 2.2 The Horizontal Supports

Two parts act as horizontal supports that touch every layer. Part III (Data) is relevant whether you are fitting a logistic regression or pretraining a transformer, because the questions of representation, quality, and bias never go away. Part VII (Retrieval and Knowledge Graphs) draws on embeddings, which first appear in Part V and mature in Part VI, but it serves the application goals of Part VIII. You can think of data and retrieval as the connective tissue between the vertical spine and the systems built on top of it.

15.2.3 2.3 The Convergence Point

Part VIII is where the threads converge. An agent that answers a customer’s question may call a language model (Part VI), retrieve grounding documents (Part VII), apply a classifier to route the request (Part IV), and depend on clean, well represented data throughout (Part III). The applications are deliberately placed last because they presuppose the rest. Reaching them is the payoff for the climb.

15.3 3. Suggested Learning Paths

Readers come to this book with different backgrounds and different goals. The chapters are written to support a linear reading, but few readers should feel obligated to read every section in order. Below are three paths, each tuned to a kind of reader. Treat them as starting points to adapt, not prescriptions.

15.3.1 3.1 The Practitioner

Suppose you are an engineer who needs to ship a working system soon, and who is comfortable treating some mathematics as a tool to be trusted rather than derived. Your path emphasizes building and evaluating.

Begin with Part I for context, then read Part III (Data) carefully, because data problems will dominate your real work. Skim Part II, returning to specific topics (matrix multiplication, the softmax, gradient descent) only when a later chapter sends you back. Read the practical chapters of Part V on training and the major architectures, then go deep on Part VI, Part VII, and Part VIII, which contain most of what you will use day to day. Run every code example. Your aim is fluency with the systems, and you can acquire the underlying theory opportunistically as questions arise.

15.3.2 3.2 The Researcher

Suppose you intend to extend the field rather than only apply it, and you want to understand methods well enough to question them. Your path emphasizes derivation and the boundaries of current knowledge.

Read Part II thoroughly and do the proofs and derivations; they are the foundation of any original contribution. Work through Part IV completely, because the classical methods clarify ideas (generalization, regularization, the structure of optimization) that deep learning inherits but rarely re-derives. Study Part V and Part VI line by line, paying attention to the design choices and the alternatives that were not taken. In Part VII and Part VIII, read with a critical eye toward open problems: evaluation, robustness, interpretability, and coordination remain active research frontiers. Follow the references at the end of each chapter into the primary literature.

15.3.3 3.3 The Student

Suppose you are learning the field systematically, perhaps alongside a course, and you have time to build understanding from the ground up. Your path is the linear one, and it is the path the book was structured to support.

Read the parts in order. Do the exercises in each chapter before moving on, because the concepts compound and gaps left early become obstacles later. Resist the temptation to jump ahead to the exciting material in Part VI; the excitement is far greater when you can see exactly why attention works and where it came from. Keep a notebook of questions, and revisit it as you advance, since many early puzzles resolve themselves once you have the later machinery. The reward for patience is a genuinely connected mental model rather than a collection of disconnected recipes.

15.3.4 3.4 Choosing and Mixing Paths

These paths are not mutually exclusive. A practitioner who hits a wall will often benefit from briefly adopting the researcher’s path through a single topic. A student preparing for a job may add the practitioner’s emphasis on running code. The honest advice is to read with a purpose in mind, to be willing to backtrack, and to use the dependency map in Section 2 to decide when a detour into earlier material is worth the time.

15.4 4. How to Use the Code and Exercises

This book is called “AI in Action” because understanding AI requires doing AI. Every concept that can be made concrete is accompanied by runnable code, and most chapters end with exercises that ask you to extend, break, or apply that code.

15.4.1 4.1 The Role of Code

The code is not decoration, and it is not a finished library to be imported and forgotten. It is written to be read. Implementations favor clarity over performance, so that you can trace exactly how a gradient flows or how an attention score is computed. When you encounter a code listing, the most valuable thing you can do is run it, change a number, and predict the result before you observe it. The gap between your prediction and the outcome is where learning happens.

15.4.2 4.2 Kinds of Exercises

The exercises fall into three rough categories. Some are confirmatory: they ask you to verify a claim made in the text, such as reproducing a derivation numerically. Some are exploratory: they ask you to vary a setting (a learning rate, a dataset size, a model depth) and characterize the effect. Some are constructive: they ask you to build something new from the components introduced in the chapter. The categories grow in difficulty and in value. The constructive exercises, though the most demanding, are the ones that most reliably turn knowledge into skill.

15.4.3 4.3 A Suggested Practice

A reliable habit is to read a chapter once for the ideas, then return with an editor open and reconstruct the key code from a blank file, consulting the text only when stuck. This active recall is uncomfortable and slow, and it is far more effective than passive rereading. Where a chapter provides a dataset, spend a few minutes simply looking at the data before modeling it, because intuition about the input is worth more than any single technique applied to it.

15.5 5. What You Will Be Able to Do by the End

It is worth being concrete about the destination, because a clear goal makes the climb easier.

15.5.1 5.1 Conceptual Fluency

By the end of the book you will be able to read a contemporary research paper or system description and follow its argument. You will recognize the mathematical objects, understand the design choices, and locate the work within the broader stack. The vocabulary that may feel like jargon now will become a language you think in.

15.5.2 5.2 Practical Capability

You will be able to take a real problem, frame it as a learning or reasoning task, choose appropriate methods, prepare and inspect the data, implement and train a model or assemble a system, and evaluate it honestly. You will know how to extend a language model with retrieval, how to give an agent tools and let it plan, and how to tell whether the result actually works. These are the capabilities that turn a reader into a builder.

15.5.3 5.3 Critical Judgment

Perhaps most importantly, you will be able to judge. You will know the failure modes of these systems, the ways evaluation can mislead, and the ethical stakes of deploying models that affect people. You will be equipped to ask whether a given approach is appropriate, what it might be hiding, and what could go wrong. In a field that moves quickly and is prone to hype, this judgment is the most durable thing the book can offer.

15.6 6. A Note on the Pace of the Field

Artificial intelligence advances quickly, and any book risks describing a moving target. The response of this book is to emphasize the foundations that change slowly: the mathematics, the principles of learning, the structure of the transformer, the logic of retrieval, and the architecture of agents. Specific models and benchmarks will be superseded, but the conceptual scaffolding will remain. If you understand why attention works, you will understand its successors faster. If you understand how to evaluate a system honestly, no new model will fool you for long. The goal is not to memorize the present state of the art but to develop the understanding that lets you absorb whatever comes next.

With the map in hand, you are ready to begin the climb. Part II awaits, where the abstract motivation of these opening chapters gives way to the precise mathematics on which everything else is built.

15.7 Further Reading

  • Bishop, C. M., and Bishop, H. (2024). Deep Learning: Foundations and Concepts. Springer. A modern, mathematically grounded survey that complements the spine of this book.
  • Goodfellow, I., Bengio, Y., and Courville, A. (2016). Deep Learning. MIT Press. The standard reference for the neural network foundations developed in Part V.
  • Murphy, K. P. (2022, 2023). Probabilistic Machine Learning (two volumes). MIT Press. A thorough treatment of the probabilistic perspective that underlies Parts II and IV.
  • Jurafsky, D., and Martin, J. H. (2025 draft). Speech and Language Processing (3rd ed.). A comprehensive companion for the language modeling material of Part VI.