7 Project Management

7.1 Introduction

Python project management has evolved significantly over the years, with tools like pip, virtualenv, conda, and poetry each attempting to solve different aspects of dependency management and environment isolation. In 2024, a tool called uv emerged from Astral, the team behind ruff, aiming to consolidate Python package management with a focus on speed and simplicity. Written in Rust, uv combines the functionality of several previously separate tools into a single, cohesive experience.

At its heart, project management answers three coupled questions. First, which versions of which packages should be installed so that every package’s stated requirements are simultaneously satisfied. Second, how to record that answer so it can be reproduced exactly on another machine months or years later. Third, where to place the resulting packages so that one project’s choices do not corrupt another’s. The first question is a constraint-satisfaction problem (dependency resolution), the second is the lock-file mechanism, and the third is environment isolation. The rest of this chapter treats each in turn, then layers on the surrounding toolchain (linting, type checking, testing, and documentation) that turns a reproducible environment into a disciplined workflow.

For machine learning and AI work, where managing deep dependency graphs and ensuring bit-for-bit reproducibility across environments is critical, getting these three questions right is the difference between an experiment that replicates and one that silently drifts. The chapter is organized to be read linearly the first time and used as a reference thereafter, moving from installation, through the formal resolution problem, to concrete ML workflows and the full development stack.

All of the tooling discussed here is mature, free, and open source. uv, ruff, pytest, mypy, and quarto are permissively licensed and develop in the open, so nothing in this workflow depends on a paid service or a proprietary registry.

7.2 Why `uv` Matters for Machine Learning and AI

Before diving into the technical details, it’s worth understanding why uv is particularly valuable for machine learning and AI development:

Reproducibility: ML models must be reproducible. With uv, you can lock exact versions of all dependencies, ensuring that your trained neural network or fine-tuned LLM produces identical results when deployed or shared with collaborators months or years later.

Speed: Installing ML frameworks like PyTorch, TensorFlow, or transformers with all their dependencies is notoriously slow. uv is 10-100x faster than pip, meaning you spend less time waiting for environments to set up and more time training models.

Simplicity: Modern ML projects require complex dependency graphs, deep learning frameworks, data processing libraries, visualization tools, and more. uv simplifies this complexity with intuitive commands and clear error messages, reducing cognitive overhead.

Isolation: Different ML projects often require different versions of frameworks. uv makes it trivial to create isolated environments, preventing version conflicts between your PyTorch 2.0 computer vision project and your TensorFlow 2.15 NLP project.

7.3 Installation

Installing uv is straightforward. The recommended method varies by operating system:

7.3.1 macOS and Linux

On Unix-based systems, the simplest installation method uses the official installer script:

curl -LsSf https://astral.sh/uv/install.sh | sh

This downloads and installs uv to your system, adding it to your PATH automatically. After installation, restart your terminal or source your shell configuration file:

source $HOME/.cargo/env

7.3.2 Windows

On Windows, you can use PowerShell:

powershell -c "irm https://astral.sh/uv/install.ps1 | iex"

Alternatively, if you have Python already installed, you can use pip:

pip install uv

However, the standalone installer is preferred as it doesn’t depend on an existing Python installation.

7.3.3 Verifying Installation

After installation, verify that uv is working correctly:

uv --version

You should see output showing the installed version, such as:

uv 0.4.18 (Homebrew 2024-11-05)

7.4 Understanding `uv`’s Architecture

To use uv effectively, it helps to understand its core concepts and how it differs from traditional Python tools.

7.4.1 The Tool Chain Analogy

Think of uv as a complete tool chain rather than a single tool. It replaces multiple tools in the Python ecosystem:

pip: Package installation
pip-tools: Dependency resolution and locking
virtualenv/venv: Environment creation
pyenv: Python version management
pipx: Tool installation

Where you previously needed to coordinate these separate tools, uv provides a unified interface. This integration eliminates common pain points like ensuring your virtual environment uses the correct Python version or manually compiling requirements files.

7.4.2 Key Design Principles

uv is built on several core principles:

Speed First: Written in Rust and using parallel downloads, uv prioritizes performance without sacrificing correctness.
Correctness: uv uses a proper dependency resolver that can handle complex version constraints, unlike pip’s historical resolver issues.
Batteries Included: Unlike tools that require plugins or additional configuration, uv works out of the box for common workflows.
Standards Compliant: uv follows Python packaging standards (PEP 517, PEP 621, etc.), ensuring compatibility with the broader ecosystem.

7.5 The Dependency Resolution Problem

The single hardest thing a package manager does is resolution: choosing one concrete version for every package such that every constraint is satisfied at once. The chapter’s claims that uv uses “a proper resolver” and that pip historically had “resolver issues” only make sense once we state the problem precisely. This section does that, then explains why the problem is genuinely hard and how modern resolvers cope.

7.5.1 A precise statement

Let $P = \{p_1, \dots, p_n\}$ be the set of packages reachable from your project’s direct requirements. For each package $p_i$ let $V_i = \{v_{i,1}, v_{i,2}, \dots\}$ be the finite set of versions available on the index, plus a distinguished symbol $\bot$ meaning “not installed.” A solution is an assignment

\[ \sigma : P \to V_i \cup \{\bot\}, \qquad \sigma(p_i) \in V_i \cup \{\bot\}, \]

that selects at most one version of each package. Dependencies are constraints between these choices. A typical requirement, “package a at version $v$ depends on numpy >= 1.24, < 2.0,” becomes the logical implication

\[ \bigl(\sigma(\texttt{a}) = v\bigr) \;\Rightarrow\; \sigma(\texttt{numpy}) \in [1.24,\, 2.0). \]

The project’s own direct dependencies are unconditional constraints of the same form. A solution is valid when every such implication holds and every selected version’s own dependencies are themselves satisfied (the constraint set is closed under reachability). Resolution is the search for a valid $\sigma$ in which every package required by the project is assigned a real version rather than $\bot$.

Definition: the version-selection constraint problem

Given packages with versioned, conditional dependency constraints, decide whether a valid assignment $\sigma$ exists, and if so return one. Optional refinements ask for the assignment that is newest under some preference order, for example lexicographically maximal versions, which is what users usually want.

7.5.2 Why it is hard

Each package’s “which version” choice is a discrete variable, and the implications above are exactly propositional clauses. Encode “package $p$ is at version $v$” as a Boolean variable $x_{p,v}$, add clauses enforcing that at most one version per package is chosen, and translate each dependency implication into a disjunction. The result is a Boolean satisfiability (SAT) instance, and conversely SAT instances can be encoded as dependency graphs. Boolean satisfiability is the canonical NP-complete problem (Cook 1971), so version selection inherits that worst-case hardness: no known algorithm solves every instance in time polynomial in the number of packages and versions. The same reduction underlies the formal study of package managers in the Linux distribution world (Abate et al. 2015), where the problem was analyzed long before it reached the Python ecosystem.

Two ecosystem realities make the worst case bite in practice:

Diamond dependencies. Project A depends on B and C; both B and C depend on D, but with overlapping-or-disjoint version ranges. The resolver must find a version of D acceptable to both, and the feasible window can be empty.
Backtracking blowups. A naive resolver picks versions greedily, discovers a conflict deep in the tree, and must undo many choices. Without learning from the conflict it can revisit the same dead end repeatedly, which is the behavior older pip resolvers were criticized for.

7.5.3 How modern resolvers cope

uv (like Dart’s pub and the poetry resolver) uses the PubGrub algorithm, an adaptation of conflict-driven clause learning (CDCL) from the SAT-solving literature. The key idea is that when the search hits an incompatibility, it does not merely backtrack one step. It derives a new, more general incompatibility (a learned clause) that summarizes why the dead end occurred, then uses that clause to prune the rest of the search and to explain the failure to the user. The clear “Because package-a depends on numpy<2.0 and package-b depends on numpy>=2.0, …” messages shown later in this chapter are these learned incompatibilities rendered as English.

Three properties matter in practice and are worth stating as expectations rather than guarantees of speed:

Completeness. If a valid assignment exists, the resolver finds one; if none exists, it reports a genuine conflict rather than installing a broken set. This is the property pip’s historical resolver lacked.
Determinism. Given the same inputs (the same pyproject.toml, the same index state), resolution returns the same assignment. Determinism is what makes the lock file meaningful: it is the serialized $\sigma$.
Best-effort optimality. Among valid assignments the resolver prefers the newest versions consistent with your constraints, so you get bug fixes “for free” without manual pinning.

flowchart TD
    start["Direct requirements from pyproject.toml"]
    start --> pick["Pick a candidate version for the next package"]
    pick --> check["Check it against all active constraints"]
    check -->|"compatible"| more{"More packages to assign"}
    check -->|"conflict"| learn["Derive a learned incompatibility"]
    learn --> back["Backjump past the real cause"]
    back --> pick
    more -->|"yes"| pick
    more -->|"no"| done["Valid assignment, write uv.lock"]

7.5.4 Worked example: an unsatisfiable diamond

Suppose your project requires two packages with the following published constraints.

Package	Version	Requires
`vision`	1.0	`numpy >= 1.24, < 2.0`
`fastmath`	3.0	`numpy >= 2.0`

There is no single numpy version in both $[1.24, 2.0)$ and $[2.0, \infty)$, since those intervals are disjoint. The resolver assigns vision==1.0, propagates numpy < 2.0, then tries fastmath==3.0, which propagates numpy >= 2.0. The two clauses about numpy are jointly unsatisfiable, so the resolver records the incompatibility “vision==1.0 and fastmath==3.0 cannot coexist” and reports it. The fix is not a resolver flag but a real engineering decision: relax a bound (does fastmath have an older release that accepts numpy < 2.0?), drop one dependency, or wait for an upstream release that widens its range. The resolver’s value is that it tells you exactly which two requirements collide, instead of installing whichever it reached last and leaving you to debug an import error at runtime.

7.5.5 When to worry, and when not to

For the overwhelming majority of projects, resolution completes in well under a second and you never think about any of this. The theory matters at the moments when it does not: when a uv sync suddenly cannot find a solution, when adding one package forces a cascade of downgrades, or when CI resolves differently from your laptop. In those moments the right mental model is the one above. The resolver is searching a constraint problem, conflicts are facts about your dependency graph rather than bugs in the tool, and the lock file is the artifact that freezes a known-good solution so the search never has to run again on the deployment path.

7.6 Basic Project Workflow

Let’s walk through creating and managing a Python project with uv. We’ll build a small machine learning project to demonstrate practical usage.

7.6.1 Creating a New Project

To create a new project, use the uv init command:

uv init image-classifier
cd image-classifier

This creates a new directory with a basic project structure:

flowchart TD
    root["image-classifier/"]
    root --> v[".python-version"]
    root --> readme["README.md"]
    root --> pyproject["pyproject.toml"]
    root --> hello["hello.py"]

Let’s examine each file:

.python-version: Specifies the Python version for this project. uv uses this to automatically download and use the correct Python version.

pyproject.toml: The modern Python project configuration file, following PEP 621. This is where dependencies, metadata, and build configuration live.

hello.py: A simple starter script that uv creates as an example.

7.6.2 Understanding `pyproject.toml`

The pyproject.toml file is central to modern Python projects. Here’s what uv init generates:

[project]
name = "image-classifier"
version = "0.1.0"
description = "Add your description here"
readme = "README.md"
requires-python = ">=3.12"
dependencies = []

[build-system]
requires = ["hatchling"]
build-backend = "hatchling.build"

Let’s break this down:

[project]: Metadata about your project, following PEP 621
name: The package name (important if you plan to distribute it)
version: Semantic version number
requires-python: Minimum Python version requirement
dependencies: List of required packages (initially empty)
[build-system]: Configuration for building the package (uses hatchling by default)

For an ML project, you might not care about building a distributable package, but the structure remains useful for dependency management.

7.6.3 Adding Dependencies

There are two main ways to add dependencies: directly editing pyproject.toml or using the command line.

7.6.3.1 Method 1: Command Line (Recommended)

To add a package, use uv add:

uv add torch torchvision numpy pillow matplotlib

This does several things automatically:

Resolves the dependencies and their sub-dependencies
Updates pyproject.toml with the new dependencies
Creates or updates uv.lock with exact versions
Installs the packages in your project environment

After running this command, your pyproject.toml will show:

dependencies = [
    "torch",
    "torchvision", 
    "numpy",
    "pillow",
    "matplotlib",
]

And uv has created a uv.lock file that pins exact versions of these packages and all their dependencies.

7.6.3.2 Method 2: Manual Editing

You can also edit pyproject.toml directly:

dependencies = [
    "torch>=2.0.0",
    "torchvision>=0.15.0",
    "numpy>=1.24.0",
    "pillow>=10.0.0",
    "matplotlib>=3.7.0",
]

Then synchronize your environment:

uv sync

This reads the pyproject.toml, resolves dependencies, and installs everything.

7.6.4 Version Constraints

When specifying dependencies, you can use various version constraint operators:

dependencies = [
    "torch",                        # Any version (not recommended)
    "torch>=2.0.0",                # Greater than or equal to 2.0.0
    "torch>=2.0.0,<3.0.0",         # Between 2.0.0 and 3.0.0
    "torch~=2.0.0",                # Compatible release (2.0.x)
    "torch==2.0.1",                # Exact version (very restrictive)
]

For ML projects, I recommend using lower bounds with conservative upper bounds:

dependencies = [
    "torch>=2.0.0,<3.0.0",
    "numpy>=1.24.0,<2.0.0",
    "transformers>=4.30.0,<5.0.0",
]

This gives you bug fixes and minor updates while protecting against breaking changes. This is especially important for deep learning frameworks where major versions can introduce significant API changes.

7.6.5 The Lock File: `uv.lock`

The uv.lock file is critical for reproducibility. It contains the exact resolved versions of every package in your dependency tree. Here’s a snippet:

[[package]]
name = "torch"
version = "2.3.1"
source = { registry = "https://pypi.org/simple" }
dependencies = [
    { name = "filelock" },
    { name = "typing-extensions" },
    { name = "sympy" },
    { name = "networkx" },
    { name = "jinja2" },
    { name = "fsspec" },
]
wheels = [
    { url = "https://files.pythonhosted.org/packages/...", hash = "sha256:..." },
]

This tells us:

Exactly which version of PyTorch is installed (2.3.1)
Where it came from (PyPI)
Its direct dependencies
The specific wheel file and its hash for verification

Important: You should commit uv.lock to version control. This ensures anyone cloning your repository can recreate your exact environment, which is critical when sharing trained models or reproducing experimental results.

7.6.6 What the lock file does and does not guarantee

It helps to be precise about the kind of reproducibility a lock file buys, because it is often overstated. The lock file is the serialized resolution $\sigma$ from the previous section, together with a cryptographic hash of every artifact. It pins three things and leaves a fourth to you.

Version reproducibility. Every transitive package is fixed to one exact version. Two uv sync runs from the same lock file install the same version numbers. This is the property that protects against silent drift when an upstream package publishes a new release.
Artifact integrity. Each wheel is recorded with a SHA-256 hash. On install, uv verifies the downloaded bytes against the recorded hash, so a corrupted download or a tampered mirror is detected rather than installed. Formally, if $h$ is the recorded hash and $\hat{h}$ is the hash of the bytes actually fetched, installation proceeds only when $h = \hat{h}$.
Source provenance. The index URL and, for Git or local dependencies, the exact commit or path are recorded, so you know precisely where each artifact came from.
What it does not pin: the platform. A lock file selects wheels for a target platform and Python version. The same lock can resolve to different wheels on Linux versus macOS, or on CPython 3.11 versus 3.12, because those are genuinely different binaries (a manylinux wheel is not a macOS wheel). For deep learning this is the usual source of “works on my machine” surprises: a CUDA-enabled torch wheel and a CPU-only one are different artifacts. The lock file makes the choice deterministic per platform; it does not make a GPU appear on a CPU-only host. The later sections on CUDA variants address this directly.

A useful way to think about it: the lock file removes nondeterminism from the resolver and from the network, leaving only the deliberate, declared variation across platforms and hardware.

7.7 Running Python with `uv`

7.7.1 The `uv run` Command

Instead of activating a virtual environment and then running Python, uv provides the uv run command:

uv run python script.py

This automatically:

Ensures the project environment exists
Installs any missing dependencies
Runs the Python script in that environment

You can also run Python interactively:

uv run python

Or execute inline code:

uv run python -c "import pandas; print(pandas.__version__)"

7.7.2 Running Installed Tools

For tools like jupyter, pytest, or black, use uv run as well:

uv run jupyter notebook
uv run pytest tests/
uv run black src/

This is cleaner than traditional workflows where you’d activate an environment first.

7.8 Development Dependencies

ML projects often need development tools (testing, formatting, documentation, experiment tracking) that aren’t required for running the actual training or inference. uv supports optional dependency groups for this.

7.8.1 Adding Development Dependencies

Add development dependencies with the --dev flag:

uv add --dev pytest black mypy jupyter tensorboard wandb

This updates pyproject.toml with a new section:

[project.optional-dependencies]
dev = [
    "pytest",
    "black",
    "mypy",
    "jupyter",
    "tensorboard",
    "wandb",
]

Or you can create custom groups:

uv add --optional gpu torch-cuda

[project.optional-dependencies]
gpu = [
    "torch-cuda",
]

7.8.2 Installing Optional Dependencies

To install the project with development dependencies:

uv sync --extra dev

Or all optional groups:

uv sync --all-extras

7.9 Python Version Management

One of uv’s most powerful features is built-in Python version management, eliminating the need for pyenv or similar tools.

7.9.1 Specifying Python Versions

You can specify the Python version in multiple ways:

1. Project-level (recommended):

uv python pin 3.12

This creates a .python-version file:

3.12

2. In pyproject.toml:

requires-python = ">=3.11"

7.9.2 Installing Python Versions

If the required Python version isn’t available, uv can install it:

uv python install 3.12

This downloads and installs Python 3.12, managed by uv. You can install multiple versions:

uv python install 3.11 3.12 3.13

7.9.3 Listing Available Pythons

To see installed Python versions:

uv python list

Output might look like:

cpython-3.13.0-macos-aarch64-none    /Users/mike/.local/share/uv/python/cpython-3.13.0-macos-aarch64-none/bin/python3
cpython-3.12.7-macos-aarch64-none    /Users/mike/.local/share/uv/python/cpython-3.12.7-macos-aarch64-none/bin/python3
cpython-3.11.10-macos-aarch64-none   /Users/mike/.local/share/uv/python/cpython-3.11.10-macos-aarch64-none/bin/python3

7.9.4 Using Specific Python Versions

For a one-off command with a specific Python version:

uv run --python 3.11 python script.py

Or create a project with a specific version:

uv init --python 3.11 my-project

7.10 Advanced Dependency Management

7.10.1 Installing from Git Repositories

Sometimes you need bleeding-edge code or a forked version of a package. uv makes this straightforward:

uv add "package @ git+https://github.com/user/package.git"

For a specific branch:

uv add "package @ git+https://github.com/user/package.git@dev-branch"

For a specific commit:

uv add "package @ git+https://github.com/user/package.git@abc123"

In pyproject.toml, this appears as:

dependencies = [
    "package @ git+https://github.com/user/package.git@abc123",
]

7.10.2 Installing from Local Paths

For packages you’re developing locally:

uv add --editable ../my-local-package

Or in pyproject.toml:

dependencies = [
    "my-package @ file:///path/to/my-package",
]

The --editable flag (or -e) makes the package editable, so changes to the source are immediately reflected without reinstalling.

7.10.3 Platform-Specific Dependencies

Some packages are only needed on certain platforms. You can specify this in pyproject.toml:

dependencies = [
    "pandas",
    "pywin32; platform_system == 'Windows'",
    "python-magic; platform_system != 'Windows'",
]

7.10.4 Resolving Dependency Conflicts

When dependencies conflict, uv provides clear error messages. These messages are the learned incompatibilities from the resolution algorithm discussed earlier rendered in English: the resolver is reporting a fact about your dependency graph, not failing arbitrarily. For example, if package A requires numpy<2.0 but package B requires numpy>=2.0, uv will report:

error: No solution found when resolving dependencies:
  Because package-a depends on numpy<2.0
    and package-b depends on numpy>=2.0,
    we can conclude that package-a and package-b are incompatible.

To resolve conflicts:

Check if updates fix it: Update packages with uv sync --upgrade
Use version constraints: Manually specify compatible versions
Report upstream: File issues with package maintainers
Fork if necessary: Maintain a patched version

7.11 Scripts and Entry Points

For distributable packages, you can define console scripts in pyproject.toml:

[project.scripts]
causal-analyze = "causal_analysis.main:cli"
did-estimate = "causal_analysis.did:main"

Then run them with:

uv run causal-analyze data.csv

This is useful for creating reproducible analysis pipelines that others can run.

7.12 Working with Jupyter Notebooks

Jupyter notebooks are common in research. Here’s how to use them with uv:

7.12.1 Adding Jupyter

uv add --dev jupyter ipykernel

7.12.2 Running Jupyter

uv run jupyter notebook

Or for JupyterLab:

uv run jupyter lab

7.12.3 Creating a Kernel

To make your project available as a Jupyter kernel:

uv run python -m ipykernel install --user --name=causal-analysis

Now you can select the “causal-analysis” kernel in any Jupyter notebook.

7.12.4 Inline Scripts in Notebooks

uv supports inline script metadata in Python files and notebooks. At the top of a script, you can specify dependencies:

# /// script
# requires-python = ">=3.11"
# dependencies = [
#   "torch",
#   "torchvision",
#   "matplotlib",
# ]
# ///

import torch
import torchvision
import matplotlib.pyplot as plt

# Your model training or inference here
model = torchvision.models.resnet18(pretrained=True)

Then run it with:

uv run script.py

uv automatically creates a temporary environment with the specified dependencies. This is perfect for one-off experiments or sharing standalone training scripts.

7.13 Reproducible ML Workflows

Let’s put everything together with a complete workflow for a machine learning project.

7.13.1 Project Structure

A well-organized ML project might look like:

flowchart TD
    root["image-classifier/"]
    root --> pyver[".python-version, Python 3.11"]
    root --> pyproject["pyproject.toml, config"]
    root --> lock["uv.lock, locked deps"]
    root --> readme["README.md, docs"]
    root --> gitignore[".gitignore"]
    root --> data["data"]
    root --> models["models"]
    root --> notebooks["notebooks"]
    root --> src["src"]
    root --> tests["tests"]
    root --> scripts["scripts"]
    root --> experiments["experiments, logs and results"]
    data --> raw["raw, original datasets"]
    data --> processed["processed, preprocessed data"]
    data --> splits["splits, train val test"]
    models --> checkpoints["checkpoints/"]
    models --> configs["configs/"]
    notebooks --> eda["01-eda.ipynb"]
    notebooks --> prep["02-preprocessing.ipynb"]
    notebooks --> trainnb["03-training.ipynb"]
    src --> classifier["classifier/"]
    classifier --> init["__init__.py"]
    classifier --> datapy["data.py"]
    classifier --> modelspy["models.py"]
    classifier --> trainpy["train.py"]
    classifier --> evalpy["evaluate.py"]
    tests --> tdata["test_data.py"]
    tests --> tmodels["test_models.py"]
    tests --> ttrain["test_train.py"]
    scripts --> strain["train.py"]
    scripts --> seval["evaluate.py"]
    experiments --> exp001["exp_001/"]

7.13.2 Complete `pyproject.toml`

Here’s a comprehensive configuration for an image classification project:

[project]
name = "image-classifier"
version = "0.1.0"
description = "Deep learning image classifier using PyTorch"
readme = "README.md"
requires-python = ">=3.11"
authors = [
    {name = "Mike", email = "mike@email.com"}
]
license = {text = "MIT"}

dependencies = [
    "torch>=2.0.0,<3.0.0",
    "torchvision>=0.15.0,<1.0.0",
    "numpy>=1.24.0,<2.0.0",
    "pillow>=10.0.0",
    "matplotlib>=3.7.0",
    "scikit-learn>=1.3.0",
    "tqdm>=4.65.0",
    "pyyaml>=6.0",
    "tensorboard>=2.13.0",
]

[project.optional-dependencies]
dev = [
    "jupyter>=1.0.0",
    "ipykernel>=6.25.0",
    "pytest>=7.4.0",
    "pytest-cov>=4.1.0",
    "black>=23.0.0",
    "ruff>=0.1.0",
    "mypy>=1.5.0",
]

experiment-tracking = [
    "wandb>=0.15.0",
    "mlflow>=2.5.0",
]

gpu = [
    "torch-cuda>=2.0.0",
]

[project.scripts]
train = "classifier.train:main"
evaluate = "classifier.evaluate:main"
infer = "classifier.inference:main"

[build-system]
requires = ["hatchling"]
build-backend = "hatchling.build"

[tool.pytest.ini_options]
testpaths = ["tests"]
python_files = ["test_*.py"]

[tool.black]
line-length = 100
target-version = ['py311']

[tool.ruff]
line-length = 100
target-version = "py311"

7.14 The Modern Python Toolchain for ML

Just as R developers rely on devtools, usethis, styler, lintr, and testthat, Python ML developers need a comprehensive toolchain. For a Python projects, we recommend:

uv: Package and environment management
Ruff: Code formatting and linting
mypy: Static type checking
pytest: Unit testing framework
Quarto: Documentation and reproducible reports

Think of it as: uv = renv + pak + devtools, Ruff = styler + lintr, pytest = testthat, mypy = (no direct R equivalent).

All these tools are installed as development dependencies and configured through pyproject.toml, creating a unified, reproducible development environment.

7.15 Ruff: Fast Formatting and Linting

Ruff is a blazingly fast linter and formatter written in Rust that has revolutionized Python development. It replaces multiple legacy tools (Black, isort, Flake8, pyupgrade, autoflake) with a single, consistent interface that’s 10-100x faster.

7.15.1 Why Ruff Matters for ML

In ML projects, code quality is crucial:

Readability: ML code involves complex transformations and mathematical operations that must be clear
Consistency: Team collaboration requires consistent style
Correctness: Linting catches bugs like unused imports, undefined variables, and common mistakes
Speed: Fast feedback loops keep you in flow state

7.15.2 Installation

Add Ruff as a development dependency:

uv add --dev ruff

7.15.3 Code Formatting

Format your entire codebase:

uv run ruff format

Or format specific files:

uv run ruff format src/classifier/train.py

Or using uvx (without installation):

uvx ruff format

Ruff’s formatter:

Enforces consistent style: Similar to Black, with opinionated defaults
Sorts imports automatically: Organizes imports into standard library, third-party, and local
Removes trailing whitespace: Cleans up formatting inconsistencies
Ensures consistent line lengths: Makes code readable on all screens
Handles string quotes: Normalizes quote usage across your codebase

Example transformation:

# Before formatting
import torch
import numpy as np
from  pathlib   import Path
import sys
from   torch import nn

def train_model(model,data,epochs=100):
    for epoch in range(  epochs ):
        loss=model.train_step( data )
        print( f"Epoch {epoch}: {loss}" )

After ruff format:

# After formatting
import sys
from pathlib import Path

import numpy as np
import torch
from torch import nn


def train_model(model, data, epochs=100):
    for epoch in range(epochs):
        loss = model.train_step(data)
        print(f"Epoch {epoch}: {loss}")

7.15.4 Linting

Check for linting issues:

uv run ruff check

Fix auto-fixable issues:

uv run ruff check --fix

Show detailed information:

uv run ruff check --show-fixes

Ruff detects hundreds of error types, including:

Common Errors:

Unused imports and variables (catching dead code)
Undefined names (typos and missing imports)
Syntax errors and deprecated syntax

Style Violations:

PEP 8 violations (spacing, naming conventions)
Import organization issues
Docstring style problems

Code Quality Issues:

Overly complex functions
Redundant code
Mutable default arguments (a common Python pitfall)
Bare except clauses (catching exceptions too broadly)

Security Issues:

Hardcoded passwords or secrets
Use of eval() or exec()
SQL injection vulnerabilities
Insecure temporary file usage

Example linting output:

src/classifier/train.py:15:8: F841 Local variable `lr` is assigned to but never used
src/classifier/train.py:23:1: E302 Expected 2 blank lines, found 1
src/classifier/models.py:45:9: B006 Do not use mutable data structures for argument defaults
src/classifier/data.py:12:1: I001 Import block is un-sorted or un-formatted

7.15.5 Configuration

Add Ruff configuration to pyproject.toml:

[tool.ruff]
# Core settings
line-length = 100  # Slightly longer than Black's 88 for ML code
target-version = "py311"
src = ["src"]
exclude = [
    ".git",
    ".venv",
    "__pycache__",
    "build",
    "dist",
]

[tool.ruff.format]
quote-style = "double"
indent-style = "space"
skip-magic-trailing-comma = false
line-ending = "auto"

[tool.ruff.lint]
# Enable rule groups
select = [
    "E",    # pycodestyle errors
    "W",    # pycodestyle warnings
    "F",    # Pyflakes
    "UP",   # pyupgrade (modernize Python code)
    "B",    # flake8-bugbear (find likely bugs)
    "SIM",  # flake8-simplify (suggest simplifications)
    "I",    # isort (import sorting)
    "N",    # pep8-naming (enforce naming conventions)
    "C4",   # flake8-comprehensions (better list/dict/set comprehensions)
    "PTH",  # flake8-use-pathlib (prefer pathlib over os.path)
    "RET",  # flake8-return (improve return statements)
    "TRY",  # tryceratops (exception handling best practices)
]

# Ignore specific rules
ignore = [
    "E501",   # Line too long (handled by formatter)
    "TRY003", # Avoid specifying long messages outside exception class
]

# Allow autofix for all enabled rules
fixable = ["ALL"]
unfixable = []

# Ignore specific rules for specific files
[tool.ruff.lint.per-file-ignores]
"__init__.py" = ["F401"]  # Allow unused imports in __init__.py
"tests/*" = ["S101"]      # Allow assert in tests

[tool.ruff.lint.isort]
known-first-party = ["classifier"]

Line Length Philosophy:

The default of 88 characters comes from Black and is based on:

Readability research showing optimal line length
Fitting two files side-by-side on modern monitors
Reducing git diff noise

For ML code with long tensor operations, 100 characters is a reasonable compromise.

7.15.6 Ruff in Your Workflow

Integrate Ruff into your daily workflow:

During development:

# Format before committing
uv run ruff format

# Check for issues
uv run ruff check --fix

# Review remaining issues
uv run ruff check

In CI/CD:

# .github/workflows/lint.yml
- name: Lint with Ruff
  run: |
    uv run ruff format --check
    uv run ruff check

VS Code integration:

{
  "editor.formatOnSave": true,
  "[python]": {
    "editor.defaultFormatter": "charliermarsh.ruff"
  }
}

7.16 Type Checking with mypy

Python supports optional type annotations through PEP 484. While Python remains dynamically typed at runtime, type annotations provide static analysis benefits that are invaluable for ML projects.

7.16.1 Why Type Checking Matters for ML

Machine learning code involves complex data transformations, tensor operations, and model architectures. Type checking helps:

Catch Errors Early:

Detect shape mismatches before running expensive training
Find dimension errors in tensor operations
Identify incorrect data types in transformations

Improve Code Clarity:

Document expected tensor shapes (e.g., Tensor[B, C, H, W])
Specify DataFrame column types
Make function contracts explicit

Better IDE Support:

Accurate autocomplete for model methods
Jump-to-definition for complex hierarchies
Refactoring with confidence

Team Collaboration:

Self-documenting interfaces
Catch integration issues early
Reduce onboarding time

7.16.2 Installation

Add mypy as a development dependency:

uv add --dev mypy

For libraries that need type stubs:

uv add --dev types-PyYAML types-tqdm

7.16.3 Basic Usage

Check types in your entire project:

uv run mypy .

Check specific files:

uv run mypy src/classifier/models.py

Check with verbose output:

uv run mypy --pretty --show-error-context .

7.16.4 Type Annotation Examples

Without types (unclear and error-prone):

def create_model(arch, num_classes, pretrained):
    if arch == "resnet18":
        model = models.resnet18(pretrained=pretrained)
        model.fc = nn.Linear(model.fc.in_features, num_classes)
    return model

def train_epoch(model, loader, optimizer):
    for batch in loader:
        images, labels = batch
        outputs = model(images)
        # ... training logic

With types (clear and verifiable):

from typing import Optional, Tuple, Dict, Any
import torch
import torch.nn as nn
from torch.utils.data import DataLoader
from torch.optim import Optimizer

def create_model(
    arch: str,
    num_classes: int,
    pretrained: bool = True
) -> nn.Module:
    """
    Create a model with specified architecture.
    
    Parameters
    ----------
    arch : str
        Model architecture name
    num_classes : int
        Number of output classes
    pretrained : bool
        Use pretrained weights
        
    Returns
    -------
    nn.Module
        Initialized model
    """
    if arch == "resnet18":
        model = models.resnet18(pretrained=pretrained)
        num_features = model.fc.in_features
        model.fc = nn.Linear(num_features, num_classes)
    return model

def train_epoch(
    model: nn.Module,
    loader: DataLoader,
    optimizer: Optimizer,
    device: torch.device
) -> Tuple[float, float]:
    """
    Train for one epoch.
    
    Returns
    -------
    tuple
        (average_loss, average_accuracy)
    """
    total_loss = 0.0
    correct = 0
    total = 0
    
    for images, labels in loader:
        images = images.to(device)
        labels = labels.to(device)
        
        outputs = model(images)
        # ... training logic
        
    return total_loss / len(loader), correct / total

Advanced: Tensor shape annotations

Using jaxtyping for shape-aware type hints:

uv add --dev jaxtyping

from jaxtyping import Float, Int
import torch
from torch import Tensor

def forward(
    x: Float[Tensor, "batch channels height width"],
    labels: Int[Tensor, "batch"]
) -> Float[Tensor, "batch num_classes"]:
    """
    Forward pass with explicit shape annotations.
    
    mypy and jaxtyping will verify tensor dimensions.
    """
    # x shape: [batch, channels, height, width]
    # Returns: [batch, num_classes]
    pass

7.16.5 The Type Checker Verifies

When you run mypy, it checks:

Argument types: Are you passing the right types?
Return types: Does the function return what it claims?
Attribute access: Does that object have that attribute?
Operations: Are operations valid for those types?

Example errors caught by mypy:

# Error: Argument has incompatible type "str"; expected "int"
model = create_model("resnet18", "10", True)

# Error: Returning None but return type is Tuple[float, float]
def train_epoch(...) -> Tuple[float, float]:
    print("Training...")
    # Forgot to return!

# Error: nn.Module has no attribute "forwar" (typo)
output = model.forwar(x)

# Error: Unsupported operand types for + ("int" and "str")
epochs = 100 + "50"

7.16.6 Configuration

Add mypy settings to pyproject.toml:

[tool.mypy]
python_version = "3.11"
warn_return_any = true
warn_unused_configs = true
warn_redundant_casts = true
warn_unused_ignores = true

# Start lenient, then tighten
disallow_untyped_defs = false        # Set to true eventually
disallow_incomplete_defs = true
check_untyped_defs = true
no_implicit_optional = true

# Show more information
show_error_codes = true
show_error_context = true
pretty = true

# Strictness per module
[[tool.mypy.overrides]]
module = "tests.*"
disallow_untyped_defs = false

[[tool.mypy.overrides]]
module = "classifier.models"
disallow_untyped_defs = true

Progressive typing strategy:

Start with disallow_untyped_defs = false
Add type hints to new code
Gradually annotate existing code
Enable disallow_untyped_defs = true for completed modules
Eventually enable strict mode globally

7.16.7 Type Stubs for ML Libraries

Libraries with built-in types:

torch ✓
numpy ✓ (numpy>=1.20)
scikit-learn ✓
transformers ✓

Libraries needing stubs:

uv add --dev types-Pillow types-tqdm types-PyYAML types-requests

Libraries without stubs:

For libraries without type stubs, you can:

Ignore them:

[tool.mypy]
ignore_missing_imports = true

Create stub files:

# stubs/some_library.pyi
def some_function(x: int) -> str: ...
class SomeClass:
    def method(self) -> None: ...

Use type: ignore comments:

from some_untyped_library import something  # type: ignore

7.16.8 Mypy in Your Workflow

Development cycle:

# Check types while developing
uv run mypy src/

# Check specific module
uv run mypy src/classifier/models.py

# Generate HTML report
uv run mypy --html-report mypy-report/ src/

CI/CD:

- name: Type check with mypy
  run: uv run mypy src/

VS Code integration:

Install the Pylance extension (Microsoft’s language server) which includes mypy integration.

7.17 Testing with pytest

pytest is Python’s de facto standard testing framework. For ML projects, testing is crucial for ensuring data pipelines, model architectures, and training loops work correctly.

7.17.1 Why Testing Matters for ML

Machine learning projects have unique testing challenges:

Data Pipeline Testing:

Verify data loading and preprocessing
Check tensor shapes and types
Validate data augmentation
Test batching and sampling

Model Testing:

Verify model architectures
Check forward/backward passes
Test with different input shapes
Validate output dimensions

Training Logic Testing:

Test loss computation
Verify optimizer updates
Check gradient flow
Test checkpoint saving/loading

End-to-End Testing:

Test complete training pipeline
Verify inference works
Test model export formats

7.17.2 Installation

Add pytest and useful plugins:

uv add --dev pytest pytest-cov pytest-xdist pytest-timeout

pytest: Core testing framework
pytest-cov: Code coverage reporting
pytest-xdist: Parallel test execution
pytest-timeout: Prevent hanging tests

7.17.3 Project Structure

flowchart TD
    root["image-classifier/"]
    root --> src["src/"]
    root --> tests["tests/"]
    src --> classifier["classifier/"]
    classifier --> init["__init__.py"]
    classifier --> datapy["data.py"]
    classifier --> modelspy["models.py"]
    classifier --> trainpy["train.py"]
    tests --> conftest["conftest.py, shared fixtures"]
    tests --> tdata["test_data.py, data pipeline tests"]
    tests --> tmodels["test_models.py, model tests"]
    tests --> ttrain["test_train.py, training tests"]

7.17.4 Writing Tests

Basic test structure:

# tests/test_models.py
import pytest
import torch
import torch.nn as nn
from classifier.models import create_model

def test_resnet18_creation():
    """Test ResNet18 model creation."""
    model = create_model(
        arch='resnet18',
        num_classes=10,
        pretrained=False,
    )
    
    assert model is not None
    assert isinstance(model, nn.Module)
    
    # Test forward pass
    x = torch.randn(2, 3, 224, 224)
    output = model(x)
    
    assert output.shape == (2, 10)
    assert not torch.isnan(output).any()


def test_model_with_frozen_backbone():
    """Test model with frozen backbone."""
    model = create_model(
        arch='resnet18',
        num_classes=10,
        pretrained=True,
        freeze_backbone=True,
    )
    
    # Check that backbone is frozen
    trainable_params = sum(
        p.numel() for p in model.parameters() if p.requires_grad
    )
    
    # Only classifier should be trainable (~5000 params)
    assert trainable_params < 10000


@pytest.mark.parametrize('architecture', ['resnet18', 'resnet50', 'efficientnet_b0'])
def test_different_architectures(architecture):
    """Test different model architectures."""
    model = create_model(
        architecture=architecture,
        num_classes=100,
        pretrained=False,
    )
    
    x = torch.randn(4, 3, 224, 224)
    output = model(x)
    
    assert output.shape == (4, 100)

Testing data pipelines:

# tests/test_data.py
import pytest
import torch
from pathlib import Path
from classifier.data import get_transforms, create_dataloaders

def test_train_transforms():
    """Test training data transforms."""
    from PIL import Image
    import numpy as np
    
    transform = get_transforms(train=True)
    
    # Create dummy image
    img = Image.fromarray(np.random.randint(0, 255, (256, 256, 3), dtype=np.uint8))
    
    # Apply transform
    tensor = transform(img)
    
    assert isinstance(tensor, torch.Tensor)
    assert tensor.shape == (3, 224, 224)
    assert tensor.min() >= -3.0  # Normalized
    assert tensor.max() <= 3.0


def test_dataloader_shapes(tmp_path):
    """Test dataloader output shapes."""
    # Create dummy dataset structure
    train_dir = tmp_path / "train"
    for class_name in ["class1", "class2"]:
        class_dir = train_dir / class_name
        class_dir.mkdir(parents=True)
        
        # Create dummy images
        for i in range(10):
            img = Image.new('RGB', (224, 224))
            img.save(class_dir / f"img_{i}.jpg")
    
    # Create dataloaders
    train_loader, val_loader, _ = create_dataloaders(
        data_dir=str(tmp_path),
        batch_size=4,
        num_workers=0,
    )
    
    # Test batch shape
    images, labels = next(iter(train_loader))
    assert images.shape[0] <= 4  # Batch size
    assert images.shape[1:] == (3, 224, 224)
    assert labels.shape[0] <= 4

Testing training logic:

# tests/test_train.py
import pytest
import torch
import torch.nn as nn
from torch.utils.data import TensorDataset, DataLoader
from classifier.train import train_epoch, validate

@pytest.fixture
def dummy_model():
    """Create a simple model for testing."""
    return nn.Sequential(
        nn.Flatten(),
        nn.Linear(3 * 224 * 224, 10)
    )

@pytest.fixture
def dummy_dataloader():
    """Create a dummy dataloader."""
    images = torch.randn(20, 3, 224, 224)
    labels = torch.randint(0, 10, (20,))
    dataset = TensorDataset(images, labels)
    return DataLoader(dataset, batch_size=4)

def test_train_epoch(dummy_model, dummy_dataloader):
    """Test training for one epoch."""
    model = dummy_model
    criterion = nn.CrossEntropyLoss()
    optimizer = torch.optim.Adam(model.parameters())
    device = torch.device('cpu')
    
    loss, acc = train_epoch(
        model, dummy_dataloader, criterion, optimizer, device, epoch=1
    )
    
    assert isinstance(loss, float)
    assert isinstance(acc, float)
    assert 0 <= acc <= 1
    assert loss >= 0

def test_validate(dummy_model, dummy_dataloader):
    """Test validation."""
    model = dummy_model
    criterion = nn.CrossEntropyLoss()
    device = torch.device('cpu')
    
    val_loss, val_acc = validate(model, dummy_dataloader, criterion, device)
    
    assert isinstance(val_loss, float)
    assert isinstance(val_acc, float)
    assert 0 <= val_acc <= 1

7.17.5 Fixtures for Reusability

Use conftest.py for shared fixtures:

# tests/conftest.py
import pytest
import torch
import tempfile
from pathlib import Path

@pytest.fixture
def device():
    """Get device for testing."""
    return torch.device('cuda' if torch.cuda.is_available() else 'cpu')

@pytest.fixture
def temp_checkpoint_dir():
    """Create temporary directory for checkpoints."""
    with tempfile.TemporaryDirectory() as tmpdir:
        yield Path(tmpdir)

@pytest.fixture
def sample_config():
    """Create sample configuration."""
    return {
        'model': {
            'architecture': 'resnet18',
            'num_classes': 10,
            'pretrained': False,
        },
        'training': {
            'batch_size': 32,
            'learning_rate': 0.001,
            'epochs': 1,
        }
    }

7.17.6 Running Tests

Run all tests:

uv run pytest

Run with verbose output:

uv run pytest -v

Run specific test file:

uv run pytest tests/test_models.py

Run specific test:

uv run pytest tests/test_models.py::test_resnet18_creation

Run tests matching pattern:

uv run pytest -k "model"  # Runs all tests with "model" in name

Run in parallel (faster):

uv run pytest -n auto  # Use all CPUs

Stop on first failure:

uv run pytest -x

Show local variables on failure:

uv run pytest -l

7.17.7 Code Coverage

Generate coverage report:

uv run pytest --cov=classifier --cov-report=term

Output:

---------- coverage: platform linux, python 3.11.9 -----------
Name                        Stmts   Miss  Cover
-----------------------------------------------
src/classifier/__init__.py      2      0   100%
src/classifier/data.py         45      3    93%
src/classifier/models.py       67      5    93%
src/classifier/train.py        89     12    87%
-----------------------------------------------
TOTAL                         203     20    90%

Generate HTML coverage report:

uv run pytest --cov=classifier --cov-report=html

This creates htmlcov/index.html showing:

Which lines are covered
Which branches are taken
Which functions are tested

Coverage requirements:

For production ML code, aim for:

>80% coverage for data pipelines (data loading, preprocessing)
>90% coverage for model architectures
>70% coverage for training loops (some branches hard to test)
100% coverage for utility functions

Coverage in CI:

uv run pytest --cov=classifier --cov-report=term --cov-fail-under=80

7.17.8 pytest Configuration

Add pytest settings to pyproject.toml:

[tool.pytest.ini_options]
# Test discovery
testpaths = ["tests"]
python_files = ["test_*.py"]
python_functions = ["test_*"]
python_classes = ["Test*"]

# Output options
addopts = [
    "--strict-markers",
    "--strict-config",
    "-ra",                    # Show summary of all test outcomes
    "--showlocals",          # Show local variables on failure
    "--tb=short",            # Shorter traceback format
]

# Markers
markers = [
    "slow: marks tests as slow (deselect with '-m \"not slow\"')",
    "gpu: marks tests as requiring GPU",
    "integration: marks tests as integration tests",
]

# Coverage options
[tool.coverage.run]
source = ["src"]
omit = ["tests/*", "*/site-packages/*"]

[tool.coverage.report]
exclude_lines = [
    "pragma: no cover",
    "def __repr__",
    "raise AssertionError",
    "raise NotImplementedError",
    "if __name__ == .__main__.:",
    "if TYPE_CHECKING:",
]

7.17.9 Advanced Testing Patterns

Parameterized tests:

@pytest.mark.parametrize('batch_size,expected_batches', [
    (32, 4),
    (16, 8),
    (8, 16),
])
def test_dataloader_batching(batch_size, expected_batches, tmp_dataset):
    loader = DataLoader(tmp_dataset, batch_size=batch_size)
    assert len(loader) == expected_batches

Testing exceptions:

def test_invalid_architecture():
    with pytest.raises(ValueError, match="Unknown architecture"):
        create_model(architecture="invalid", num_classes=10)

Skipping tests conditionally:

@pytest.mark.skipif(not torch.cuda.is_available(), reason="Requires GPU")
def test_gpu_training():
    model = create_model().cuda()
    # ... GPU-specific test

Slow test marker:

@pytest.mark.slow
def test_full_training_run():
    # This test takes 5 minutes
    pass

# Run fast tests only: pytest -m "not slow"

7.18 Documentation with Quarto

For ML projects, documentation serves multiple purposes:

Code documentation: API docs for functions and classes
Experiment reports: Document training runs and results
Model cards: Document model architecture, performance, limitations
Tutorials: Show how to use your models

7.18.1 Quarto for ML Reports

Quarto is perfect for ML documentation because it supports:

Executable code: Run training scripts and show results
Multiple languages: Python, R, Julia in same document
Rich outputs: Plots, tables, interactive visualizations
Multiple formats: HTML, PDF, presentations, websites

Installation:

# Install Quarto (not via uv)
# macOS
brew install quarto

# Linux
sudo wget https://github.com/quarto-dev/quarto-cli/releases/download/v1.4.549/quarto-1.4.549-linux-amd64.deb
sudo dpkg -i quarto-1.4.549-linux-amd64.deb

# Windows - download from https://quarto.org

Example experiment report (reports/experiment_001.qmd):

---
title: "ResNet18 Image Classification"
author: "Mike"
date: "2024-11-09"
format:
  html:
    code-fold: true
    toc: true
---

## Objective

Train ResNet18 on CIFAR-10 dataset to achieve >90% accuracy.

## Environment Setup

```{python}
import sys
sys.path.insert(0, '../src')

import torch
from classifier.models import create_model
from classifier.train import train_model
import matplotlib.pyplot as plt
import pandas as pd
```

## Model Architecture

```{python}
model = create_model('resnet18', num_classes=10, pretrained=False)
print(f"Total parameters: {sum(p.numel() for p in model.parameters()):,}")
```

## Training Configuration

```{python}
config = {
    'model': {'architecture': 'resnet18', 'num_classes': 10},
    'training': {
        'batch_size': 128,
        'learning_rate': 0.001,
        'epochs': 50,
        'optimizer': 'Adam',
    }
}

pd.DataFrame([config['training']]).T
```

## Training Results

```{python}
# Load training logs
logs = pd.read_csv('../experiments/exp_001/metrics.csv')

fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(12, 4))

# Plot loss
ax1.plot(logs['epoch'], logs['train_loss'], label='Train')
ax1.plot(logs['epoch'], logs['val_loss'], label='Validation')
ax1.set_xlabel('Epoch')
ax1.set_ylabel('Loss')
ax1.set_title('Training and Validation Loss')
ax1.legend()
ax1.grid(True)

# Plot accuracy
ax2.plot(logs['epoch'], logs['train_acc'], label='Train')
ax2.plot(logs['epoch'], logs['val_acc'], label='Validation')
ax2.set_xlabel('Epoch')
ax2.set_ylabel('Accuracy')
ax2.set_title('Training and Validation Accuracy')
ax2.legend()
ax2.grid(True)

plt.tight_layout()
plt.show()
```

## Final Performance

```{python}
best_epoch = logs.loc[logs['val_acc'].idxmax()]
print(f"Best validation accuracy: {best_epoch['val_acc']:.2%}")
print(f"Achieved at epoch: {int(best_epoch['epoch'])}")
print(f"Test accuracy: {best_epoch['test_acc']:.2%}")
```

## Confusion Matrix

```{python}
from sklearn.metrics import confusion_matrix
import seaborn as sns

# Load predictions
y_true = ...  # Load true labels
y_pred = ...  # Load predictions

cm = confusion_matrix(y_true, y_pred)
plt.figure(figsize=(10, 8))
sns.heatmap(cm, annot=True, fmt='d', cmap='Blues')
plt.title('Confusion Matrix')
plt.ylabel('True Label')
plt.xlabel('Predicted Label')
plt.show()
```

## Conclusion

- Achieved validation accuracy
- Model converged after specified epochs
- Ready for deployment

Render the report:

quarto render reports/experiment_001.qmd

This generates reports/experiment_001.html with all results embedded.

7.18.2 Model Cards

Document your models with Quarto model cards:

---
title: "ResNet18 CIFAR-10 Classifier"
subtitle: "Model Card"
format:
  html:
    toc: true
---

## Model Details

- **Model Name**: ResNet18 CIFAR-10 Classifier
- **Version**: 1.0.0
- **Date**: 2024-11-09
- **Architecture**: ResNet18
- **Framework**: PyTorch 2.3.1

## Intended Use

This model classifies images into 10 CIFAR-10 categories:
airplane, automobile, bird, cat, deer, dog, frog, horse, ship, truck.

**Primary uses:**
- Educational demonstrations
- Baseline for computer vision research
- Image classification API

**Out-of-scope uses:**
- Medical diagnosis
- Safety-critical applications
- Real-world deployment without validation

## Training Data

- **Dataset**: CIFAR-10
- **Size**: 50,000 training images, 10,000 test images
- **Resolution**: 32×32 RGB images
- **Splits**: 45,000 train / 5,000 validation / 10,000 test

## Performance

| Split | Accuracy |
|-------|----------|
| Train | 98.5% |
| Validation | 92.3% |
| Test | 91.8% |

## Limitations

- Only works on 32×32 images
- Performance degrades on images outside CIFAR-10 distribution
- No adversarial robustness
- Bias towards training distribution

## Ethical Considerations

- Dataset contains potential biases in category representation
- Should not be used for surveillance applications
- Consider privacy implications when deploying

7.19 Complete Development Workflow

Putting it all together, here’s a complete development cycle:

7.19.1 Daily Development Cycle

# 1. Pull latest changes
git pull

# 2. Sync environment
uv sync --all-extras

# 3. Make changes to code
# ... edit files ...

# 4. Format code
uv run ruff format

# 5. Fix linting issues
uv run ruff check --fix

# 6. Verify remaining issues
uv run ruff check

# 7. Type check
uv run mypy src/

# 8. Run tests
uv run pytest

# 9. Check coverage
uv run pytest --cov=classifier --cov-report=term

# 10. Commit changes
git add .
git commit -m "Add feature X"
git push

7.19.2 Before Committing Checklist

Create a Makefile to automate checks:

.PHONY: format lint typecheck test check all

format:
    uv run ruff format

lint:
    uv run ruff check --fix
    uv run ruff check

typecheck:
    uv run mypy src/

test:
    uv run pytest -v

coverage:
    uv run pytest --cov=classifier --cov-report=html --cov-report=term

check: format lint typecheck test

all: check coverage

clean:
    rm -rf .venv
    rm -rf htmlcov/
    rm -rf .mypy_cache/
    rm -rf .pytest_cache/
    rm -rf .ruff_cache/
    find . -type d -name __pycache__ -exec rm -rf {} +
    find . -type f -name "*.pyc" -delete

Usage:

# Run all checks before committing
make check

# Generate coverage report
make coverage

# Clean up artifacts
make clean

7.19.3 Pre-commit Hooks (Optional)

For automatic checking, install pre-commit:

uv add --dev pre-commit

Create .pre-commit-config.yaml:

repos:
  - repo: local
    hooks:
      - id: ruff-format
        name: Ruff Format
        entry: uv run ruff format
        language: system
        types: [python]
        
      - id: ruff-check
        name: Ruff Check
        entry: uv run ruff check --fix
        language: system
        types: [python]
        
      - id: mypy
        name: mypy
        entry: uv run mypy
        language: system
        types: [python]
        pass_filenames: false
        args: [src/]
        
      - id: pytest-fast
        name: pytest (fast tests only)
        entry: uv run pytest -m "not slow"
        language: system
        pass_filenames: false
        always_run: true

Install hooks:

uv run pre-commit install

Now checks run automatically on git commit.

7.19.4 CI/CD Pipeline

Create .github/workflows/test.yml:

name: Test

on: [push, pull_request]

jobs:
  test:
    runs-on: ubuntu-latest
    
    steps:
      - uses: actions/checkout@v4
      
      - name: Install uv
        run: curl -LsSf https://astral.sh/uv/install.sh | sh
        
      - name: Add uv to PATH
        run: echo "$HOME/.cargo/bin" >> $GITHUB_PATH
      
      - name: Sync dependencies
        run: uv sync --all-extras
      
      - name: Format check
        run: uv run ruff format --check
      
      - name: Lint
        run: uv run ruff check
      
      - name: Type check
        run: uv run mypy src/
      
      - name: Test
        run: uv run pytest --cov=classifier --cov-report=xml
      
      - name: Upload coverage
        uses: codecov/codecov-action@v3
        with:
          file: ./coverage.xml

7.19.5 Project Structure Best Practices

For ML projects that might integrate with R workflows or require cross-language collaboration:

flowchart TD
    root["ml-project/"]
    root --> github[".github/"]
    root --> configs["configs, training configs"]
    root --> data["data, not in git"]
    root --> docs["docs, documentation"]
    root --> experiments["experiments, tracking"]
    root --> models["models, saved models"]
    root --> notebooks["notebooks, Jupyter"]
    root --> reports["reports, Quarto reports"]
    root --> scripts["scripts, utility scripts"]
    root --> src["src, source code"]
    root --> tests["tests/"]
    root --> gitignore[".gitignore"]
    root --> pyver[".python-version"]
    root --> makefile["Makefile"]
    root --> pyproject["pyproject.toml"]
    root --> readme["README.md"]
    root --> lock["uv.lock"]
    github --> workflows["workflows/"]
    workflows --> testyml["test.yml"]
    workflows --> deployyml["deploy.yml"]
    configs --> r18["resnet18.yaml"]
    configs --> r50["resnet50.yaml"]
    data --> raw["raw/"]
    data --> processed["processed/"]
    data --> splits["splits/"]
    docs --> modelcard["model_card.qmd"]
    docs --> apidoc["api.qmd"]
    experiments --> exp001["exp_001/"]
    experiments --> exp002["exp_002/"]
    exp001 --> cfg["config.yaml"]
    exp001 --> metrics["metrics.csv"]
    exp001 --> ckpt["checkpoints/"]
    models --> prod["production/"]
    models --> staging["staging/"]
    notebooks --> eda["01-eda.ipynb"]
    notebooks --> analysis["02-analysis.ipynb"]
    reports --> exprep["experiment_001.qmd"]
    scripts --> strain["train.py"]
    scripts --> seval["evaluate.py"]
    src --> classifier["classifier/"]
    classifier --> init["__init__.py"]
    classifier --> datapy["data.py"]
    classifier --> modelspy["models.py"]
    classifier --> trainpy["train.py"]
    classifier --> evalpy["evaluate.py"]
    tests --> conftest["conftest.py"]
    tests --> tdata["test_data.py"]
    tests --> tmodels["test_models.py"]
    tests --> ttrain["test_train.py"]

.gitignore for ML projects:

# Python
__pycache__/
*.py[cod]
*$py.class
*.so
.Python
build/
develop-eggs/
dist/
downloads/
eggs/
.eggs/
lib/
lib64/
parts/
sdist/
var/
wheels/
*.egg-info/
.installed.cfg
*.egg

# Virtual environments
.venv/
venv/
ENV/
env/

# IDE
.vscode/
.idea/
*.swp
*.swo
*~

# Testing
.pytest_cache/
.coverage
htmlcov/
.mypy_cache/
.ruff_cache/

# Jupyter
.ipynb_checkpoints/

# Data (large files)
data/raw/*.jpg
data/raw/*.png
data/raw/*.zip
data/processed/*.npy
data/processed/*.h5

# Models (use Git LFS or external storage)
models/*.pth
models/*.ckpt
models/*.h5
*.onnx

# Experiment tracking
wandb/
mlruns/
.neptune/
experiments/*/checkpoints/

# Logs
logs/
*.log

# OS
.DS_Store
Thumbs.db

7.20 Summary: The Complete ML Development Stack

With uv and the modern Python toolchain, you have:

Environment Management (uv):

Fast, reliable package installation
Reproducible environments with lock files
Python version management
GPU/CPU dependency variants

Code Quality (Ruff):

Consistent formatting
Automated linting
Fast feedback loops
Catches common bugs

Type Safety (mypy):

Early error detection
Self-documenting code
Better IDE support
Refactoring confidence

Testing (pytest):

Unit and integration tests
Code coverage tracking
Parallel test execution
CI/CD integration

Documentation (Quarto):

Executable reports
Model cards
API documentation
Reproducible analyses

This toolchain creates a professional development workflow that:

Catches errors early (before training expensive models)
Ensures reproducibility (lock files + versioning)
Improves collaboration (consistent style + documentation)
Speeds up development (fast tools + automation)

The investment in setting up this infrastructure pays dividends throughout your ML project lifecycle, from initial prototyping through production deployment.

7.20.1 Setting Up the Project

Clone and set up:

# Clone repository
git clone https://github.com/user/image-classifier.git
cd image-classifier

# Install dependencies (uv reads uv.lock for exact versions)
uv sync --all-extras

# Run tests
uv run pytest

# Start training
uv run train --config configs/resnet18.yaml

The beauty of this workflow: a single uv sync command installs everything exactly as specified in the lock file. No version mismatches, no dependency conflicts, no environment inconsistencies when deploying your trained model.

7.20.2 Updating Dependencies

When you need to update packages (e.g., new PyTorch release with bug fixes):

# Update all packages to latest compatible versions
uv sync --upgrade

# Update specific package
uv add --upgrade torch

# Update and regenerate lock file
uv lock --upgrade

After updating, test your code thoroughly and commit the new uv.lock:

uv run pytest
git add uv.lock
git commit -m "Update dependencies - PyTorch 2.3.0"

Important for ML: When updating deep learning frameworks, always retrain key models and validate that performance hasn’t degraded. Minor version updates can sometimes change numerical precision or default behaviors.

7.21 Tools and Global Packages

Beyond project dependencies, you often need global tools like ruff, black, or pipx equivalents. uv handles these with uv tool.

7.21.1 Installing Global Tools

uv tool install ruff
uv tool install black
uv tool install mypy

These are installed in isolated environments but available globally. You can then use them anywhere:

ruff check .
black src/
mypy src/

7.21.2 Listing Installed Tools

uv tool list

7.21.3 Upgrading Tools

uv tool upgrade ruff
uv tool upgrade-all  # Upgrade all tools

7.21.4 Running Tools Without Installing

For one-off uses:

uv tool run ruff check .

This downloads ruff if needed, runs it, then discards the environment.

7.22 Migration from Other Tools

7.22.1 From `pip` and `requirements.txt`

If you have a requirements.txt:

# Create new project
uv init my-project
cd my-project

# Import requirements
uv add $(cat requirements.txt)

Or convert to pyproject.toml manually:

dependencies = [
    "pandas==2.0.3",
    "numpy==1.24.4",
    # ... etc
]

Then:

uv sync

7.22.2 From `poetry`

If migrating from poetry, you already have pyproject.toml. Just remove poetry-specific sections:

# Remove poetry.lock
rm poetry.lock

# Initialize uv in the directory
uv init --no-readme

# Sync dependencies
uv sync

7.22.3 From `conda`

For conda users, export your environment:

conda env export --from-history > requirements.txt

Edit requirements.txt to remove conda-specific packages, then:

uv init my-project
cd my-project
uv add $(cat requirements.txt)

Some packages (especially scientific ones like cudatoolkit) are conda-specific and may need alternatives or system-level installation.

7.23 Continuous Integration

Using uv in CI/CD pipelines is straightforward and fast.

7.23.1 GitHub Actions Example

name: Tests

on: [push, pull_request]

jobs:
  test:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      
      - name: Install uv
        run: curl -LsSf https://astral.sh/uv/install.sh | sh
      
      - name: Sync dependencies
        run: uv sync --all-extras
      
      - name: Run tests
        run: uv run pytest --cov=src tests/
      
      - name: Run type checking
        run: uv run mypy src/

This is much faster than traditional pip install approaches, often reducing CI times by 50% or more.

7.23.2 GitLab CI Example

test:
  image: python:3.12
  before_script:
    - curl -LsSf https://astral.sh/uv/install.sh | sh
    - source $HOME/.cargo/env
  script:
    - uv sync --all-extras
    - uv run pytest

7.24 Performance Considerations

The speed of uv is one of its defining features. Here’s why it’s fast and how to maximize performance:

7.24.1 Parallel Downloads

uv downloads packages in parallel, using all available network bandwidth. Traditional pip downloads serially, which wastes time.

7.24.2 Caching

uv aggressively caches downloaded wheels. Once you’ve installed pandas==2.2.2, it’s cached globally. Installing it in another project is nearly instant.

Cache location:

# macOS/Linux
~/.cache/uv/

# Windows  
%LOCALAPPDATA%\uv\cache\

7.24.3 Benchmark Comparisons

In real-world testing, uv shows dramatic speedups:

Tool	Time to install torch+torchvision+numpy
pip	185 seconds
poetry	145 seconds
uv	12 seconds

For larger dependency trees (e.g., installing transformers with all its dependencies, or a complete data science stack), the difference is even more pronounced. This matters especially in ML workflows where you frequently create new environments for experiments or CI/CD pipelines.

7.24.4 Tips for Maximum Performance

Use the lock file: uv sync with a lock file is faster than resolving dependencies from scratch
Cache in CI: Cache ~/.cache/uv in CI pipelines
Pre-download dependencies: Use uv sync --no-install-project to download without installing
Use wheels: Avoid source distributions when possible; wheels install much faster

7.25 Troubleshooting Common Issues

7.25.1 Problem: Package Not Found

error: Failed to download `package-name`

Solution: Check package name spelling. Verify it exists on PyPI. Try updating the index:

uv sync --refresh

7.25.2 Problem: Version Conflicts

error: No solution found when resolving dependencies

Solution: Relax version constraints. Check which packages are conflicting and update them:

uv tree  # See dependency tree

7.25.3 Problem: Python Version Not Available

error: No interpreter found for Python 3.12

Solution: Install the Python version:

uv python install 3.12

7.25.4 Problem: Import Fails in Script

ImportError: No module named 'torch'

Solution: Ensure you’re running with uv run:

uv run python train.py

Or sync dependencies:

uv sync

7.25.5 Problem: Wrong Package Version

Solution: Check what’s installed:

uv pip list

Lock and sync to fix:

uv lock
uv sync

7.26 Best Practices for ML Projects

Based on years of machine learning development, here are recommended practices:

7.26.1 1. Always Use Lock Files

Commit uv.lock to git. This is non-negotiable for reproducible ML research and production deployments.

git add uv.lock pyproject.toml
git commit -m "Lock dependencies"

7.26.2 2. Pin Python Versions

Use .python-version to specify the exact Python version:

uv python pin 3.11.9

This prevents subtle bugs from Python version differences that can affect model training or inference.

7.26.3 3. Separate Development Dependencies

Keep development tools separate from training/inference dependencies:

[project.optional-dependencies]
dev = [
    "pytest",
    "jupyter",
    "black",
]

This keeps your production Docker images lean.

7.26.4 4. Document Environment Setup

Include clear instructions in README.md:

## Setup

1. Install uv: `curl -LsSf https://astral.sh/uv/install.sh | sh`
2. Sync environment: `uv sync --all-extras`
3. Train model: `uv run train --config configs/resnet50.yaml`
4. Evaluate: `uv run evaluate --checkpoint models/best.pth`

7.26.5 5. Use Scripts for Reproducibility

Define scripts in pyproject.toml:

[project.scripts]
preprocess = "classifier.data:preprocess"
train = "classifier.train:main"
evaluate = "classifier.evaluate:main"
infer = "classifier.inference:predict"

Then document the ML pipeline:

uv run preprocess --data data/raw/
uv run train --epochs 100 --lr 0.001
uv run evaluate --model models/checkpoint.pth
uv run infer --image test.jpg

7.26.6 6. Version Control Configuration

Create a .gitignore:

# Python
__pycache__/
*.py[cod]
.ipynb_checkpoints/

# uv
.venv/

# Data (don't commit large datasets)
data/raw/*.jpg
data/raw/*.png
data/processed/

# Models (use Git LFS or external storage)
models/*.pth
models/*.ckpt
*.h5

# Experiment tracking
wandb/
mlruns/
.neptune/

# Results
results/
experiments/*/outputs/

7.26.7 7. Regular Dependency Audits

Periodically check for outdated packages:

uv sync --upgrade
uv run pytest  # Ensure tests still pass
# Re-run key training experiments to validate

7.26.8 8. Use Inline Scripts for Quick Experiments

For quick exploratory work or prototyping:

# /// script
# dependencies = [
#   "torch",
#   "torchvision",
#   "matplotlib",
# ]
# ///

import torch
import torchvision.models as models
import matplotlib.pyplot as plt

# Quick model prototyping
model = models.resnet18(pretrained=True)
# ... experiment code ...

Run with:

uv run experiment.py

7.26.9 9. GPU Environment Management

For projects requiring CUDA, create separate dependency groups:

[project.optional-dependencies]
gpu = [
    "torch[cuda]>=2.0.0",
]

cpu = [
    "torch>=2.0.0",
]

Then install based on your environment:

# On GPU machine
uv sync --extra gpu

# On CPU-only machine
uv sync --extra cpu

7.27 Working with Deep Learning Frameworks and GPUs

One of the most common pain points in ML development is managing deep learning frameworks, especially when dealing with CUDA and GPU support. uv simplifies this process significantly.

7.27.1 PyTorch with CUDA Support

PyTorch offers different packages for CPU-only and CUDA-enabled versions. With uv, you can manage these elegantly:

Option 1: Platform-specific dependencies

[project]
dependencies = [
    "numpy>=1.24.0",
    "pillow>=10.0.0",
]

[project.optional-dependencies]
cuda = [
    "torch>=2.0.0",
    "torchvision>=0.15.0",
]

cpu = [
    "torch>=2.0.0",
    "torchvision>=0.15.0",
]

Then install based on your hardware:

# On GPU machine
uv sync --extra cuda

# On CPU-only machine
uv sync --extra cpu

Option 2: Using PyTorch index for CUDA versions

PyTorch hosts CUDA-specific builds on their own index:

# Add PyTorch with CUDA 12.1 support
uv add torch torchvision --index-url https://download.pytorch.org/whl/cu121

Or in pyproject.toml:

[tool.uv]
extra-index-url = ["https://download.pytorch.org/whl/cu121"]

[project]
dependencies = [
    "torch>=2.0.0",
    "torchvision>=0.15.0",
]

7.27.2 TensorFlow with GPU Support

TensorFlow 2.x simplifies GPU support:

# TensorFlow with GPU support (works with CUDA)
uv add tensorflow[and-cuda]>=2.15.0

Or for CPU-only:

uv add tensorflow>=2.15.0

7.27.3 JAX with GPU Support

JAX requires specific CUDA/cuDNN versions:

# JAX with CUDA 12 support
uv add "jax[cuda12]>=0.4.20" -f https://storage.googleapis.com/jax-releases/jax_cuda_releases.html

7.27.4 Verifying GPU Access

Create a simple verification script:

# /// script
# dependencies = [
#   "torch",
# ]
# ///

import torch

print(f"PyTorch version: {torch.__version__}")
print(f"CUDA available: {torch.cuda.is_available()}")
if torch.cuda.is_available():
    print(f"CUDA version: {torch.version.cuda}")
    print(f"Number of GPUs: {torch.cuda.device_count()}")
    print(f"GPU name: {torch.cuda.get_device_name(0)}")

Run with:

uv run verify_gpu.py

7.27.5 Managing Multiple Framework Versions

For projects that need to test across different framework versions:

[project.optional-dependencies]
torch-2-0 = ["torch==2.0.1", "torchvision==0.15.2"]
torch-2-1 = ["torch==2.1.2", "torchvision==0.16.2"]
torch-2-3 = ["torch==2.3.1", "torchvision==0.18.1"]

Then test with different versions:

uv sync --extra torch-2-0
uv run pytest

uv sync --extra torch-2-1
uv run pytest

7.27.6 Hugging Face Transformers

For NLP tasks with transformers:

uv add transformers datasets tokenizers accelerate

For training large models with optimizations:

uv add transformers[torch] datasets accelerate bitsandbytes

7.27.7 Common ML Stack

Here’s a comprehensive ML dependency setup:

[project]
name = "ml-project"
version = "0.1.0"
requires-python = ">=3.11"

dependencies = [
    # Core scientific computing
    "numpy>=1.24.0,<2.0.0",
    "scipy>=1.11.0",
    "pandas>=2.0.0",
    
    # Visualization
    "matplotlib>=3.7.0",
    "seaborn>=0.12.0",
    "plotly>=5.14.0",
    
    # ML utilities
    "scikit-learn>=1.3.0",
    "tqdm>=4.65.0",
    "pyyaml>=6.0",
]

[project.optional-dependencies]
# Deep learning
pytorch = [
    "torch>=2.0.0,<3.0.0",
    "torchvision>=0.15.0",
    "torchaudio>=2.0.0",
    "lightning>=2.0.0",
]

tensorflow = [
    "tensorflow[and-cuda]>=2.15.0",
    "tensorboard>=2.15.0",
]

# NLP
nlp = [
    "transformers>=4.30.0",
    "datasets>=2.12.0",
    "tokenizers>=0.13.0",
    "sentencepiece>=0.1.99",
]

# Computer vision
cv = [
    "opencv-python>=4.8.0",
    "albumentations>=1.3.1",
    "timm>=0.9.0",
]

# Experiment tracking
tracking = [
    "wandb>=0.15.0",
    "mlflow>=2.5.0",
    "tensorboard>=2.13.0",
]

# Optimization
optimization = [
    "optuna>=3.2.0",
    "ray[tune]>=2.5.0",
]

# Development
dev = [
    "pytest>=7.4.0",
    "pytest-cov>=4.1.0",
    "black>=23.0.0",
    "ruff>=0.1.0",
    "mypy>=1.5.0",
    "jupyter>=1.0.0",
    "ipykernel>=6.25.0",
]

Install what you need:

# Full PyTorch stack with NLP
uv sync --extra pytorch --extra nlp --extra tracking --extra dev

# TensorFlow with computer vision
uv sync --extra tensorflow --extra cv --extra tracking --extra dev

7.27.8 Docker Integration

Create a Dockerfile that uses uv:

FROM nvidia/cuda:12.1.0-base-ubuntu22.04

# Install Python and uv
RUN apt-get update && apt-get install -y python3.11 python3-pip curl
RUN curl -LsSf https://astral.sh/uv/install.sh | sh
ENV PATH="/root/.cargo/bin:$PATH"

# Copy project files
WORKDIR /app
COPY pyproject.toml uv.lock ./

# Install dependencies
RUN uv sync --no-dev

# Copy source code
COPY src/ ./src/

# Run training
CMD ["uv", "run", "train", "--config", "configs/production.yaml"]

Build and run:

docker build -t ml-model:latest .
docker run --gpus all ml-model:latest

7.27.9 CUDA Version Management

Different projects might need different CUDA versions. Document clearly:

# pyproject.toml
[tool.uv]
# PyTorch with CUDA 12.1
extra-index-url = ["https://download.pytorch.org/whl/cu121"]

[project]
dependencies = [
    "torch>=2.3.0",
    "torchvision>=0.18.0",
]

In README:

## Requirements

- CUDA 12.1 or later
- NVIDIA driver 530 or later
- 8GB+ GPU memory (recommended)

## Installation

```bash
# Verify CUDA version
nvidia-smi

# Install dependencies
uv sync --all-extras
```

7.27.10 Mixed Precision Training

For models using mixed precision (crucial for large models):

uv add torch torchvision
# Apex for older PyTorch versions
uv add git+https://github.com/NVIDIA/apex.git

Or use native PyTorch AMP (already included in torch>=1.6).

7.27.11 Memory Optimization Libraries

For large models that don’t fit in GPU memory:

# DeepSpeed for distributed training
uv add deepspeed

# bitsandbytes for quantization
uv add bitsandbytes

# Flash Attention for efficient attention
uv add flash-attn --no-build-isolation

7.27.12 Troubleshooting GPU Issues

Problem: CUDA not detected

# Check PyTorch installation
uv run python -c "import torch; print(torch.cuda.is_available())"

Solution: Ensure you installed CUDA-enabled PyTorch:

uv add torch --index-url https://download.pytorch.org/whl/cu121

Problem: Out of memory errors

Add gradient checkpointing and mixed precision:

# Enable gradient checkpointing
model.gradient_checkpointing_enable()

# Use automatic mixed precision
from torch.cuda.amp import autocast, GradScaler

scaler = GradScaler()

with autocast():
    outputs = model(inputs)
    loss = criterion(outputs, labels)

scaler.scale(loss).backward()
scaler.step(optimizer)
scaler.update()

Problem: Different CUDA versions on different machines

Use environment-specific lock files:

# On GPU machine with CUDA 12.1
uv lock --output-file uv.lock.cuda121

# On GPU machine with CUDA 11.8
uv lock --output-file uv.lock.cuda118

# Sync with specific lock file
uv sync --locked uv.lock.cuda121

7.28 Integration with Other Tools

7.28.1 Pre-commit Hooks

Use uv with pre-commit for code quality:

# .pre-commit-config.yaml
repos:
  - repo: local
    hooks:
      - id: ruff
        name: ruff
        entry: uv run ruff check --fix
        language: system
        types: [python]
      
      - id: black
        name: black
        entry: uv run black
        language: system
        types: [python]
      
      - id: mypy
        name: mypy
        entry: uv run mypy
        language: system
        types: [python]

7.28.2 VS Code Configuration

Configure VS Code to use uv:

{
  "python.defaultInterpreterPath": ".venv/bin/python",
  "python.terminal.activateEnvironment": false,
  "python.testing.pytestEnabled": true,
  "python.testing.pytestArgs": ["tests"],
  "[python]": {
    "editor.defaultFormatter": "ms-python.black-formatter",
    "editor.formatOnSave": true,
    "editor.codeActionsOnSave": {
      "source.organizeImports": true
    }
  }
}

7.28.3 Make-based ML Workflows

Combine with Make for complex ML workflows:

.PHONY: install train evaluate deploy clean

install:
    uv sync --all-extras

data:
    uv run python scripts/download_data.py
    uv run python scripts/preprocess.py

train:
    uv run train --config configs/resnet50.yaml --epochs 100

train-debug:
    uv run train --config configs/debug.yaml --epochs 1

evaluate:
    uv run evaluate --checkpoint models/best.pth --data data/test/

tensorboard:
    uv run tensorboard --logdir experiments/

test:
    uv run pytest tests/ -v --cov=src

format:
    uv run black src/ tests/
    uv run ruff check --fix src/ tests/

type-check:
    uv run mypy src/

notebook:
    uv run jupyter lab

clean:
    rm -rf .venv
    find . -type d -name __pycache__ -exec rm -rf {} +
    rm -rf experiments/*/checkpoints/*.pth

# Complete pipeline
pipeline: data train evaluate

Usage:

# Setup and train
make install
make pipeline

# Development
make train-debug
make test
make format

7.29 Advanced Topics

7.29.1 Custom Package Indexes

If your organization has a private PyPI server:

uv add --index-url https://pypi.company.com/simple/ company-package

Or in pyproject.toml:

[tool.uv]
index-url = "https://pypi.company.com/simple/"
extra-index-url = ["https://pypi.org/simple/"]

7.29.2 Building and Publishing Packages

To build a distribution:

uv build

This creates wheel and source distributions in dist/.

To publish to PyPI:

uv publish

7.29.3 Workspaces

For monorepos with multiple packages:

# Root pyproject.toml
[tool.uv.workspace]
members = ["packages/*"]

Then each subdirectory in packages/ can have its own pyproject.toml.

7.29.4 Environment Variables

Control uv behavior with environment variables:

# Specify cache location
export UV_CACHE_DIR=/custom/cache

# Use different PyPI mirror
export UV_INDEX_URL=https://mirror.pypi.org/simple/

# Increase verbosity
export UV_VERBOSE=1

7.30 Comparison with Other Tools

7.30.1 `uv` vs `pip`

Feature	pip	uv
Speed	Baseline	10-100x faster
Resolver	Backtracking	Modern SAT solver
Lock files	Manual (pip-tools)	Built-in
Python management	No	Yes
Virtual envs	Manual	Automatic

7.30.2 `uv` vs `poetry`

Feature	poetry	uv
Speed	Slow	Very fast
Maturity	Mature	New (but stable)
Plugin system	Yes	No
Publishing	Excellent	Good
Learning curve	Moderate	Low

7.30.3 `uv` vs `conda`

Feature	conda	uv
Binary packages	Yes	Wheels only
Non-Python deps	Yes	No
Speed	Slow	Very fast
Environment size	Large	Small
Scientific stack	Excellent	Good

For pure Python projects, uv is superior. For projects requiring system libraries (CUDA, MKL, etc.), conda may still be necessary.

7.31 Real-World Example: Complete ML Project

Let’s walk through setting up a complete image classification project using PyTorch and modern best practices.

7.31.1 Step 1: Initialize Project

uv init image-classifier
cd image-classifier
uv python pin 3.11

7.31.2 Step 2: Configure `pyproject.toml`

[project]
name = "image-classifier"
version = "0.1.0"
description = "Deep learning image classifier using ResNet architecture"
readme = "README.md"
requires-python = ">=3.11"
authors = [
    {name = "Mike", email = "mike@marshall.usc.edu"}
]

dependencies = [
    "torch>=2.0.0,<3.0.0",
    "torchvision>=0.15.0,<1.0.0",
    "numpy>=1.24.0,<2.0.0",
    "pillow>=10.0.0",
    "matplotlib>=3.7.0",
    "scikit-learn>=1.3.0",
    "tqdm>=4.65.0",
    "pyyaml>=6.0",
    "tensorboard>=2.13.0",
]

[project.optional-dependencies]
dev = [
    "jupyter>=1.0.0",
    "ipykernel>=6.25.0",
    "pytest>=7.4.0",
    "pytest-cov>=4.1.0",
    "black>=23.0.0",
    "ruff>=0.1.0",
    "mypy>=1.5.0",
]

experiment = [
    "wandb>=0.15.0",
]

[project.scripts]
train = "classifier.train:main"
evaluate = "classifier.evaluate:main"
infer = "classifier.inference:predict"

[build-system]
requires = ["hatchling"]
build-backend = "hatchling.build"

7.31.3 Step 3: Install Dependencies

uv sync --all-extras

7.31.4 Step 4: Create Project Structure

mkdir -p data/{raw,processed,splits}
mkdir -p models/checkpoints
mkdir -p src/classifier
mkdir -p notebooks
mkdir -p tests
mkdir -p configs
mkdir -p experiments

7.31.5 Step 5: Write Core Code

Create src/classifier/models.py:

"""Neural network architectures for image classification."""

import torch
import torch.nn as nn
import torchvision.models as models
from typing import Optional


def create_model(
    architecture: str = "resnet18",
    num_classes: int = 10,
    pretrained: bool = True,
    freeze_backbone: bool = False,
) -> nn.Module:
    """
    Create a model with specified architecture.
    
    Parameters
    ----------
    architecture : str
        Model architecture ('resnet18', 'resnet50', 'efficientnet_b0')
    num_classes : int
        Number of output classes
    pretrained : bool
        Use ImageNet pretrained weights
    freeze_backbone : bool
        Freeze backbone layers for transfer learning
        
    Returns
    -------
    nn.Module
        Initialized model
    """
    if architecture == "resnet18":
        model = models.resnet18(weights='IMAGENET1K_V1' if pretrained else None)
        num_features = model.fc.in_features
        model.fc = nn.Linear(num_features, num_classes)
    elif architecture == "resnet50":
        model = models.resnet50(weights='IMAGENET1K_V1' if pretrained else None)
        num_features = model.fc.in_features
        model.fc = nn.Linear(num_features, num_classes)
    elif architecture == "efficientnet_b0":
        model = models.efficientnet_b0(
            weights='IMAGENET1K_V1' if pretrained else None
        )
        num_features = model.classifier[1].in_features
        model.classifier[1] = nn.Linear(num_features, num_classes)
    else:
        raise ValueError(f"Unknown architecture: {architecture}")
    
    if freeze_backbone:
        # Freeze all layers except the final classifier
        for param in model.parameters():
            param.requires_grad = False
        
        # Unfreeze classifier
        if architecture in ["resnet18", "resnet50"]:
            for param in model.fc.parameters():
                param.requires_grad = True
        elif architecture == "efficientnet_b0":
            for param in model.classifier.parameters():
                param.requires_grad = True
    
    return model


class Classifier(nn.Module):
    """
    Wrapper for classification models with additional utilities.
    """
    
    def __init__(
        self,
        backbone: nn.Module,
        num_classes: int,
        dropout: float = 0.5,
    ):
        super().__init__()
        self.backbone = backbone
        self.dropout = nn.Dropout(dropout)
        
    def forward(self, x: torch.Tensor) -> torch.Tensor:
        features = self.backbone(x)
        return self.dropout(features)

Create src/classifier/train.py:

"""Training loop for image classification."""

import torch
import torch.nn as nn
import torch.optim as optim
from torch.utils.data import DataLoader
from torch.utils.tensorboard import SummaryWriter
from pathlib import Path
from tqdm import tqdm
from typing import Dict, Tuple
import yaml

from .models import create_model
from .data import create_dataloaders
from .utils import save_checkpoint, AverageMeter


def train_epoch(
    model: nn.Module,
    dataloader: DataLoader,
    criterion: nn.Module,
    optimizer: optim.Optimizer,
    device: torch.device,
    epoch: int,
) -> Tuple[float, float]:
    """
    Train for one epoch.
    
    Returns
    -------
    tuple
        Average loss and accuracy for the epoch
    """
    model.train()
    losses = AverageMeter()
    accuracies = AverageMeter()
    
    pbar = tqdm(dataloader, desc=f"Epoch {epoch}")
    
    for images, labels in pbar:
        images = images.to(device)
        labels = labels.to(device)
        
        # Forward pass
        outputs = model(images)
        loss = criterion(outputs, labels)
        
        # Backward pass
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()
        
        # Calculate accuracy
        _, predicted = outputs.max(1)
        accuracy = (predicted == labels).float().mean()
        
        # Update metrics
        losses.update(loss.item(), images.size(0))
        accuracies.update(accuracy.item(), images.size(0))
        
        pbar.set_postfix({
            'loss': f'{losses.avg:.4f}',
            'acc': f'{accuracies.avg:.4f}'
        })
    
    return losses.avg, accuracies.avg


def validate(
    model: nn.Module,
    dataloader: DataLoader,
    criterion: nn.Module,
    device: torch.device,
) -> Tuple[float, float]:
    """
    Validate the model.
    
    Returns
    -------
    tuple
        Average loss and accuracy
    """
    model.eval()
    losses = AverageMeter()
    accuracies = AverageMeter()
    
    with torch.no_grad():
        for images, labels in tqdm(dataloader, desc="Validation"):
            images = images.to(device)
            labels = labels.to(device)
            
            outputs = model(images)
            loss = criterion(outputs, labels)
            
            _, predicted = outputs.max(1)
            accuracy = (predicted == labels).float().mean()
            
            losses.update(loss.item(), images.size(0))
            accuracies.update(accuracy.item(), images.size(0))
    
    return losses.avg, accuracies.avg


def train_model(config: Dict) -> None:
    """
    Main training function.
    
    Parameters
    ----------
    config : dict
        Training configuration
    """
    # Setup
    device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
    print(f"Using device: {device}")
    
    # Create dataloaders
    train_loader, val_loader, _ = create_dataloaders(
        data_dir=config['data']['path'],
        batch_size=config['training']['batch_size'],
        num_workers=config['training']['num_workers'],
    )
    
    # Create model
    model = create_model(
        architecture=config['model']['architecture'],
        num_classes=config['model']['num_classes'],
        pretrained=config['model']['pretrained'],
        freeze_backbone=config['model'].get('freeze_backbone', False),
    )
    model = model.to(device)
    
    # Loss and optimizer
    criterion = nn.CrossEntropyLoss()
    optimizer = optim.Adam(
        model.parameters(),
        lr=config['training']['learning_rate'],
        weight_decay=config['training']['weight_decay'],
    )
    
    # Learning rate scheduler
    scheduler = optim.lr_scheduler.ReduceLROnPlateau(
        optimizer,
        mode='min',
        factor=0.5,
        patience=5,
    )
    
    # Tensorboard
    writer = SummaryWriter(config['training']['log_dir'])
    
    # Training loop
    best_val_acc = 0.0
    
    for epoch in range(1, config['training']['epochs'] + 1):
        # Train
        train_loss, train_acc = train_epoch(
            model, train_loader, criterion, optimizer, device, epoch
        )
        
        # Validate
        val_loss, val_acc = validate(model, val_loader, criterion, device)
        
        # Update learning rate
        scheduler.step(val_loss)
        
        # Log metrics
        writer.add_scalar('Loss/train', train_loss, epoch)
        writer.add_scalar('Loss/val', val_loss, epoch)
        writer.add_scalar('Accuracy/train', train_acc, epoch)
        writer.add_scalar('Accuracy/val', val_acc, epoch)
        writer.add_scalar('LR', optimizer.param_groups[0]['lr'], epoch)
        
        print(f"\nEpoch {epoch}:")
        print(f"  Train Loss: {train_loss:.4f}, Train Acc: {train_acc:.4f}")
        print(f"  Val Loss: {val_loss:.4f}, Val Acc: {val_acc:.4f}")
        
        # Save checkpoint
        is_best = val_acc > best_val_acc
        best_val_acc = max(val_acc, best_val_acc)
        
        save_checkpoint(
            {
                'epoch': epoch,
                'model_state_dict': model.state_dict(),
                'optimizer_state_dict': optimizer.state_dict(),
                'val_acc': val_acc,
                'config': config,
            },
            is_best=is_best,
            checkpoint_dir=config['training']['checkpoint_dir'],
        )
    
    writer.close()
    print(f"\nTraining completed. Best validation accuracy: {best_val_acc:.4f}")


def main():
    """Entry point for training script."""
    import argparse
    
    parser = argparse.ArgumentParser(description='Train image classifier')
    parser.add_argument(
        '--config',
        type=str,
        required=True,
        help='Path to config file'
    )
    args = parser.parse_args()
    
    # Load config
    with open(args.config, 'r') as f:
        config = yaml.safe_load(f)
    
    # Train
    train_model(config)


if __name__ == '__main__':
    main()

Create src/classifier/data.py:

"""Data loading and preprocessing utilities."""

import torch
from torch.utils.data import DataLoader, random_split
from torchvision import datasets, transforms
from pathlib import Path
from typing import Tuple


def get_transforms(
    train: bool = True,
    image_size: int = 224,
) -> transforms.Compose:
    """
    Get data transforms for training or validation.
    
    Parameters
    ----------
    train : bool
        If True, return training transforms with augmentation
    image_size : int
        Target image size
        
    Returns
    -------
    transforms.Compose
        Composed transforms
    """
    if train:
        return transforms.Compose([
            transforms.RandomResizedCrop(image_size),
            transforms.RandomHorizontalFlip(),
            transforms.ColorJitter(
                brightness=0.2,
                contrast=0.2,
                saturation=0.2,
            ),
            transforms.ToTensor(),
            transforms.Normalize(
                mean=[0.485, 0.456, 0.406],
                std=[0.229, 0.224, 0.225],
            ),
        ])
    else:
        return transforms.Compose([
            transforms.Resize(256),
            transforms.CenterCrop(image_size),
            transforms.ToTensor(),
            transforms.Normalize(
                mean=[0.485, 0.456, 0.406],
                std=[0.229, 0.224, 0.225],
            ),
        ])


def create_dataloaders(
    data_dir: str,
    batch_size: int = 32,
    num_workers: int = 4,
    val_split: float = 0.2,
) -> Tuple[DataLoader, DataLoader, DataLoader]:
    """
    Create train, validation, and test dataloaders.
    
    Parameters
    ----------
    data_dir : str
        Path to data directory
    batch_size : int
        Batch size
    num_workers : int
        Number of workers for data loading
    val_split : float
        Validation split ratio
        
    Returns
    -------
    tuple
        Train, validation, and test dataloaders
    """
    data_path = Path(data_dir)
    
    # Load datasets
    train_dataset = datasets.ImageFolder(
        data_path / 'train',
        transform=get_transforms(train=True)
    )
    
    test_dataset = datasets.ImageFolder(
        data_path / 'test',
        transform=get_transforms(train=False)
    )
    
    # Split train into train and validation
    val_size = int(len(train_dataset) * val_split)
    train_size = len(train_dataset) - val_size
    
    train_subset, val_subset = random_split(
        train_dataset,
        [train_size, val_size],
        generator=torch.Generator().manual_seed(42)
    )
    
    # Create dataloaders
    train_loader = DataLoader(
        train_subset,
        batch_size=batch_size,
        shuffle=True,
        num_workers=num_workers,
        pin_memory=True,
    )
    
    val_loader = DataLoader(
        val_subset,
        batch_size=batch_size,
        shuffle=False,
        num_workers=num_workers,
        pin_memory=True,
    )
    
    test_loader = DataLoader(
        test_dataset,
        batch_size=batch_size,
        shuffle=False,
        num_workers=num_workers,
        pin_memory=True,
    )
    
    return train_loader, val_loader, test_loader

7.31.6 Step 6: Write Tests

Create tests/test_models.py:

"""Tests for model architectures."""

import pytest
import torch
from classifier.models import create_model


def test_resnet18_creation():
    """Test ResNet18 model creation."""
    model = create_model(
        architecture='resnet18',
        num_classes=10,
        pretrained=False,
    )
    
    assert model is not None
    
    # Test forward pass
    x = torch.randn(2, 3, 224, 224)
    output = model(x)
    
    assert output.shape == (2, 10)


def test_model_with_frozen_backbone():
    """Test model with frozen backbone."""
    model = create_model(
        architecture='resnet18',
        num_classes=10,
        pretrained=True,
        freeze_backbone=True,
    )
    
    # Check that backbone is frozen
    trainable_params = sum(
        p.numel() for p in model.parameters() if p.requires_grad
    )
    
    # Only classifier should be trainable
    assert trainable_params < 1000000  # Arbitrary threshold


@pytest.mark.parametrize('architecture', ['resnet18', 'resnet50'])
def test_different_architectures(architecture):
    """Test different model architectures."""
    model = create_model(
        architecture=architecture,
        num_classes=100,
        pretrained=False,
    )
    
    x = torch.randn(4, 3, 224, 224)
    output = model(x)
    
    assert output.shape == (4, 100)

7.31.7 Step 7: Create Configuration

Create configs/resnet18.yaml:

# Model configuration
model:
  architecture: resnet18
  num_classes: 10
  pretrained: true
  freeze_backbone: false

# Data configuration
data:
  path: data/
  image_size: 224

# Training configuration
training:
  batch_size: 32
  epochs: 50
  learning_rate: 0.001
  weight_decay: 0.0001
  num_workers: 4
  checkpoint_dir: models/checkpoints/
  log_dir: experiments/resnet18/

7.31.8 Step 8: Run Training

# Run tests first
uv run pytest tests/ -v

# Start training
uv run train --config configs/resnet18.yaml

# Monitor with tensorboard
uv run tensorboard --logdir experiments/

7.31.9 Step 9: Create Analysis Notebook

Create notebooks/01-analysis.ipynb:

# /// script
# dependencies = [
#   "torch",
#   "torchvision",
#   "matplotlib",
#   "seaborn",
# ]
# ///

import sys
sys.path.insert(0, '../src')

from classifier.models import create_model
from classifier.data import create_dataloaders
import torch
import matplotlib.pyplot as plt
import seaborn as sns

# Load trained model
model = create_model('resnet18', num_classes=10)
checkpoint = torch.load('../models/checkpoints/best.pth')
model.load_state_dict(checkpoint['model_state_dict'])

# Analyze results
_, _, test_loader = create_dataloaders('../data', batch_size=32)

# Evaluate and visualize
# ... evaluation code ...

7.31.10 Step 10: Document

Create comprehensive README.md:

# Image Classifier

Deep learning image classifier using PyTorch and ResNet architectures.

## Features

- Multiple architecture support (ResNet18, ResNet50, EfficientNet)
- Transfer learning with pretrained weights
- Data augmentation
- TensorBoard logging
- Comprehensive testing

## Setup

```bash
# Install uv
curl -LsSf https://astral.sh/uv/install.sh | sh

# Clone and setup
git clone https://github.com/user/image-classifier.git
cd image-classifier
uv sync --all-extras
```

## Usage

### Training

```bash
uv run train --config configs/resnet18.yaml
```

### Evaluation

```bash
uv run evaluate --checkpoint models/best.pth --data data/test/
```

### Inference

```bash
uv run infer --checkpoint models/best.pth --image path/to/image.jpg
```

### Monitoring

```bash
uv run tensorboard --logdir experiments/
```

## Project Structure

```
image-classifier/
├── src/classifier/      # Source code
├── tests/              # Unit tests
├── configs/            # Training configurations
├── data/               # Datasets
├── models/             # Model checkpoints
├── notebooks/          # Jupyter notebooks
└── experiments/        # Experiment logs
```

## Results

| Model | Accuracy | Parameters |
|-------|----------|-----------|
| ResNet18 | 92.3% | 11.7M |
| ResNet50 | 94.1% | 25.6M |

## Citation

If you use this code, please cite...

7.32 Conclusion

uv represents a significant step forward in Python package management. Its speed, simplicity, and reliability make it ideal for machine learning and AI development where managing complex dependencies and ensuring reproducibility is critical. By combining package management, environment isolation, and Python version management into a single tool, uv eliminates much of the friction that has historically plagued Python ML development.

For ML practitioners, the benefits are clear:

Faster iteration: Less time waiting for packages means more time training models and experimenting
Better reproducibility: Lock files ensure your trained models can be deployed with the exact environment they were trained in
Simpler workflows: One tool instead of many reduces cognitive overhead
Production-ready: Fast, reliable dependency management makes deployment smoother

As you continue through this book, many examples will benefit from using uv for environment management. The patterns we’ve established here, using pyproject.toml, locking dependencies, and running code with uv run, will serve you well throughout your machine learning journey, from prototyping to production deployment.

7.33 Summary

In this chapter, we’ve covered:

Installing and configuring uv across different platforms
Creating and managing ML projects with proper structure
Handling dependencies, version constraints, and lock files
Managing Python versions for consistency
Integrating with Jupyter notebooks for experimentation
Building reproducible ML workflows for training and deployment
Troubleshooting common issues in ML environments
Best practices for ML/AI projects including GPU environment management

With uv in your toolkit, you’re well-equipped to manage the technical infrastructure of your ML projects, allowing you to focus on what matters most: building, training, and deploying effective machine learning models. The speed and reliability of uv means less time fighting with dependencies and more time on actual model development and experimentation.

# Project Management {#sec-uv} ## Introduction Python project management has evolved significantly over the years, with tools like `pip`, `virtualenv`, `conda`, and `poetry` each attempting to solve different aspects of dependency management and environment isolation. In 2024, a tool called `uv` emerged from Astral, the team behind `ruff`, aiming to consolidate Python package management with a focus on speed and simplicity. Written in Rust, `uv` combines the functionality of several previously separate tools into a single, cohesive experience. At its heart, project management answers three coupled questions. First, *which* versions of which packages should be installed so that every package's stated requirements are simultaneously satisfied. Second, *how* to record that answer so it can be reproduced exactly on another machine months or years later. Third, *where* to place the resulting packages so that one project's choices do not corrupt another's. The first question is a constraint-satisfaction problem (dependency resolution), the second is the lock-file mechanism, and the third is environment isolation. The rest of this chapter treats each in turn, then layers on the surrounding toolchain (linting, type checking, testing, and documentation) that turns a reproducible environment into a disciplined workflow. For machine learning and AI work, where managing deep dependency graphs and ensuring bit-for-bit reproducibility across environments is critical, getting these three questions right is the difference between an experiment that replicates and one that silently drifts. The chapter is organized to be read linearly the first time and used as a reference thereafter, moving from installation, through the formal resolution problem, to concrete ML workflows and the full development stack. All of the tooling discussed here is mature, free, and open source. `uv`, `ruff`, `pytest`, `mypy`, and `quarto` are permissively licensed and develop in the open, so nothing in this workflow depends on a paid service or a proprietary registry. ## Why `uv` Matters for Machine Learning and AI Before diving into the technical details, it's worth understanding why `uv` is particularly valuable for machine learning and AI development: **Reproducibility**: ML models must be reproducible. With `uv`, you can lock exact versions of all dependencies, ensuring that your trained neural network or fine-tuned LLM produces identical results when deployed or shared with collaborators months or years later. **Speed**: Installing ML frameworks like PyTorch, TensorFlow, or transformers with all their dependencies is notoriously slow. `uv` is 10-100x faster than `pip`, meaning you spend less time waiting for environments to set up and more time training models. **Simplicity**: Modern ML projects require complex dependency graphs, deep learning frameworks, data processing libraries, visualization tools, and more. `uv` simplifies this complexity with intuitive commands and clear error messages, reducing cognitive overhead. **Isolation**: Different ML projects often require different versions of frameworks. `uv` makes it trivial to create isolated environments, preventing version conflicts between your PyTorch 2.0 computer vision project and your TensorFlow 2.15 NLP project. ## Installation Installing `uv` is straightforward. The recommended method varies by operating system: ### macOS and Linux On Unix-based systems, the simplest installation method uses the official installer script: ``` bash curl -LsSf https://astral.sh/uv/install.sh | sh ``` This downloads and installs `uv` to your system, adding it to your PATH automatically. After installation, restart your terminal or source your shell configuration file: ``` bash source $HOME/.cargo/env ``` ### Windows On Windows, you can use PowerShell: ``` powershell powershell -c "irm https://astral.sh/uv/install.ps1 | iex" ``` Alternatively, if you have Python already installed, you can use `pip`: ``` bash pip install uv ``` However, the standalone installer is preferred as it doesn't depend on an existing Python installation. ### Verifying Installation After installation, verify that `uv` is working correctly: ``` bash uv --version ``` You should see output showing the installed version, such as: ``` uv 0.4.18 (Homebrew 2024-11-05) ``` ## Understanding `uv`'s Architecture To use `uv` effectively, it helps to understand its core concepts and how it differs from traditional Python tools. ### The Tool Chain Analogy Think of `uv` as a complete tool chain rather than a single tool. It replaces multiple tools in the Python ecosystem: - **`pip`**: Package installation - **`pip-tools`**: Dependency resolution and locking - **`virtualenv`/`venv`**: Environment creation - **`pyenv`**: Python version management - **`pipx`**: Tool installation Where you previously needed to coordinate these separate tools, `uv` provides a unified interface. This integration eliminates common pain points like ensuring your virtual environment uses the correct Python version or manually compiling requirements files. ### Key Design Principles `uv` is built on several core principles: 1. **Speed First**: Written in Rust and using parallel downloads, `uv` prioritizes performance without sacrificing correctness. 2. **Correctness**: `uv` uses a proper dependency resolver that can handle complex version constraints, unlike `pip`'s historical resolver issues. 3. **Batteries Included**: Unlike tools that require plugins or additional configuration, `uv` works out of the box for common workflows. 4. **Standards Compliant**: `uv` follows Python packaging standards (PEP 517, PEP 621, etc.), ensuring compatibility with the broader ecosystem. ## The Dependency Resolution Problem The single hardest thing a package manager does is *resolution*: choosing one concrete version for every package such that every constraint is satisfied at once. The chapter's claims that `uv` uses "a proper resolver" and that `pip` historically had "resolver issues" only make sense once we state the problem precisely. This section does that, then explains why the problem is genuinely hard and how modern resolvers cope. ### A precise statement Let $P = \{p_1, \dots, p_n\}$ be the set of packages reachable from your project's direct requirements. For each package $p_i$ let $V_i = \{v_{i,1}, v_{i,2}, \dots\}$ be the finite set of versions available on the index, plus a distinguished symbol $\bot$ meaning "not installed." A *solution* is an assignment $$ \sigma : P \to V_i \cup \{\bot\}, \qquad \sigma(p_i) \in V_i \cup \{\bot\}, $$ that selects at most one version of each package. Dependencies are constraints between these choices. A typical requirement, "package `a` at version $v$ depends on `numpy >= 1.24, < 2.0`," becomes the logical implication $$ \bigl(\sigma(\texttt{a}) = v\bigr) \;\Rightarrow\; \sigma(\texttt{numpy}) \in [1.24,\, 2.0). $$ The project's own direct dependencies are unconditional constraints of the same form. A solution is *valid* when every such implication holds and every selected version's own dependencies are themselves satisfied (the constraint set is closed under reachability). Resolution is the search for a valid $\sigma$ in which every package required by the project is assigned a real version rather than $\bot$. ::: {.callout-note} ## Definition: the version-selection constraint problem Given packages with versioned, conditional dependency constraints, decide whether a valid assignment $\sigma$ exists, and if so return one. Optional refinements ask for the assignment that is *newest* under some preference order, for example lexicographically maximal versions, which is what users usually want. ::: ### Why it is hard Each package's "which version" choice is a discrete variable, and the implications above are exactly propositional clauses. Encode "package $p$ is at version $v$" as a Boolean variable $x_{p,v}$, add clauses enforcing that at most one version per package is chosen, and translate each dependency implication into a disjunction. The result is a Boolean satisfiability (SAT) instance, and conversely SAT instances can be encoded as dependency graphs. Boolean satisfiability is the canonical NP-complete problem [@cook1971complexity], so version selection inherits that worst-case hardness: no known algorithm solves every instance in time polynomial in the number of packages and versions. The same reduction underlies the formal study of package managers in the Linux distribution world [@abate2015modular], where the problem was analyzed long before it reached the Python ecosystem. Two ecosystem realities make the worst case bite in practice: - **Diamond dependencies.** Project `A` depends on `B` and `C`; both `B` and `C` depend on `D`, but with overlapping-or-disjoint version ranges. The resolver must find a version of `D` acceptable to both, and the feasible window can be empty. - **Backtracking blowups.** A naive resolver picks versions greedily, discovers a conflict deep in the tree, and must undo many choices. Without learning from the conflict it can revisit the same dead end repeatedly, which is the behavior older `pip` resolvers were criticized for. ### How modern resolvers cope `uv` (like Dart's `pub` and the `poetry` resolver) uses the **PubGrub** algorithm, an adaptation of conflict-driven clause learning (CDCL) from the SAT-solving literature. The key idea is that when the search hits an incompatibility, it does not merely backtrack one step. It derives a new, more general *incompatibility* (a learned clause) that summarizes *why* the dead end occurred, then uses that clause to prune the rest of the search and to explain the failure to the user. The clear "Because package-a depends on numpy<2.0 and package-b depends on numpy>=2.0, ..." messages shown later in this chapter are these learned incompatibilities rendered as English. Three properties matter in practice and are worth stating as expectations rather than guarantees of speed: 1. **Completeness.** If a valid assignment exists, the resolver finds one; if none exists, it reports a genuine conflict rather than installing a broken set. This is the property `pip`'s historical resolver lacked. 2. **Determinism.** Given the same inputs (the same `pyproject.toml`, the same index state), resolution returns the same assignment. Determinism is what makes the lock file meaningful: it is the serialized $\sigma$. 3. **Best-effort optimality.** Among valid assignments the resolver prefers the newest versions consistent with your constraints, so you get bug fixes "for free" without manual pinning. ```{mermaid} flowchart TD start["Direct requirements from pyproject.toml"] start --> pick["Pick a candidate version for the next package"] pick --> check["Check it against all active constraints"] check -->|"compatible"| more{"More packages to assign"} check -->|"conflict"| learn["Derive a learned incompatibility"] learn --> back["Backjump past the real cause"] back --> pick more -->|"yes"| pick more -->|"no"| done["Valid assignment, write uv.lock"] ``` ### Worked example: an unsatisfiable diamond Suppose your project requires two packages with the following published constraints. | Package | Version | Requires | |---------|---------|----------| | `vision` | 1.0 | `numpy >= 1.24, < 2.0` | | `fastmath` | 3.0 | `numpy >= 2.0` | There is no single `numpy` version in both $[1.24, 2.0)$ and $[2.0, \infty)$, since those intervals are disjoint. The resolver assigns `vision==1.0`, propagates `numpy < 2.0`, then tries `fastmath==3.0`, which propagates `numpy >= 2.0`. The two clauses about `numpy` are jointly unsatisfiable, so the resolver records the incompatibility "`vision==1.0` and `fastmath==3.0` cannot coexist" and reports it. The fix is not a resolver flag but a real engineering decision: relax a bound (does `fastmath` have an older release that accepts `numpy < 2.0`?), drop one dependency, or wait for an upstream release that widens its range. The resolver's value is that it tells you exactly which two requirements collide, instead of installing whichever it reached last and leaving you to debug an import error at runtime. ### When to worry, and when not to For the overwhelming majority of projects, resolution completes in well under a second and you never think about any of this. The theory matters at the moments when it does not: when a `uv sync` suddenly cannot find a solution, when adding one package forces a cascade of downgrades, or when CI resolves differently from your laptop. In those moments the right mental model is the one above. The resolver is searching a constraint problem, conflicts are facts about your dependency graph rather than bugs in the tool, and the lock file is the artifact that freezes a known-good solution so the search never has to run again on the deployment path. ## Basic Project Workflow Let's walk through creating and managing a Python project with `uv`. We'll build a small machine learning project to demonstrate practical usage. ### Creating a New Project To create a new project, use the `uv init` command: ``` bash uv init image-classifier cd image-classifier ``` This creates a new directory with a basic project structure: ```{mermaid} flowchart TD root["image-classifier/"] root --> v[".python-version"] root --> readme["README.md"] root --> pyproject["pyproject.toml"] root --> hello["hello.py"] ``` Let's examine each file: **`.python-version`**: Specifies the Python version for this project. `uv` uses this to automatically download and use the correct Python version. **`pyproject.toml`**: The modern Python project configuration file, following PEP 621. This is where dependencies, metadata, and build configuration live. **`hello.py`**: A simple starter script that `uv` creates as an example. ### Understanding `pyproject.toml` The `pyproject.toml` file is central to modern Python projects. Here's what `uv init` generates: ``` toml [project] name = "image-classifier" version = "0.1.0" description = "Add your description here" readme = "README.md" requires-python = ">=3.12" dependencies = [] [build-system] requires = ["hatchling"] build-backend = "hatchling.build" ``` Let's break this down: - **`[project]`**: Metadata about your project, following PEP 621 - **`name`**: The package name (important if you plan to distribute it) - **`version`**: Semantic version number - **`requires-python`**: Minimum Python version requirement - **`dependencies`**: List of required packages (initially empty) - **`[build-system]`**: Configuration for building the package (uses `hatchling` by default) For an ML project, you might not care about building a distributable package, but the structure remains useful for dependency management. ### Adding Dependencies There are two main ways to add dependencies: directly editing `pyproject.toml` or using the command line. #### Method 1: Command Line (Recommended) To add a package, use `uv add`: ``` bash uv add torch torchvision numpy pillow matplotlib ``` This does several things automatically: 1. Resolves the dependencies and their sub-dependencies 2. Updates `pyproject.toml` with the new dependencies 3. Creates or updates `uv.lock` with exact versions 4. Installs the packages in your project environment After running this command, your `pyproject.toml` will show: ``` toml dependencies = [ "torch", "torchvision", "numpy", "pillow", "matplotlib", ] ``` And `uv` has created a `uv.lock` file that pins exact versions of these packages and all their dependencies. #### Method 2: Manual Editing You can also edit `pyproject.toml` directly: ``` toml dependencies = [ "torch>=2.0.0", "torchvision>=0.15.0", "numpy>=1.24.0", "pillow>=10.0.0", "matplotlib>=3.7.0", ] ``` Then synchronize your environment: ``` bash uv sync ``` This reads the `pyproject.toml`, resolves dependencies, and installs everything. ### Version Constraints When specifying dependencies, you can use various version constraint operators: ``` toml dependencies = [ "torch", # Any version (not recommended) "torch>=2.0.0", # Greater than or equal to 2.0.0 "torch>=2.0.0,<3.0.0", # Between 2.0.0 and 3.0.0 "torch~=2.0.0", # Compatible release (2.0.x) "torch==2.0.1", # Exact version (very restrictive) ] ``` For ML projects, I recommend using lower bounds with conservative upper bounds: ``` toml dependencies = [ "torch>=2.0.0,<3.0.0", "numpy>=1.24.0,<2.0.0", "transformers>=4.30.0,<5.0.0", ] ``` This gives you bug fixes and minor updates while protecting against breaking changes. This is especially important for deep learning frameworks where major versions can introduce significant API changes. ### The Lock File: `uv.lock` The `uv.lock` file is critical for reproducibility. It contains the exact resolved versions of every package in your dependency tree. Here's a snippet: ``` toml [[package]] name = "torch" version = "2.3.1" source = { registry = "https://pypi.org/simple" } dependencies = [ { name = "filelock" }, { name = "typing-extensions" }, { name = "sympy" }, { name = "networkx" }, { name = "jinja2" }, { name = "fsspec" }, ] wheels = [ { url = "https://files.pythonhosted.org/packages/...", hash = "sha256:..." }, ] ``` This tells us: - Exactly which version of PyTorch is installed (2.3.1) - Where it came from (PyPI) - Its direct dependencies - The specific wheel file and its hash for verification **Important**: You should commit `uv.lock` to version control. This ensures anyone cloning your repository can recreate your exact environment, which is critical when sharing trained models or reproducing experimental results. ### What the lock file does and does not guarantee It helps to be precise about the kind of reproducibility a lock file buys, because it is often overstated. The lock file is the serialized resolution $\sigma$ from the previous section, together with a cryptographic hash of every artifact. It pins three things and leaves a fourth to you. - **Version reproducibility.** Every transitive package is fixed to one exact version. Two `uv sync` runs from the same lock file install the same version numbers. This is the property that protects against silent drift when an upstream package publishes a new release. - **Artifact integrity.** Each wheel is recorded with a SHA-256 hash. On install, `uv` verifies the downloaded bytes against the recorded hash, so a corrupted download or a tampered mirror is detected rather than installed. Formally, if $h$ is the recorded hash and $\hat{h}$ is the hash of the bytes actually fetched, installation proceeds only when $h = \hat{h}$. - **Source provenance.** The index URL and, for Git or local dependencies, the exact commit or path are recorded, so you know precisely where each artifact came from. - **What it does not pin: the platform.** A lock file selects wheels for a target platform and Python version. The same lock can resolve to different wheels on Linux versus macOS, or on CPython 3.11 versus 3.12, because those are genuinely different binaries (a manylinux wheel is not a macOS wheel). For deep learning this is the usual source of "works on my machine" surprises: a CUDA-enabled `torch` wheel and a CPU-only one are different artifacts. The lock file makes the *choice* deterministic per platform; it does not make a GPU appear on a CPU-only host. The later sections on CUDA variants address this directly. A useful way to think about it: the lock file removes nondeterminism from the resolver and from the network, leaving only the deliberate, declared variation across platforms and hardware. ## Running Python with `uv` ### The `uv run` Command Instead of activating a virtual environment and then running Python, `uv` provides the `uv run` command: ``` bash uv run python script.py ``` This automatically: 1. Ensures the project environment exists 2. Installs any missing dependencies 3. Runs the Python script in that environment You can also run Python interactively: ``` bash uv run python ``` Or execute inline code: ``` bash uv run python -c "import pandas; print(pandas.__version__)" ``` ### Running Installed Tools For tools like `jupyter`, `pytest`, or `black`, use `uv run` as well: ``` bash uv run jupyter notebook uv run pytest tests/ uv run black src/ ``` This is cleaner than traditional workflows where you'd activate an environment first. ## Development Dependencies ML projects often need development tools (testing, formatting, documentation, experiment tracking) that aren't required for running the actual training or inference. `uv` supports optional dependency groups for this. ### Adding Development Dependencies Add development dependencies with the `--dev` flag: ``` bash uv add --dev pytest black mypy jupyter tensorboard wandb ``` This updates `pyproject.toml` with a new section: ``` toml [project.optional-dependencies] dev = [ "pytest", "black", "mypy", "jupyter", "tensorboard", "wandb", ] ``` Or you can create custom groups: ``` bash uv add --optional gpu torch-cuda ``` ``` toml [project.optional-dependencies] gpu = [ "torch-cuda", ] ``` ### Installing Optional Dependencies To install the project with development dependencies: ``` bash uv sync --extra dev ``` Or all optional groups: ``` bash uv sync --all-extras ``` ## Python Version Management One of `uv`'s most powerful features is built-in Python version management, eliminating the need for `pyenv` or similar tools. ### Specifying Python Versions You can specify the Python version in multiple ways: **1. Project-level (recommended):** ``` bash uv python pin 3.12 ``` This creates a `.python-version` file: ``` 3.12 ``` **2. In pyproject.toml:** ``` toml requires-python = ">=3.11" ``` ### Installing Python Versions If the required Python version isn't available, `uv` can install it: ``` bash uv python install 3.12 ``` This downloads and installs Python 3.12, managed by `uv`. You can install multiple versions: ``` bash uv python install 3.11 3.12 3.13 ``` ### Listing Available Pythons To see installed Python versions: ``` bash uv python list ``` Output might look like: ``` cpython-3.13.0-macos-aarch64-none /Users/mike/.local/share/uv/python/cpython-3.13.0-macos-aarch64-none/bin/python3 cpython-3.12.7-macos-aarch64-none /Users/mike/.local/share/uv/python/cpython-3.12.7-macos-aarch64-none/bin/python3 cpython-3.11.10-macos-aarch64-none /Users/mike/.local/share/uv/python/cpython-3.11.10-macos-aarch64-none/bin/python3 ``` ### Using Specific Python Versions For a one-off command with a specific Python version: ``` bash uv run --python 3.11 python script.py ``` Or create a project with a specific version: ``` bash uv init --python 3.11 my-project ``` ## Advanced Dependency Management ### Installing from Git Repositories Sometimes you need bleeding-edge code or a forked version of a package. `uv` makes this straightforward: ``` bash uv add "package @ git+https://github.com/user/package.git" ``` For a specific branch: ``` bash uv add "package @ git+https://github.com/user/package.git@dev-branch" ``` For a specific commit: ``` bash uv add "package @ git+https://github.com/user/package.git@abc123" ``` In `pyproject.toml`, this appears as: ``` toml dependencies = [ "package @ git+https://github.com/user/package.git@abc123", ] ``` ### Installing from Local Paths For packages you're developing locally: ``` bash uv add --editable ../my-local-package ``` Or in `pyproject.toml`: ``` toml dependencies = [ "my-package @ file:///path/to/my-package", ] ``` The `--editable` flag (or `-e`) makes the package editable, so changes to the source are immediately reflected without reinstalling. ### Platform-Specific Dependencies Some packages are only needed on certain platforms. You can specify this in `pyproject.toml`: ``` toml dependencies = [ "pandas", "pywin32; platform_system == 'Windows'", "python-magic; platform_system != 'Windows'", ] ``` ### Resolving Dependency Conflicts When dependencies conflict, `uv` provides clear error messages. These messages are the learned incompatibilities from the resolution algorithm discussed earlier rendered in English: the resolver is reporting a fact about your dependency graph, not failing arbitrarily. For example, if package A requires `numpy<2.0` but package B requires `numpy>=2.0`, `uv` will report: ``` error: No solution found when resolving dependencies: Because package-a depends on numpy<2.0 and package-b depends on numpy>=2.0, we can conclude that package-a and package-b are incompatible. ``` To resolve conflicts: 1. **Check if updates fix it**: Update packages with `uv sync --upgrade` 2. **Use version constraints**: Manually specify compatible versions 3. **Report upstream**: File issues with package maintainers 4. **Fork if necessary**: Maintain a patched version ## Scripts and Entry Points For distributable packages, you can define console scripts in `pyproject.toml`: ``` toml [project.scripts] causal-analyze = "causal_analysis.main:cli" did-estimate = "causal_analysis.did:main" ``` Then run them with: ``` bash uv run causal-analyze data.csv ``` This is useful for creating reproducible analysis pipelines that others can run. ## Working with Jupyter Notebooks Jupyter notebooks are common in research. Here's how to use them with `uv`: ### Adding Jupyter ``` bash uv add --dev jupyter ipykernel ``` ### Running Jupyter ``` bash uv run jupyter notebook ``` Or for JupyterLab: ``` bash uv run jupyter lab ``` ### Creating a Kernel To make your project available as a Jupyter kernel: ``` bash uv run python -m ipykernel install --user --name=causal-analysis ``` Now you can select the "causal-analysis" kernel in any Jupyter notebook. ### Inline Scripts in Notebooks `uv` supports inline script metadata in Python files and notebooks. At the top of a script, you can specify dependencies: ``` python # /// script # requires-python = ">=3.11" # dependencies = [ # "torch", # "torchvision", # "matplotlib", # ] # /// import torch import torchvision import matplotlib.pyplot as plt # Your model training or inference here model = torchvision.models.resnet18(pretrained=True) ``` Then run it with: ``` bash uv run script.py ``` `uv` automatically creates a temporary environment with the specified dependencies. This is perfect for one-off experiments or sharing standalone training scripts. ## Reproducible ML Workflows Let's put everything together with a complete workflow for a machine learning project. ### Project Structure A well-organized ML project might look like: ```{mermaid} flowchart TD root["image-classifier/"] root --> pyver[".python-version, Python 3.11"] root --> pyproject["pyproject.toml, config"] root --> lock["uv.lock, locked deps"] root --> readme["README.md, docs"] root --> gitignore[".gitignore"] root --> data["data"] root --> models["models"] root --> notebooks["notebooks"] root --> src["src"] root --> tests["tests"] root --> scripts["scripts"] root --> experiments["experiments, logs and results"] data --> raw["raw, original datasets"] data --> processed["processed, preprocessed data"] data --> splits["splits, train val test"] models --> checkpoints["checkpoints/"] models --> configs["configs/"] notebooks --> eda["01-eda.ipynb"] notebooks --> prep["02-preprocessing.ipynb"] notebooks --> trainnb["03-training.ipynb"] src --> classifier["classifier/"] classifier --> init["__init__.py"] classifier --> datapy["data.py"] classifier --> modelspy["models.py"] classifier --> trainpy["train.py"] classifier --> evalpy["evaluate.py"] tests --> tdata["test_data.py"] tests --> tmodels["test_models.py"] tests --> ttrain["test_train.py"] scripts --> strain["train.py"] scripts --> seval["evaluate.py"] experiments --> exp001["exp_001/"] ``` ### Complete `pyproject.toml` Here's a comprehensive configuration for an image classification project: ``` toml [project] name = "image-classifier" version = "0.1.0" description = "Deep learning image classifier using PyTorch" readme = "README.md" requires-python = ">=3.11" authors = [ {name = "Mike", email = "mike@email.com"} ] license = {text = "MIT"} dependencies = [ "torch>=2.0.0,<3.0.0", "torchvision>=0.15.0,<1.0.0", "numpy>=1.24.0,<2.0.0", "pillow>=10.0.0", "matplotlib>=3.7.0", "scikit-learn>=1.3.0", "tqdm>=4.65.0", "pyyaml>=6.0", "tensorboard>=2.13.0", ] [project.optional-dependencies] dev = [ "jupyter>=1.0.0", "ipykernel>=6.25.0", "pytest>=7.4.0", "pytest-cov>=4.1.0", "black>=23.0.0", "ruff>=0.1.0", "mypy>=1.5.0", ] experiment-tracking = [ "wandb>=0.15.0", "mlflow>=2.5.0", ] gpu = [ "torch-cuda>=2.0.0", ] [project.scripts] train = "classifier.train:main" evaluate = "classifier.evaluate:main" infer = "classifier.inference:main" [build-system] requires = ["hatchling"] build-backend = "hatchling.build" [tool.pytest.ini_options] testpaths = ["tests"] python_files = ["test_*.py"] [tool.black] line-length = 100 target-version = ['py311'] [tool.ruff] line-length = 100 target-version = "py311" ``` ## The Modern Python Toolchain for ML Just as R developers rely on `devtools`, `usethis`, `styler`, `lintr`, and `testthat`, Python ML developers need a comprehensive toolchain. For a Python projects, we recommend: - **uv**: Package and environment management - **Ruff**: Code formatting and linting - **mypy**: Static type checking - **pytest**: Unit testing framework - **Quarto**: Documentation and reproducible reports Think of it as: `uv` = `renv` + `pak` + `devtools`, `Ruff` = `styler` + `lintr`, `pytest` = `testthat`, `mypy` = (no direct R equivalent). All these tools are installed as development dependencies and configured through `pyproject.toml`, creating a unified, reproducible development environment. ## Ruff: Fast Formatting and Linting Ruff is a blazingly fast linter and formatter written in Rust that has revolutionized Python development. It replaces multiple legacy tools (Black, isort, Flake8, pyupgrade, autoflake) with a single, consistent interface that's 10-100x faster. ### Why Ruff Matters for ML In ML projects, code quality is crucial: - **Readability**: ML code involves complex transformations and mathematical operations that must be clear - **Consistency**: Team collaboration requires consistent style - **Correctness**: Linting catches bugs like unused imports, undefined variables, and common mistakes - **Speed**: Fast feedback loops keep you in flow state ### Installation Add Ruff as a development dependency: ``` bash uv add --dev ruff ``` ### Code Formatting Format your entire codebase: ``` bash uv run ruff format ``` Or format specific files: ``` bash uv run ruff format src/classifier/train.py ``` Or using `uvx` (without installation): ``` bash uvx ruff format ``` Ruff's formatter: - **Enforces consistent style**: Similar to Black, with opinionated defaults - **Sorts imports automatically**: Organizes imports into standard library, third-party, and local - **Removes trailing whitespace**: Cleans up formatting inconsistencies - **Ensures consistent line lengths**: Makes code readable on all screens - **Handles string quotes**: Normalizes quote usage across your codebase **Example transformation:** ``` python # Before formatting import torch import numpy as np from pathlib import Path import sys from torch import nn def train_model(model,data,epochs=100): for epoch in range( epochs ): loss=model.train_step( data ) print( f"Epoch {epoch}: {loss}" ) ``` After `ruff format`: ``` python # After formatting import sys from pathlib import Path import numpy as np import torch from torch import nn def train_model(model, data, epochs=100): for epoch in range(epochs): loss = model.train_step(data) print(f"Epoch {epoch}: {loss}") ``` ### Linting Check for linting issues: ``` bash uv run ruff check ``` Fix auto-fixable issues: ``` bash uv run ruff check --fix ``` Show detailed information: ``` bash uv run ruff check --show-fixes ``` Ruff detects hundreds of error types, including: **Common Errors:** - Unused imports and variables (catching dead code) - Undefined names (typos and missing imports) - Syntax errors and deprecated syntax **Style Violations:** - PEP 8 violations (spacing, naming conventions) - Import organization issues - Docstring style problems **Code Quality Issues:** - Overly complex functions - Redundant code - Mutable default arguments (a common Python pitfall) - Bare except clauses (catching exceptions too broadly) **Security Issues:** - Hardcoded passwords or secrets - Use of `eval()` or `exec()` - SQL injection vulnerabilities - Insecure temporary file usage **Example linting output:** ``` src/classifier/train.py:15:8: F841 Local variable `lr` is assigned to but never used src/classifier/train.py:23:1: E302 Expected 2 blank lines, found 1 src/classifier/models.py:45:9: B006 Do not use mutable data structures for argument defaults src/classifier/data.py:12:1: I001 Import block is un-sorted or un-formatted ``` ### Configuration Add Ruff configuration to `pyproject.toml`: ``` toml [tool.ruff] # Core settings line-length = 100 # Slightly longer than Black's 88 for ML code target-version = "py311" src = ["src"] exclude = [ ".git", ".venv", "__pycache__", "build", "dist", ] [tool.ruff.format] quote-style = "double" indent-style = "space" skip-magic-trailing-comma = false line-ending = "auto" [tool.ruff.lint] # Enable rule groups select = [ "E", # pycodestyle errors "W", # pycodestyle warnings "F", # Pyflakes "UP", # pyupgrade (modernize Python code) "B", # flake8-bugbear (find likely bugs) "SIM", # flake8-simplify (suggest simplifications) "I", # isort (import sorting) "N", # pep8-naming (enforce naming conventions) "C4", # flake8-comprehensions (better list/dict/set comprehensions) "PTH", # flake8-use-pathlib (prefer pathlib over os.path) "RET", # flake8-return (improve return statements) "TRY", # tryceratops (exception handling best practices) ] # Ignore specific rules ignore = [ "E501", # Line too long (handled by formatter) "TRY003", # Avoid specifying long messages outside exception class ] # Allow autofix for all enabled rules fixable = ["ALL"] unfixable = [] # Ignore specific rules for specific files [tool.ruff.lint.per-file-ignores] "__init__.py" = ["F401"] # Allow unused imports in __init__.py "tests/*" = ["S101"] # Allow assert in tests [tool.ruff.lint.isort] known-first-party = ["classifier"] ``` **Line Length Philosophy:** The default of 88 characters comes from Black and is based on: - Readability research showing optimal line length - Fitting two files side-by-side on modern monitors - Reducing git diff noise For ML code with long tensor operations, 100 characters is a reasonable compromise. ### Ruff in Your Workflow Integrate Ruff into your daily workflow: **During development:** ``` bash # Format before committing uv run ruff format # Check for issues uv run ruff check --fix # Review remaining issues uv run ruff check ``` **In CI/CD:** ``` yaml # .github/workflows/lint.yml - name: Lint with Ruff run: | uv run ruff format --check uv run ruff check ``` **VS Code integration:** ``` json { "editor.formatOnSave": true, "[python]": { "editor.defaultFormatter": "charliermarsh.ruff" } } ``` ## Type Checking with mypy Python supports optional type annotations through PEP 484. While Python remains dynamically typed at runtime, type annotations provide static analysis benefits that are invaluable for ML projects. ### Why Type Checking Matters for ML Machine learning code involves complex data transformations, tensor operations, and model architectures. Type checking helps: **Catch Errors Early:** - Detect shape mismatches before running expensive training - Find dimension errors in tensor operations - Identify incorrect data types in transformations **Improve Code Clarity:** - Document expected tensor shapes (e.g., `Tensor[B, C, H, W]`) - Specify DataFrame column types - Make function contracts explicit **Better IDE Support:** - Accurate autocomplete for model methods - Jump-to-definition for complex hierarchies - Refactoring with confidence **Team Collaboration:** - Self-documenting interfaces - Catch integration issues early - Reduce onboarding time ### Installation Add mypy as a development dependency: ``` bash uv add --dev mypy ``` For libraries that need type stubs: ``` bash uv add --dev types-PyYAML types-tqdm ``` ### Basic Usage Check types in your entire project: ``` bash uv run mypy . ``` Check specific files: ``` bash uv run mypy src/classifier/models.py ``` Check with verbose output: ``` bash uv run mypy --pretty --show-error-context . ``` ### Type Annotation Examples **Without types (unclear and error-prone):** ``` python def create_model(arch, num_classes, pretrained): if arch == "resnet18": model = models.resnet18(pretrained=pretrained) model.fc = nn.Linear(model.fc.in_features, num_classes) return model def train_epoch(model, loader, optimizer): for batch in loader: images, labels = batch outputs = model(images) # ... training logic ``` **With types (clear and verifiable):** ``` python from typing import Optional, Tuple, Dict, Any import torch import torch.nn as nn from torch.utils.data import DataLoader from torch.optim import Optimizer def create_model( arch: str, num_classes: int, pretrained: bool = True ) -> nn.Module: """ Create a model with specified architecture. Parameters ---------- arch : str Model architecture name num_classes : int Number of output classes pretrained : bool Use pretrained weights Returns ------- nn.Module Initialized model """ if arch == "resnet18": model = models.resnet18(pretrained=pretrained) num_features = model.fc.in_features model.fc = nn.Linear(num_features, num_classes) return model def train_epoch( model: nn.Module, loader: DataLoader, optimizer: Optimizer, device: torch.device ) -> Tuple[float, float]: """ Train for one epoch. Returns ------- tuple (average_loss, average_accuracy) """ total_loss = 0.0 correct = 0 total = 0 for images, labels in loader: images = images.to(device) labels = labels.to(device) outputs = model(images) # ... training logic return total_loss / len(loader), correct / total ``` **Advanced: Tensor shape annotations** Using `jaxtyping` for shape-aware type hints: ``` bash uv add --dev jaxtyping ``` ``` python from jaxtyping import Float, Int import torch from torch import Tensor def forward( x: Float[Tensor, "batch channels height width"], labels: Int[Tensor, "batch"] ) -> Float[Tensor, "batch num_classes"]: """ Forward pass with explicit shape annotations. mypy and jaxtyping will verify tensor dimensions. """ # x shape: [batch, channels, height, width] # Returns: [batch, num_classes] pass ``` ### The Type Checker Verifies When you run `mypy`, it checks: 1. **Argument types**: Are you passing the right types? 2. **Return types**: Does the function return what it claims? 3. **Attribute access**: Does that object have that attribute? 4. **Operations**: Are operations valid for those types? **Example errors caught by mypy:** ``` python # Error: Argument has incompatible type "str"; expected "int" model = create_model("resnet18", "10", True) # Error: Returning None but return type is Tuple[float, float] def train_epoch(...) -> Tuple[float, float]: print("Training...") # Forgot to return! # Error: nn.Module has no attribute "forwar" (typo) output = model.forwar(x) # Error: Unsupported operand types for + ("int" and "str") epochs = 100 + "50" ``` ### Configuration Add mypy settings to `pyproject.toml`: ``` toml [tool.mypy] python_version = "3.11" warn_return_any = true warn_unused_configs = true warn_redundant_casts = true warn_unused_ignores = true # Start lenient, then tighten disallow_untyped_defs = false # Set to true eventually disallow_incomplete_defs = true check_untyped_defs = true no_implicit_optional = true # Show more information show_error_codes = true show_error_context = true pretty = true # Strictness per module [[tool.mypy.overrides]] module = "tests.*" disallow_untyped_defs = false [[tool.mypy.overrides]] module = "classifier.models" disallow_untyped_defs = true ``` **Progressive typing strategy:** 1. Start with `disallow_untyped_defs = false` 2. Add type hints to new code 3. Gradually annotate existing code 4. Enable `disallow_untyped_defs = true` for completed modules 5. Eventually enable strict mode globally ### Type Stubs for ML Libraries **Libraries with built-in types:** - `torch` ✓ - `numpy` ✓ (numpy\>=1.20) - `scikit-learn` ✓ - `transformers` ✓ **Libraries needing stubs:** ``` bash uv add --dev types-Pillow types-tqdm types-PyYAML types-requests ``` **Libraries without stubs:** For libraries without type stubs, you can: 1. **Ignore them:** ``` toml [tool.mypy] ignore_missing_imports = true ``` 2. **Create stub files:** ``` python # stubs/some_library.pyi def some_function(x: int) -> str: ... class SomeClass: def method(self) -> None: ... ``` 3. **Use type: ignore comments:** ``` python from some_untyped_library import something # type: ignore ``` ### Mypy in Your Workflow **Development cycle:** ``` bash # Check types while developing uv run mypy src/ # Check specific module uv run mypy src/classifier/models.py # Generate HTML report uv run mypy --html-report mypy-report/ src/ ``` **CI/CD:** ``` yaml - name: Type check with mypy run: uv run mypy src/ ``` **VS Code integration:** Install the Pylance extension (Microsoft's language server) which includes mypy integration. ## Testing with pytest `pytest` is Python's de facto standard testing framework. For ML projects, testing is crucial for ensuring data pipelines, model architectures, and training loops work correctly. ### Why Testing Matters for ML Machine learning projects have unique testing challenges: **Data Pipeline Testing:** - Verify data loading and preprocessing - Check tensor shapes and types - Validate data augmentation - Test batching and sampling **Model Testing:** - Verify model architectures - Check forward/backward passes - Test with different input shapes - Validate output dimensions **Training Logic Testing:** - Test loss computation - Verify optimizer updates - Check gradient flow - Test checkpoint saving/loading **End-to-End Testing:** - Test complete training pipeline - Verify inference works - Test model export formats ### Installation Add pytest and useful plugins: ``` bash uv add --dev pytest pytest-cov pytest-xdist pytest-timeout ``` - **pytest**: Core testing framework - **pytest-cov**: Code coverage reporting - **pytest-xdist**: Parallel test execution - **pytest-timeout**: Prevent hanging tests ### Project Structure ```{mermaid} flowchart TD root["image-classifier/"] root --> src["src/"] root --> tests["tests/"] src --> classifier["classifier/"] classifier --> init["__init__.py"] classifier --> datapy["data.py"] classifier --> modelspy["models.py"] classifier --> trainpy["train.py"] tests --> conftest["conftest.py, shared fixtures"] tests --> tdata["test_data.py, data pipeline tests"] tests --> tmodels["test_models.py, model tests"] tests --> ttrain["test_train.py, training tests"] ``` ### Writing Tests **Basic test structure:** ``` python # tests/test_models.py import pytest import torch import torch.nn as nn from classifier.models import create_model def test_resnet18_creation(): """Test ResNet18 model creation.""" model = create_model( arch='resnet18', num_classes=10, pretrained=False, ) assert model is not None assert isinstance(model, nn.Module) # Test forward pass x = torch.randn(2, 3, 224, 224) output = model(x) assert output.shape == (2, 10) assert not torch.isnan(output).any() def test_model_with_frozen_backbone(): """Test model with frozen backbone.""" model = create_model( arch='resnet18', num_classes=10, pretrained=True, freeze_backbone=True, ) # Check that backbone is frozen trainable_params = sum( p.numel() for p in model.parameters() if p.requires_grad ) # Only classifier should be trainable (~5000 params) assert trainable_params < 10000 @pytest.mark.parametrize('architecture', ['resnet18', 'resnet50', 'efficientnet_b0']) def test_different_architectures(architecture): """Test different model architectures.""" model = create_model( architecture=architecture, num_classes=100, pretrained=False, ) x = torch.randn(4, 3, 224, 224) output = model(x) assert output.shape == (4, 100) ``` **Testing data pipelines:** ``` python # tests/test_data.py import pytest import torch from pathlib import Path from classifier.data import get_transforms, create_dataloaders def test_train_transforms(): """Test training data transforms.""" from PIL import Image import numpy as np transform = get_transforms(train=True) # Create dummy image img = Image.fromarray(np.random.randint(0, 255, (256, 256, 3), dtype=np.uint8)) # Apply transform tensor = transform(img) assert isinstance(tensor, torch.Tensor) assert tensor.shape == (3, 224, 224) assert tensor.min() >= -3.0 # Normalized assert tensor.max() <= 3.0 def test_dataloader_shapes(tmp_path): """Test dataloader output shapes.""" # Create dummy dataset structure train_dir = tmp_path / "train" for class_name in ["class1", "class2"]: class_dir = train_dir / class_name class_dir.mkdir(parents=True) # Create dummy images for i in range(10): img = Image.new('RGB', (224, 224)) img.save(class_dir / f"img_{i}.jpg") # Create dataloaders train_loader, val_loader, _ = create_dataloaders( data_dir=str(tmp_path), batch_size=4, num_workers=0, ) # Test batch shape images, labels = next(iter(train_loader)) assert images.shape[0] <= 4 # Batch size assert images.shape[1:] == (3, 224, 224) assert labels.shape[0] <= 4 ``` **Testing training logic:** ``` python # tests/test_train.py import pytest import torch import torch.nn as nn from torch.utils.data import TensorDataset, DataLoader from classifier.train import train_epoch, validate @pytest.fixture def dummy_model(): """Create a simple model for testing.""" return nn.Sequential( nn.Flatten(), nn.Linear(3 * 224 * 224, 10) ) @pytest.fixture def dummy_dataloader(): """Create a dummy dataloader.""" images = torch.randn(20, 3, 224, 224) labels = torch.randint(0, 10, (20,)) dataset = TensorDataset(images, labels) return DataLoader(dataset, batch_size=4) def test_train_epoch(dummy_model, dummy_dataloader): """Test training for one epoch.""" model = dummy_model criterion = nn.CrossEntropyLoss() optimizer = torch.optim.Adam(model.parameters()) device = torch.device('cpu') loss, acc = train_epoch( model, dummy_dataloader, criterion, optimizer, device, epoch=1 ) assert isinstance(loss, float) assert isinstance(acc, float) assert 0 <= acc <= 1 assert loss >= 0 def test_validate(dummy_model, dummy_dataloader): """Test validation.""" model = dummy_model criterion = nn.CrossEntropyLoss() device = torch.device('cpu') val_loss, val_acc = validate(model, dummy_dataloader, criterion, device) assert isinstance(val_loss, float) assert isinstance(val_acc, float) assert 0 <= val_acc <= 1 ``` ### Fixtures for Reusability Use `conftest.py` for shared fixtures: ``` python # tests/conftest.py import pytest import torch import tempfile from pathlib import Path @pytest.fixture def device(): """Get device for testing.""" return torch.device('cuda' if torch.cuda.is_available() else 'cpu') @pytest.fixture def temp_checkpoint_dir(): """Create temporary directory for checkpoints.""" with tempfile.TemporaryDirectory() as tmpdir: yield Path(tmpdir) @pytest.fixture def sample_config(): """Create sample configuration.""" return { 'model': { 'architecture': 'resnet18', 'num_classes': 10, 'pretrained': False, }, 'training': { 'batch_size': 32, 'learning_rate': 0.001, 'epochs': 1, } } ``` ### Running Tests **Run all tests:** ``` bash uv run pytest ``` **Run with verbose output:** ``` bash uv run pytest -v ``` **Run specific test file:** ``` bash uv run pytest tests/test_models.py ``` **Run specific test:** ``` bash uv run pytest tests/test_models.py::test_resnet18_creation ``` **Run tests matching pattern:** ``` bash uv run pytest -k "model" # Runs all tests with "model" in name ``` **Run in parallel (faster):** ``` bash uv run pytest -n auto # Use all CPUs ``` **Stop on first failure:** ``` bash uv run pytest -x ``` **Show local variables on failure:** ``` bash uv run pytest -l ``` ### Code Coverage **Generate coverage report:** ``` bash uv run pytest --cov=classifier --cov-report=term ``` **Output:** ``` ---------- coverage: platform linux, python 3.11.9 ----------- Name Stmts Miss Cover ----------------------------------------------- src/classifier/__init__.py 2 0 100% src/classifier/data.py 45 3 93% src/classifier/models.py 67 5 93% src/classifier/train.py 89 12 87% ----------------------------------------------- TOTAL 203 20 90% ``` **Generate HTML coverage report:** ``` bash uv run pytest --cov=classifier --cov-report=html ``` This creates `htmlcov/index.html` showing: - Which lines are covered - Which branches are taken - Which functions are tested **Coverage requirements:** For production ML code, aim for: - **\>80%** coverage for data pipelines (data loading, preprocessing) - **\>90%** coverage for model architectures - **\>70%** coverage for training loops (some branches hard to test) - **100%** coverage for utility functions **Coverage in CI:** ``` bash uv run pytest --cov=classifier --cov-report=term --cov-fail-under=80 ``` ### pytest Configuration Add pytest settings to `pyproject.toml`: ``` toml [tool.pytest.ini_options] # Test discovery testpaths = ["tests"] python_files = ["test_*.py"] python_functions = ["test_*"] python_classes = ["Test*"] # Output options addopts = [ "--strict-markers", "--strict-config", "-ra", # Show summary of all test outcomes "--showlocals", # Show local variables on failure "--tb=short", # Shorter traceback format ] # Markers markers = [ "slow: marks tests as slow (deselect with '-m \"not slow\"')", "gpu: marks tests as requiring GPU", "integration: marks tests as integration tests", ] # Coverage options [tool.coverage.run] source = ["src"] omit = ["tests/*", "*/site-packages/*"] [tool.coverage.report] exclude_lines = [ "pragma: no cover", "def __repr__", "raise AssertionError", "raise NotImplementedError", "if __name__ == .__main__.:", "if TYPE_CHECKING:", ] ``` ### Advanced Testing Patterns **Parameterized tests:** ``` python @pytest.mark.parametrize('batch_size,expected_batches', [ (32, 4), (16, 8), (8, 16), ]) def test_dataloader_batching(batch_size, expected_batches, tmp_dataset): loader = DataLoader(tmp_dataset, batch_size=batch_size) assert len(loader) == expected_batches ``` **Testing exceptions:** ``` python def test_invalid_architecture(): with pytest.raises(ValueError, match="Unknown architecture"): create_model(architecture="invalid", num_classes=10) ``` **Skipping tests conditionally:** ``` python @pytest.mark.skipif(not torch.cuda.is_available(), reason="Requires GPU") def test_gpu_training(): model = create_model().cuda() # ... GPU-specific test ``` **Slow test marker:** ``` python @pytest.mark.slow def test_full_training_run(): # This test takes 5 minutes pass # Run fast tests only: pytest -m "not slow" ``` ## Documentation with Quarto For ML projects, documentation serves multiple purposes: 1. **Code documentation**: API docs for functions and classes 2. **Experiment reports**: Document training runs and results 3. **Model cards**: Document model architecture, performance, limitations 4. **Tutorials**: Show how to use your models ### Quarto for ML Reports Quarto is perfect for ML documentation because it supports: - **Executable code**: Run training scripts and show results - **Multiple languages**: Python, R, Julia in same document - **Rich outputs**: Plots, tables, interactive visualizations - **Multiple formats**: HTML, PDF, presentations, websites **Installation:** ``` bash # Install Quarto (not via uv) # macOS brew install quarto # Linux sudo wget https://github.com/quarto-dev/quarto-cli/releases/download/v1.4.549/quarto-1.4.549-linux-amd64.deb sudo dpkg -i quarto-1.4.549-linux-amd64.deb # Windows - download from https://quarto.org ``` **Example experiment report** (`reports/experiment_001.qmd`): ```` --- title: "ResNet18 Image Classification" author: "Mike" date: "2024-11-09" format: html: code-fold: true toc: true --- ## Objective Train ResNet18 on CIFAR-10 dataset to achieve >90% accuracy. ## Environment Setup ```{python} import sys sys.path.insert(0, '../src') import torch from classifier.models import create_model from classifier.train import train_model import matplotlib.pyplot as plt import pandas as pd ``` ## Model Architecture ```{python} model = create_model('resnet18', num_classes=10, pretrained=False) print(f"Total parameters: {sum(p.numel() for p in model.parameters()):,}") ``` ## Training Configuration ```{python} config = { 'model': {'architecture': 'resnet18', 'num_classes': 10}, 'training': { 'batch_size': 128, 'learning_rate': 0.001, 'epochs': 50, 'optimizer': 'Adam', } } pd.DataFrame([config['training']]).T ``` ## Training Results ```{python} # Load training logs logs = pd.read_csv('../experiments/exp_001/metrics.csv') fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(12, 4)) # Plot loss ax1.plot(logs['epoch'], logs['train_loss'], label='Train') ax1.plot(logs['epoch'], logs['val_loss'], label='Validation') ax1.set_xlabel('Epoch') ax1.set_ylabel('Loss') ax1.set_title('Training and Validation Loss') ax1.legend() ax1.grid(True) # Plot accuracy ax2.plot(logs['epoch'], logs['train_acc'], label='Train') ax2.plot(logs['epoch'], logs['val_acc'], label='Validation') ax2.set_xlabel('Epoch') ax2.set_ylabel('Accuracy') ax2.set_title('Training and Validation Accuracy') ax2.legend() ax2.grid(True) plt.tight_layout() plt.show() ``` ## Final Performance ```{python} best_epoch = logs.loc[logs['val_acc'].idxmax()] print(f"Best validation accuracy: {best_epoch['val_acc']:.2%}") print(f"Achieved at epoch: {int(best_epoch['epoch'])}") print(f"Test accuracy: {best_epoch['test_acc']:.2%}") ``` ## Confusion Matrix ```{python} from sklearn.metrics import confusion_matrix import seaborn as sns # Load predictions y_true = ... # Load true labels y_pred = ... # Load predictions cm = confusion_matrix(y_true, y_pred) plt.figure(figsize=(10, 8)) sns.heatmap(cm, annot=True, fmt='d', cmap='Blues') plt.title('Confusion Matrix') plt.ylabel('True Label') plt.xlabel('Predicted Label') plt.show() ``` ## Conclusion - Achieved validation accuracy - Model converged after specified epochs - Ready for deployment ```` **Render the report:** ``` bash quarto render reports/experiment_001.qmd ``` This generates `reports/experiment_001.html` with all results embedded. ### Model Cards Document your models with Quarto model cards: ``` --- title: "ResNet18 CIFAR-10 Classifier" subtitle: "Model Card" format: html: toc: true --- ## Model Details - **Model Name**: ResNet18 CIFAR-10 Classifier - **Version**: 1.0.0 - **Date**: 2024-11-09 - **Architecture**: ResNet18 - **Framework**: PyTorch 2.3.1 ## Intended Use This model classifies images into 10 CIFAR-10 categories: airplane, automobile, bird, cat, deer, dog, frog, horse, ship, truck. **Primary uses:** - Educational demonstrations - Baseline for computer vision research - Image classification API **Out-of-scope uses:** - Medical diagnosis - Safety-critical applications - Real-world deployment without validation ## Training Data - **Dataset**: CIFAR-10 - **Size**: 50,000 training images, 10,000 test images - **Resolution**: 32×32 RGB images - **Splits**: 45,000 train / 5,000 validation / 10,000 test ## Performance | Split | Accuracy | |-------|----------| | Train | 98.5% | | Validation | 92.3% | | Test | 91.8% | ## Limitations - Only works on 32×32 images - Performance degrades on images outside CIFAR-10 distribution - No adversarial robustness - Bias towards training distribution ## Ethical Considerations - Dataset contains potential biases in category representation - Should not be used for surveillance applications - Consider privacy implications when deploying ``` ## Complete Development Workflow Putting it all together, here's a complete development cycle: ### Daily Development Cycle ``` bash # 1. Pull latest changes git pull # 2. Sync environment uv sync --all-extras # 3. Make changes to code # ... edit files ... # 4. Format code uv run ruff format # 5. Fix linting issues uv run ruff check --fix # 6. Verify remaining issues uv run ruff check # 7. Type check uv run mypy src/ # 8. Run tests uv run pytest # 9. Check coverage uv run pytest --cov=classifier --cov-report=term # 10. Commit changes git add . git commit -m "Add feature X" git push ``` ### Before Committing Checklist Create a `Makefile` to automate checks: ``` makefile .PHONY: format lint typecheck test check all format: uv run ruff format lint: uv run ruff check --fix uv run ruff check typecheck: uv run mypy src/ test: uv run pytest -v coverage: uv run pytest --cov=classifier --cov-report=html --cov-report=term check: format lint typecheck test all: check coverage clean: rm -rf .venv rm -rf htmlcov/ rm -rf .mypy_cache/ rm -rf .pytest_cache/ rm -rf .ruff_cache/ find . -type d -name __pycache__ -exec rm -rf {} + find . -type f -name "*.pyc" -delete ``` **Usage:** ``` bash # Run all checks before committing make check # Generate coverage report make coverage # Clean up artifacts make clean ``` ### Pre-commit Hooks (Optional) For automatic checking, install pre-commit: ``` bash uv add --dev pre-commit ``` Create `.pre-commit-config.yaml`: ``` yaml repos: - repo: local hooks: - id: ruff-format name: Ruff Format entry: uv run ruff format language: system types: [python] - id: ruff-check name: Ruff Check entry: uv run ruff check --fix language: system types: [python] - id: mypy name: mypy entry: uv run mypy language: system types: [python] pass_filenames: false args: [src/] - id: pytest-fast name: pytest (fast tests only) entry: uv run pytest -m "not slow" language: system pass_filenames: false always_run: true ``` Install hooks: ``` bash uv run pre-commit install ``` Now checks run automatically on `git commit`. ### CI/CD Pipeline Create `.github/workflows/test.yml`: ``` yaml name: Test on: [push, pull_request] jobs: test: runs-on: ubuntu-latest steps: - uses: actions/checkout@v4 - name: Install uv run: curl -LsSf https://astral.sh/uv/install.sh | sh - name: Add uv to PATH run: echo "$HOME/.cargo/bin" >> $GITHUB_PATH - name: Sync dependencies run: uv sync --all-extras - name: Format check run: uv run ruff format --check - name: Lint run: uv run ruff check - name: Type check run: uv run mypy src/ - name: Test run: uv run pytest --cov=classifier --cov-report=xml - name: Upload coverage uses: codecov/codecov-action@v3 with: file: ./coverage.xml ``` ### Project Structure Best Practices For ML projects that might integrate with R workflows or require cross-language collaboration: ```{mermaid} flowchart TD root["ml-project/"] root --> github[".github/"] root --> configs["configs, training configs"] root --> data["data, not in git"] root --> docs["docs, documentation"] root --> experiments["experiments, tracking"] root --> models["models, saved models"] root --> notebooks["notebooks, Jupyter"] root --> reports["reports, Quarto reports"] root --> scripts["scripts, utility scripts"] root --> src["src, source code"] root --> tests["tests/"] root --> gitignore[".gitignore"] root --> pyver[".python-version"] root --> makefile["Makefile"] root --> pyproject["pyproject.toml"] root --> readme["README.md"] root --> lock["uv.lock"] github --> workflows["workflows/"] workflows --> testyml["test.yml"] workflows --> deployyml["deploy.yml"] configs --> r18["resnet18.yaml"] configs --> r50["resnet50.yaml"] data --> raw["raw/"] data --> processed["processed/"] data --> splits["splits/"] docs --> modelcard["model_card.qmd"] docs --> apidoc["api.qmd"] experiments --> exp001["exp_001/"] experiments --> exp002["exp_002/"] exp001 --> cfg["config.yaml"] exp001 --> metrics["metrics.csv"] exp001 --> ckpt["checkpoints/"] models --> prod["production/"] models --> staging["staging/"] notebooks --> eda["01-eda.ipynb"] notebooks --> analysis["02-analysis.ipynb"] reports --> exprep["experiment_001.qmd"] scripts --> strain["train.py"] scripts --> seval["evaluate.py"] src --> classifier["classifier/"] classifier --> init["__init__.py"] classifier --> datapy["data.py"] classifier --> modelspy["models.py"] classifier --> trainpy["train.py"] classifier --> evalpy["evaluate.py"] tests --> conftest["conftest.py"] tests --> tdata["test_data.py"] tests --> tmodels["test_models.py"] tests --> ttrain["test_train.py"] ``` **.gitignore for ML projects:** ``` gitignore # Python __pycache__/ *.py[cod] *$py.class *.so .Python build/ develop-eggs/ dist/ downloads/ eggs/ .eggs/ lib/ lib64/ parts/ sdist/ var/ wheels/ *.egg-info/ .installed.cfg *.egg # Virtual environments .venv/ venv/ ENV/ env/ # IDE .vscode/ .idea/ *.swp *.swo *~ # Testing .pytest_cache/ .coverage htmlcov/ .mypy_cache/ .ruff_cache/ # Jupyter .ipynb_checkpoints/ # Data (large files) data/raw/*.jpg data/raw/*.png data/raw/*.zip data/processed/*.npy data/processed/*.h5 # Models (use Git LFS or external storage) models/*.pth models/*.ckpt models/*.h5 *.onnx # Experiment tracking wandb/ mlruns/ .neptune/ experiments/*/checkpoints/ # Logs logs/ *.log # OS .DS_Store Thumbs.db ``` ## Summary: The Complete ML Development Stack With `uv` and the modern Python toolchain, you have: **Environment Management (uv):** - Fast, reliable package installation - Reproducible environments with lock files - Python version management - GPU/CPU dependency variants **Code Quality (Ruff):** - Consistent formatting - Automated linting - Fast feedback loops - Catches common bugs **Type Safety (mypy):** - Early error detection - Self-documenting code - Better IDE support - Refactoring confidence **Testing (pytest):** - Unit and integration tests - Code coverage tracking - Parallel test execution - CI/CD integration **Documentation (Quarto):** - Executable reports - Model cards - API documentation - Reproducible analyses This toolchain creates a professional development workflow that: - **Catches errors early** (before training expensive models) - **Ensures reproducibility** (lock files + versioning) - **Improves collaboration** (consistent style + documentation) - **Speeds up development** (fast tools + automation) The investment in setting up this infrastructure pays dividends throughout your ML project lifecycle, from initial prototyping through production deployment. ### Setting Up the Project Clone and set up: ``` bash # Clone repository git clone https://github.com/user/image-classifier.git cd image-classifier # Install dependencies (uv reads uv.lock for exact versions) uv sync --all-extras # Run tests uv run pytest # Start training uv run train --config configs/resnet18.yaml ``` The beauty of this workflow: a single `uv sync` command installs everything exactly as specified in the lock file. No version mismatches, no dependency conflicts, no environment inconsistencies when deploying your trained model. ### Updating Dependencies When you need to update packages (e.g., new PyTorch release with bug fixes): ``` bash # Update all packages to latest compatible versions uv sync --upgrade # Update specific package uv add --upgrade torch # Update and regenerate lock file uv lock --upgrade ``` After updating, test your code thoroughly and commit the new `uv.lock`: ``` bash uv run pytest git add uv.lock git commit -m "Update dependencies - PyTorch 2.3.0" ``` **Important for ML**: When updating deep learning frameworks, always retrain key models and validate that performance hasn't degraded. Minor version updates can sometimes change numerical precision or default behaviors. ## Tools and Global Packages Beyond project dependencies, you often need global tools like `ruff`, `black`, or `pipx` equivalents. `uv` handles these with `uv tool`. ### Installing Global Tools ``` bash uv tool install ruff uv tool install black uv tool install mypy ``` These are installed in isolated environments but available globally. You can then use them anywhere: ``` bash ruff check . black src/ mypy src/ ``` ### Listing Installed Tools ``` bash uv tool list ``` ### Upgrading Tools ``` bash uv tool upgrade ruff uv tool upgrade-all # Upgrade all tools ``` ### Running Tools Without Installing For one-off uses: ``` bash uv tool run ruff check . ``` This downloads `ruff` if needed, runs it, then discards the environment. ## Migration from Other Tools ### From `pip` and `requirements.txt` If you have a `requirements.txt`: ``` bash # Create new project uv init my-project cd my-project # Import requirements uv add $(cat requirements.txt) ``` Or convert to `pyproject.toml` manually: ``` toml dependencies = [ "pandas==2.0.3", "numpy==1.24.4", # ... etc ] ``` Then: ``` bash uv sync ``` ### From `poetry` If migrating from `poetry`, you already have `pyproject.toml`. Just remove poetry-specific sections: ``` bash # Remove poetry.lock rm poetry.lock # Initialize uv in the directory uv init --no-readme # Sync dependencies uv sync ``` ### From `conda` For `conda` users, export your environment: ``` bash conda env export --from-history > requirements.txt ``` Edit `requirements.txt` to remove conda-specific packages, then: ``` bash uv init my-project cd my-project uv add $(cat requirements.txt) ``` Some packages (especially scientific ones like `cudatoolkit`) are conda-specific and may need alternatives or system-level installation. ## Continuous Integration Using `uv` in CI/CD pipelines is straightforward and fast. ### GitHub Actions Example ``` yaml name: Tests on: [push, pull_request] jobs: test: runs-on: ubuntu-latest steps: - uses: actions/checkout@v4 - name: Install uv run: curl -LsSf https://astral.sh/uv/install.sh | sh - name: Sync dependencies run: uv sync --all-extras - name: Run tests run: uv run pytest --cov=src tests/ - name: Run type checking run: uv run mypy src/ ``` This is much faster than traditional `pip install` approaches, often reducing CI times by 50% or more. ### GitLab CI Example ``` yaml test: image: python:3.12 before_script: - curl -LsSf https://astral.sh/uv/install.sh | sh - source $HOME/.cargo/env script: - uv sync --all-extras - uv run pytest ``` ## Performance Considerations The speed of `uv` is one of its defining features. Here's why it's fast and how to maximize performance: ### Parallel Downloads `uv` downloads packages in parallel, using all available network bandwidth. Traditional `pip` downloads serially, which wastes time. ### Caching `uv` aggressively caches downloaded wheels. Once you've installed `pandas==2.2.2`, it's cached globally. Installing it in another project is nearly instant. Cache location: ``` bash # macOS/Linux ~/.cache/uv/ # Windows %LOCALAPPDATA%\uv\cache\ ``` ### Benchmark Comparisons In real-world testing, `uv` shows dramatic speedups: | Tool | Time to install torch+torchvision+numpy | |--------|-----------------------------------------| | pip | 185 seconds | | poetry | 145 seconds | | uv | 12 seconds | For larger dependency trees (e.g., installing transformers with all its dependencies, or a complete data science stack), the difference is even more pronounced. This matters especially in ML workflows where you frequently create new environments for experiments or CI/CD pipelines. ### Tips for Maximum Performance 1. **Use the lock file**: `uv sync` with a lock file is faster than resolving dependencies from scratch 2. **Cache in CI**: Cache `~/.cache/uv` in CI pipelines 3. **Pre-download dependencies**: Use `uv sync --no-install-project` to download without installing 4. **Use wheels**: Avoid source distributions when possible; wheels install much faster ## Troubleshooting Common Issues ### Problem: Package Not Found ``` error: Failed to download `package-name` ``` **Solution**: Check package name spelling. Verify it exists on PyPI. Try updating the index: ``` bash uv sync --refresh ``` ### Problem: Version Conflicts ``` error: No solution found when resolving dependencies ``` **Solution**: Relax version constraints. Check which packages are conflicting and update them: ``` bash uv tree # See dependency tree ``` ### Problem: Python Version Not Available ``` error: No interpreter found for Python 3.12 ``` **Solution**: Install the Python version: ``` bash uv python install 3.12 ``` ### Problem: Import Fails in Script ``` python ImportError: No module named 'torch' ``` **Solution**: Ensure you're running with `uv run`: ``` bash uv run python train.py ``` Or sync dependencies: ``` bash uv sync ``` ### Problem: Wrong Package Version **Solution**: Check what's installed: ``` bash uv pip list ``` Lock and sync to fix: ``` bash uv lock uv sync ``` ## Best Practices for ML Projects Based on years of machine learning development, here are recommended practices: ### 1. Always Use Lock Files Commit `uv.lock` to git. This is non-negotiable for reproducible ML research and production deployments. ``` bash git add uv.lock pyproject.toml git commit -m "Lock dependencies" ``` ### 2. Pin Python Versions Use `.python-version` to specify the exact Python version: ``` bash uv python pin 3.11.9 ``` This prevents subtle bugs from Python version differences that can affect model training or inference. ### 3. Separate Development Dependencies Keep development tools separate from training/inference dependencies: ``` toml [project.optional-dependencies] dev = [ "pytest", "jupyter", "black", ] ``` This keeps your production Docker images lean. ### 4. Document Environment Setup Include clear instructions in `README.md`: ``` ## Setup 1. Install uv: `curl -LsSf https://astral.sh/uv/install.sh | sh` 2. Sync environment: `uv sync --all-extras` 3. Train model: `uv run train --config configs/resnet50.yaml` 4. Evaluate: `uv run evaluate --checkpoint models/best.pth` ``` ### 5. Use Scripts for Reproducibility Define scripts in `pyproject.toml`: ``` toml [project.scripts] preprocess = "classifier.data:preprocess" train = "classifier.train:main" evaluate = "classifier.evaluate:main" infer = "classifier.inference:predict" ``` Then document the ML pipeline: ``` bash uv run preprocess --data data/raw/ uv run train --epochs 100 --lr 0.001 uv run evaluate --model models/checkpoint.pth uv run infer --image test.jpg ``` ### 6. Version Control Configuration Create a `.gitignore`: ``` gitignore # Python __pycache__/ *.py[cod] .ipynb_checkpoints/ # uv .venv/ # Data (don't commit large datasets) data/raw/*.jpg data/raw/*.png data/processed/ # Models (use Git LFS or external storage) models/*.pth models/*.ckpt *.h5 # Experiment tracking wandb/ mlruns/ .neptune/ # Results results/ experiments/*/outputs/ ``` ### 7. Regular Dependency Audits Periodically check for outdated packages: ``` bash uv sync --upgrade uv run pytest # Ensure tests still pass # Re-run key training experiments to validate ``` ### 8. Use Inline Scripts for Quick Experiments For quick exploratory work or prototyping: ``` python # /// script # dependencies = [ # "torch", # "torchvision", # "matplotlib", # ] # /// import torch import torchvision.models as models import matplotlib.pyplot as plt # Quick model prototyping model = models.resnet18(pretrained=True) # ... experiment code ... ``` Run with: ``` bash uv run experiment.py ``` ### 9. GPU Environment Management For projects requiring CUDA, create separate dependency groups: ``` toml [project.optional-dependencies] gpu = [ "torch[cuda]>=2.0.0", ] cpu = [ "torch>=2.0.0", ] ``` Then install based on your environment: ``` bash # On GPU machine uv sync --extra gpu # On CPU-only machine uv sync --extra cpu ``` ## Working with Deep Learning Frameworks and GPUs One of the most common pain points in ML development is managing deep learning frameworks, especially when dealing with CUDA and GPU support. `uv` simplifies this process significantly. ### PyTorch with CUDA Support PyTorch offers different packages for CPU-only and CUDA-enabled versions. With `uv`, you can manage these elegantly: **Option 1: Platform-specific dependencies** ``` toml [project] dependencies = [ "numpy>=1.24.0", "pillow>=10.0.0", ] [project.optional-dependencies] cuda = [ "torch>=2.0.0", "torchvision>=0.15.0", ] cpu = [ "torch>=2.0.0", "torchvision>=0.15.0", ] ``` Then install based on your hardware: ``` bash # On GPU machine uv sync --extra cuda # On CPU-only machine uv sync --extra cpu ``` **Option 2: Using PyTorch index for CUDA versions** PyTorch hosts CUDA-specific builds on their own index: ``` bash # Add PyTorch with CUDA 12.1 support uv add torch torchvision --index-url https://download.pytorch.org/whl/cu121 ``` Or in `pyproject.toml`: ``` toml [tool.uv] extra-index-url = ["https://download.pytorch.org/whl/cu121"] [project] dependencies = [ "torch>=2.0.0", "torchvision>=0.15.0", ] ``` ### TensorFlow with GPU Support TensorFlow 2.x simplifies GPU support: ``` bash # TensorFlow with GPU support (works with CUDA) uv add tensorflow[and-cuda]>=2.15.0 ``` Or for CPU-only: ``` bash uv add tensorflow>=2.15.0 ``` ### JAX with GPU Support JAX requires specific CUDA/cuDNN versions: ``` bash # JAX with CUDA 12 support uv add "jax[cuda12]>=0.4.20" -f https://storage.googleapis.com/jax-releases/jax_cuda_releases.html ``` ### Verifying GPU Access Create a simple verification script: ``` python # /// script # dependencies = [ # "torch", # ] # /// import torch print(f"PyTorch version: {torch.__version__}") print(f"CUDA available: {torch.cuda.is_available()}") if torch.cuda.is_available(): print(f"CUDA version: {torch.version.cuda}") print(f"Number of GPUs: {torch.cuda.device_count()}") print(f"GPU name: {torch.cuda.get_device_name(0)}") ``` Run with: ``` bash uv run verify_gpu.py ``` ### Managing Multiple Framework Versions For projects that need to test across different framework versions: ``` toml [project.optional-dependencies] torch-2-0 = ["torch==2.0.1", "torchvision==0.15.2"] torch-2-1 = ["torch==2.1.2", "torchvision==0.16.2"] torch-2-3 = ["torch==2.3.1", "torchvision==0.18.1"] ``` Then test with different versions: ``` bash uv sync --extra torch-2-0 uv run pytest uv sync --extra torch-2-1 uv run pytest ``` ### Hugging Face Transformers For NLP tasks with transformers: ``` bash uv add transformers datasets tokenizers accelerate ``` For training large models with optimizations: ``` bash uv add transformers[torch] datasets accelerate bitsandbytes ``` ### Common ML Stack Here's a comprehensive ML dependency setup: ``` toml [project] name = "ml-project" version = "0.1.0" requires-python = ">=3.11" dependencies = [ # Core scientific computing "numpy>=1.24.0,<2.0.0", "scipy>=1.11.0", "pandas>=2.0.0", # Visualization "matplotlib>=3.7.0", "seaborn>=0.12.0", "plotly>=5.14.0", # ML utilities "scikit-learn>=1.3.0", "tqdm>=4.65.0", "pyyaml>=6.0", ] [project.optional-dependencies] # Deep learning pytorch = [ "torch>=2.0.0,<3.0.0", "torchvision>=0.15.0", "torchaudio>=2.0.0", "lightning>=2.0.0", ] tensorflow = [ "tensorflow[and-cuda]>=2.15.0", "tensorboard>=2.15.0", ] # NLP nlp = [ "transformers>=4.30.0", "datasets>=2.12.0", "tokenizers>=0.13.0", "sentencepiece>=0.1.99", ] # Computer vision cv = [ "opencv-python>=4.8.0", "albumentations>=1.3.1", "timm>=0.9.0", ] # Experiment tracking tracking = [ "wandb>=0.15.0", "mlflow>=2.5.0", "tensorboard>=2.13.0", ] # Optimization optimization = [ "optuna>=3.2.0", "ray[tune]>=2.5.0", ] # Development dev = [ "pytest>=7.4.0", "pytest-cov>=4.1.0", "black>=23.0.0", "ruff>=0.1.0", "mypy>=1.5.0", "jupyter>=1.0.0", "ipykernel>=6.25.0", ] ``` Install what you need: ``` bash # Full PyTorch stack with NLP uv sync --extra pytorch --extra nlp --extra tracking --extra dev # TensorFlow with computer vision uv sync --extra tensorflow --extra cv --extra tracking --extra dev ``` ### Docker Integration Create a `Dockerfile` that uses `uv`: ``` dockerfile FROM nvidia/cuda:12.1.0-base-ubuntu22.04 # Install Python and uv RUN apt-get update && apt-get install -y python3.11 python3-pip curl RUN curl -LsSf https://astral.sh/uv/install.sh | sh ENV PATH="/root/.cargo/bin:$PATH" # Copy project files WORKDIR /app COPY pyproject.toml uv.lock ./ # Install dependencies RUN uv sync --no-dev # Copy source code COPY src/ ./src/ # Run training CMD ["uv", "run", "train", "--config", "configs/production.yaml"] ``` Build and run: ``` bash docker build -t ml-model:latest . docker run --gpus all ml-model:latest ``` ### CUDA Version Management Different projects might need different CUDA versions. Document clearly: ``` toml # pyproject.toml [tool.uv] # PyTorch with CUDA 12.1 extra-index-url = ["https://download.pytorch.org/whl/cu121"] [project] dependencies = [ "torch>=2.3.0", "torchvision>=0.18.0", ] ``` In README: ```` ## Requirements - CUDA 12.1 or later - NVIDIA driver 530 or later - 8GB+ GPU memory (recommended) ## Installation ```bash # Verify CUDA version nvidia-smi # Install dependencies uv sync --all-extras ``` ```` ### Mixed Precision Training For models using mixed precision (crucial for large models): ``` bash uv add torch torchvision # Apex for older PyTorch versions uv add git+https://github.com/NVIDIA/apex.git ``` Or use native PyTorch AMP (already included in torch\>=1.6). ### Memory Optimization Libraries For large models that don't fit in GPU memory: ``` bash # DeepSpeed for distributed training uv add deepspeed # bitsandbytes for quantization uv add bitsandbytes # Flash Attention for efficient attention uv add flash-attn --no-build-isolation ``` ### Troubleshooting GPU Issues **Problem: CUDA not detected** ``` bash # Check PyTorch installation uv run python -c "import torch; print(torch.cuda.is_available())" ``` **Solution**: Ensure you installed CUDA-enabled PyTorch: ``` bash uv add torch --index-url https://download.pytorch.org/whl/cu121 ``` **Problem: Out of memory errors** Add gradient checkpointing and mixed precision: ``` python # Enable gradient checkpointing model.gradient_checkpointing_enable() # Use automatic mixed precision from torch.cuda.amp import autocast, GradScaler scaler = GradScaler() with autocast(): outputs = model(inputs) loss = criterion(outputs, labels) scaler.scale(loss).backward() scaler.step(optimizer) scaler.update() ``` **Problem: Different CUDA versions on different machines** Use environment-specific lock files: ``` bash # On GPU machine with CUDA 12.1 uv lock --output-file uv.lock.cuda121 # On GPU machine with CUDA 11.8 uv lock --output-file uv.lock.cuda118 # Sync with specific lock file uv sync --locked uv.lock.cuda121 ``` ## Integration with Other Tools ### Pre-commit Hooks Use `uv` with pre-commit for code quality: ``` yaml # .pre-commit-config.yaml repos: - repo: local hooks: - id: ruff name: ruff entry: uv run ruff check --fix language: system types: [python] - id: black name: black entry: uv run black language: system types: [python] - id: mypy name: mypy entry: uv run mypy language: system types: [python] ``` ### VS Code Configuration Configure VS Code to use `uv`: ``` json { "python.defaultInterpreterPath": ".venv/bin/python", "python.terminal.activateEnvironment": false, "python.testing.pytestEnabled": true, "python.testing.pytestArgs": ["tests"], "[python]": { "editor.defaultFormatter": "ms-python.black-formatter", "editor.formatOnSave": true, "editor.codeActionsOnSave": { "source.organizeImports": true } } } ``` ### Make-based ML Workflows Combine with Make for complex ML workflows: ``` makefile .PHONY: install train evaluate deploy clean install: uv sync --all-extras data: uv run python scripts/download_data.py uv run python scripts/preprocess.py train: uv run train --config configs/resnet50.yaml --epochs 100 train-debug: uv run train --config configs/debug.yaml --epochs 1 evaluate: uv run evaluate --checkpoint models/best.pth --data data/test/ tensorboard: uv run tensorboard --logdir experiments/ test: uv run pytest tests/ -v --cov=src format: uv run black src/ tests/ uv run ruff check --fix src/ tests/ type-check: uv run mypy src/ notebook: uv run jupyter lab clean: rm -rf .venv find . -type d -name __pycache__ -exec rm -rf {} + rm -rf experiments/*/checkpoints/*.pth # Complete pipeline pipeline: data train evaluate ``` Usage: ``` bash # Setup and train make install make pipeline # Development make train-debug make test make format ``` ## Advanced Topics ### Custom Package Indexes If your organization has a private PyPI server: ``` bash uv add --index-url https://pypi.company.com/simple/ company-package ``` Or in `pyproject.toml`: ``` toml [tool.uv] index-url = "https://pypi.company.com/simple/" extra-index-url = ["https://pypi.org/simple/"] ``` ### Building and Publishing Packages To build a distribution: ``` bash uv build ``` This creates wheel and source distributions in `dist/`. To publish to PyPI: ``` bash uv publish ``` ### Workspaces For monorepos with multiple packages: ``` toml # Root pyproject.toml [tool.uv.workspace] members = ["packages/*"] ``` Then each subdirectory in `packages/` can have its own `pyproject.toml`. ### Environment Variables Control `uv` behavior with environment variables: ``` bash # Specify cache location export UV_CACHE_DIR=/custom/cache # Use different PyPI mirror export UV_INDEX_URL=https://mirror.pypi.org/simple/ # Increase verbosity export UV_VERBOSE=1 ``` ## Comparison with Other Tools ### `uv` vs `pip` | Feature | pip | uv | |-------------------|--------------------|-------------------| | Speed | Baseline | 10-100x faster | | Resolver | Backtracking | Modern SAT solver | | Lock files | Manual (pip-tools) | Built-in | | Python management | No | Yes | | Virtual envs | Manual | Automatic | ### `uv` vs `poetry` | Feature | poetry | uv | |----------------|-----------|------------------| | Speed | Slow | Very fast | | Maturity | Mature | New (but stable) | | Plugin system | Yes | No | | Publishing | Excellent | Good | | Learning curve | Moderate | Low | ### `uv` vs `conda` | Feature | conda | uv | |------------------|-----------|-------------| | Binary packages | Yes | Wheels only | | Non-Python deps | Yes | No | | Speed | Slow | Very fast | | Environment size | Large | Small | | Scientific stack | Excellent | Good | For pure Python projects, `uv` is superior. For projects requiring system libraries (CUDA, MKL, etc.), `conda` may still be necessary. ## Real-World Example: Complete ML Project Let's walk through setting up a complete image classification project using PyTorch and modern best practices. ### Step 1: Initialize Project ``` bash uv init image-classifier cd image-classifier uv python pin 3.11 ``` ### Step 2: Configure `pyproject.toml` ``` toml [project] name = "image-classifier" version = "0.1.0" description = "Deep learning image classifier using ResNet architecture" readme = "README.md" requires-python = ">=3.11" authors = [ {name = "Mike", email = "mike@marshall.usc.edu"} ] dependencies = [ "torch>=2.0.0,<3.0.0", "torchvision>=0.15.0,<1.0.0", "numpy>=1.24.0,<2.0.0", "pillow>=10.0.0", "matplotlib>=3.7.0", "scikit-learn>=1.3.0", "tqdm>=4.65.0", "pyyaml>=6.0", "tensorboard>=2.13.0", ] [project.optional-dependencies] dev = [ "jupyter>=1.0.0", "ipykernel>=6.25.0", "pytest>=7.4.0", "pytest-cov>=4.1.0", "black>=23.0.0", "ruff>=0.1.0", "mypy>=1.5.0", ] experiment = [ "wandb>=0.15.0", ] [project.scripts] train = "classifier.train:main" evaluate = "classifier.evaluate:main" infer = "classifier.inference:predict" [build-system] requires = ["hatchling"] build-backend = "hatchling.build" ``` ### Step 3: Install Dependencies ``` bash uv sync --all-extras ``` ### Step 4: Create Project Structure ``` bash mkdir -p data/{raw,processed,splits} mkdir -p models/checkpoints mkdir -p src/classifier mkdir -p notebooks mkdir -p tests mkdir -p configs mkdir -p experiments ``` ### Step 5: Write Core Code Create `src/classifier/models.py`: ``` python """Neural network architectures for image classification.""" import torch import torch.nn as nn import torchvision.models as models from typing import Optional def create_model( architecture: str = "resnet18", num_classes: int = 10, pretrained: bool = True, freeze_backbone: bool = False, ) -> nn.Module: """ Create a model with specified architecture. Parameters ---------- architecture : str Model architecture ('resnet18', 'resnet50', 'efficientnet_b0') num_classes : int Number of output classes pretrained : bool Use ImageNet pretrained weights freeze_backbone : bool Freeze backbone layers for transfer learning Returns ------- nn.Module Initialized model """ if architecture == "resnet18": model = models.resnet18(weights='IMAGENET1K_V1' if pretrained else None) num_features = model.fc.in_features model.fc = nn.Linear(num_features, num_classes) elif architecture == "resnet50": model = models.resnet50(weights='IMAGENET1K_V1' if pretrained else None) num_features = model.fc.in_features model.fc = nn.Linear(num_features, num_classes) elif architecture == "efficientnet_b0": model = models.efficientnet_b0( weights='IMAGENET1K_V1' if pretrained else None ) num_features = model.classifier[1].in_features model.classifier[1] = nn.Linear(num_features, num_classes) else: raise ValueError(f"Unknown architecture: {architecture}") if freeze_backbone: # Freeze all layers except the final classifier for param in model.parameters(): param.requires_grad = False # Unfreeze classifier if architecture in ["resnet18", "resnet50"]: for param in model.fc.parameters(): param.requires_grad = True elif architecture == "efficientnet_b0": for param in model.classifier.parameters(): param.requires_grad = True return model class Classifier(nn.Module): """ Wrapper for classification models with additional utilities. """ def __init__( self, backbone: nn.Module, num_classes: int, dropout: float = 0.5, ): super().__init__() self.backbone = backbone self.dropout = nn.Dropout(dropout) def forward(self, x: torch.Tensor) -> torch.Tensor: features = self.backbone(x) return self.dropout(features) ``` Create `src/classifier/train.py`: ``` python """Training loop for image classification.""" import torch import torch.nn as nn import torch.optim as optim from torch.utils.data import DataLoader from torch.utils.tensorboard import SummaryWriter from pathlib import Path from tqdm import tqdm from typing import Dict, Tuple import yaml from .models import create_model from .data import create_dataloaders from .utils import save_checkpoint, AverageMeter def train_epoch( model: nn.Module, dataloader: DataLoader, criterion: nn.Module, optimizer: optim.Optimizer, device: torch.device, epoch: int, ) -> Tuple[float, float]: """ Train for one epoch. Returns ------- tuple Average loss and accuracy for the epoch """ model.train() losses = AverageMeter() accuracies = AverageMeter() pbar = tqdm(dataloader, desc=f"Epoch {epoch}") for images, labels in pbar: images = images.to(device) labels = labels.to(device) # Forward pass outputs = model(images) loss = criterion(outputs, labels) # Backward pass optimizer.zero_grad() loss.backward() optimizer.step() # Calculate accuracy _, predicted = outputs.max(1) accuracy = (predicted == labels).float().mean() # Update metrics losses.update(loss.item(), images.size(0)) accuracies.update(accuracy.item(), images.size(0)) pbar.set_postfix({ 'loss': f'{losses.avg:.4f}', 'acc': f'{accuracies.avg:.4f}' }) return losses.avg, accuracies.avg def validate( model: nn.Module, dataloader: DataLoader, criterion: nn.Module, device: torch.device, ) -> Tuple[float, float]: """ Validate the model. Returns ------- tuple Average loss and accuracy """ model.eval() losses = AverageMeter() accuracies = AverageMeter() with torch.no_grad(): for images, labels in tqdm(dataloader, desc="Validation"): images = images.to(device) labels = labels.to(device) outputs = model(images) loss = criterion(outputs, labels) _, predicted = outputs.max(1) accuracy = (predicted == labels).float().mean() losses.update(loss.item(), images.size(0)) accuracies.update(accuracy.item(), images.size(0)) return losses.avg, accuracies.avg def train_model(config: Dict) -> None: """ Main training function. Parameters ---------- config : dict Training configuration """ # Setup device = torch.device('cuda' if torch.cuda.is_available() else 'cpu') print(f"Using device: {device}") # Create dataloaders train_loader, val_loader, _ = create_dataloaders( data_dir=config['data']['path'], batch_size=config['training']['batch_size'], num_workers=config['training']['num_workers'], ) # Create model model = create_model( architecture=config['model']['architecture'], num_classes=config['model']['num_classes'], pretrained=config['model']['pretrained'], freeze_backbone=config['model'].get('freeze_backbone', False), ) model = model.to(device) # Loss and optimizer criterion = nn.CrossEntropyLoss() optimizer = optim.Adam( model.parameters(), lr=config['training']['learning_rate'], weight_decay=config['training']['weight_decay'], ) # Learning rate scheduler scheduler = optim.lr_scheduler.ReduceLROnPlateau( optimizer, mode='min', factor=0.5, patience=5, ) # Tensorboard writer = SummaryWriter(config['training']['log_dir']) # Training loop best_val_acc = 0.0 for epoch in range(1, config['training']['epochs'] + 1): # Train train_loss, train_acc = train_epoch( model, train_loader, criterion, optimizer, device, epoch ) # Validate val_loss, val_acc = validate(model, val_loader, criterion, device) # Update learning rate scheduler.step(val_loss) # Log metrics writer.add_scalar('Loss/train', train_loss, epoch) writer.add_scalar('Loss/val', val_loss, epoch) writer.add_scalar('Accuracy/train', train_acc, epoch) writer.add_scalar('Accuracy/val', val_acc, epoch) writer.add_scalar('LR', optimizer.param_groups[0]['lr'], epoch) print(f"\nEpoch {epoch}:") print(f" Train Loss: {train_loss:.4f}, Train Acc: {train_acc:.4f}") print(f" Val Loss: {val_loss:.4f}, Val Acc: {val_acc:.4f}") # Save checkpoint is_best = val_acc > best_val_acc best_val_acc = max(val_acc, best_val_acc) save_checkpoint( { 'epoch': epoch, 'model_state_dict': model.state_dict(), 'optimizer_state_dict': optimizer.state_dict(), 'val_acc': val_acc, 'config': config, }, is_best=is_best, checkpoint_dir=config['training']['checkpoint_dir'], ) writer.close() print(f"\nTraining completed. Best validation accuracy: {best_val_acc:.4f}") def main(): """Entry point for training script.""" import argparse parser = argparse.ArgumentParser(description='Train image classifier') parser.add_argument( '--config', type=str, required=True, help='Path to config file' ) args = parser.parse_args() # Load config with open(args.config, 'r') as f: config = yaml.safe_load(f) # Train train_model(config) if __name__ == '__main__': main() ``` Create `src/classifier/data.py`: ``` python """Data loading and preprocessing utilities.""" import torch from torch.utils.data import DataLoader, random_split from torchvision import datasets, transforms from pathlib import Path from typing import Tuple def get_transforms( train: bool = True, image_size: int = 224, ) -> transforms.Compose: """ Get data transforms for training or validation. Parameters ---------- train : bool If True, return training transforms with augmentation image_size : int Target image size Returns ------- transforms.Compose Composed transforms """ if train: return transforms.Compose([ transforms.RandomResizedCrop(image_size), transforms.RandomHorizontalFlip(), transforms.ColorJitter( brightness=0.2, contrast=0.2, saturation=0.2, ), transforms.ToTensor(), transforms.Normalize( mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225], ), ]) else: return transforms.Compose([ transforms.Resize(256), transforms.CenterCrop(image_size), transforms.ToTensor(), transforms.Normalize( mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225], ), ]) def create_dataloaders( data_dir: str, batch_size: int = 32, num_workers: int = 4, val_split: float = 0.2, ) -> Tuple[DataLoader, DataLoader, DataLoader]: """ Create train, validation, and test dataloaders. Parameters ---------- data_dir : str Path to data directory batch_size : int Batch size num_workers : int Number of workers for data loading val_split : float Validation split ratio Returns ------- tuple Train, validation, and test dataloaders """ data_path = Path(data_dir) # Load datasets train_dataset = datasets.ImageFolder( data_path / 'train', transform=get_transforms(train=True) ) test_dataset = datasets.ImageFolder( data_path / 'test', transform=get_transforms(train=False) ) # Split train into train and validation val_size = int(len(train_dataset) * val_split) train_size = len(train_dataset) - val_size train_subset, val_subset = random_split( train_dataset, [train_size, val_size], generator=torch.Generator().manual_seed(42) ) # Create dataloaders train_loader = DataLoader( train_subset, batch_size=batch_size, shuffle=True, num_workers=num_workers, pin_memory=True, ) val_loader = DataLoader( val_subset, batch_size=batch_size, shuffle=False, num_workers=num_workers, pin_memory=True, ) test_loader = DataLoader( test_dataset, batch_size=batch_size, shuffle=False, num_workers=num_workers, pin_memory=True, ) return train_loader, val_loader, test_loader ``` ### Step 6: Write Tests Create `tests/test_models.py`: ``` python """Tests for model architectures.""" import pytest import torch from classifier.models import create_model def test_resnet18_creation(): """Test ResNet18 model creation.""" model = create_model( architecture='resnet18', num_classes=10, pretrained=False, ) assert model is not None # Test forward pass x = torch.randn(2, 3, 224, 224) output = model(x) assert output.shape == (2, 10) def test_model_with_frozen_backbone(): """Test model with frozen backbone.""" model = create_model( architecture='resnet18', num_classes=10, pretrained=True, freeze_backbone=True, ) # Check that backbone is frozen trainable_params = sum( p.numel() for p in model.parameters() if p.requires_grad ) # Only classifier should be trainable assert trainable_params < 1000000 # Arbitrary threshold @pytest.mark.parametrize('architecture', ['resnet18', 'resnet50']) def test_different_architectures(architecture): """Test different model architectures.""" model = create_model( architecture=architecture, num_classes=100, pretrained=False, ) x = torch.randn(4, 3, 224, 224) output = model(x) assert output.shape == (4, 100) ``` ### Step 7: Create Configuration Create `configs/resnet18.yaml`: ``` yaml # Model configuration model: architecture: resnet18 num_classes: 10 pretrained: true freeze_backbone: false # Data configuration data: path: data/ image_size: 224 # Training configuration training: batch_size: 32 epochs: 50 learning_rate: 0.001 weight_decay: 0.0001 num_workers: 4 checkpoint_dir: models/checkpoints/ log_dir: experiments/resnet18/ ``` ### Step 8: Run Training ``` bash # Run tests first uv run pytest tests/ -v # Start training uv run train --config configs/resnet18.yaml # Monitor with tensorboard uv run tensorboard --logdir experiments/ ``` ### Step 9: Create Analysis Notebook Create `notebooks/01-analysis.ipynb`: ``` python # /// script # dependencies = [ # "torch", # "torchvision", # "matplotlib", # "seaborn", # ] # /// import sys sys.path.insert(0, '../src') from classifier.models import create_model from classifier.data import create_dataloaders import torch import matplotlib.pyplot as plt import seaborn as sns # Load trained model model = create_model('resnet18', num_classes=10) checkpoint = torch.load('../models/checkpoints/best.pth') model.load_state_dict(checkpoint['model_state_dict']) # Analyze results _, _, test_loader = create_dataloaders('../data', batch_size=32) # Evaluate and visualize # ... evaluation code ... ``` ### Step 10: Document Create comprehensive `README.md`: ```` markdown # Image Classifier Deep learning image classifier using PyTorch and ResNet architectures. ## Features - Multiple architecture support (ResNet18, ResNet50, EfficientNet) - Transfer learning with pretrained weights - Data augmentation - TensorBoard logging - Comprehensive testing ## Setup ```bash # Install uv curl -LsSf https://astral.sh/uv/install.sh | sh # Clone and setup git clone https://github.com/user/image-classifier.git cd image-classifier uv sync --all-extras ``` ## Usage ### Training ```bash uv run train --config configs/resnet18.yaml ``` ### Evaluation ```bash uv run evaluate --checkpoint models/best.pth --data data/test/ ``` ### Inference ```bash uv run infer --checkpoint models/best.pth --image path/to/image.jpg ``` ### Monitoring ```bash uv run tensorboard --logdir experiments/ ``` ## Project Structure ``` image-classifier/ ├── src/classifier/ # Source code ├── tests/ # Unit tests ├── configs/ # Training configurations ├── data/ # Datasets ├── models/ # Model checkpoints ├── notebooks/ # Jupyter notebooks └── experiments/ # Experiment logs ``` ## Results | Model | Accuracy | Parameters | |-------|----------|-----------| | ResNet18 | 92.3% | 11.7M | | ResNet50 | 94.1% | 25.6M | ## Citation If you use this code, please cite... ```` ## Conclusion `uv` represents a significant step forward in Python package management. Its speed, simplicity, and reliability make it ideal for machine learning and AI development where managing complex dependencies and ensuring reproducibility is critical. By combining package management, environment isolation, and Python version management into a single tool, `uv` eliminates much of the friction that has historically plagued Python ML development. For ML practitioners, the benefits are clear: - **Faster iteration**: Less time waiting for packages means more time training models and experimenting - **Better reproducibility**: Lock files ensure your trained models can be deployed with the exact environment they were trained in - **Simpler workflows**: One tool instead of many reduces cognitive overhead - **Production-ready**: Fast, reliable dependency management makes deployment smoother As you continue through this book, many examples will benefit from using `uv` for environment management. The patterns we've established here, using `pyproject.toml`, locking dependencies, and running code with `uv run`, will serve you well throughout your machine learning journey, from prototyping to production deployment. ## Summary In this chapter, we've covered: - Installing and configuring `uv` across different platforms - Creating and managing ML projects with proper structure - Handling dependencies, version constraints, and lock files - Managing Python versions for consistency - Integrating with Jupyter notebooks for experimentation - Building reproducible ML workflows for training and deployment - Troubleshooting common issues in ML environments - Best practices for ML/AI projects including GPU environment management With `uv` in your toolkit, you're well-equipped to manage the technical infrastructure of your ML projects, allowing you to focus on what matters most: building, training, and deploying effective machine learning models. The speed and reliability of `uv` means less time fighting with dependencies and more time on actual model development and experimentation.

7.1 Introduction

7.2 Why uv Matters for Machine Learning and AI

7.3 Installation

7.3.1 macOS and Linux

7.3.2 Windows

7.3.3 Verifying Installation

7.4 Understanding uv’s Architecture

7.4.1 The Tool Chain Analogy

7.4.2 Key Design Principles

7.5 The Dependency Resolution Problem

7.5.1 A precise statement

7.5.2 Why it is hard

7.5.3 How modern resolvers cope

7.5.4 Worked example: an unsatisfiable diamond

7.5.5 When to worry, and when not to

7.6 Basic Project Workflow

7.6.1 Creating a New Project

7.6.2 Understanding pyproject.toml

7.6.3 Adding Dependencies

7.6.3.1 Method 1: Command Line (Recommended)

7.6.3.2 Method 2: Manual Editing

7.6.4 Version Constraints

7.6.5 The Lock File: uv.lock

7.6.6 What the lock file does and does not guarantee

7.7 Running Python with uv

7.7.1 The uv run Command

7.7.2 Running Installed Tools

7.8 Development Dependencies

7.8.1 Adding Development Dependencies

7.8.2 Installing Optional Dependencies

7.9 Python Version Management

7.9.1 Specifying Python Versions

7.9.2 Installing Python Versions

7.9.3 Listing Available Pythons

7.9.4 Using Specific Python Versions

7.10 Advanced Dependency Management

7.10.1 Installing from Git Repositories

7.10.2 Installing from Local Paths

7.10.3 Platform-Specific Dependencies

7.10.4 Resolving Dependency Conflicts

7.11 Scripts and Entry Points

7.12 Working with Jupyter Notebooks

7.12.1 Adding Jupyter

7.12.2 Running Jupyter

7.12.3 Creating a Kernel

7.12.4 Inline Scripts in Notebooks

7.13 Reproducible ML Workflows

7.13.1 Project Structure

7.13.2 Complete pyproject.toml

7.14 The Modern Python Toolchain for ML

7.15 Ruff: Fast Formatting and Linting

7.15.1 Why Ruff Matters for ML

7.15.2 Installation

7.15.3 Code Formatting

7.15.4 Linting

7.15.5 Configuration

7.15.6 Ruff in Your Workflow

7.16 Type Checking with mypy

7.16.1 Why Type Checking Matters for ML

7.16.2 Installation

7.16.3 Basic Usage

7.16.4 Type Annotation Examples

7.16.5 The Type Checker Verifies

7.16.6 Configuration

7.16.7 Type Stubs for ML Libraries

7.16.8 Mypy in Your Workflow

7.17 Testing with pytest

7.17.1 Why Testing Matters for ML

7.17.2 Installation

7.17.3 Project Structure

7.17.4 Writing Tests

7.17.5 Fixtures for Reusability

7.17.6 Running Tests

7.17.7 Code Coverage

7.17.8 pytest Configuration

7.17.9 Advanced Testing Patterns

7.18 Documentation with Quarto

7.18.1 Quarto for ML Reports

7.18.2 Model Cards

7.19 Complete Development Workflow

7.2 Why `uv` Matters for Machine Learning and AI

7.4 Understanding `uv`’s Architecture

7.6.2 Understanding `pyproject.toml`

7.6.5 The Lock File: `uv.lock`

7.7 Running Python with `uv`

7.7.1 The `uv run` Command

7.13.2 Complete `pyproject.toml`

7.22.1 From `pip` and `requirements.txt`

7.22.2 From `poetry`

7.22.3 From `conda`

7.30.1 `uv` vs `pip`

7.30.2 `uv` vs `poetry`

7.30.3 `uv` vs `conda`