7  Project Management

7.1 Introduction

Python project management has evolved significantly over the years, with tools like pip, virtualenv, conda, and poetry each attempting to solve different aspects of dependency management and environment isolation. In 2024, a new tool called uv emerged from Astral, the team behind ruff, promising to revolutionize Python package management with unprecedented speed and simplicity. Written in Rust, uv represents a paradigm shift in how we manage Python projects, combining the functionality of multiple tools into a single, cohesive experience.

In this chapter, we’ll explore uv comprehensively, covering everything from basic installation to advanced workflows for machine learning and AI development. For ML/AI work, where managing complex dependencies and ensuring reproducibility across different environments is critical, uv provides an elegant solution to streamline your development workflow.

7.2 Why uv Matters for Machine Learning and AI

Before diving into the technical details, it’s worth understanding why uv is particularly valuable for machine learning and AI development:

Reproducibility: ML models must be reproducible. With uv, you can lock exact versions of all dependencies, ensuring that your trained neural network or fine-tuned LLM produces identical results when deployed or shared with collaborators months or years later.

Speed: Installing ML frameworks like PyTorch, TensorFlow, or transformers with all their dependencies is notoriously slow. uv is 10-100x faster than pip, meaning you spend less time waiting for environments to set up and more time training models.

Simplicity: Modern ML projects require complex dependency graphs, deep learning frameworks, data processing libraries, visualization tools, and more. uv simplifies this complexity with intuitive commands and clear error messages, reducing cognitive overhead.

Isolation: Different ML projects often require different versions of frameworks. uv makes it trivial to create isolated environments, preventing version conflicts between your PyTorch 2.0 computer vision project and your TensorFlow 2.15 NLP project.

7.3 Installation

Installing uv is straightforward. The recommended method varies by operating system:

7.3.1 macOS and Linux

On Unix-based systems, the simplest installation method uses the official installer script:

curl -LsSf https://astral.sh/uv/install.sh | sh

This downloads and installs uv to your system, adding it to your PATH automatically. After installation, restart your terminal or source your shell configuration file:

source $HOME/.cargo/env

7.3.2 Windows

On Windows, you can use PowerShell:

powershell -c "irm https://astral.sh/uv/install.ps1 | iex"

Alternatively, if you have Python already installed, you can use pip:

pip install uv

However, the standalone installer is preferred as it doesn’t depend on an existing Python installation.

7.3.3 Verifying Installation

After installation, verify that uv is working correctly:

uv --version

You should see output showing the installed version, such as:

uv 0.4.18 (Homebrew 2024-11-05)

7.4 Understanding uv’s Architecture

To use uv effectively, it helps to understand its core concepts and how it differs from traditional Python tools.

7.4.1 The Tool Chain Analogy

Think of uv as a complete tool chain rather than a single tool. It replaces multiple tools in the Python ecosystem:

  • pip: Package installation
  • pip-tools: Dependency resolution and locking
  • virtualenv/venv: Environment creation
  • pyenv: Python version management
  • pipx: Tool installation

Where you previously needed to coordinate these separate tools, uv provides a unified interface. This integration eliminates common pain points like ensuring your virtual environment uses the correct Python version or manually compiling requirements files.

7.4.2 Key Design Principles

uv is built on several core principles:

  1. Speed First: Written in Rust and using parallel downloads, uv prioritizes performance without sacrificing correctness.

  2. Correctness: uv uses a proper dependency resolver that can handle complex version constraints, unlike pip’s historical resolver issues.

  3. Batteries Included: Unlike tools that require plugins or additional configuration, uv works out of the box for common workflows.

  4. Standards Compliant: uv follows Python packaging standards (PEP 517, PEP 621, etc.), ensuring compatibility with the broader ecosystem.

7.5 Basic Project Workflow

Let’s walk through creating and managing a Python project with uv. We’ll build a small machine learning project to demonstrate practical usage.

7.5.1 Creating a New Project

To create a new project, use the uv init command:

uv init image-classifier
cd image-classifier

This creates a new directory with a basic project structure:

flowchart TD
    root["image-classifier/"]
    root --> v[".python-version"]
    root --> readme["README.md"]
    root --> pyproject["pyproject.toml"]
    root --> hello["hello.py"]

Let’s examine each file:

.python-version: Specifies the Python version for this project. uv uses this to automatically download and use the correct Python version.

pyproject.toml: The modern Python project configuration file, following PEP 621. This is where dependencies, metadata, and build configuration live.

hello.py: A simple starter script that uv creates as an example.

7.5.2 Understanding pyproject.toml

The pyproject.toml file is central to modern Python projects. Here’s what uv init generates:

[project]
name = "image-classifier"
version = "0.1.0"
description = "Add your description here"
readme = "README.md"
requires-python = ">=3.12"
dependencies = []

[build-system]
requires = ["hatchling"]
build-backend = "hatchling.build"

Let’s break this down:

  • [project]: Metadata about your project, following PEP 621
  • name: The package name (important if you plan to distribute it)
  • version: Semantic version number
  • requires-python: Minimum Python version requirement
  • dependencies: List of required packages (initially empty)
  • [build-system]: Configuration for building the package (uses hatchling by default)

For an ML project, you might not care about building a distributable package, but the structure remains useful for dependency management.

7.5.3 Adding Dependencies

There are two main ways to add dependencies: directly editing pyproject.toml or using the command line.

7.5.3.2 Method 2: Manual Editing

You can also edit pyproject.toml directly:

dependencies = [
    "torch>=2.0.0",
    "torchvision>=0.15.0",
    "numpy>=1.24.0",
    "pillow>=10.0.0",
    "matplotlib>=3.7.0",
]

Then synchronize your environment:

uv sync

This reads the pyproject.toml, resolves dependencies, and installs everything.

7.5.4 Version Constraints

When specifying dependencies, you can use various version constraint operators:

dependencies = [
    "torch",                        # Any version (not recommended)
    "torch>=2.0.0",                # Greater than or equal to 2.0.0
    "torch>=2.0.0,<3.0.0",         # Between 2.0.0 and 3.0.0
    "torch~=2.0.0",                # Compatible release (2.0.x)
    "torch==2.0.1",                # Exact version (very restrictive)
]

For ML projects, I recommend using lower bounds with conservative upper bounds:

dependencies = [
    "torch>=2.0.0,<3.0.0",
    "numpy>=1.24.0,<2.0.0",
    "transformers>=4.30.0,<5.0.0",
]

This gives you bug fixes and minor updates while protecting against breaking changes. This is especially important for deep learning frameworks where major versions can introduce significant API changes.

7.5.5 The Lock File: uv.lock

The uv.lock file is critical for reproducibility. It contains the exact resolved versions of every package in your dependency tree. Here’s a snippet:

[[package]]
name = "torch"
version = "2.3.1"
source = { registry = "https://pypi.org/simple" }
dependencies = [
    { name = "filelock" },
    { name = "typing-extensions" },
    { name = "sympy" },
    { name = "networkx" },
    { name = "jinja2" },
    { name = "fsspec" },
]
wheels = [
    { url = "https://files.pythonhosted.org/packages/...", hash = "sha256:..." },
]

This tells us:

  • Exactly which version of PyTorch is installed (2.3.1)
  • Where it came from (PyPI)
  • Its direct dependencies
  • The specific wheel file and its hash for verification

Important: You should commit uv.lock to version control. This ensures anyone cloning your repository can recreate your exact environment, which is critical when sharing trained models or reproducing experimental results.

7.6 Running Python with uv

7.6.1 The uv run Command

Instead of activating a virtual environment and then running Python, uv provides the uv run command:

uv run python script.py

This automatically:

  1. Ensures the project environment exists
  2. Installs any missing dependencies
  3. Runs the Python script in that environment

You can also run Python interactively:

uv run python

Or execute inline code:

uv run python -c "import pandas; print(pandas.__version__)"

7.6.2 Running Installed Tools

For tools like jupyter, pytest, or black, use uv run as well:

uv run jupyter notebook
uv run pytest tests/
uv run black src/

This is cleaner than traditional workflows where you’d activate an environment first.

7.7 Development Dependencies

ML projects often need development tools (testing, formatting, documentation, experiment tracking) that aren’t required for running the actual training or inference. uv supports optional dependency groups for this.

7.7.1 Adding Development Dependencies

Add development dependencies with the --dev flag:

uv add --dev pytest black mypy jupyter tensorboard wandb

This updates pyproject.toml with a new section:

[project.optional-dependencies]
dev = [
    "pytest",
    "black",
    "mypy",
    "jupyter",
    "tensorboard",
    "wandb",
]

Or you can create custom groups:

uv add --optional gpu torch-cuda
[project.optional-dependencies]
gpu = [
    "torch-cuda",
]

7.7.2 Installing Optional Dependencies

To install the project with development dependencies:

uv sync --extra dev

Or all optional groups:

uv sync --all-extras

7.8 Python Version Management

One of uv’s most powerful features is built-in Python version management, eliminating the need for pyenv or similar tools.

7.8.1 Specifying Python Versions

You can specify the Python version in multiple ways:

1. Project-level (recommended):

uv python pin 3.12

This creates a .python-version file:

3.12

2. In pyproject.toml:

requires-python = ">=3.11"

7.8.2 Installing Python Versions

If the required Python version isn’t available, uv can install it:

uv python install 3.12

This downloads and installs Python 3.12, managed by uv. You can install multiple versions:

uv python install 3.11 3.12 3.13

7.8.3 Listing Available Pythons

To see installed Python versions:

uv python list

Output might look like:

cpython-3.13.0-macos-aarch64-none    /Users/mike/.local/share/uv/python/cpython-3.13.0-macos-aarch64-none/bin/python3
cpython-3.12.7-macos-aarch64-none    /Users/mike/.local/share/uv/python/cpython-3.12.7-macos-aarch64-none/bin/python3
cpython-3.11.10-macos-aarch64-none   /Users/mike/.local/share/uv/python/cpython-3.11.10-macos-aarch64-none/bin/python3

7.8.4 Using Specific Python Versions

For a one-off command with a specific Python version:

uv run --python 3.11 python script.py

Or create a project with a specific version:

uv init --python 3.11 my-project

7.9 Advanced Dependency Management

7.9.1 Installing from Git Repositories

Sometimes you need bleeding-edge code or a forked version of a package. uv makes this straightforward:

uv add "package @ git+https://github.com/user/package.git"

For a specific branch:

uv add "package @ git+https://github.com/user/package.git@dev-branch"

For a specific commit:

uv add "package @ git+https://github.com/user/package.git@abc123"

In pyproject.toml, this appears as:

dependencies = [
    "package @ git+https://github.com/user/package.git@abc123",
]

7.9.2 Installing from Local Paths

For packages you’re developing locally:

uv add --editable ../my-local-package

Or in pyproject.toml:

dependencies = [
    "my-package @ file:///path/to/my-package",
]

The --editable flag (or -e) makes the package editable, so changes to the source are immediately reflected without reinstalling.

7.9.3 Platform-Specific Dependencies

Some packages are only needed on certain platforms. You can specify this in pyproject.toml:

dependencies = [
    "pandas",
    "pywin32; platform_system == 'Windows'",
    "python-magic; platform_system != 'Windows'",
]

7.9.4 Resolving Dependency Conflicts

When dependencies conflict, uv provides clear error messages. For example, if package A requires numpy<2.0 but package B requires numpy>=2.0, uv will report:

error: No solution found when resolving dependencies:
  Because package-a depends on numpy<2.0
    and package-b depends on numpy>=2.0,
    we can conclude that package-a and package-b are incompatible.

To resolve conflicts:

  1. Check if updates fix it: Update packages with uv sync --upgrade
  2. Use version constraints: Manually specify compatible versions
  3. Report upstream: File issues with package maintainers
  4. Fork if necessary: Maintain a patched version

7.10 Scripts and Entry Points

For distributable packages, you can define console scripts in pyproject.toml:

[project.scripts]
causal-analyze = "causal_analysis.main:cli"
did-estimate = "causal_analysis.did:main"

Then run them with:

uv run causal-analyze data.csv

This is useful for creating reproducible analysis pipelines that others can run.

7.11 Working with Jupyter Notebooks

Jupyter notebooks are common in research. Here’s how to use them with uv:

7.11.1 Adding Jupyter

uv add --dev jupyter ipykernel

7.11.2 Running Jupyter

uv run jupyter notebook

Or for JupyterLab:

uv run jupyter lab

7.11.3 Creating a Kernel

To make your project available as a Jupyter kernel:

uv run python -m ipykernel install --user --name=causal-analysis

Now you can select the “causal-analysis” kernel in any Jupyter notebook.

7.11.4 Inline Scripts in Notebooks

uv supports inline script metadata in Python files and notebooks. At the top of a script, you can specify dependencies:

# /// script
# requires-python = ">=3.11"
# dependencies = [
#   "torch",
#   "torchvision",
#   "matplotlib",
# ]
# ///

import torch
import torchvision
import matplotlib.pyplot as plt

# Your model training or inference here
model = torchvision.models.resnet18(pretrained=True)

Then run it with:

uv run script.py

uv automatically creates a temporary environment with the specified dependencies. This is perfect for one-off experiments or sharing standalone training scripts.

7.12 Reproducible ML Workflows

Let’s put everything together with a complete workflow for a machine learning project.

7.12.1 Project Structure

A well-organized ML project might look like:

flowchart TD
    root["image-classifier/"]
    root --> pyver[".python-version, Python 3.11"]
    root --> pyproject["pyproject.toml, config"]
    root --> lock["uv.lock, locked deps"]
    root --> readme["README.md, docs"]
    root --> gitignore[".gitignore"]
    root --> data["data"]
    root --> models["models"]
    root --> notebooks["notebooks"]
    root --> src["src"]
    root --> tests["tests"]
    root --> scripts["scripts"]
    root --> experiments["experiments, logs and results"]
    data --> raw["raw, original datasets"]
    data --> processed["processed, preprocessed data"]
    data --> splits["splits, train val test"]
    models --> checkpoints["checkpoints/"]
    models --> configs["configs/"]
    notebooks --> eda["01-eda.ipynb"]
    notebooks --> prep["02-preprocessing.ipynb"]
    notebooks --> trainnb["03-training.ipynb"]
    src --> classifier["classifier/"]
    classifier --> init["__init__.py"]
    classifier --> datapy["data.py"]
    classifier --> modelspy["models.py"]
    classifier --> trainpy["train.py"]
    classifier --> evalpy["evaluate.py"]
    tests --> tdata["test_data.py"]
    tests --> tmodels["test_models.py"]
    tests --> ttrain["test_train.py"]
    scripts --> strain["train.py"]
    scripts --> seval["evaluate.py"]
    experiments --> exp001["exp_001/"]

7.12.2 Complete pyproject.toml

Here’s a comprehensive configuration for an image classification project:

[project]
name = "image-classifier"
version = "0.1.0"
description = "Deep learning image classifier using PyTorch"
readme = "README.md"
requires-python = ">=3.11"
authors = [
    {name = "Mike", email = "mike@email.com"}
]
license = {text = "MIT"}

dependencies = [
    "torch>=2.0.0,<3.0.0",
    "torchvision>=0.15.0,<1.0.0",
    "numpy>=1.24.0,<2.0.0",
    "pillow>=10.0.0",
    "matplotlib>=3.7.0",
    "scikit-learn>=1.3.0",
    "tqdm>=4.65.0",
    "pyyaml>=6.0",
    "tensorboard>=2.13.0",
]

[project.optional-dependencies]
dev = [
    "jupyter>=1.0.0",
    "ipykernel>=6.25.0",
    "pytest>=7.4.0",
    "pytest-cov>=4.1.0",
    "black>=23.0.0",
    "ruff>=0.1.0",
    "mypy>=1.5.0",
]

experiment-tracking = [
    "wandb>=0.15.0",
    "mlflow>=2.5.0",
]

gpu = [
    "torch-cuda>=2.0.0",
]

[project.scripts]
train = "classifier.train:main"
evaluate = "classifier.evaluate:main"
infer = "classifier.inference:main"

[build-system]
requires = ["hatchling"]
build-backend = "hatchling.build"

[tool.pytest.ini_options]
testpaths = ["tests"]
python_files = ["test_*.py"]

[tool.black]
line-length = 100
target-version = ['py311']

[tool.ruff]
line-length = 100
target-version = "py311"

7.13 The Modern Python Toolchain for ML

Just as R developers rely on devtools, usethis, styler, lintr, and testthat, Python ML developers need a comprehensive toolchain. For a Python projects, we recommend:

  • uv: Package and environment management
  • Ruff: Code formatting and linting
  • mypy: Static type checking
  • pytest: Unit testing framework
  • Quarto: Documentation and reproducible reports

Think of it as: uv = renv + pak + devtools, Ruff = styler + lintr, pytest = testthat, mypy = (no direct R equivalent).

All these tools are installed as development dependencies and configured through pyproject.toml, creating a unified, reproducible development environment.

7.14 Ruff: Fast Formatting and Linting

Ruff is a blazingly fast linter and formatter written in Rust that has revolutionized Python development. It replaces multiple legacy tools (Black, isort, Flake8, pyupgrade, autoflake) with a single, consistent interface that’s 10-100x faster.

7.14.1 Why Ruff Matters for ML

In ML projects, code quality is crucial:

  • Readability: ML code involves complex transformations and mathematical operations that must be clear
  • Consistency: Team collaboration requires consistent style
  • Correctness: Linting catches bugs like unused imports, undefined variables, and common mistakes
  • Speed: Fast feedback loops keep you in flow state

7.14.2 Installation

Add Ruff as a development dependency:

uv add --dev ruff

7.14.3 Code Formatting

Format your entire codebase:

uv run ruff format

Or format specific files:

uv run ruff format src/classifier/train.py

Or using uvx (without installation):

uvx ruff format

Ruff’s formatter:

  • Enforces consistent style: Similar to Black, with opinionated defaults
  • Sorts imports automatically: Organizes imports into standard library, third-party, and local
  • Removes trailing whitespace: Cleans up formatting inconsistencies
  • Ensures consistent line lengths: Makes code readable on all screens
  • Handles string quotes: Normalizes quote usage across your codebase

Example transformation:

# Before formatting
import torch
import numpy as np
from  pathlib   import Path
import sys
from   torch import nn

def train_model(model,data,epochs=100):
    for epoch in range(  epochs ):
        loss=model.train_step( data )
        print( f"Epoch {epoch}: {loss}" )

After ruff format:

# After formatting
import sys
from pathlib import Path

import numpy as np
import torch
from torch import nn


def train_model(model, data, epochs=100):
    for epoch in range(epochs):
        loss = model.train_step(data)
        print(f"Epoch {epoch}: {loss}")

7.14.4 Linting

Check for linting issues:

uv run ruff check

Fix auto-fixable issues:

uv run ruff check --fix

Show detailed information:

uv run ruff check --show-fixes

Ruff detects hundreds of error types, including:

Common Errors:

  • Unused imports and variables (catching dead code)

  • Undefined names (typos and missing imports)

  • Syntax errors and deprecated syntax

Style Violations:

  • PEP 8 violations (spacing, naming conventions)

  • Import organization issues

  • Docstring style problems

Code Quality Issues:

  • Overly complex functions

  • Redundant code

  • Mutable default arguments (a common Python pitfall)

  • Bare except clauses (catching exceptions too broadly)

Security Issues:

  • Hardcoded passwords or secrets

  • Use of eval() or exec()

  • SQL injection vulnerabilities

  • Insecure temporary file usage

Example linting output:

src/classifier/train.py:15:8: F841 Local variable `lr` is assigned to but never used
src/classifier/train.py:23:1: E302 Expected 2 blank lines, found 1
src/classifier/models.py:45:9: B006 Do not use mutable data structures for argument defaults
src/classifier/data.py:12:1: I001 Import block is un-sorted or un-formatted

7.14.5 Configuration

Add Ruff configuration to pyproject.toml:

[tool.ruff]
# Core settings
line-length = 100  # Slightly longer than Black's 88 for ML code
target-version = "py311"
src = ["src"]
exclude = [
    ".git",
    ".venv",
    "__pycache__",
    "build",
    "dist",
]

[tool.ruff.format]
quote-style = "double"
indent-style = "space"
skip-magic-trailing-comma = false
line-ending = "auto"

[tool.ruff.lint]
# Enable rule groups
select = [
    "E",    # pycodestyle errors
    "W",    # pycodestyle warnings
    "F",    # Pyflakes
    "UP",   # pyupgrade (modernize Python code)
    "B",    # flake8-bugbear (find likely bugs)
    "SIM",  # flake8-simplify (suggest simplifications)
    "I",    # isort (import sorting)
    "N",    # pep8-naming (enforce naming conventions)
    "C4",   # flake8-comprehensions (better list/dict/set comprehensions)
    "PTH",  # flake8-use-pathlib (prefer pathlib over os.path)
    "RET",  # flake8-return (improve return statements)
    "TRY",  # tryceratops (exception handling best practices)
]

# Ignore specific rules
ignore = [
    "E501",   # Line too long (handled by formatter)
    "TRY003", # Avoid specifying long messages outside exception class
]

# Allow autofix for all enabled rules
fixable = ["ALL"]
unfixable = []

# Ignore specific rules for specific files
[tool.ruff.lint.per-file-ignores]
"__init__.py" = ["F401"]  # Allow unused imports in __init__.py
"tests/*" = ["S101"]      # Allow assert in tests

[tool.ruff.lint.isort]
known-first-party = ["classifier"]

Line Length Philosophy:

The default of 88 characters comes from Black and is based on:

  • Readability research showing optimal line length

  • Fitting two files side-by-side on modern monitors

  • Reducing git diff noise

For ML code with long tensor operations, 100 characters is a reasonable compromise.

7.14.6 Ruff in Your Workflow

Integrate Ruff into your daily workflow:

During development:

# Format before committing
uv run ruff format

# Check for issues
uv run ruff check --fix

# Review remaining issues
uv run ruff check

In CI/CD:

# .github/workflows/lint.yml
- name: Lint with Ruff
  run: |
    uv run ruff format --check
    uv run ruff check

VS Code integration:

{
  "editor.formatOnSave": true,
  "[python]": {
    "editor.defaultFormatter": "charliermarsh.ruff"
  }
}

7.15 Type Checking with mypy

Python supports optional type annotations through PEP 484. While Python remains dynamically typed at runtime, type annotations provide static analysis benefits that are invaluable for ML projects.

7.15.1 Why Type Checking Matters for ML

Machine learning code involves complex data transformations, tensor operations, and model architectures. Type checking helps:

Catch Errors Early:

  • Detect shape mismatches before running expensive training

  • Find dimension errors in tensor operations

  • Identify incorrect data types in transformations

Improve Code Clarity:

  • Document expected tensor shapes (e.g., Tensor[B, C, H, W])

  • Specify DataFrame column types

  • Make function contracts explicit

Better IDE Support:

  • Accurate autocomplete for model methods

  • Jump-to-definition for complex hierarchies

  • Refactoring with confidence

Team Collaboration:

  • Self-documenting interfaces

  • Catch integration issues early

  • Reduce onboarding time

7.15.2 Installation

Add mypy as a development dependency:

uv add --dev mypy

For libraries that need type stubs:

uv add --dev types-PyYAML types-tqdm

7.15.3 Basic Usage

Check types in your entire project:

uv run mypy .

Check specific files:

uv run mypy src/classifier/models.py

Check with verbose output:

uv run mypy --pretty --show-error-context .

7.15.4 Type Annotation Examples

Without types (unclear and error-prone):

def create_model(arch, num_classes, pretrained):
    if arch == "resnet18":
        model = models.resnet18(pretrained=pretrained)
        model.fc = nn.Linear(model.fc.in_features, num_classes)
    return model

def train_epoch(model, loader, optimizer):
    for batch in loader:
        images, labels = batch
        outputs = model(images)
        # ... training logic

With types (clear and verifiable):

from typing import Optional, Tuple, Dict, Any
import torch
import torch.nn as nn
from torch.utils.data import DataLoader
from torch.optim import Optimizer

def create_model(
    arch: str,
    num_classes: int,
    pretrained: bool = True
) -> nn.Module:
    """
    Create a model with specified architecture.
    
    Parameters
    ----------
    arch : str
        Model architecture name
    num_classes : int
        Number of output classes
    pretrained : bool
        Use pretrained weights
        
    Returns
    -------
    nn.Module
        Initialized model
    """
    if arch == "resnet18":
        model = models.resnet18(pretrained=pretrained)
        num_features = model.fc.in_features
        model.fc = nn.Linear(num_features, num_classes)
    return model

def train_epoch(
    model: nn.Module,
    loader: DataLoader,
    optimizer: Optimizer,
    device: torch.device
) -> Tuple[float, float]:
    """
    Train for one epoch.
    
    Returns
    -------
    tuple
        (average_loss, average_accuracy)
    """
    total_loss = 0.0
    correct = 0
    total = 0
    
    for images, labels in loader:
        images = images.to(device)
        labels = labels.to(device)
        
        outputs = model(images)
        # ... training logic
        
    return total_loss / len(loader), correct / total

Advanced: Tensor shape annotations

Using jaxtyping for shape-aware type hints:

uv add --dev jaxtyping
from jaxtyping import Float, Int
import torch
from torch import Tensor

def forward(
    x: Float[Tensor, "batch channels height width"],
    labels: Int[Tensor, "batch"]
) -> Float[Tensor, "batch num_classes"]:
    """
    Forward pass with explicit shape annotations.
    
    mypy and jaxtyping will verify tensor dimensions.
    """
    # x shape: [batch, channels, height, width]
    # Returns: [batch, num_classes]
    pass

7.15.5 The Type Checker Verifies

When you run mypy, it checks:

  1. Argument types: Are you passing the right types?
  2. Return types: Does the function return what it claims?
  3. Attribute access: Does that object have that attribute?
  4. Operations: Are operations valid for those types?

Example errors caught by mypy:

# Error: Argument has incompatible type "str"; expected "int"
model = create_model("resnet18", "10", True)

# Error: Returning None but return type is Tuple[float, float]
def train_epoch(...) -> Tuple[float, float]:
    print("Training...")
    # Forgot to return!

# Error: nn.Module has no attribute "forwar" (typo)
output = model.forwar(x)

# Error: Unsupported operand types for + ("int" and "str")
epochs = 100 + "50"

7.15.6 Configuration

Add mypy settings to pyproject.toml:

[tool.mypy]
python_version = "3.11"
warn_return_any = true
warn_unused_configs = true
warn_redundant_casts = true
warn_unused_ignores = true

# Start lenient, then tighten
disallow_untyped_defs = false        # Set to true eventually
disallow_incomplete_defs = true
check_untyped_defs = true
no_implicit_optional = true

# Show more information
show_error_codes = true
show_error_context = true
pretty = true

# Strictness per module
[[tool.mypy.overrides]]
module = "tests.*"
disallow_untyped_defs = false

[[tool.mypy.overrides]]
module = "classifier.models"
disallow_untyped_defs = true

Progressive typing strategy:

  1. Start with disallow_untyped_defs = false
  2. Add type hints to new code
  3. Gradually annotate existing code
  4. Enable disallow_untyped_defs = true for completed modules
  5. Eventually enable strict mode globally

7.15.7 Type Stubs for ML Libraries

Libraries with built-in types:

  • torch

  • numpy ✓ (numpy>=1.20)

  • scikit-learn

  • transformers

Libraries needing stubs:

uv add --dev types-Pillow types-tqdm types-PyYAML types-requests

Libraries without stubs:

For libraries without type stubs, you can:

  1. Ignore them:
[tool.mypy]
ignore_missing_imports = true
  1. Create stub files:
# stubs/some_library.pyi
def some_function(x: int) -> str: ...
class SomeClass:
    def method(self) -> None: ...
  1. Use type: ignore comments:
from some_untyped_library import something  # type: ignore

7.15.8 Mypy in Your Workflow

Development cycle:

# Check types while developing
uv run mypy src/

# Check specific module
uv run mypy src/classifier/models.py

# Generate HTML report
uv run mypy --html-report mypy-report/ src/

CI/CD:

- name: Type check with mypy
  run: uv run mypy src/

VS Code integration:

Install the Pylance extension (Microsoft’s language server) which includes mypy integration.

7.16 Testing with pytest

pytest is Python’s de facto standard testing framework. For ML projects, testing is crucial for ensuring data pipelines, model architectures, and training loops work correctly.

7.16.1 Why Testing Matters for ML

Machine learning projects have unique testing challenges:

Data Pipeline Testing:

  • Verify data loading and preprocessing

  • Check tensor shapes and types

  • Validate data augmentation

  • Test batching and sampling

Model Testing:

  • Verify model architectures

  • Check forward/backward passes

  • Test with different input shapes

  • Validate output dimensions

Training Logic Testing:

  • Test loss computation

  • Verify optimizer updates

  • Check gradient flow

  • Test checkpoint saving/loading

End-to-End Testing:

  • Test complete training pipeline

  • Verify inference works

  • Test model export formats

7.16.2 Installation

Add pytest and useful plugins:

uv add --dev pytest pytest-cov pytest-xdist pytest-timeout
  • pytest: Core testing framework
  • pytest-cov: Code coverage reporting
  • pytest-xdist: Parallel test execution
  • pytest-timeout: Prevent hanging tests

7.16.3 Project Structure

flowchart TD
    root["image-classifier/"]
    root --> src["src/"]
    root --> tests["tests/"]
    src --> classifier["classifier/"]
    classifier --> init["__init__.py"]
    classifier --> datapy["data.py"]
    classifier --> modelspy["models.py"]
    classifier --> trainpy["train.py"]
    tests --> conftest["conftest.py, shared fixtures"]
    tests --> tdata["test_data.py, data pipeline tests"]
    tests --> tmodels["test_models.py, model tests"]
    tests --> ttrain["test_train.py, training tests"]

7.16.4 Writing Tests

Basic test structure:

# tests/test_models.py
import pytest
import torch
import torch.nn as nn
from classifier.models import create_model

def test_resnet18_creation():
    """Test ResNet18 model creation."""
    model = create_model(
        arch='resnet18',
        num_classes=10,
        pretrained=False,
    )
    
    assert model is not None
    assert isinstance(model, nn.Module)
    
    # Test forward pass
    x = torch.randn(2, 3, 224, 224)
    output = model(x)
    
    assert output.shape == (2, 10)
    assert not torch.isnan(output).any()


def test_model_with_frozen_backbone():
    """Test model with frozen backbone."""
    model = create_model(
        arch='resnet18',
        num_classes=10,
        pretrained=True,
        freeze_backbone=True,
    )
    
    # Check that backbone is frozen
    trainable_params = sum(
        p.numel() for p in model.parameters() if p.requires_grad
    )
    
    # Only classifier should be trainable (~5000 params)
    assert trainable_params < 10000


@pytest.mark.parametrize('architecture', ['resnet18', 'resnet50', 'efficientnet_b0'])
def test_different_architectures(architecture):
    """Test different model architectures."""
    model = create_model(
        architecture=architecture,
        num_classes=100,
        pretrained=False,
    )
    
    x = torch.randn(4, 3, 224, 224)
    output = model(x)
    
    assert output.shape == (4, 100)

Testing data pipelines:

# tests/test_data.py
import pytest
import torch
from pathlib import Path
from classifier.data import get_transforms, create_dataloaders

def test_train_transforms():
    """Test training data transforms."""
    from PIL import Image
    import numpy as np
    
    transform = get_transforms(train=True)
    
    # Create dummy image
    img = Image.fromarray(np.random.randint(0, 255, (256, 256, 3), dtype=np.uint8))
    
    # Apply transform
    tensor = transform(img)
    
    assert isinstance(tensor, torch.Tensor)
    assert tensor.shape == (3, 224, 224)
    assert tensor.min() >= -3.0  # Normalized
    assert tensor.max() <= 3.0


def test_dataloader_shapes(tmp_path):
    """Test dataloader output shapes."""
    # Create dummy dataset structure
    train_dir = tmp_path / "train"
    for class_name in ["class1", "class2"]:
        class_dir = train_dir / class_name
        class_dir.mkdir(parents=True)
        
        # Create dummy images
        for i in range(10):
            img = Image.new('RGB', (224, 224))
            img.save(class_dir / f"img_{i}.jpg")
    
    # Create dataloaders
    train_loader, val_loader, _ = create_dataloaders(
        data_dir=str(tmp_path),
        batch_size=4,
        num_workers=0,
    )
    
    # Test batch shape
    images, labels = next(iter(train_loader))
    assert images.shape[0] <= 4  # Batch size
    assert images.shape[1:] == (3, 224, 224)
    assert labels.shape[0] <= 4

Testing training logic:

# tests/test_train.py
import pytest
import torch
import torch.nn as nn
from torch.utils.data import TensorDataset, DataLoader
from classifier.train import train_epoch, validate

@pytest.fixture
def dummy_model():
    """Create a simple model for testing."""
    return nn.Sequential(
        nn.Flatten(),
        nn.Linear(3 * 224 * 224, 10)
    )

@pytest.fixture
def dummy_dataloader():
    """Create a dummy dataloader."""
    images = torch.randn(20, 3, 224, 224)
    labels = torch.randint(0, 10, (20,))
    dataset = TensorDataset(images, labels)
    return DataLoader(dataset, batch_size=4)

def test_train_epoch(dummy_model, dummy_dataloader):
    """Test training for one epoch."""
    model = dummy_model
    criterion = nn.CrossEntropyLoss()
    optimizer = torch.optim.Adam(model.parameters())
    device = torch.device('cpu')
    
    loss, acc = train_epoch(
        model, dummy_dataloader, criterion, optimizer, device, epoch=1
    )
    
    assert isinstance(loss, float)
    assert isinstance(acc, float)
    assert 0 <= acc <= 1
    assert loss >= 0

def test_validate(dummy_model, dummy_dataloader):
    """Test validation."""
    model = dummy_model
    criterion = nn.CrossEntropyLoss()
    device = torch.device('cpu')
    
    val_loss, val_acc = validate(model, dummy_dataloader, criterion, device)
    
    assert isinstance(val_loss, float)
    assert isinstance(val_acc, float)
    assert 0 <= val_acc <= 1

7.16.5 Fixtures for Reusability

Use conftest.py for shared fixtures:

# tests/conftest.py
import pytest
import torch
import tempfile
from pathlib import Path

@pytest.fixture
def device():
    """Get device for testing."""
    return torch.device('cuda' if torch.cuda.is_available() else 'cpu')

@pytest.fixture
def temp_checkpoint_dir():
    """Create temporary directory for checkpoints."""
    with tempfile.TemporaryDirectory() as tmpdir:
        yield Path(tmpdir)

@pytest.fixture
def sample_config():
    """Create sample configuration."""
    return {
        'model': {
            'architecture': 'resnet18',
            'num_classes': 10,
            'pretrained': False,
        },
        'training': {
            'batch_size': 32,
            'learning_rate': 0.001,
            'epochs': 1,
        }
    }

7.16.6 Running Tests

Run all tests:

uv run pytest

Run with verbose output:

uv run pytest -v

Run specific test file:

uv run pytest tests/test_models.py

Run specific test:

uv run pytest tests/test_models.py::test_resnet18_creation

Run tests matching pattern:

uv run pytest -k "model"  # Runs all tests with "model" in name

Run in parallel (faster):

uv run pytest -n auto  # Use all CPUs

Stop on first failure:

uv run pytest -x

Show local variables on failure:

uv run pytest -l

7.16.7 Code Coverage

Generate coverage report:

uv run pytest --cov=classifier --cov-report=term

Output:

---------- coverage: platform linux, python 3.11.9 -----------
Name                        Stmts   Miss  Cover
-----------------------------------------------
src/classifier/__init__.py      2      0   100%
src/classifier/data.py         45      3    93%
src/classifier/models.py       67      5    93%
src/classifier/train.py        89     12    87%
-----------------------------------------------
TOTAL                         203     20    90%

Generate HTML coverage report:

uv run pytest --cov=classifier --cov-report=html

This creates htmlcov/index.html showing:

  • Which lines are covered

  • Which branches are taken

  • Which functions are tested

Coverage requirements:

For production ML code, aim for:

  • >80% coverage for data pipelines (data loading, preprocessing)

  • >90% coverage for model architectures

  • >70% coverage for training loops (some branches hard to test)

  • 100% coverage for utility functions

Coverage in CI:

uv run pytest --cov=classifier --cov-report=term --cov-fail-under=80

7.16.8 pytest Configuration

Add pytest settings to pyproject.toml:

[tool.pytest.ini_options]
# Test discovery
testpaths = ["tests"]
python_files = ["test_*.py"]
python_functions = ["test_*"]
python_classes = ["Test*"]

# Output options
addopts = [
    "--strict-markers",
    "--strict-config",
    "-ra",                    # Show summary of all test outcomes
    "--showlocals",          # Show local variables on failure
    "--tb=short",            # Shorter traceback format
]

# Markers
markers = [
    "slow: marks tests as slow (deselect with '-m \"not slow\"')",
    "gpu: marks tests as requiring GPU",
    "integration: marks tests as integration tests",
]

# Coverage options
[tool.coverage.run]
source = ["src"]
omit = ["tests/*", "*/site-packages/*"]

[tool.coverage.report]
exclude_lines = [
    "pragma: no cover",
    "def __repr__",
    "raise AssertionError",
    "raise NotImplementedError",
    "if __name__ == .__main__.:",
    "if TYPE_CHECKING:",
]

7.16.9 Advanced Testing Patterns

Parameterized tests:

@pytest.mark.parametrize('batch_size,expected_batches', [
    (32, 4),
    (16, 8),
    (8, 16),
])
def test_dataloader_batching(batch_size, expected_batches, tmp_dataset):
    loader = DataLoader(tmp_dataset, batch_size=batch_size)
    assert len(loader) == expected_batches

Testing exceptions:

def test_invalid_architecture():
    with pytest.raises(ValueError, match="Unknown architecture"):
        create_model(architecture="invalid", num_classes=10)

Skipping tests conditionally:

@pytest.mark.skipif(not torch.cuda.is_available(), reason="Requires GPU")
def test_gpu_training():
    model = create_model().cuda()
    # ... GPU-specific test

Slow test marker:

@pytest.mark.slow
def test_full_training_run():
    # This test takes 5 minutes
    pass

# Run fast tests only: pytest -m "not slow"

7.17 Documentation with Quarto

For ML projects, documentation serves multiple purposes:

  1. Code documentation: API docs for functions and classes
  2. Experiment reports: Document training runs and results
  3. Model cards: Document model architecture, performance, limitations
  4. Tutorials: Show how to use your models

7.17.1 Quarto for ML Reports

Quarto is perfect for ML documentation because it supports:

  • Executable code: Run training scripts and show results
  • Multiple languages: Python, R, Julia in same document
  • Rich outputs: Plots, tables, interactive visualizations
  • Multiple formats: HTML, PDF, presentations, websites

Installation:

# Install Quarto (not via uv)
# macOS
brew install quarto

# Linux
sudo wget https://github.com/quarto-dev/quarto-cli/releases/download/v1.4.549/quarto-1.4.549-linux-amd64.deb
sudo dpkg -i quarto-1.4.549-linux-amd64.deb

# Windows - download from https://quarto.org

Example experiment report (reports/experiment_001.qmd):

---
title: "ResNet18 Image Classification"
author: "Mike"
date: "2024-11-09"
format:
  html:
    code-fold: true
    toc: true
---

## Objective

Train ResNet18 on CIFAR-10 dataset to achieve >90% accuracy.

## Environment Setup

```{python}
import sys
sys.path.insert(0, '../src')

import torch
from classifier.models import create_model
from classifier.train import train_model
import matplotlib.pyplot as plt
import pandas as pd
```

## Model Architecture

```{python}
model = create_model('resnet18', num_classes=10, pretrained=False)
print(f"Total parameters: {sum(p.numel() for p in model.parameters()):,}")
```

## Training Configuration

```{python}
config = {
    'model': {'architecture': 'resnet18', 'num_classes': 10},
    'training': {
        'batch_size': 128,
        'learning_rate': 0.001,
        'epochs': 50,
        'optimizer': 'Adam',
    }
}

pd.DataFrame([config['training']]).T
```

## Training Results

```{python}
# Load training logs
logs = pd.read_csv('../experiments/exp_001/metrics.csv')

fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(12, 4))

# Plot loss
ax1.plot(logs['epoch'], logs['train_loss'], label='Train')
ax1.plot(logs['epoch'], logs['val_loss'], label='Validation')
ax1.set_xlabel('Epoch')
ax1.set_ylabel('Loss')
ax1.set_title('Training and Validation Loss')
ax1.legend()
ax1.grid(True)

# Plot accuracy
ax2.plot(logs['epoch'], logs['train_acc'], label='Train')
ax2.plot(logs['epoch'], logs['val_acc'], label='Validation')
ax2.set_xlabel('Epoch')
ax2.set_ylabel('Accuracy')
ax2.set_title('Training and Validation Accuracy')
ax2.legend()
ax2.grid(True)

plt.tight_layout()
plt.show()
```

## Final Performance

```{python}
best_epoch = logs.loc[logs['val_acc'].idxmax()]
print(f"Best validation accuracy: {best_epoch['val_acc']:.2%}")
print(f"Achieved at epoch: {int(best_epoch['epoch'])}")
print(f"Test accuracy: {best_epoch['test_acc']:.2%}")
```

## Confusion Matrix

```{python}
from sklearn.metrics import confusion_matrix
import seaborn as sns

# Load predictions
y_true = ...  # Load true labels
y_pred = ...  # Load predictions

cm = confusion_matrix(y_true, y_pred)
plt.figure(figsize=(10, 8))
sns.heatmap(cm, annot=True, fmt='d', cmap='Blues')
plt.title('Confusion Matrix')
plt.ylabel('True Label')
plt.xlabel('Predicted Label')
plt.show()
```

## Conclusion

- Achieved validation accuracy
- Model converged after specified epochs
- Ready for deployment

Render the report:

quarto render reports/experiment_001.qmd

This generates reports/experiment_001.html with all results embedded.

7.17.2 Model Cards

Document your models with Quarto model cards:

---
title: "ResNet18 CIFAR-10 Classifier"
subtitle: "Model Card"
format:
  html:
    toc: true
---

## Model Details

- **Model Name**: ResNet18 CIFAR-10 Classifier
- **Version**: 1.0.0
- **Date**: 2024-11-09
- **Architecture**: ResNet18
- **Framework**: PyTorch 2.3.1

## Intended Use

This model classifies images into 10 CIFAR-10 categories:
airplane, automobile, bird, cat, deer, dog, frog, horse, ship, truck.

**Primary uses:**
- Educational demonstrations
- Baseline for computer vision research
- Image classification API

**Out-of-scope uses:**
- Medical diagnosis
- Safety-critical applications
- Real-world deployment without validation

## Training Data

- **Dataset**: CIFAR-10
- **Size**: 50,000 training images, 10,000 test images
- **Resolution**: 32×32 RGB images
- **Splits**: 45,000 train / 5,000 validation / 10,000 test

## Performance

| Split | Accuracy |
|-------|----------|
| Train | 98.5% |
| Validation | 92.3% |
| Test | 91.8% |

## Limitations

- Only works on 32×32 images
- Performance degrades on images outside CIFAR-10 distribution
- No adversarial robustness
- Bias towards training distribution

## Ethical Considerations

- Dataset contains potential biases in category representation
- Should not be used for surveillance applications
- Consider privacy implications when deploying

7.18 Complete Development Workflow

Putting it all together, here’s a complete development cycle:

7.18.1 Daily Development Cycle

# 1. Pull latest changes
git pull

# 2. Sync environment
uv sync --all-extras

# 3. Make changes to code
# ... edit files ...

# 4. Format code
uv run ruff format

# 5. Fix linting issues
uv run ruff check --fix

# 6. Verify remaining issues
uv run ruff check

# 7. Type check
uv run mypy src/

# 8. Run tests
uv run pytest

# 9. Check coverage
uv run pytest --cov=classifier --cov-report=term

# 10. Commit changes
git add .
git commit -m "Add feature X"
git push

7.18.2 Before Committing Checklist

Create a Makefile to automate checks:

.PHONY: format lint typecheck test check all

format:
    uv run ruff format

lint:
    uv run ruff check --fix
    uv run ruff check

typecheck:
    uv run mypy src/

test:
    uv run pytest -v

coverage:
    uv run pytest --cov=classifier --cov-report=html --cov-report=term

check: format lint typecheck test

all: check coverage

clean:
    rm -rf .venv
    rm -rf htmlcov/
    rm -rf .mypy_cache/
    rm -rf .pytest_cache/
    rm -rf .ruff_cache/
    find . -type d -name __pycache__ -exec rm -rf {} +
    find . -type f -name "*.pyc" -delete

Usage:

# Run all checks before committing
make check

# Generate coverage report
make coverage

# Clean up artifacts
make clean

7.18.3 Pre-commit Hooks (Optional)

For automatic checking, install pre-commit:

uv add --dev pre-commit

Create .pre-commit-config.yaml:

repos:
  - repo: local
    hooks:
      - id: ruff-format
        name: Ruff Format
        entry: uv run ruff format
        language: system
        types: [python]
        
      - id: ruff-check
        name: Ruff Check
        entry: uv run ruff check --fix
        language: system
        types: [python]
        
      - id: mypy
        name: mypy
        entry: uv run mypy
        language: system
        types: [python]
        pass_filenames: false
        args: [src/]
        
      - id: pytest-fast
        name: pytest (fast tests only)
        entry: uv run pytest -m "not slow"
        language: system
        pass_filenames: false
        always_run: true

Install hooks:

uv run pre-commit install

Now checks run automatically on git commit.

7.18.4 CI/CD Pipeline

Create .github/workflows/test.yml:

name: Test

on: [push, pull_request]

jobs:
  test:
    runs-on: ubuntu-latest
    
    steps:
      - uses: actions/checkout@v4
      
      - name: Install uv
        run: curl -LsSf https://astral.sh/uv/install.sh | sh
        
      - name: Add uv to PATH
        run: echo "$HOME/.cargo/bin" >> $GITHUB_PATH
      
      - name: Sync dependencies
        run: uv sync --all-extras
      
      - name: Format check
        run: uv run ruff format --check
      
      - name: Lint
        run: uv run ruff check
      
      - name: Type check
        run: uv run mypy src/
      
      - name: Test
        run: uv run pytest --cov=classifier --cov-report=xml
      
      - name: Upload coverage
        uses: codecov/codecov-action@v3
        with:
          file: ./coverage.xml

7.18.5 Project Structure Best Practices

For ML projects that might integrate with R workflows or require cross-language collaboration:

flowchart TD
    root["ml-project/"]
    root --> github[".github/"]
    root --> configs["configs, training configs"]
    root --> data["data, not in git"]
    root --> docs["docs, documentation"]
    root --> experiments["experiments, tracking"]
    root --> models["models, saved models"]
    root --> notebooks["notebooks, Jupyter"]
    root --> reports["reports, Quarto reports"]
    root --> scripts["scripts, utility scripts"]
    root --> src["src, source code"]
    root --> tests["tests/"]
    root --> gitignore[".gitignore"]
    root --> pyver[".python-version"]
    root --> makefile["Makefile"]
    root --> pyproject["pyproject.toml"]
    root --> readme["README.md"]
    root --> lock["uv.lock"]
    github --> workflows["workflows/"]
    workflows --> testyml["test.yml"]
    workflows --> deployyml["deploy.yml"]
    configs --> r18["resnet18.yaml"]
    configs --> r50["resnet50.yaml"]
    data --> raw["raw/"]
    data --> processed["processed/"]
    data --> splits["splits/"]
    docs --> modelcard["model_card.qmd"]
    docs --> apidoc["api.qmd"]
    experiments --> exp001["exp_001/"]
    experiments --> exp002["exp_002/"]
    exp001 --> cfg["config.yaml"]
    exp001 --> metrics["metrics.csv"]
    exp001 --> ckpt["checkpoints/"]
    models --> prod["production/"]
    models --> staging["staging/"]
    notebooks --> eda["01-eda.ipynb"]
    notebooks --> analysis["02-analysis.ipynb"]
    reports --> exprep["experiment_001.qmd"]
    scripts --> strain["train.py"]
    scripts --> seval["evaluate.py"]
    src --> classifier["classifier/"]
    classifier --> init["__init__.py"]
    classifier --> datapy["data.py"]
    classifier --> modelspy["models.py"]
    classifier --> trainpy["train.py"]
    classifier --> evalpy["evaluate.py"]
    tests --> conftest["conftest.py"]
    tests --> tdata["test_data.py"]
    tests --> tmodels["test_models.py"]
    tests --> ttrain["test_train.py"]

.gitignore for ML projects:

# Python
__pycache__/
*.py[cod]
*$py.class
*.so
.Python
build/
develop-eggs/
dist/
downloads/
eggs/
.eggs/
lib/
lib64/
parts/
sdist/
var/
wheels/
*.egg-info/
.installed.cfg
*.egg

# Virtual environments
.venv/
venv/
ENV/
env/

# IDE
.vscode/
.idea/
*.swp
*.swo
*~

# Testing
.pytest_cache/
.coverage
htmlcov/
.mypy_cache/
.ruff_cache/

# Jupyter
.ipynb_checkpoints/

# Data (large files)
data/raw/*.jpg
data/raw/*.png
data/raw/*.zip
data/processed/*.npy
data/processed/*.h5

# Models (use Git LFS or external storage)
models/*.pth
models/*.ckpt
models/*.h5
*.onnx

# Experiment tracking
wandb/
mlruns/
.neptune/
experiments/*/checkpoints/

# Logs
logs/
*.log

# OS
.DS_Store
Thumbs.db

7.19 Summary: The Complete ML Development Stack

With uv and the modern Python toolchain, you have:

Environment Management (uv):

  • Fast, reliable package installation

  • Reproducible environments with lock files

  • Python version management

  • GPU/CPU dependency variants

Code Quality (Ruff):

  • Consistent formatting

  • Automated linting

  • Fast feedback loops

  • Catches common bugs

Type Safety (mypy):

  • Early error detection

  • Self-documenting code

  • Better IDE support

  • Refactoring confidence

Testing (pytest):

  • Unit and integration tests

  • Code coverage tracking

  • Parallel test execution

  • CI/CD integration

Documentation (Quarto):

  • Executable reports

  • Model cards

  • API documentation

  • Reproducible analyses

This toolchain creates a professional development workflow that:

  • Catches errors early (before training expensive models)

  • Ensures reproducibility (lock files + versioning)

  • Improves collaboration (consistent style + documentation)

  • Speeds up development (fast tools + automation)

The investment in setting up this infrastructure pays dividends throughout your ML project lifecycle, from initial prototyping through production deployment.

7.19.1 Setting Up the Project

Clone and set up:

# Clone repository
git clone https://github.com/user/image-classifier.git
cd image-classifier

# Install dependencies (uv reads uv.lock for exact versions)
uv sync --all-extras

# Run tests
uv run pytest

# Start training
uv run train --config configs/resnet18.yaml

The beauty of this workflow: a single uv sync command installs everything exactly as specified in the lock file. No version mismatches, no dependency conflicts, no environment inconsistencies when deploying your trained model.

7.19.2 Updating Dependencies

When you need to update packages (e.g., new PyTorch release with bug fixes):

# Update all packages to latest compatible versions
uv sync --upgrade

# Update specific package
uv add --upgrade torch

# Update and regenerate lock file
uv lock --upgrade

After updating, test your code thoroughly and commit the new uv.lock:

uv run pytest
git add uv.lock
git commit -m "Update dependencies - PyTorch 2.3.0"

Important for ML: When updating deep learning frameworks, always retrain key models and validate that performance hasn’t degraded. Minor version updates can sometimes change numerical precision or default behaviors.

7.20 Tools and Global Packages

Beyond project dependencies, you often need global tools like ruff, black, or pipx equivalents. uv handles these with uv tool.

7.20.1 Installing Global Tools

uv tool install ruff
uv tool install black
uv tool install mypy

These are installed in isolated environments but available globally. You can then use them anywhere:

ruff check .
black src/
mypy src/

7.20.2 Listing Installed Tools

uv tool list

7.20.3 Upgrading Tools

uv tool upgrade ruff
uv tool upgrade-all  # Upgrade all tools

7.20.4 Running Tools Without Installing

For one-off uses:

uv tool run ruff check .

This downloads ruff if needed, runs it, then discards the environment.

7.21 Migration from Other Tools

7.21.1 From pip and requirements.txt

If you have a requirements.txt:

# Create new project
uv init my-project
cd my-project

# Import requirements
uv add $(cat requirements.txt)

Or convert to pyproject.toml manually:

dependencies = [
    "pandas==2.0.3",
    "numpy==1.24.4",
    # ... etc
]

Then:

uv sync

7.21.2 From poetry

If migrating from poetry, you already have pyproject.toml. Just remove poetry-specific sections:

# Remove poetry.lock
rm poetry.lock

# Initialize uv in the directory
uv init --no-readme

# Sync dependencies
uv sync

7.21.3 From conda

For conda users, export your environment:

conda env export --from-history > requirements.txt

Edit requirements.txt to remove conda-specific packages, then:

uv init my-project
cd my-project
uv add $(cat requirements.txt)

Some packages (especially scientific ones like cudatoolkit) are conda-specific and may need alternatives or system-level installation.

7.22 Continuous Integration

Using uv in CI/CD pipelines is straightforward and fast.

7.22.1 GitHub Actions Example

name: Tests

on: [push, pull_request]

jobs:
  test:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      
      - name: Install uv
        run: curl -LsSf https://astral.sh/uv/install.sh | sh
      
      - name: Sync dependencies
        run: uv sync --all-extras
      
      - name: Run tests
        run: uv run pytest --cov=src tests/
      
      - name: Run type checking
        run: uv run mypy src/

This is much faster than traditional pip install approaches, often reducing CI times by 50% or more.

7.22.2 GitLab CI Example

test:
  image: python:3.12
  before_script:
    - curl -LsSf https://astral.sh/uv/install.sh | sh
    - source $HOME/.cargo/env
  script:
    - uv sync --all-extras
    - uv run pytest

7.23 Performance Considerations

The speed of uv is one of its defining features. Here’s why it’s fast and how to maximize performance:

7.23.1 Parallel Downloads

uv downloads packages in parallel, using all available network bandwidth. Traditional pip downloads serially, which wastes time.

7.23.2 Caching

uv aggressively caches downloaded wheels. Once you’ve installed pandas==2.2.2, it’s cached globally. Installing it in another project is nearly instant.

Cache location:

# macOS/Linux
~/.cache/uv/

# Windows  
%LOCALAPPDATA%\uv\cache\

7.23.3 Benchmark Comparisons

In real-world testing, uv shows dramatic speedups:

Tool Time to install torch+torchvision+numpy
pip 185 seconds
poetry 145 seconds
uv 12 seconds

For larger dependency trees (e.g., installing transformers with all its dependencies, or a complete data science stack), the difference is even more pronounced. This matters especially in ML workflows where you frequently create new environments for experiments or CI/CD pipelines.

7.23.4 Tips for Maximum Performance

  1. Use the lock file: uv sync with a lock file is faster than resolving dependencies from scratch
  2. Cache in CI: Cache ~/.cache/uv in CI pipelines
  3. Pre-download dependencies: Use uv sync --no-install-project to download without installing
  4. Use wheels: Avoid source distributions when possible; wheels install much faster

7.24 Troubleshooting Common Issues

7.24.1 Problem: Package Not Found

error: Failed to download `package-name`

Solution: Check package name spelling. Verify it exists on PyPI. Try updating the index:

uv sync --refresh

7.24.2 Problem: Version Conflicts

error: No solution found when resolving dependencies

Solution: Relax version constraints. Check which packages are conflicting and update them:

uv tree  # See dependency tree

7.24.3 Problem: Python Version Not Available

error: No interpreter found for Python 3.12

Solution: Install the Python version:

uv python install 3.12

7.24.4 Problem: Import Fails in Script

ImportError: No module named 'torch'

Solution: Ensure you’re running with uv run:

uv run python train.py

Or sync dependencies:

uv sync

7.24.5 Problem: Wrong Package Version

Solution: Check what’s installed:

uv pip list

Lock and sync to fix:

uv lock
uv sync

7.25 Best Practices for ML Projects

Based on years of machine learning development, here are recommended practices:

7.25.1 1. Always Use Lock Files

Commit uv.lock to git. This is non-negotiable for reproducible ML research and production deployments.

git add uv.lock pyproject.toml
git commit -m "Lock dependencies"

7.25.2 2. Pin Python Versions

Use .python-version to specify the exact Python version:

uv python pin 3.11.9

This prevents subtle bugs from Python version differences that can affect model training or inference.

7.25.3 3. Separate Development Dependencies

Keep development tools separate from training/inference dependencies:

[project.optional-dependencies]
dev = [
    "pytest",
    "jupyter",
    "black",
]

This keeps your production Docker images lean.

7.25.4 4. Document Environment Setup

Include clear instructions in README.md:

## Setup

1. Install uv: `curl -LsSf https://astral.sh/uv/install.sh | sh`
2. Sync environment: `uv sync --all-extras`
3. Train model: `uv run train --config configs/resnet50.yaml`
4. Evaluate: `uv run evaluate --checkpoint models/best.pth`

7.25.5 5. Use Scripts for Reproducibility

Define scripts in pyproject.toml:

[project.scripts]
preprocess = "classifier.data:preprocess"
train = "classifier.train:main"
evaluate = "classifier.evaluate:main"
infer = "classifier.inference:predict"

Then document the ML pipeline:

uv run preprocess --data data/raw/
uv run train --epochs 100 --lr 0.001
uv run evaluate --model models/checkpoint.pth
uv run infer --image test.jpg

7.25.6 6. Version Control Configuration

Create a .gitignore:

# Python
__pycache__/
*.py[cod]
.ipynb_checkpoints/

# uv
.venv/

# Data (don't commit large datasets)
data/raw/*.jpg
data/raw/*.png
data/processed/

# Models (use Git LFS or external storage)
models/*.pth
models/*.ckpt
*.h5

# Experiment tracking
wandb/
mlruns/
.neptune/

# Results
results/
experiments/*/outputs/

7.25.7 7. Regular Dependency Audits

Periodically check for outdated packages:

uv sync --upgrade
uv run pytest  # Ensure tests still pass
# Re-run key training experiments to validate

7.25.8 8. Use Inline Scripts for Quick Experiments

For quick exploratory work or prototyping:

# /// script
# dependencies = [
#   "torch",
#   "torchvision",
#   "matplotlib",
# ]
# ///

import torch
import torchvision.models as models
import matplotlib.pyplot as plt

# Quick model prototyping
model = models.resnet18(pretrained=True)
# ... experiment code ...

Run with:

uv run experiment.py

7.25.9 9. GPU Environment Management

For projects requiring CUDA, create separate dependency groups:

[project.optional-dependencies]
gpu = [
    "torch[cuda]>=2.0.0",
]

cpu = [
    "torch>=2.0.0",
]

Then install based on your environment:

# On GPU machine
uv sync --extra gpu

# On CPU-only machine
uv sync --extra cpu

7.26 Working with Deep Learning Frameworks and GPUs

One of the most common pain points in ML development is managing deep learning frameworks, especially when dealing with CUDA and GPU support. uv simplifies this process significantly.

7.26.1 PyTorch with CUDA Support

PyTorch offers different packages for CPU-only and CUDA-enabled versions. With uv, you can manage these elegantly:

Option 1: Platform-specific dependencies

[project]
dependencies = [
    "numpy>=1.24.0",
    "pillow>=10.0.0",
]

[project.optional-dependencies]
cuda = [
    "torch>=2.0.0",
    "torchvision>=0.15.0",
]

cpu = [
    "torch>=2.0.0",
    "torchvision>=0.15.0",
]

Then install based on your hardware:

# On GPU machine
uv sync --extra cuda

# On CPU-only machine
uv sync --extra cpu

Option 2: Using PyTorch index for CUDA versions

PyTorch hosts CUDA-specific builds on their own index:

# Add PyTorch with CUDA 12.1 support
uv add torch torchvision --index-url https://download.pytorch.org/whl/cu121

Or in pyproject.toml:

[tool.uv]
extra-index-url = ["https://download.pytorch.org/whl/cu121"]

[project]
dependencies = [
    "torch>=2.0.0",
    "torchvision>=0.15.0",
]

7.26.2 TensorFlow with GPU Support

TensorFlow 2.x simplifies GPU support:

# TensorFlow with GPU support (works with CUDA)
uv add tensorflow[and-cuda]>=2.15.0

Or for CPU-only:

uv add tensorflow>=2.15.0

7.26.3 JAX with GPU Support

JAX requires specific CUDA/cuDNN versions:

# JAX with CUDA 12 support
uv add "jax[cuda12]>=0.4.20" -f https://storage.googleapis.com/jax-releases/jax_cuda_releases.html

7.26.4 Verifying GPU Access

Create a simple verification script:

# /// script
# dependencies = [
#   "torch",
# ]
# ///

import torch

print(f"PyTorch version: {torch.__version__}")
print(f"CUDA available: {torch.cuda.is_available()}")
if torch.cuda.is_available():
    print(f"CUDA version: {torch.version.cuda}")
    print(f"Number of GPUs: {torch.cuda.device_count()}")
    print(f"GPU name: {torch.cuda.get_device_name(0)}")

Run with:

uv run verify_gpu.py

7.26.5 Managing Multiple Framework Versions

For projects that need to test across different framework versions:

[project.optional-dependencies]
torch-2-0 = ["torch==2.0.1", "torchvision==0.15.2"]
torch-2-1 = ["torch==2.1.2", "torchvision==0.16.2"]
torch-2-3 = ["torch==2.3.1", "torchvision==0.18.1"]

Then test with different versions:

uv sync --extra torch-2-0
uv run pytest

uv sync --extra torch-2-1
uv run pytest

7.26.6 Hugging Face Transformers

For NLP tasks with transformers:

uv add transformers datasets tokenizers accelerate

For training large models with optimizations:

uv add transformers[torch] datasets accelerate bitsandbytes

7.26.7 Common ML Stack

Here’s a comprehensive ML dependency setup:

[project]
name = "ml-project"
version = "0.1.0"
requires-python = ">=3.11"

dependencies = [
    # Core scientific computing
    "numpy>=1.24.0,<2.0.0",
    "scipy>=1.11.0",
    "pandas>=2.0.0",
    
    # Visualization
    "matplotlib>=3.7.0",
    "seaborn>=0.12.0",
    "plotly>=5.14.0",
    
    # ML utilities
    "scikit-learn>=1.3.0",
    "tqdm>=4.65.0",
    "pyyaml>=6.0",
]

[project.optional-dependencies]
# Deep learning
pytorch = [
    "torch>=2.0.0,<3.0.0",
    "torchvision>=0.15.0",
    "torchaudio>=2.0.0",
    "lightning>=2.0.0",
]

tensorflow = [
    "tensorflow[and-cuda]>=2.15.0",
    "tensorboard>=2.15.0",
]

# NLP
nlp = [
    "transformers>=4.30.0",
    "datasets>=2.12.0",
    "tokenizers>=0.13.0",
    "sentencepiece>=0.1.99",
]

# Computer vision
cv = [
    "opencv-python>=4.8.0",
    "albumentations>=1.3.1",
    "timm>=0.9.0",
]

# Experiment tracking
tracking = [
    "wandb>=0.15.0",
    "mlflow>=2.5.0",
    "tensorboard>=2.13.0",
]

# Optimization
optimization = [
    "optuna>=3.2.0",
    "ray[tune]>=2.5.0",
]

# Development
dev = [
    "pytest>=7.4.0",
    "pytest-cov>=4.1.0",
    "black>=23.0.0",
    "ruff>=0.1.0",
    "mypy>=1.5.0",
    "jupyter>=1.0.0",
    "ipykernel>=6.25.0",
]

Install what you need:

# Full PyTorch stack with NLP
uv sync --extra pytorch --extra nlp --extra tracking --extra dev

# TensorFlow with computer vision
uv sync --extra tensorflow --extra cv --extra tracking --extra dev

7.26.8 Docker Integration

Create a Dockerfile that uses uv:

FROM nvidia/cuda:12.1.0-base-ubuntu22.04

# Install Python and uv
RUN apt-get update && apt-get install -y python3.11 python3-pip curl
RUN curl -LsSf https://astral.sh/uv/install.sh | sh
ENV PATH="/root/.cargo/bin:$PATH"

# Copy project files
WORKDIR /app
COPY pyproject.toml uv.lock ./

# Install dependencies
RUN uv sync --no-dev

# Copy source code
COPY src/ ./src/

# Run training
CMD ["uv", "run", "train", "--config", "configs/production.yaml"]

Build and run:

docker build -t ml-model:latest .
docker run --gpus all ml-model:latest

7.26.9 CUDA Version Management

Different projects might need different CUDA versions. Document clearly:

# pyproject.toml
[tool.uv]
# PyTorch with CUDA 12.1
extra-index-url = ["https://download.pytorch.org/whl/cu121"]

[project]
dependencies = [
    "torch>=2.3.0",
    "torchvision>=0.18.0",
]

In README:

## Requirements

- CUDA 12.1 or later
- NVIDIA driver 530 or later
- 8GB+ GPU memory (recommended)

## Installation

```bash
# Verify CUDA version
nvidia-smi

# Install dependencies
uv sync --all-extras
```

7.26.10 Mixed Precision Training

For models using mixed precision (crucial for large models):

uv add torch torchvision
# Apex for older PyTorch versions
uv add git+https://github.com/NVIDIA/apex.git

Or use native PyTorch AMP (already included in torch>=1.6).

7.26.11 Memory Optimization Libraries

For large models that don’t fit in GPU memory:

# DeepSpeed for distributed training
uv add deepspeed

# bitsandbytes for quantization
uv add bitsandbytes

# Flash Attention for efficient attention
uv add flash-attn --no-build-isolation

7.26.12 Troubleshooting GPU Issues

Problem: CUDA not detected

# Check PyTorch installation
uv run python -c "import torch; print(torch.cuda.is_available())"

Solution: Ensure you installed CUDA-enabled PyTorch:

uv add torch --index-url https://download.pytorch.org/whl/cu121

Problem: Out of memory errors

Add gradient checkpointing and mixed precision:

# Enable gradient checkpointing
model.gradient_checkpointing_enable()

# Use automatic mixed precision
from torch.cuda.amp import autocast, GradScaler

scaler = GradScaler()

with autocast():
    outputs = model(inputs)
    loss = criterion(outputs, labels)

scaler.scale(loss).backward()
scaler.step(optimizer)
scaler.update()

Problem: Different CUDA versions on different machines

Use environment-specific lock files:

# On GPU machine with CUDA 12.1
uv lock --output-file uv.lock.cuda121

# On GPU machine with CUDA 11.8
uv lock --output-file uv.lock.cuda118

# Sync with specific lock file
uv sync --locked uv.lock.cuda121

7.27 Integration with Other Tools

7.27.1 Pre-commit Hooks

Use uv with pre-commit for code quality:

# .pre-commit-config.yaml
repos:
  - repo: local
    hooks:
      - id: ruff
        name: ruff
        entry: uv run ruff check --fix
        language: system
        types: [python]
      
      - id: black
        name: black
        entry: uv run black
        language: system
        types: [python]
      
      - id: mypy
        name: mypy
        entry: uv run mypy
        language: system
        types: [python]

7.27.2 VS Code Configuration

Configure VS Code to use uv:

{
  "python.defaultInterpreterPath": ".venv/bin/python",
  "python.terminal.activateEnvironment": false,
  "python.testing.pytestEnabled": true,
  "python.testing.pytestArgs": ["tests"],
  "[python]": {
    "editor.defaultFormatter": "ms-python.black-formatter",
    "editor.formatOnSave": true,
    "editor.codeActionsOnSave": {
      "source.organizeImports": true
    }
  }
}

7.27.3 Make-based ML Workflows

Combine with Make for complex ML workflows:

.PHONY: install train evaluate deploy clean

install:
    uv sync --all-extras

data:
    uv run python scripts/download_data.py
    uv run python scripts/preprocess.py

train:
    uv run train --config configs/resnet50.yaml --epochs 100

train-debug:
    uv run train --config configs/debug.yaml --epochs 1

evaluate:
    uv run evaluate --checkpoint models/best.pth --data data/test/

tensorboard:
    uv run tensorboard --logdir experiments/

test:
    uv run pytest tests/ -v --cov=src

format:
    uv run black src/ tests/
    uv run ruff check --fix src/ tests/

type-check:
    uv run mypy src/

notebook:
    uv run jupyter lab

clean:
    rm -rf .venv
    find . -type d -name __pycache__ -exec rm -rf {} +
    rm -rf experiments/*/checkpoints/*.pth

# Complete pipeline
pipeline: data train evaluate

Usage:

# Setup and train
make install
make pipeline

# Development
make train-debug
make test
make format

7.28 Advanced Topics

7.28.1 Custom Package Indexes

If your organization has a private PyPI server:

uv add --index-url https://pypi.company.com/simple/ company-package

Or in pyproject.toml:

[tool.uv]
index-url = "https://pypi.company.com/simple/"
extra-index-url = ["https://pypi.org/simple/"]

7.28.2 Building and Publishing Packages

To build a distribution:

uv build

This creates wheel and source distributions in dist/.

To publish to PyPI:

uv publish

7.28.3 Workspaces

For monorepos with multiple packages:

# Root pyproject.toml
[tool.uv.workspace]
members = ["packages/*"]

Then each subdirectory in packages/ can have its own pyproject.toml.

7.28.4 Environment Variables

Control uv behavior with environment variables:

# Specify cache location
export UV_CACHE_DIR=/custom/cache

# Use different PyPI mirror
export UV_INDEX_URL=https://mirror.pypi.org/simple/

# Increase verbosity
export UV_VERBOSE=1

7.29 Comparison with Other Tools

7.29.1 uv vs pip

Feature pip uv
Speed Baseline 10-100x faster
Resolver Backtracking Modern SAT solver
Lock files Manual (pip-tools) Built-in
Python management No Yes
Virtual envs Manual Automatic

7.29.2 uv vs poetry

Feature poetry uv
Speed Slow Very fast
Maturity Mature New (but stable)
Plugin system Yes No
Publishing Excellent Good
Learning curve Moderate Low

7.29.3 uv vs conda

Feature conda uv
Binary packages Yes Wheels only
Non-Python deps Yes No
Speed Slow Very fast
Environment size Large Small
Scientific stack Excellent Good

For pure Python projects, uv is superior. For projects requiring system libraries (CUDA, MKL, etc.), conda may still be necessary.

7.30 Real-World Example: Complete ML Project

Let’s walk through setting up a complete image classification project using PyTorch and modern best practices.

7.30.1 Step 1: Initialize Project

uv init image-classifier
cd image-classifier
uv python pin 3.11

7.30.2 Step 2: Configure pyproject.toml

[project]
name = "image-classifier"
version = "0.1.0"
description = "Deep learning image classifier using ResNet architecture"
readme = "README.md"
requires-python = ">=3.11"
authors = [
    {name = "Mike", email = "mike@marshall.usc.edu"}
]

dependencies = [
    "torch>=2.0.0,<3.0.0",
    "torchvision>=0.15.0,<1.0.0",
    "numpy>=1.24.0,<2.0.0",
    "pillow>=10.0.0",
    "matplotlib>=3.7.0",
    "scikit-learn>=1.3.0",
    "tqdm>=4.65.0",
    "pyyaml>=6.0",
    "tensorboard>=2.13.0",
]

[project.optional-dependencies]
dev = [
    "jupyter>=1.0.0",
    "ipykernel>=6.25.0",
    "pytest>=7.4.0",
    "pytest-cov>=4.1.0",
    "black>=23.0.0",
    "ruff>=0.1.0",
    "mypy>=1.5.0",
]

experiment = [
    "wandb>=0.15.0",
]

[project.scripts]
train = "classifier.train:main"
evaluate = "classifier.evaluate:main"
infer = "classifier.inference:predict"

[build-system]
requires = ["hatchling"]
build-backend = "hatchling.build"

7.30.3 Step 3: Install Dependencies

uv sync --all-extras

7.30.4 Step 4: Create Project Structure

mkdir -p data/{raw,processed,splits}
mkdir -p models/checkpoints
mkdir -p src/classifier
mkdir -p notebooks
mkdir -p tests
mkdir -p configs
mkdir -p experiments

7.30.5 Step 5: Write Core Code

Create src/classifier/models.py:

"""Neural network architectures for image classification."""

import torch
import torch.nn as nn
import torchvision.models as models
from typing import Optional


def create_model(
    architecture: str = "resnet18",
    num_classes: int = 10,
    pretrained: bool = True,
    freeze_backbone: bool = False,
) -> nn.Module:
    """
    Create a model with specified architecture.
    
    Parameters
    ----------
    architecture : str
        Model architecture ('resnet18', 'resnet50', 'efficientnet_b0')
    num_classes : int
        Number of output classes
    pretrained : bool
        Use ImageNet pretrained weights
    freeze_backbone : bool
        Freeze backbone layers for transfer learning
        
    Returns
    -------
    nn.Module
        Initialized model
    """
    if architecture == "resnet18":
        model = models.resnet18(weights='IMAGENET1K_V1' if pretrained else None)
        num_features = model.fc.in_features
        model.fc = nn.Linear(num_features, num_classes)
    elif architecture == "resnet50":
        model = models.resnet50(weights='IMAGENET1K_V1' if pretrained else None)
        num_features = model.fc.in_features
        model.fc = nn.Linear(num_features, num_classes)
    elif architecture == "efficientnet_b0":
        model = models.efficientnet_b0(
            weights='IMAGENET1K_V1' if pretrained else None
        )
        num_features = model.classifier[1].in_features
        model.classifier[1] = nn.Linear(num_features, num_classes)
    else:
        raise ValueError(f"Unknown architecture: {architecture}")
    
    if freeze_backbone:
        # Freeze all layers except the final classifier
        for param in model.parameters():
            param.requires_grad = False
        
        # Unfreeze classifier
        if architecture in ["resnet18", "resnet50"]:
            for param in model.fc.parameters():
                param.requires_grad = True
        elif architecture == "efficientnet_b0":
            for param in model.classifier.parameters():
                param.requires_grad = True
    
    return model


class Classifier(nn.Module):
    """
    Wrapper for classification models with additional utilities.
    """
    
    def __init__(
        self,
        backbone: nn.Module,
        num_classes: int,
        dropout: float = 0.5,
    ):
        super().__init__()
        self.backbone = backbone
        self.dropout = nn.Dropout(dropout)
        
    def forward(self, x: torch.Tensor) -> torch.Tensor:
        features = self.backbone(x)
        return self.dropout(features)

Create src/classifier/train.py:

"""Training loop for image classification."""

import torch
import torch.nn as nn
import torch.optim as optim
from torch.utils.data import DataLoader
from torch.utils.tensorboard import SummaryWriter
from pathlib import Path
from tqdm import tqdm
from typing import Dict, Tuple
import yaml

from .models import create_model
from .data import create_dataloaders
from .utils import save_checkpoint, AverageMeter


def train_epoch(
    model: nn.Module,
    dataloader: DataLoader,
    criterion: nn.Module,
    optimizer: optim.Optimizer,
    device: torch.device,
    epoch: int,
) -> Tuple[float, float]:
    """
    Train for one epoch.
    
    Returns
    -------
    tuple
        Average loss and accuracy for the epoch
    """
    model.train()
    losses = AverageMeter()
    accuracies = AverageMeter()
    
    pbar = tqdm(dataloader, desc=f"Epoch {epoch}")
    
    for images, labels in pbar:
        images = images.to(device)
        labels = labels.to(device)
        
        # Forward pass
        outputs = model(images)
        loss = criterion(outputs, labels)
        
        # Backward pass
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()
        
        # Calculate accuracy
        _, predicted = outputs.max(1)
        accuracy = (predicted == labels).float().mean()
        
        # Update metrics
        losses.update(loss.item(), images.size(0))
        accuracies.update(accuracy.item(), images.size(0))
        
        pbar.set_postfix({
            'loss': f'{losses.avg:.4f}',
            'acc': f'{accuracies.avg:.4f}'
        })
    
    return losses.avg, accuracies.avg


def validate(
    model: nn.Module,
    dataloader: DataLoader,
    criterion: nn.Module,
    device: torch.device,
) -> Tuple[float, float]:
    """
    Validate the model.
    
    Returns
    -------
    tuple
        Average loss and accuracy
    """
    model.eval()
    losses = AverageMeter()
    accuracies = AverageMeter()
    
    with torch.no_grad():
        for images, labels in tqdm(dataloader, desc="Validation"):
            images = images.to(device)
            labels = labels.to(device)
            
            outputs = model(images)
            loss = criterion(outputs, labels)
            
            _, predicted = outputs.max(1)
            accuracy = (predicted == labels).float().mean()
            
            losses.update(loss.item(), images.size(0))
            accuracies.update(accuracy.item(), images.size(0))
    
    return losses.avg, accuracies.avg


def train_model(config: Dict) -> None:
    """
    Main training function.
    
    Parameters
    ----------
    config : dict
        Training configuration
    """
    # Setup
    device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
    print(f"Using device: {device}")
    
    # Create dataloaders
    train_loader, val_loader, _ = create_dataloaders(
        data_dir=config['data']['path'],
        batch_size=config['training']['batch_size'],
        num_workers=config['training']['num_workers'],
    )
    
    # Create model
    model = create_model(
        architecture=config['model']['architecture'],
        num_classes=config['model']['num_classes'],
        pretrained=config['model']['pretrained'],
        freeze_backbone=config['model'].get('freeze_backbone', False),
    )
    model = model.to(device)
    
    # Loss and optimizer
    criterion = nn.CrossEntropyLoss()
    optimizer = optim.Adam(
        model.parameters(),
        lr=config['training']['learning_rate'],
        weight_decay=config['training']['weight_decay'],
    )
    
    # Learning rate scheduler
    scheduler = optim.lr_scheduler.ReduceLROnPlateau(
        optimizer,
        mode='min',
        factor=0.5,
        patience=5,
    )
    
    # Tensorboard
    writer = SummaryWriter(config['training']['log_dir'])
    
    # Training loop
    best_val_acc = 0.0
    
    for epoch in range(1, config['training']['epochs'] + 1):
        # Train
        train_loss, train_acc = train_epoch(
            model, train_loader, criterion, optimizer, device, epoch
        )
        
        # Validate
        val_loss, val_acc = validate(model, val_loader, criterion, device)
        
        # Update learning rate
        scheduler.step(val_loss)
        
        # Log metrics
        writer.add_scalar('Loss/train', train_loss, epoch)
        writer.add_scalar('Loss/val', val_loss, epoch)
        writer.add_scalar('Accuracy/train', train_acc, epoch)
        writer.add_scalar('Accuracy/val', val_acc, epoch)
        writer.add_scalar('LR', optimizer.param_groups[0]['lr'], epoch)
        
        print(f"\nEpoch {epoch}:")
        print(f"  Train Loss: {train_loss:.4f}, Train Acc: {train_acc:.4f}")
        print(f"  Val Loss: {val_loss:.4f}, Val Acc: {val_acc:.4f}")
        
        # Save checkpoint
        is_best = val_acc > best_val_acc
        best_val_acc = max(val_acc, best_val_acc)
        
        save_checkpoint(
            {
                'epoch': epoch,
                'model_state_dict': model.state_dict(),
                'optimizer_state_dict': optimizer.state_dict(),
                'val_acc': val_acc,
                'config': config,
            },
            is_best=is_best,
            checkpoint_dir=config['training']['checkpoint_dir'],
        )
    
    writer.close()
    print(f"\nTraining completed. Best validation accuracy: {best_val_acc:.4f}")


def main():
    """Entry point for training script."""
    import argparse
    
    parser = argparse.ArgumentParser(description='Train image classifier')
    parser.add_argument(
        '--config',
        type=str,
        required=True,
        help='Path to config file'
    )
    args = parser.parse_args()
    
    # Load config
    with open(args.config, 'r') as f:
        config = yaml.safe_load(f)
    
    # Train
    train_model(config)


if __name__ == '__main__':
    main()

Create src/classifier/data.py:

"""Data loading and preprocessing utilities."""

import torch
from torch.utils.data import DataLoader, random_split
from torchvision import datasets, transforms
from pathlib import Path
from typing import Tuple


def get_transforms(
    train: bool = True,
    image_size: int = 224,
) -> transforms.Compose:
    """
    Get data transforms for training or validation.
    
    Parameters
    ----------
    train : bool
        If True, return training transforms with augmentation
    image_size : int
        Target image size
        
    Returns
    -------
    transforms.Compose
        Composed transforms
    """
    if train:
        return transforms.Compose([
            transforms.RandomResizedCrop(image_size),
            transforms.RandomHorizontalFlip(),
            transforms.ColorJitter(
                brightness=0.2,
                contrast=0.2,
                saturation=0.2,
            ),
            transforms.ToTensor(),
            transforms.Normalize(
                mean=[0.485, 0.456, 0.406],
                std=[0.229, 0.224, 0.225],
            ),
        ])
    else:
        return transforms.Compose([
            transforms.Resize(256),
            transforms.CenterCrop(image_size),
            transforms.ToTensor(),
            transforms.Normalize(
                mean=[0.485, 0.456, 0.406],
                std=[0.229, 0.224, 0.225],
            ),
        ])


def create_dataloaders(
    data_dir: str,
    batch_size: int = 32,
    num_workers: int = 4,
    val_split: float = 0.2,
) -> Tuple[DataLoader, DataLoader, DataLoader]:
    """
    Create train, validation, and test dataloaders.
    
    Parameters
    ----------
    data_dir : str
        Path to data directory
    batch_size : int
        Batch size
    num_workers : int
        Number of workers for data loading
    val_split : float
        Validation split ratio
        
    Returns
    -------
    tuple
        Train, validation, and test dataloaders
    """
    data_path = Path(data_dir)
    
    # Load datasets
    train_dataset = datasets.ImageFolder(
        data_path / 'train',
        transform=get_transforms(train=True)
    )
    
    test_dataset = datasets.ImageFolder(
        data_path / 'test',
        transform=get_transforms(train=False)
    )
    
    # Split train into train and validation
    val_size = int(len(train_dataset) * val_split)
    train_size = len(train_dataset) - val_size
    
    train_subset, val_subset = random_split(
        train_dataset,
        [train_size, val_size],
        generator=torch.Generator().manual_seed(42)
    )
    
    # Create dataloaders
    train_loader = DataLoader(
        train_subset,
        batch_size=batch_size,
        shuffle=True,
        num_workers=num_workers,
        pin_memory=True,
    )
    
    val_loader = DataLoader(
        val_subset,
        batch_size=batch_size,
        shuffle=False,
        num_workers=num_workers,
        pin_memory=True,
    )
    
    test_loader = DataLoader(
        test_dataset,
        batch_size=batch_size,
        shuffle=False,
        num_workers=num_workers,
        pin_memory=True,
    )
    
    return train_loader, val_loader, test_loader

7.30.6 Step 6: Write Tests

Create tests/test_models.py:

"""Tests for model architectures."""

import pytest
import torch
from classifier.models import create_model


def test_resnet18_creation():
    """Test ResNet18 model creation."""
    model = create_model(
        architecture='resnet18',
        num_classes=10,
        pretrained=False,
    )
    
    assert model is not None
    
    # Test forward pass
    x = torch.randn(2, 3, 224, 224)
    output = model(x)
    
    assert output.shape == (2, 10)


def test_model_with_frozen_backbone():
    """Test model with frozen backbone."""
    model = create_model(
        architecture='resnet18',
        num_classes=10,
        pretrained=True,
        freeze_backbone=True,
    )
    
    # Check that backbone is frozen
    trainable_params = sum(
        p.numel() for p in model.parameters() if p.requires_grad
    )
    
    # Only classifier should be trainable
    assert trainable_params < 1000000  # Arbitrary threshold


@pytest.mark.parametrize('architecture', ['resnet18', 'resnet50'])
def test_different_architectures(architecture):
    """Test different model architectures."""
    model = create_model(
        architecture=architecture,
        num_classes=100,
        pretrained=False,
    )
    
    x = torch.randn(4, 3, 224, 224)
    output = model(x)
    
    assert output.shape == (4, 100)

7.30.7 Step 7: Create Configuration

Create configs/resnet18.yaml:

# Model configuration
model:
  architecture: resnet18
  num_classes: 10
  pretrained: true
  freeze_backbone: false

# Data configuration
data:
  path: data/
  image_size: 224

# Training configuration
training:
  batch_size: 32
  epochs: 50
  learning_rate: 0.001
  weight_decay: 0.0001
  num_workers: 4
  checkpoint_dir: models/checkpoints/
  log_dir: experiments/resnet18/

7.30.8 Step 8: Run Training

# Run tests first
uv run pytest tests/ -v

# Start training
uv run train --config configs/resnet18.yaml

# Monitor with tensorboard
uv run tensorboard --logdir experiments/

7.30.9 Step 9: Create Analysis Notebook

Create notebooks/01-analysis.ipynb:

# /// script
# dependencies = [
#   "torch",
#   "torchvision",
#   "matplotlib",
#   "seaborn",
# ]
# ///

import sys
sys.path.insert(0, '../src')

from classifier.models import create_model
from classifier.data import create_dataloaders
import torch
import matplotlib.pyplot as plt
import seaborn as sns

# Load trained model
model = create_model('resnet18', num_classes=10)
checkpoint = torch.load('../models/checkpoints/best.pth')
model.load_state_dict(checkpoint['model_state_dict'])

# Analyze results
_, _, test_loader = create_dataloaders('../data', batch_size=32)

# Evaluate and visualize
# ... evaluation code ...

7.30.10 Step 10: Document

Create comprehensive README.md:

# Image Classifier

Deep learning image classifier using PyTorch and ResNet architectures.

## Features

- Multiple architecture support (ResNet18, ResNet50, EfficientNet)
- Transfer learning with pretrained weights
- Data augmentation
- TensorBoard logging
- Comprehensive testing

## Setup

```bash
# Install uv
curl -LsSf https://astral.sh/uv/install.sh | sh

# Clone and setup
git clone https://github.com/user/image-classifier.git
cd image-classifier
uv sync --all-extras
```

## Usage

### Training

```bash
uv run train --config configs/resnet18.yaml
```

### Evaluation

```bash
uv run evaluate --checkpoint models/best.pth --data data/test/
```

### Inference

```bash
uv run infer --checkpoint models/best.pth --image path/to/image.jpg
```

### Monitoring

```bash
uv run tensorboard --logdir experiments/
```

## Project Structure

```
image-classifier/
├── src/classifier/      # Source code
├── tests/              # Unit tests
├── configs/            # Training configurations
├── data/               # Datasets
├── models/             # Model checkpoints
├── notebooks/          # Jupyter notebooks
└── experiments/        # Experiment logs
```

## Results

| Model | Accuracy | Parameters |
|-------|----------|-----------|
| ResNet18 | 92.3% | 11.7M |
| ResNet50 | 94.1% | 25.6M |

## Citation

If you use this code, please cite...

7.31 Conclusion

uv represents a significant step forward in Python package management. Its speed, simplicity, and reliability make it ideal for machine learning and AI development where managing complex dependencies and ensuring reproducibility is critical. By combining package management, environment isolation, and Python version management into a single tool, uv eliminates much of the friction that has historically plagued Python ML development.

For ML practitioners, the benefits are clear:

  • Faster iteration: Less time waiting for packages means more time training models and experimenting
  • Better reproducibility: Lock files ensure your trained models can be deployed with the exact environment they were trained in
  • Simpler workflows: One tool instead of many reduces cognitive overhead
  • Production-ready: Fast, reliable dependency management makes deployment smoother

As you continue through this book, many examples will benefit from using uv for environment management. The patterns we’ve established here, using pyproject.toml, locking dependencies, and running code with uv run, will serve you well throughout your machine learning journey, from prototyping to production deployment.

7.32 Summary

In this chapter, we’ve covered:

  • Installing and configuring uv across different platforms
  • Creating and managing ML projects with proper structure
  • Handling dependencies, version constraints, and lock files
  • Managing Python versions for consistency
  • Integrating with Jupyter notebooks for experimentation
  • Building reproducible ML workflows for training and deployment
  • Troubleshooting common issues in ML environments
  • Best practices for ML/AI projects including GPU environment management

With uv in your toolkit, you’re well-equipped to manage the technical infrastructure of your ML projects, allowing you to focus on what matters most: building, training, and deploying effective machine learning models. The speed and reliability of uv means less time fighting with dependencies and more time on actual model development and experimentation.