4 Biological versus Artificial Intelligence

4.1 1. Introduction

The artificial neural network is the dominant computational metaphor of our era, and it carries an origin story that is biological. The phrase “neural network” invites the reader to imagine that a trained transformer is, in some meaningful sense, a brain in silicon. This chapter examines that invitation critically. The relationship between biological and artificial intelligence is neither one of identity nor one of total disconnection. It is a relationship of loose inspiration, occasional convergence, and frequent divergence. Understanding precisely where the analogy holds and where it collapses is essential for the practitioner who wants to reason clearly about what current systems can and cannot do, and for the researcher who wants to know whether neuroscience still has anything to offer machine learning.

We proceed from the biological substrate upward. We begin with the neuron and the synapse, the physical units of computation in the brain. We then describe how brains actually learn, which turns out to be quite different from how artificial networks learn. We introduce the artificial neuron as the abstraction it really is, namely a drastic simplification adopted for mathematical convenience rather than biological fidelity. We then catalog the major points of divergence: the credit assignment problem, energy efficiency, and the question of spiking versus rate coding. Neuromorphic computing is treated as the engineering response to some of these gaps. Finally we survey what artificial intelligence has genuinely borrowed from neuroscience, what it has not, and why brain inspiration has limits as a research strategy.

The argument of the chapter is summarized by the comparison below, which the rest of the text unpacks and qualifies. Each row is a dimension along which the two kinds of system can be measured, and in nearly every case the two columns describe genuinely different mechanisms rather than two implementations of one idea.

Dimension	Biological neuron	Artificial unit
Signal	Discrete spikes in continuous time	Single real number per forward pass
Code	Spike timing and rate	Rate (activation) only
Weight	Dynamic stochastic chemical synapse	Single fixed scalar
Learning rule	Local, online, neuromodulated plasticity	Global, batched, exact backpropagation
Credit assignment	Approximate, local, no weight transport	Exact gradient via chain rule
Power	About twenty watts, whole brain	Kilowatts to megawatts per training run
Activity	Sparse and event driven	Dense and synchronous
Sample efficiency	Few-shot, continual, low forgetting	Data hungry, prone to catastrophic forgetting

A note on scope. This chapter compares mechanisms, not capabilities. It does not argue that one system is superior; it argues that the two are different solutions to overlapping problems, and that conflating them leads to bad intuitions about what artificial systems can do.

4.2 2. The Biological Neuron and Synapse

4.2.1 2.1 Anatomy and electrical behavior

A typical neuron consists of a cell body (the soma), a branching set of input structures (the dendrites), and a single output fiber (the axon) that itself branches to contact many downstream cells. Signals arrive at the dendrites, are integrated in the soma, and, if the integrated signal crosses a threshold, trigger an action potential: a brief, stereotyped electrical spike that propagates down the axon. The human brain contains on the order of eighty six billion neurons, with each neuron forming thousands of connections, yielding a connectome with something like one hundred trillion synapses [1].

The action potential is not a graded quantity. It is an all-or-nothing event produced by the rapid opening and closing of voltage-gated sodium and potassium channels in the membrane, described quantitatively by the Hodgkin and Huxley model of 1952 [2]. In that model the membrane is a capacitor in parallel with voltage-dependent conductances, and the membrane potential $V$ evolves as

\[ C_m \frac{dV}{dt} = -\,g_{\text{Na}}\, m^3 h\,(V - E_{\text{Na}}) \;-\; g_{\text{K}}\, n^4\,(V - E_{\text{K}}) \;-\; g_L\,(V - E_L) \;+\; I, \]

where $C_m$ is the membrane capacitance, the $E$ terms are reversal potentials, the $g$ terms are maximal conductances, $I$ is the injected current, and $m$, $h$, $n$ are gating variables in $[0,1]$ each obeying first-order kinetics of the form $\dot{x} = \alpha_x(V)(1-x) - \beta_x(V)\,x$. The cubic and quartic powers and the nonlinear voltage dependence of the rate functions $\alpha_x, \beta_x$ together produce the threshold and the stereotyped spike shape. A useful conceptual reduction is the leaky integrate-and-fire (LIF) neuron, which keeps only the subthreshold dynamics

\[ \tau_m \frac{dV}{dt} = -(V - V_{\text{rest}}) + R\,I(t), \]

and adds an explicit rule: when $V$ reaches a threshold $V_{\text{th}}$, the neuron emits a spike and $V$ is reset to $V_{\text{reset}}$. The LIF model is the workhorse of computational neuroscience and of most spiking neural networks precisely because it captures the essential nonlinearity, threshold then reset, at a fraction of the cost. Information in either scheme is carried not by the amplitude of any single spike but by the timing and frequency of spikes. This is a critical point of contrast with artificial networks and we will return to it.

4.2.2 2.2 The synapse as a computational element

Where two neurons meet, the connection is a synapse. At a chemical synapse, an arriving spike triggers the release of neurotransmitter molecules into a narrow cleft. These molecules bind receptors on the receiving cell and produce a small electrical change there, either excitatory (pushing the cell toward firing) or inhibitory (pushing it away). The strength of this effect, the synaptic weight in machine learning language, is set by many physical variables: the number of vesicles released, the quantity of neurotransmitter, the density and type of postsynaptic receptors, and the geometry of the dendritic spine on which the synapse sits.

This is the first place the analogy frays. An artificial weight is a single scalar. A biological synapse is a dynamic, stochastic, multi-factor chemical machine whose effective strength varies on timescales from milliseconds to a lifetime. Dendrites themselves perform nonlinear computation before any signal reaches the soma, so a single biological neuron may be closer in computational power to a small multilayer network than to a single artificial unit [3]. The mapping “one neuron, one unit” is therefore already a serious oversimplification at the level of the single cell.

4.3 3. How Brains Learn

4.3.1 3.1 Synaptic plasticity

Learning in the brain is, to a first approximation, change in synaptic strength. The foundational principle was stated by Donald Hebb in 1949 and is usually paraphrased as “cells that fire together wire together” [4]. When a presynaptic neuron repeatedly contributes to firing a postsynaptic neuron, the synapse between them strengthens. The biological mechanisms include long-term potentiation (LTP) and long-term depression (LTD), durable increases and decreases in synaptic efficacy mediated by receptor trafficking and structural change at the synapse.

A refinement of Hebb’s rule, spike-timing-dependent plasticity (STDP), captures the observation that the precise relative timing of pre and postsynaptic spikes matters. If the presynaptic spike precedes the postsynaptic one by a few milliseconds, the synapse potentiates; if the order reverses, it depresses [5]. The canonical fit to the data is an exponential window in the timing difference $\Delta t = t_{\text{post}} - t_{\text{pre}}$:

\[ \Delta w = \begin{cases} A_{+}\, e^{-\Delta t / \tau_{+}}, & \Delta t > 0 \quad (\text{pre before post, potentiation}),\\[4pt] -A_{-}\, e^{\,\Delta t / \tau_{-}}, & \Delta t < 0 \quad (\text{post before pre, depression}), \end{cases} \]

with positive amplitudes $A_{+}, A_{-}$ and time constants $\tau_{+}, \tau_{-}$ of order ten to twenty milliseconds. The sign of $\Delta w$ flips with the sign of $\Delta t$, which encodes a crude notion of causality: synapses that plausibly helped cause a spike are strengthened, and synapses that fired too late are weakened. STDP is fundamentally local and causal: each synapse updates using only information physically available at that synapse, namely its own pre and post activity. There is no global error signal threaded backward through the network. This locality is the property that backpropagation, discussed in Section 5, conspicuously lacks.

4.3.2 3.2 Neuromodulation and the three-factor rule

Pure Hebbian plasticity cannot by itself explain goal-directed learning, because it has no notion of reward or relevance. The brain supplies this through neuromodulation. Diffuse systems releasing dopamine, acetylcholine, serotonin, and norepinephrine broadcast slow, global signals that gate and shape plasticity. Dopamine in particular encodes a reward prediction error, the difference between expected and received reward, a quantity that maps remarkably well onto the temporal difference error of reinforcement learning [6]. This gives rise to the “three-factor” learning rule, which can be written compactly as

\[ \Delta w_{ij} \;=\; \eta\; M(t)\; e_{ij}(t), \qquad \dot{e}_{ij} = -\frac{e_{ij}}{\tau_e} + f\big(x_i^{\text{pre}}, x_j^{\text{post}}\big), \]

where $e_{ij}$ is a local synaptic eligibility trace that accumulates Hebbian coincidence $f$ of pre and post activity and then decays with time constant $\tau_e$, and $M(t)$ is the third factor, a globally broadcast neuromodulatory signal such as dopamine that is roughly proportional to a reward prediction error. The eligibility trace solves the temporal credit assignment problem: the synapse holds a fading record of its recent contribution so that a value signal arriving a moment later can still reach the synapses responsible. Synaptic change therefore depends on presynaptic activity, postsynaptic activity, and the third neuromodulatory term that indicates whether the recent behavior was good or bad. Learning in the brain is thus a hybrid of local correlation and globally broadcast, chemically delivered value signals, operating continuously and online rather than in discrete training epochs.

4.4 4. The Artificial Neuron as a Loose Abstraction

4.4.1 4.1 From McCulloch and Pitts to the perceptron

The artificial neuron descends from the threshold logic unit of McCulloch and Pitts (1943), who showed that idealized binary neurons could compute logical functions [7], and from Rosenblatt’s perceptron (1958), which added a learning rule for the weights [8]. The modern unit computes a weighted sum of its inputs, adds a bias, and applies a nonlinear activation function. For an input vector $\mathbf{x} \in \mathbb{R}^d$, weights $\mathbf{w} \in \mathbb{R}^d$, and bias $b \in \mathbb{R}$,

\[ a = \mathbf{w}^{\top}\mathbf{x} + b = \sum_{k=1}^{d} w_k x_k + b, \qquad y = g(a), \]

where $g$ is the activation. The diagram makes the same statement visually:

        x1 ----w1----\
                      \
        x2 ----w2-----> [ sum: a = w.x + b ] --> [ g(a) ] --> output
                      /
        x3 ----w3----/

biological loose analogy:
   inputs  ~ dendritic signals
   weights ~ synaptic strengths
   sum     ~ soma integration
   g(.)    ~ thresholded firing

The visual correspondence is real but shallow. The weighted sum stands in for dendritic integration, the weights for synaptic strengths, and the activation function for the spiking threshold. The original choice of a sigmoid activation was loosely motivated by the saturating firing rate of a neuron. Modern networks largely abandoned this for the rectified linear unit (ReLU), which has no biological pretension at all and was adopted purely because it trains better [9]. This is a recurring pattern: where biological fidelity and engineering performance conflict, performance wins, and the field is correct to let it.

4.4.2 4.2 What the abstraction throws away

The artificial neuron discards the temporal dimension entirely. It emits a continuous real number, interpreted as a firing rate or simply as an abstract activation, in a single forward pass. It has no spikes, no time, no membrane dynamics, no separate excitatory and inhibitory channels obeying Dale’s principle, no dendritic nonlinearities, and no stochasticity. It is a static function evaluation. This is not a flaw to be apologized for; it is a deliberate abstraction that makes the system differentiable and therefore trainable by gradient descent. But it means the artificial neuron is a metaphor that has been optimized for mathematics, not a model that has been validated against biology.

4.5 5. Where the Analogy Breaks

4.5.1 5.1 Backpropagation versus biological learning

The deepest divergence concerns how the two systems solve the credit assignment problem: how to decide which internal parameter deserves blame or credit for an outcome. Artificial networks use backpropagation, which computes the exact gradient of a loss function with respect to every weight by applying the chain rule backward through the network [10]. For a feedforward network with layers $\ell = 1, \dots, L$, pre-activations $\mathbf{a}^{\ell} = W^{\ell}\mathbf{h}^{\ell-1} + \mathbf{b}^{\ell}$, and activations $\mathbf{h}^{\ell} = g(\mathbf{a}^{\ell})$, the error signal $\boldsymbol{\delta}^{\ell} = \partial \mathcal{L} / \partial \mathbf{a}^{\ell}$ obeys the recursion

\[ \boldsymbol{\delta}^{L} = \nabla_{\mathbf{a}^{L}}\mathcal{L}, \qquad \boldsymbol{\delta}^{\ell} = \big( (W^{\ell+1})^{\top} \boldsymbol{\delta}^{\ell+1} \big) \odot g'(\mathbf{a}^{\ell}), \qquad \frac{\partial \mathcal{L}}{\partial W^{\ell}} = \boldsymbol{\delta}^{\ell} (\mathbf{h}^{\ell-1})^{\top}, \]

where $\odot$ is elementwise multiplication. Backpropagation is extraordinarily effective and is the engine of essentially all modern deep learning. It is also widely regarded as biologically implausible for several concrete reasons that are visible directly in the recursion above.

First, the weight transport problem: the backward step multiplies the error by $(W^{\ell+1})^{\top}$, the exact transpose of the forward weight matrix, so the backward pass must reuse the same weights as the forward pass. Biologically this would require each synapse to know the strength of a distinct synapse elsewhere, for which there is no known mechanism. Second, backpropagation requires a separate, precisely orchestrated backward phase that is distinct from forward inference, with errors propagated as signed real numbers; cortical circuits show no clear correlate of such a phase. Third, the gradient must be computed globally and exactly, whereas biological plasticity is local and noisy. Researchers have proposed mechanisms by which the brain might approximate gradient-based credit assignment, including feedback alignment (which shows that random fixed backward weights $B^{\ell+1}$ in place of $(W^{\ell+1})^{\top}$ can still support learning, dissolving the weight transport objection) and predictive-coding schemes that compute error locally [11]. These remain hypotheses. The honest summary is that the brain clearly does something functionally analogous to credit assignment, but the evidence that it does anything like exact backpropagation is weak.

The contrast between the two regimes is structural, and the diagram below makes it explicit. On the left, a global loss computed at the output is threaded backward through transposed weights to reach every parameter. On the right, each synapse updates from quantities physically present at that synapse, with a slow scalar value signal broadcast to all of them at once.

flowchart TB
  subgraph BP["Backpropagation"]
    direction TB
    L["Global loss at output"] --> D["Signed error per layer"]
    D --> W["Update uses transposed forward weights"]
    W --> P1["Every weight gets exact gradient"]
  end
  subgraph BIO["Brain learning"]
    direction TB
    PRE["Presynaptic activity"] --> E["Local eligibility trace"]
    POST["Postsynaptic activity"] --> E
    MOD["Broadcast neuromodulator"] --> U["Local update gated by value signal"]
    E --> U
  end

4.5.2 5.2 Energy efficiency

The quantitative gap in energy is stark. The human brain runs on roughly twenty watts, about the power of a dim light bulb, while performing perception, motor control, language, and reasoning continuously [12]. A back-of-the-envelope calculation shows where this frugality comes from. With about $8.6 \times 10^{10}$ neurons firing on average at a few hertz, on the order of $10^{11}$ to $10^{12}$ spikes occur per second across the whole brain at twenty joules per second of power, which works out to roughly $10^{-11}$ to $10^{-10}$ joules per spike at the system level. Dense digital arithmetic is far more expensive per primitive operation once the cost of moving operands is included, and crucially the brain only pays for the small fraction of neurons that actually spike. The lesson of the estimate is not the exact figure, which is uncertain, but the mechanism: sparse, event-driven computation with colocated memory avoids paying for the silent majority of units. Training a single large language model can consume megawatt-hours and emit carbon on the scale of hundreds of transatlantic flights, and inference at scale draws on entire data centers. Several architectural facts explain the brain’s frugality. Biological computation is event-driven: a neuron consumes significant energy only when it spikes, and neural activity is sparse, with most neurons silent most of the time. Memory and computation are colocated at the synapse, avoiding the constant shuttling of data between separate memory and processing units that dominates the energy budget of conventional von Neumann hardware (the so-called memory wall). A graphics processing unit, by contrast, computes densely and synchronously and spends much of its power moving data. This efficiency gap is one of the strongest arguments that the brain’s design principles still have practical lessons to teach, even if its learning algorithm does not.

4.5.3 5.3 Spiking versus rate coding

A third divergence concerns the code itself. Artificial networks use what is best described as rate coding: a unit’s output is a single number standing for an average activity level, with all temporal structure averaged away. Formally, a spike train is a sum of Dirac impulses $s(t) = \sum_k \delta(t - t_k)$, and a rate code retains only the count $r = \frac{1}{T}\int_0^T s(t)\,dt$ over a window $T$, discarding the individual times $t_k$. A temporal code, by contrast, treats the times $t_k$ themselves as the message. Real neurons communicate with discrete spikes in continuous time, and there is substantial evidence that the precise timing of those spikes carries information that a rate average would discard. Temporal codes can in principle represent and transmit information faster and more efficiently than rate codes, because a single well-timed spike can be informative [13]. The artificial neuron’s commitment to rate coding is again an engineering choice: real-valued, differentiable activations are what gradient descent needs. Spikes are discrete and non-differentiable, which is exactly why they are hard to train and why mainstream deep learning has avoided them. The cost of this choice is that artificial networks forgo whatever computational advantages temporal coding confers.

4.6 6. Neuromorphic Computing

Neuromorphic computing is the engineering program that takes the brain’s physical principles, rather than its learning algorithm, as the thing worth copying. The term and the original vision are due to Carver Mead, who in the late 1980s argued that analog circuits could emulate neural computation far more efficiently than digital simulation [14]. Modern neuromorphic systems are typically digital or mixed-signal and share a common philosophy: event-driven spiking communication, massive parallelism, sparse activity, and the colocation of memory and computation to defeat the memory wall.

Representative platforms include IBM’s TrueNorth, which placed one million spiking neurons on a chip drawing well under one watt; Intel’s Loihi, which added on-chip programmable learning rules so that plasticity can occur locally on the hardware; and SpiNNaker, a massively parallel architecture built from many simple cores designed to simulate spiking networks in real time [15]. These chips can be dramatically more energy-efficient than GPUs for the right workloads, particularly sparse, event-driven, always-on sensing tasks. The catch is the training problem identified above: because spikes are non-differentiable, training spiking neural networks to the accuracy of conventional deep networks remains difficult. The dominant workaround is surrogate gradient training, which replaces the non-differentiable spike with a smooth approximation during the backward pass so that backpropagation can be applied anyway [16]. It is worth noticing the irony: the most biologically inspired hardware is most easily trained by importing the least biologically plausible algorithm. Neuromorphic computing today is a promising research direction with real efficiency wins in narrow domains, not yet a general replacement for the GPU.

4.7 7. What AI Has Borrowed, and What It Has Not

4.7.1 7.1 Genuine borrowings

Three of the most important ideas in modern AI have clear neuroscientific lineage, though in each case the engineering implementation diverged sharply from the biology.

Convolutional networks descend directly from Hubel and Wiesel’s work on the cat visual cortex, which revealed simple and complex cells with local receptive fields arranged in a hierarchy of increasing abstraction [17]. Fukushima’s Neocognitron explicitly modeled this hierarchy [18], and the convolutional networks that now underpin computer vision inherit the core ideas of local receptive fields, weight sharing, and pooling. Weight sharing, however, is a pure engineering convenience with no biological counterpart: the brain does not tie the weights of neurons in different cortical locations.

Reinforcement learning has perhaps the deepest and most bidirectional relationship with neuroscience. The temporal difference learning algorithm was developed in machine learning and then found to predict the phasic firing of dopamine neurons with striking precision, so that the reward prediction error hypothesis of dopamine is now textbook neuroscience [6]. Here theory flowed in both directions, an unusually productive case.

Attention, the mechanism at the heart of the transformer, is named after the cognitive phenomenon of selective attention, the brain’s ability to prioritize some inputs over others [19]. But the resemblance is largely at the level of slogan. The scaled dot-product attention of a transformer is a specific differentiable operation computing a weighted average over learned key, query, and value projections; it bears no demonstrated mechanistic relationship to the neural circuits of biological attention. The name is an inspiration and a useful intuition pump, not a model.

4.7.2 7.2 What AI has not taken

The list of things AI has not borrowed is arguably more revealing. Mainstream deep learning has not adopted spiking communication, continuous online learning, neuromodulatory gating, local learning rules, the strict separation of excitation and inhibition, the brain’s extreme energy efficiency, or its sample efficiency. A child learns a new object category from a handful of examples; a deep network often needs thousands or millions. The brain learns continually without catastrophically forgetting what it learned before, a problem (catastrophic forgetting) that still plagues artificial networks. And the brain operates with a tiny fraction of the data and energy. These omissions are precisely the open problems of the field, which suggests that the parts of biology AI has ignored may be exactly the parts worth revisiting.

4.8 8. The Limits of Brain Inspiration

It is tempting to conclude that the path forward is simply more biological fidelity, but the history of the field counsels caution. The most successful components of modern AI, namely backpropagation, ReLU activations, weight sharing, layer normalization, and the transformer’s attention, are in large part biologically implausible or biologically silent. Performance, not fidelity, drove their adoption. Airplanes do not flap their wings, and the analogy is apt: the principles of aerodynamics that birds exploit were worth understanding, but slavish imitation of feathers and flapping would have delayed powered flight. Brain inspiration has been most useful as a source of abstract principles (hierarchy, local receptive fields, prediction errors, attention, event-driven sparsity) and least useful as a blueprint for literal copying.

There is also a deep epistemic caution. We do not actually understand how the brain computes. Our models of neural learning are incomplete and contested, and the danger of reasoning from the brain is that we may be reasoning from our current, possibly mistaken, theories of the brain rather than from the brain itself. The reverse inference, using artificial networks as models of the brain, is now a thriving subfield, with trained deep networks serving as the best available predictors of activity in visual cortex [20]. But this is a claim about representational similarity in trained systems, not a claim that the brain learns or computes the way the network does.

The mature position is dualistic. Biological and artificial intelligence are two largely independent solutions to overlapping problems, converging here and diverging there. Neuroscience remains a generous source of hypotheses, and the brain’s unmatched efficiency and sample efficiency mark out the frontier that artificial systems have not yet reached. But the artificial neuron should be understood for what it is: a loose, deliberately impoverished abstraction that succeeded because it could be optimized, not because it was faithful. Knowing the difference is what separates a clear understanding of these systems from the seductive and misleading picture of a digital brain.

4.8.1 8.1 When the analogy helps and when it misleads

For the practitioner, the brain analogy is a tool with a specific safe operating range.

It helps when used as a source of abstract architectural principles. Hierarchy, local receptive fields, prediction errors, attention as selective routing, and event-driven sparsity are all ideas that crossed over from neuroscience and earned their place by improving systems. If you are searching for a new inductive bias, biology is a reasonable place to look for candidates, which you then validate empirically rather than accept on authority.

It misleads in several recurring ways. The first pitfall is the equivalence fallacy: assuming that because a system is called a “neural network” it must learn, represent, or fail like a brain. It does not, and arguments of the form “the brain does X, therefore the model does X” are unsound. The second pitfall is the fidelity fallacy: assuming that making a model more biologically realistic will make it more capable. The historical record points the other way, with the most successful components being the least biological. The third pitfall is reasoning from theory as if it were fact: our models of how the brain learns are incomplete and contested, so an argument grounded in the brain may really be grounded in a current and possibly wrong theory of the brain. The fourth pitfall is the direction-of-fit error: the fact that trained networks predict neural activity well [20] is a statement about representational similarity in trained systems, not evidence that the brain runs backpropagation. Used within these limits, the analogy is an intuition pump. Pushed beyond them, it is a source of confident error.

4.9 9. Summary

The biological neuron is a dynamic electrochemical device communicating through timed spikes, learning through local, neuromodulated synaptic plasticity, and running on twenty watts. The artificial neuron is a static, differentiable function trained by global, exact backpropagation on hardware that consumes orders of magnitude more energy. The analogy that gave neural networks their name is real at the level of abstract principle and false at the level of mechanism. AI has borrowed genuine ideas from neuroscience, convolution, reinforcement learning, and the inspiration for attention, while leaving behind spiking, online learning, energy efficiency, and sample efficiency, which are exactly the field’s unsolved problems. Neuromorphic computing pursues the brain’s physical principles and earns real efficiency gains, but trains most easily by importing the unbiological backpropagation it was meant to escape. The brain remains a source of hypotheses and a benchmark of efficiency, but not a blueprint, and the clearest thinkers treat the two intelligences as related cousins rather than as the same thing in different substrates.

4.10 References

[1] Herculano-Houzel, S. (2009). The human brain in numbers: a linearly scaled-up primate brain. Frontiers in Human Neuroscience, 3, 31. https://doi.org/10.3389/neuro.09.031.2009

[2] Hodgkin, A. L., & Huxley, A. F. (1952). A quantitative description of membrane current and its application to conduction and excitation in nerve. The Journal of Physiology, 117(4), 500-544. https://doi.org/10.1113/jphysiol.1952.sp004764

[3] Beniaguev, D., Segev, I., & London, M. (2021). Single cortical neurons as deep artificial neural networks. Neuron, 109(17), 2727-2739. https://doi.org/10.1016/j.neuron.2021.07.002

[4] Hebb, D. O. (1949). The Organization of Behavior: A Neuropsychological Theory. Wiley. https://psycnet.apa.org/record/1950-02200-000

[5] Bi, G. Q., & Poo, M. M. (1998). Synaptic modifications in cultured hippocampal neurons: dependence on spike timing, synaptic strength, and postsynaptic cell type. The Journal of Neuroscience, 18(24), 10464-10472. https://doi.org/10.1523/JNEUROSCI.18-24-10464.1998

[6] Schultz, W., Dayan, P., & Montague, P. R. (1997). A neural substrate of prediction and reward. Science, 275(5306), 1593-1599. https://doi.org/10.1126/science.275.5306.1593

[7] McCulloch, W. S., & Pitts, W. (1943). A logical calculus of the ideas immanent in nervous activity. The Bulletin of Mathematical Biophysics, 5, 115-133. https://doi.org/10.1007/BF02478259

[8] Rosenblatt, F. (1958). The perceptron: a probabilistic model for information storage and organization in the brain. Psychological Review, 65(6), 386-408. https://doi.org/10.1037/h0042519

[9] Glorot, X., Bordes, A., & Bengio, Y. (2011). Deep sparse rectifier neural networks. Proceedings of the 14th International Conference on Artificial Intelligence and Statistics (AISTATS). https://proceedings.mlr.press/v15/glorot11a.html

[10] Rumelhart, D. E., Hinton, G. E., & Williams, R. J. (1986). Learning representations by back-propagating errors. Nature, 323(6088), 533-536. https://doi.org/10.1038/323533a0

[11] Lillicrap, T. P., Cownden, D., Tweed, D. B., & Akerman, C. J. (2016). Random synaptic feedback weights support error backpropagation for deep learning. Nature Communications, 7, 13276. https://doi.org/10.1038/ncomms13276

[12] Lillicrap, T. P., Santoro, A., Marris, L., Akerman, C. J., & Hinton, G. (2020). Backpropagation and the brain. Nature Reviews Neuroscience, 21(6), 335-346. https://doi.org/10.1038/s41583-020-0277-3

[13] Thorpe, S., Delorme, A., & Van Rullen, R. (2001). Spike-based strategies for rapid processing. Neural Networks, 14(6-7), 715-725. https://doi.org/10.1016/S0893-6080(01)00083-1

[14] Mead, C. (1990). Neuromorphic electronic systems. Proceedings of the IEEE, 78(10), 1629-1636. https://doi.org/10.1109/5.58356

[15] Davies, M., et al. (2018). Loihi: a neuromorphic manycore processor with on-chip learning. IEEE Micro, 38(1), 82-99. https://doi.org/10.1109/MM.2018.112130359

[16] Neftci, E. O., Mostafa, H., & Zenke, F. (2019). Surrogate gradient learning in spiking neural networks. IEEE Signal Processing Magazine, 36(6), 51-63. https://doi.org/10.1109/MSP.2019.2931595

[17] Hubel, D. H., & Wiesel, T. N. (1962). Receptive fields, binocular interaction and functional architecture in the cat’s visual cortex. The Journal of Physiology, 160(1), 106-154. https://doi.org/10.1113/jphysiol.1962.sp006837

[18] Fukushima, K. (1980). Neocognitron: a self-organizing neural network model for a mechanism of pattern recognition unaffected by shift in position. Biological Cybernetics, 36(4), 193-202. https://doi.org/10.1007/BF00344251

[19] Vaswani, A., et al. (2017). Attention is all you need. Advances in Neural Information Processing Systems, 30. https://arxiv.org/abs/1706.03762

[20] Yamins, D. L. K., & DiCarlo, J. J. (2016). Using goal-driven deep learning models to understand sensory cortex. Nature Neuroscience, 19(3), 356-365. https://doi.org/10.1038/nn.4244

# Biological versus Artificial Intelligence ## 1. Introduction The artificial neural network is the dominant computational metaphor of our era, and it carries an origin story that is biological. The phrase "neural network" invites the reader to imagine that a trained transformer is, in some meaningful sense, a brain in silicon. This chapter examines that invitation critically. The relationship between biological and artificial intelligence is neither one of identity nor one of total disconnection. It is a relationship of loose inspiration, occasional convergence, and frequent divergence. Understanding precisely where the analogy holds and where it collapses is essential for the practitioner who wants to reason clearly about what current systems can and cannot do, and for the researcher who wants to know whether neuroscience still has anything to offer machine learning. We proceed from the biological substrate upward. We begin with the neuron and the synapse, the physical units of computation in the brain. We then describe how brains actually learn, which turns out to be quite different from how artificial networks learn. We introduce the artificial neuron as the abstraction it really is, namely a drastic simplification adopted for mathematical convenience rather than biological fidelity. We then catalog the major points of divergence: the credit assignment problem, energy efficiency, and the question of spiking versus rate coding. Neuromorphic computing is treated as the engineering response to some of these gaps. Finally we survey what artificial intelligence has genuinely borrowed from neuroscience, what it has not, and why brain inspiration has limits as a research strategy. The argument of the chapter is summarized by the comparison below, which the rest of the text unpacks and qualifies. Each row is a dimension along which the two kinds of system can be measured, and in nearly every case the two columns describe genuinely different mechanisms rather than two implementations of one idea. | Dimension | Biological neuron | Artificial unit | |---|---|---| | Signal | Discrete spikes in continuous time | Single real number per forward pass | | Code | Spike timing and rate | Rate (activation) only | | Weight | Dynamic stochastic chemical synapse | Single fixed scalar | | Learning rule | Local, online, neuromodulated plasticity | Global, batched, exact backpropagation | | Credit assignment | Approximate, local, no weight transport | Exact gradient via chain rule | | Power | About twenty watts, whole brain | Kilowatts to megawatts per training run | | Activity | Sparse and event driven | Dense and synchronous | | Sample efficiency | Few-shot, continual, low forgetting | Data hungry, prone to catastrophic forgetting | A note on scope. This chapter compares mechanisms, not capabilities. It does not argue that one system is superior; it argues that the two are different solutions to overlapping problems, and that conflating them leads to bad intuitions about what artificial systems can do. ## 2. The Biological Neuron and Synapse ### 2.1 Anatomy and electrical behavior A typical neuron consists of a cell body (the soma), a branching set of input structures (the dendrites), and a single output fiber (the axon) that itself branches to contact many downstream cells. Signals arrive at the dendrites, are integrated in the soma, and, if the integrated signal crosses a threshold, trigger an action potential: a brief, stereotyped electrical spike that propagates down the axon. The human brain contains on the order of eighty six billion neurons, with each neuron forming thousands of connections, yielding a connectome with something like one hundred trillion synapses [1]. The action potential is not a graded quantity. It is an all-or-nothing event produced by the rapid opening and closing of voltage-gated sodium and potassium channels in the membrane, described quantitatively by the Hodgkin and Huxley model of 1952 [2]. In that model the membrane is a capacitor in parallel with voltage-dependent conductances, and the membrane potential $V$ evolves as $$ C_m \frac{dV}{dt} = -\,g_{\text{Na}}\, m^3 h\,(V - E_{\text{Na}}) \;-\; g_{\text{K}}\, n^4\,(V - E_{\text{K}}) \;-\; g_L\,(V - E_L) \;+\; I, $$ where $C_m$ is the membrane capacitance, the $E$ terms are reversal potentials, the $g$ terms are maximal conductances, $I$ is the injected current, and $m$, $h$, $n$ are gating variables in $[0,1]$ each obeying first-order kinetics of the form $\dot{x} = \alpha_x(V)(1-x) - \beta_x(V)\,x$. The cubic and quartic powers and the nonlinear voltage dependence of the rate functions $\alpha_x, \beta_x$ together produce the threshold and the stereotyped spike shape. A useful conceptual reduction is the leaky integrate-and-fire (LIF) neuron, which keeps only the subthreshold dynamics $$ \tau_m \frac{dV}{dt} = -(V - V_{\text{rest}}) + R\,I(t), $$ and adds an explicit rule: when $V$ reaches a threshold $V_{\text{th}}$, the neuron emits a spike and $V$ is reset to $V_{\text{reset}}$. The LIF model is the workhorse of computational neuroscience and of most spiking neural networks precisely because it captures the essential nonlinearity, threshold then reset, at a fraction of the cost. Information in either scheme is carried not by the amplitude of any single spike but by the timing and frequency of spikes. This is a critical point of contrast with artificial networks and we will return to it. ### 2.2 The synapse as a computational element Where two neurons meet, the connection is a synapse. At a chemical synapse, an arriving spike triggers the release of neurotransmitter molecules into a narrow cleft. These molecules bind receptors on the receiving cell and produce a small electrical change there, either excitatory (pushing the cell toward firing) or inhibitory (pushing it away). The strength of this effect, the synaptic weight in machine learning language, is set by many physical variables: the number of vesicles released, the quantity of neurotransmitter, the density and type of postsynaptic receptors, and the geometry of the dendritic spine on which the synapse sits. This is the first place the analogy frays. An artificial weight is a single scalar. A biological synapse is a dynamic, stochastic, multi-factor chemical machine whose effective strength varies on timescales from milliseconds to a lifetime. Dendrites themselves perform nonlinear computation before any signal reaches the soma, so a single biological neuron may be closer in computational power to a small multilayer network than to a single artificial unit [3]. The mapping "one neuron, one unit" is therefore already a serious oversimplification at the level of the single cell. ## 3. How Brains Learn ### 3.1 Synaptic plasticity Learning in the brain is, to a first approximation, change in synaptic strength. The foundational principle was stated by Donald Hebb in 1949 and is usually paraphrased as "cells that fire together wire together" [4]. When a presynaptic neuron repeatedly contributes to firing a postsynaptic neuron, the synapse between them strengthens. The biological mechanisms include long-term potentiation (LTP) and long-term depression (LTD), durable increases and decreases in synaptic efficacy mediated by receptor trafficking and structural change at the synapse. A refinement of Hebb's rule, spike-timing-dependent plasticity (STDP), captures the observation that the precise relative timing of pre and postsynaptic spikes matters. If the presynaptic spike precedes the postsynaptic one by a few milliseconds, the synapse potentiates; if the order reverses, it depresses [5]. The canonical fit to the data is an exponential window in the timing difference $\Delta t = t_{\text{post}} - t_{\text{pre}}$: $$ \Delta w = \begin{cases} A_{+}\, e^{-\Delta t / \tau_{+}}, & \Delta t > 0 \quad (\text{pre before post, potentiation}),\\[4pt] -A_{-}\, e^{\,\Delta t / \tau_{-}}, & \Delta t < 0 \quad (\text{post before pre, depression}), \end{cases} $$ with positive amplitudes $A_{+}, A_{-}$ and time constants $\tau_{+}, \tau_{-}$ of order ten to twenty milliseconds. The sign of $\Delta w$ flips with the sign of $\Delta t$, which encodes a crude notion of causality: synapses that plausibly helped cause a spike are strengthened, and synapses that fired too late are weakened. STDP is fundamentally local and causal: each synapse updates using only information physically available at that synapse, namely its own pre and post activity. There is no global error signal threaded backward through the network. This locality is the property that backpropagation, discussed in Section 5, conspicuously lacks. ### 3.2 Neuromodulation and the three-factor rule Pure Hebbian plasticity cannot by itself explain goal-directed learning, because it has no notion of reward or relevance. The brain supplies this through neuromodulation. Diffuse systems releasing dopamine, acetylcholine, serotonin, and norepinephrine broadcast slow, global signals that gate and shape plasticity. Dopamine in particular encodes a reward prediction error, the difference between expected and received reward, a quantity that maps remarkably well onto the temporal difference error of reinforcement learning [6]. This gives rise to the "three-factor" learning rule, which can be written compactly as $$ \Delta w_{ij} \;=\; \eta\; M(t)\; e_{ij}(t), \qquad \dot{e}_{ij} = -\frac{e_{ij}}{\tau_e} + f\big(x_i^{\text{pre}}, x_j^{\text{post}}\big), $$ where $e_{ij}$ is a local synaptic eligibility trace that accumulates Hebbian coincidence $f$ of pre and post activity and then decays with time constant $\tau_e$, and $M(t)$ is the third factor, a globally broadcast neuromodulatory signal such as dopamine that is roughly proportional to a reward prediction error. The eligibility trace solves the temporal credit assignment problem: the synapse holds a fading record of its recent contribution so that a value signal arriving a moment later can still reach the synapses responsible. Synaptic change therefore depends on presynaptic activity, postsynaptic activity, and the third neuromodulatory term that indicates whether the recent behavior was good or bad. Learning in the brain is thus a hybrid of local correlation and globally broadcast, chemically delivered value signals, operating continuously and online rather than in discrete training epochs. ## 4. The Artificial Neuron as a Loose Abstraction ### 4.1 From McCulloch and Pitts to the perceptron The artificial neuron descends from the threshold logic unit of McCulloch and Pitts (1943), who showed that idealized binary neurons could compute logical functions [7], and from Rosenblatt's perceptron (1958), which added a learning rule for the weights [8]. The modern unit computes a weighted sum of its inputs, adds a bias, and applies a nonlinear activation function. For an input vector $\mathbf{x} \in \mathbb{R}^d$, weights $\mathbf{w} \in \mathbb{R}^d$, and bias $b \in \mathbb{R}$, $$ a = \mathbf{w}^{\top}\mathbf{x} + b = \sum_{k=1}^{d} w_k x_k + b, \qquad y = g(a), $$ where $g$ is the activation. The diagram makes the same statement visually: ``` x1 ----w1----\ \ x2 ----w2-----> [ sum: a = w.x + b ] --> [ g(a) ] --> output / x3 ----w3----/ biological loose analogy: inputs ~ dendritic signals weights ~ synaptic strengths sum ~ soma integration g(.) ~ thresholded firing ``` The visual correspondence is real but shallow. The weighted sum stands in for dendritic integration, the weights for synaptic strengths, and the activation function for the spiking threshold. The original choice of a sigmoid activation was loosely motivated by the saturating firing rate of a neuron. Modern networks largely abandoned this for the rectified linear unit (ReLU), which has no biological pretension at all and was adopted purely because it trains better [9]. This is a recurring pattern: where biological fidelity and engineering performance conflict, performance wins, and the field is correct to let it. ### 4.2 What the abstraction throws away The artificial neuron discards the temporal dimension entirely. It emits a continuous real number, interpreted as a firing rate or simply as an abstract activation, in a single forward pass. It has no spikes, no time, no membrane dynamics, no separate excitatory and inhibitory channels obeying Dale's principle, no dendritic nonlinearities, and no stochasticity. It is a static function evaluation. This is not a flaw to be apologized for; it is a deliberate abstraction that makes the system differentiable and therefore trainable by gradient descent. But it means the artificial neuron is a metaphor that has been optimized for mathematics, not a model that has been validated against biology. ## 5. Where the Analogy Breaks ### 5.1 Backpropagation versus biological learning The deepest divergence concerns how the two systems solve the credit assignment problem: how to decide which internal parameter deserves blame or credit for an outcome. Artificial networks use backpropagation, which computes the exact gradient of a loss function with respect to every weight by applying the chain rule backward through the network [10]. For a feedforward network with layers $\ell = 1, \dots, L$, pre-activations $\mathbf{a}^{\ell} = W^{\ell}\mathbf{h}^{\ell-1} + \mathbf{b}^{\ell}$, and activations $\mathbf{h}^{\ell} = g(\mathbf{a}^{\ell})$, the error signal $\boldsymbol{\delta}^{\ell} = \partial \mathcal{L} / \partial \mathbf{a}^{\ell}$ obeys the recursion $$ \boldsymbol{\delta}^{L} = \nabla_{\mathbf{a}^{L}}\mathcal{L}, \qquad \boldsymbol{\delta}^{\ell} = \big( (W^{\ell+1})^{\top} \boldsymbol{\delta}^{\ell+1} \big) \odot g'(\mathbf{a}^{\ell}), \qquad \frac{\partial \mathcal{L}}{\partial W^{\ell}} = \boldsymbol{\delta}^{\ell} (\mathbf{h}^{\ell-1})^{\top}, $$ where $\odot$ is elementwise multiplication. Backpropagation is extraordinarily effective and is the engine of essentially all modern deep learning. It is also widely regarded as biologically implausible for several concrete reasons that are visible directly in the recursion above. First, the weight transport problem: the backward step multiplies the error by $(W^{\ell+1})^{\top}$, the exact transpose of the forward weight matrix, so the backward pass must reuse the same weights as the forward pass. Biologically this would require each synapse to know the strength of a distinct synapse elsewhere, for which there is no known mechanism. Second, backpropagation requires a separate, precisely orchestrated backward phase that is distinct from forward inference, with errors propagated as signed real numbers; cortical circuits show no clear correlate of such a phase. Third, the gradient must be computed globally and exactly, whereas biological plasticity is local and noisy. Researchers have proposed mechanisms by which the brain might approximate gradient-based credit assignment, including feedback alignment (which shows that random fixed backward weights $B^{\ell+1}$ in place of $(W^{\ell+1})^{\top}$ can still support learning, dissolving the weight transport objection) and predictive-coding schemes that compute error locally [11]. These remain hypotheses. The honest summary is that the brain clearly does something functionally analogous to credit assignment, but the evidence that it does anything like exact backpropagation is weak. The contrast between the two regimes is structural, and the diagram below makes it explicit. On the left, a global loss computed at the output is threaded backward through transposed weights to reach every parameter. On the right, each synapse updates from quantities physically present at that synapse, with a slow scalar value signal broadcast to all of them at once. ```{mermaid} flowchart TB subgraph BP["Backpropagation"] direction TB L["Global loss at output"] --> D["Signed error per layer"] D --> W["Update uses transposed forward weights"] W --> P1["Every weight gets exact gradient"] end subgraph BIO["Brain learning"] direction TB PRE["Presynaptic activity"] --> E["Local eligibility trace"] POST["Postsynaptic activity"] --> E MOD["Broadcast neuromodulator"] --> U["Local update gated by value signal"] E --> U end ``` ### 5.2 Energy efficiency The quantitative gap in energy is stark. The human brain runs on roughly twenty watts, about the power of a dim light bulb, while performing perception, motor control, language, and reasoning continuously [12]. A back-of-the-envelope calculation shows where this frugality comes from. With about $8.6 \times 10^{10}$ neurons firing on average at a few hertz, on the order of $10^{11}$ to $10^{12}$ spikes occur per second across the whole brain at twenty joules per second of power, which works out to roughly $10^{-11}$ to $10^{-10}$ joules per spike at the system level. Dense digital arithmetic is far more expensive per primitive operation once the cost of moving operands is included, and crucially the brain only pays for the small fraction of neurons that actually spike. The lesson of the estimate is not the exact figure, which is uncertain, but the mechanism: sparse, event-driven computation with colocated memory avoids paying for the silent majority of units. Training a single large language model can consume megawatt-hours and emit carbon on the scale of hundreds of transatlantic flights, and inference at scale draws on entire data centers. Several architectural facts explain the brain's frugality. Biological computation is event-driven: a neuron consumes significant energy only when it spikes, and neural activity is sparse, with most neurons silent most of the time. Memory and computation are colocated at the synapse, avoiding the constant shuttling of data between separate memory and processing units that dominates the energy budget of conventional von Neumann hardware (the so-called memory wall). A graphics processing unit, by contrast, computes densely and synchronously and spends much of its power moving data. This efficiency gap is one of the strongest arguments that the brain's design principles still have practical lessons to teach, even if its learning algorithm does not. ### 5.3 Spiking versus rate coding A third divergence concerns the code itself. Artificial networks use what is best described as rate coding: a unit's output is a single number standing for an average activity level, with all temporal structure averaged away. Formally, a spike train is a sum of Dirac impulses $s(t) = \sum_k \delta(t - t_k)$, and a rate code retains only the count $r = \frac{1}{T}\int_0^T s(t)\,dt$ over a window $T$, discarding the individual times $t_k$. A temporal code, by contrast, treats the times $t_k$ themselves as the message. Real neurons communicate with discrete spikes in continuous time, and there is substantial evidence that the precise timing of those spikes carries information that a rate average would discard. Temporal codes can in principle represent and transmit information faster and more efficiently than rate codes, because a single well-timed spike can be informative [13]. The artificial neuron's commitment to rate coding is again an engineering choice: real-valued, differentiable activations are what gradient descent needs. Spikes are discrete and non-differentiable, which is exactly why they are hard to train and why mainstream deep learning has avoided them. The cost of this choice is that artificial networks forgo whatever computational advantages temporal coding confers. ## 6. Neuromorphic Computing Neuromorphic computing is the engineering program that takes the brain's physical principles, rather than its learning algorithm, as the thing worth copying. The term and the original vision are due to Carver Mead, who in the late 1980s argued that analog circuits could emulate neural computation far more efficiently than digital simulation [14]. Modern neuromorphic systems are typically digital or mixed-signal and share a common philosophy: event-driven spiking communication, massive parallelism, sparse activity, and the colocation of memory and computation to defeat the memory wall. Representative platforms include IBM's TrueNorth, which placed one million spiking neurons on a chip drawing well under one watt; Intel's Loihi, which added on-chip programmable learning rules so that plasticity can occur locally on the hardware; and SpiNNaker, a massively parallel architecture built from many simple cores designed to simulate spiking networks in real time [15]. These chips can be dramatically more energy-efficient than GPUs for the right workloads, particularly sparse, event-driven, always-on sensing tasks. The catch is the training problem identified above: because spikes are non-differentiable, training spiking neural networks to the accuracy of conventional deep networks remains difficult. The dominant workaround is surrogate gradient training, which replaces the non-differentiable spike with a smooth approximation during the backward pass so that backpropagation can be applied anyway [16]. It is worth noticing the irony: the most biologically inspired hardware is most easily trained by importing the least biologically plausible algorithm. Neuromorphic computing today is a promising research direction with real efficiency wins in narrow domains, not yet a general replacement for the GPU. ## 7. What AI Has Borrowed, and What It Has Not ### 7.1 Genuine borrowings Three of the most important ideas in modern AI have clear neuroscientific lineage, though in each case the engineering implementation diverged sharply from the biology. Convolutional networks descend directly from Hubel and Wiesel's work on the cat visual cortex, which revealed simple and complex cells with local receptive fields arranged in a hierarchy of increasing abstraction [17]. Fukushima's Neocognitron explicitly modeled this hierarchy [18], and the convolutional networks that now underpin computer vision inherit the core ideas of local receptive fields, weight sharing, and pooling. Weight sharing, however, is a pure engineering convenience with no biological counterpart: the brain does not tie the weights of neurons in different cortical locations. Reinforcement learning has perhaps the deepest and most bidirectional relationship with neuroscience. The temporal difference learning algorithm was developed in machine learning and then found to predict the phasic firing of dopamine neurons with striking precision, so that the reward prediction error hypothesis of dopamine is now textbook neuroscience [6]. Here theory flowed in both directions, an unusually productive case. Attention, the mechanism at the heart of the transformer, is named after the cognitive phenomenon of selective attention, the brain's ability to prioritize some inputs over others [19]. But the resemblance is largely at the level of slogan. The scaled dot-product attention of a transformer is a specific differentiable operation computing a weighted average over learned key, query, and value projections; it bears no demonstrated mechanistic relationship to the neural circuits of biological attention. The name is an inspiration and a useful intuition pump, not a model. ### 7.2 What AI has not taken The list of things AI has not borrowed is arguably more revealing. Mainstream deep learning has not adopted spiking communication, continuous online learning, neuromodulatory gating, local learning rules, the strict separation of excitation and inhibition, the brain's extreme energy efficiency, or its sample efficiency. A child learns a new object category from a handful of examples; a deep network often needs thousands or millions. The brain learns continually without catastrophically forgetting what it learned before, a problem (catastrophic forgetting) that still plagues artificial networks. And the brain operates with a tiny fraction of the data and energy. These omissions are precisely the open problems of the field, which suggests that the parts of biology AI has ignored may be exactly the parts worth revisiting. ## 8. The Limits of Brain Inspiration It is tempting to conclude that the path forward is simply more biological fidelity, but the history of the field counsels caution. The most successful components of modern AI, namely backpropagation, ReLU activations, weight sharing, layer normalization, and the transformer's attention, are in large part biologically implausible or biologically silent. Performance, not fidelity, drove their adoption. Airplanes do not flap their wings, and the analogy is apt: the principles of aerodynamics that birds exploit were worth understanding, but slavish imitation of feathers and flapping would have delayed powered flight. Brain inspiration has been most useful as a source of abstract principles (hierarchy, local receptive fields, prediction errors, attention, event-driven sparsity) and least useful as a blueprint for literal copying. There is also a deep epistemic caution. We do not actually understand how the brain computes. Our models of neural learning are incomplete and contested, and the danger of reasoning from the brain is that we may be reasoning from our current, possibly mistaken, theories of the brain rather than from the brain itself. The reverse inference, using artificial networks as models of the brain, is now a thriving subfield, with trained deep networks serving as the best available predictors of activity in visual cortex [20]. But this is a claim about representational similarity in trained systems, not a claim that the brain learns or computes the way the network does. The mature position is dualistic. Biological and artificial intelligence are two largely independent solutions to overlapping problems, converging here and diverging there. Neuroscience remains a generous source of hypotheses, and the brain's unmatched efficiency and sample efficiency mark out the frontier that artificial systems have not yet reached. But the artificial neuron should be understood for what it is: a loose, deliberately impoverished abstraction that succeeded because it could be optimized, not because it was faithful. Knowing the difference is what separates a clear understanding of these systems from the seductive and misleading picture of a digital brain. ### 8.1 When the analogy helps and when it misleads For the practitioner, the brain analogy is a tool with a specific safe operating range. It helps when used as a source of abstract architectural principles. Hierarchy, local receptive fields, prediction errors, attention as selective routing, and event-driven sparsity are all ideas that crossed over from neuroscience and earned their place by improving systems. If you are searching for a new inductive bias, biology is a reasonable place to look for candidates, which you then validate empirically rather than accept on authority. It misleads in several recurring ways. The first pitfall is the equivalence fallacy: assuming that because a system is called a "neural network" it must learn, represent, or fail like a brain. It does not, and arguments of the form "the brain does X, therefore the model does X" are unsound. The second pitfall is the fidelity fallacy: assuming that making a model more biologically realistic will make it more capable. The historical record points the other way, with the most successful components being the least biological. The third pitfall is reasoning from theory as if it were fact: our models of how the brain learns are incomplete and contested, so an argument grounded in the brain may really be grounded in a current and possibly wrong theory of the brain. The fourth pitfall is the direction-of-fit error: the fact that trained networks predict neural activity well [20] is a statement about representational similarity in trained systems, not evidence that the brain runs backpropagation. Used within these limits, the analogy is an intuition pump. Pushed beyond them, it is a source of confident error. ## 9. Summary The biological neuron is a dynamic electrochemical device communicating through timed spikes, learning through local, neuromodulated synaptic plasticity, and running on twenty watts. The artificial neuron is a static, differentiable function trained by global, exact backpropagation on hardware that consumes orders of magnitude more energy. The analogy that gave neural networks their name is real at the level of abstract principle and false at the level of mechanism. AI has borrowed genuine ideas from neuroscience, convolution, reinforcement learning, and the inspiration for attention, while leaving behind spiking, online learning, energy efficiency, and sample efficiency, which are exactly the field's unsolved problems. Neuromorphic computing pursues the brain's physical principles and earns real efficiency gains, but trains most easily by importing the unbiological backpropagation it was meant to escape. The brain remains a source of hypotheses and a benchmark of efficiency, but not a blueprint, and the clearest thinkers treat the two intelligences as related cousins rather than as the same thing in different substrates. ## References [1] Herculano-Houzel, S. (2009). The human brain in numbers: a linearly scaled-up primate brain. Frontiers in Human Neuroscience, 3, 31. https://doi.org/10.3389/neuro.09.031.2009 [2] Hodgkin, A. L., & Huxley, A. F. (1952). A quantitative description of membrane current and its application to conduction and excitation in nerve. The Journal of Physiology, 117(4), 500-544. https://doi.org/10.1113/jphysiol.1952.sp004764 [3] Beniaguev, D., Segev, I., & London, M. (2021). Single cortical neurons as deep artificial neural networks. Neuron, 109(17), 2727-2739. https://doi.org/10.1016/j.neuron.2021.07.002 [4] Hebb, D. O. (1949). The Organization of Behavior: A Neuropsychological Theory. Wiley. https://psycnet.apa.org/record/1950-02200-000 [5] Bi, G. Q., & Poo, M. M. (1998). Synaptic modifications in cultured hippocampal neurons: dependence on spike timing, synaptic strength, and postsynaptic cell type. The Journal of Neuroscience, 18(24), 10464-10472. https://doi.org/10.1523/JNEUROSCI.18-24-10464.1998 [6] Schultz, W., Dayan, P., & Montague, P. R. (1997). A neural substrate of prediction and reward. Science, 275(5306), 1593-1599. https://doi.org/10.1126/science.275.5306.1593 [7] McCulloch, W. S., & Pitts, W. (1943). A logical calculus of the ideas immanent in nervous activity. The Bulletin of Mathematical Biophysics, 5, 115-133. https://doi.org/10.1007/BF02478259 [8] Rosenblatt, F. (1958). The perceptron: a probabilistic model for information storage and organization in the brain. Psychological Review, 65(6), 386-408. https://doi.org/10.1037/h0042519 [9] Glorot, X., Bordes, A., & Bengio, Y. (2011). Deep sparse rectifier neural networks. Proceedings of the 14th International Conference on Artificial Intelligence and Statistics (AISTATS). https://proceedings.mlr.press/v15/glorot11a.html [10] Rumelhart, D. E., Hinton, G. E., & Williams, R. J. (1986). Learning representations by back-propagating errors. Nature, 323(6088), 533-536. https://doi.org/10.1038/323533a0 [11] Lillicrap, T. P., Cownden, D., Tweed, D. B., & Akerman, C. J. (2016). Random synaptic feedback weights support error backpropagation for deep learning. Nature Communications, 7, 13276. https://doi.org/10.1038/ncomms13276 [12] Lillicrap, T. P., Santoro, A., Marris, L., Akerman, C. J., & Hinton, G. (2020). Backpropagation and the brain. Nature Reviews Neuroscience, 21(6), 335-346. https://doi.org/10.1038/s41583-020-0277-3 [13] Thorpe, S., Delorme, A., & Van Rullen, R. (2001). Spike-based strategies for rapid processing. Neural Networks, 14(6-7), 715-725. https://doi.org/10.1016/S0893-6080(01)00083-1 [14] Mead, C. (1990). Neuromorphic electronic systems. Proceedings of the IEEE, 78(10), 1629-1636. https://doi.org/10.1109/5.58356 [15] Davies, M., et al. (2018). Loihi: a neuromorphic manycore processor with on-chip learning. IEEE Micro, 38(1), 82-99. https://doi.org/10.1109/MM.2018.112130359 [16] Neftci, E. O., Mostafa, H., & Zenke, F. (2019). Surrogate gradient learning in spiking neural networks. IEEE Signal Processing Magazine, 36(6), 51-63. https://doi.org/10.1109/MSP.2019.2931595 [17] Hubel, D. H., & Wiesel, T. N. (1962). Receptive fields, binocular interaction and functional architecture in the cat's visual cortex. The Journal of Physiology, 160(1), 106-154. https://doi.org/10.1113/jphysiol.1962.sp006837 [18] Fukushima, K. (1980). Neocognitron: a self-organizing neural network model for a mechanism of pattern recognition unaffected by shift in position. Biological Cybernetics, 36(4), 193-202. https://doi.org/10.1007/BF00344251 [19] Vaswani, A., et al. (2017). Attention is all you need. Advances in Neural Information Processing Systems, 30. https://arxiv.org/abs/1706.03762 [20] Yamins, D. L. K., & DiCarlo, J. J. (2016). Using goal-driven deep learning models to understand sensory cortex. Nature Neuroscience, 19(3), 356-365. https://doi.org/10.1038/nn.4244