4  Biological versus Artificial Intelligence

4.1 1. Introduction

The artificial neural network is the dominant computational metaphor of our era, and it carries an origin story that is biological. The phrase “neural network” invites the reader to imagine that a trained transformer is, in some meaningful sense, a brain in silicon. This chapter examines that invitation critically. The relationship between biological and artificial intelligence is neither one of identity nor one of total disconnection. It is a relationship of loose inspiration, occasional convergence, and frequent divergence. Understanding precisely where the analogy holds and where it collapses is essential for the practitioner who wants to reason clearly about what current systems can and cannot do, and for the researcher who wants to know whether neuroscience still has anything to offer machine learning.

We proceed from the biological substrate upward. We begin with the neuron and the synapse, the physical units of computation in the brain. We then describe how brains actually learn, which turns out to be quite different from how artificial networks learn. We introduce the artificial neuron as the abstraction it really is, namely a drastic simplification adopted for mathematical convenience rather than biological fidelity. We then catalog the major points of divergence: the credit assignment problem, energy efficiency, and the question of spiking versus rate coding. Neuromorphic computing is treated as the engineering response to some of these gaps. Finally we survey what artificial intelligence has genuinely borrowed from neuroscience, what it has not, and why brain inspiration has limits as a research strategy.

4.2 2. The Biological Neuron and Synapse

4.2.1 2.1 Anatomy and electrical behavior

A typical neuron consists of a cell body (the soma), a branching set of input structures (the dendrites), and a single output fiber (the axon) that itself branches to contact many downstream cells. Signals arrive at the dendrites, are integrated in the soma, and, if the integrated signal crosses a threshold, trigger an action potential: a brief, stereotyped electrical spike that propagates down the axon. The human brain contains on the order of eighty six billion neurons, with each neuron forming thousands of connections, yielding a connectome with something like one hundred trillion synapses [1].

The action potential is not a graded quantity. It is an all-or-nothing event produced by the rapid opening and closing of voltage-gated sodium and potassium channels in the membrane, described quantitatively by the Hodgkin and Huxley model of 1952 [2]. Information in this scheme is carried not by the amplitude of any single spike but by the timing and frequency of spikes. This is a critical point of contrast with artificial networks and we will return to it.

4.2.2 2.2 The synapse as a computational element

Where two neurons meet, the connection is a synapse. At a chemical synapse, an arriving spike triggers the release of neurotransmitter molecules into a narrow cleft. These molecules bind receptors on the receiving cell and produce a small electrical change there, either excitatory (pushing the cell toward firing) or inhibitory (pushing it away). The strength of this effect, the synaptic weight in machine learning language, is set by many physical variables: the number of vesicles released, the quantity of neurotransmitter, the density and type of postsynaptic receptors, and the geometry of the dendritic spine on which the synapse sits.

This is the first place the analogy frays. An artificial weight is a single scalar. A biological synapse is a dynamic, stochastic, multi-factor chemical machine whose effective strength varies on timescales from milliseconds to a lifetime. Dendrites themselves perform nonlinear computation before any signal reaches the soma, so a single biological neuron may be closer in computational power to a small multilayer network than to a single artificial unit [3]. The mapping “one neuron, one unit” is therefore already a serious oversimplification at the level of the single cell.

4.3 3. How Brains Learn

4.3.1 3.1 Synaptic plasticity

Learning in the brain is, to a first approximation, change in synaptic strength. The foundational principle was stated by Donald Hebb in 1949 and is usually paraphrased as “cells that fire together wire together” [4]. When a presynaptic neuron repeatedly contributes to firing a postsynaptic neuron, the synapse between them strengthens. The biological mechanisms include long-term potentiation (LTP) and long-term depression (LTD), durable increases and decreases in synaptic efficacy mediated by receptor trafficking and structural change at the synapse.

A refinement of Hebb’s rule, spike-timing-dependent plasticity (STDP), captures the observation that the precise relative timing of pre and postsynaptic spikes matters. If the presynaptic spike precedes the postsynaptic one by a few milliseconds, the synapse potentiates; if the order reverses, it depresses [5]. STDP is fundamentally local and causal: each synapse updates using only information physically available at that synapse, namely its own pre and post activity. There is no global error signal threaded backward through the network.

4.3.2 3.2 Neuromodulation and the three-factor rule

Pure Hebbian plasticity cannot by itself explain goal-directed learning, because it has no notion of reward or relevance. The brain supplies this through neuromodulation. Diffuse systems releasing dopamine, acetylcholine, serotonin, and norepinephrine broadcast slow, global signals that gate and shape plasticity. Dopamine in particular encodes a reward prediction error, the difference between expected and received reward, a quantity that maps remarkably well onto the temporal difference error of reinforcement learning [6]. This gives rise to the “three-factor” learning rule: synaptic change depends on presynaptic activity, postsynaptic activity, and a third neuromodulatory signal indicating whether the recent behavior was good or bad. Learning in the brain is thus a hybrid of local correlation and globally broadcast, chemically delivered value signals, operating continuously and online rather than in discrete training epochs.

4.4 4. The Artificial Neuron as a Loose Abstraction

4.4.1 4.1 From McCulloch and Pitts to the perceptron

The artificial neuron descends from the threshold logic unit of McCulloch and Pitts (1943), who showed that idealized binary neurons could compute logical functions [7], and from Rosenblatt’s perceptron (1958), which added a learning rule for the weights [8]. The modern unit computes a weighted sum of its inputs, adds a bias, and applies a nonlinear activation function:

        x1 ----w1----\
                      \
        x2 ----w2-----> [ sum: a = w.x + b ] --> [ g(a) ] --> output
                      /
        x3 ----w3----/

biological loose analogy:
   inputs  ~ dendritic signals
   weights ~ synaptic strengths
   sum     ~ soma integration
   g(.)    ~ thresholded firing

The visual correspondence is real but shallow. The weighted sum stands in for dendritic integration, the weights for synaptic strengths, and the activation function for the spiking threshold. The original choice of a sigmoid activation was loosely motivated by the saturating firing rate of a neuron. Modern networks largely abandoned this for the rectified linear unit (ReLU), which has no biological pretension at all and was adopted purely because it trains better [9]. This is a recurring pattern: where biological fidelity and engineering performance conflict, performance wins, and the field is correct to let it.

4.4.2 4.2 What the abstraction throws away

The artificial neuron discards the temporal dimension entirely. It emits a continuous real number, interpreted as a firing rate or simply as an abstract activation, in a single forward pass. It has no spikes, no time, no membrane dynamics, no separate excitatory and inhibitory channels obeying Dale’s principle, no dendritic nonlinearities, and no stochasticity. It is a static function evaluation. This is not a flaw to be apologized for; it is a deliberate abstraction that makes the system differentiable and therefore trainable by gradient descent. But it means the artificial neuron is a metaphor that has been optimized for mathematics, not a model that has been validated against biology.

4.5 5. Where the Analogy Breaks

4.5.1 5.1 Backpropagation versus biological learning

The deepest divergence concerns how the two systems solve the credit assignment problem: how to decide which internal parameter deserves blame or credit for an outcome. Artificial networks use backpropagation, which computes the exact gradient of a loss function with respect to every weight by applying the chain rule backward through the network [10]. Backpropagation is extraordinarily effective and is the engine of essentially all modern deep learning. It is also widely regarded as biologically implausible for several concrete reasons.

First, the weight transport problem: backpropagation requires the backward pass to use the same weights as the forward pass, implying each synapse would need to know the strength of a distinct synapse elsewhere, for which there is no known biological mechanism. Second, backpropagation requires a separate, precisely orchestrated backward phase that is distinct from forward inference, with errors propagated as signed real numbers; cortical circuits show no clear correlate of such a phase. Third, the gradient must be computed globally and exactly, whereas biological plasticity is local and noisy. Researchers have proposed mechanisms by which the brain might approximate gradient-based credit assignment, including feedback alignment (which shows that random fixed backward weights can still support learning, dissolving the weight transport objection) and predictive-coding schemes that compute error locally [11]. These remain hypotheses. The honest summary is that the brain clearly does something functionally analogous to credit assignment, but the evidence that it does anything like exact backpropagation is weak.

4.5.2 5.2 Energy efficiency

The quantitative gap in energy is stark. The human brain runs on roughly twenty watts, about the power of a dim light bulb, while performing perception, motor control, language, and reasoning continuously [12]. Training a single large language model can consume megawatt-hours and emit carbon on the scale of hundreds of transatlantic flights, and inference at scale draws on entire data centers. Several architectural facts explain the brain’s frugality. Biological computation is event-driven: a neuron consumes significant energy only when it spikes, and neural activity is sparse, with most neurons silent most of the time. Memory and computation are colocated at the synapse, avoiding the constant shuttling of data between separate memory and processing units that dominates the energy budget of conventional von Neumann hardware (the so-called memory wall). A graphics processing unit, by contrast, computes densely and synchronously and spends much of its power moving data. This efficiency gap is one of the strongest arguments that the brain’s design principles still have practical lessons to teach, even if its learning algorithm does not.

4.5.3 5.3 Spiking versus rate coding

A third divergence concerns the code itself. Artificial networks use what is best described as rate coding: a unit’s output is a single number standing for an average activity level, with all temporal structure averaged away. Real neurons communicate with discrete spikes in continuous time, and there is substantial evidence that the precise timing of those spikes carries information that a rate average would discard. Temporal codes can in principle represent and transmit information faster and more efficiently than rate codes, because a single well-timed spike can be informative [13]. The artificial neuron’s commitment to rate coding is again an engineering choice: real-valued, differentiable activations are what gradient descent needs. Spikes are discrete and non-differentiable, which is exactly why they are hard to train and why mainstream deep learning has avoided them. The cost of this choice is that artificial networks forgo whatever computational advantages temporal coding confers.

4.6 6. Neuromorphic Computing

Neuromorphic computing is the engineering program that takes the brain’s physical principles, rather than its learning algorithm, as the thing worth copying. The term and the original vision are due to Carver Mead, who in the late 1980s argued that analog circuits could emulate neural computation far more efficiently than digital simulation [14]. Modern neuromorphic systems are typically digital or mixed-signal and share a common philosophy: event-driven spiking communication, massive parallelism, sparse activity, and the colocation of memory and computation to defeat the memory wall.

Representative platforms include IBM’s TrueNorth, which placed one million spiking neurons on a chip drawing well under one watt; Intel’s Loihi, which added on-chip programmable learning rules so that plasticity can occur locally on the hardware; and SpiNNaker, a massively parallel architecture built from many simple cores designed to simulate spiking networks in real time [15]. These chips can be dramatically more energy-efficient than GPUs for the right workloads, particularly sparse, event-driven, always-on sensing tasks. The catch is the training problem identified above: because spikes are non-differentiable, training spiking neural networks to the accuracy of conventional deep networks remains difficult. The dominant workaround is surrogate gradient training, which replaces the non-differentiable spike with a smooth approximation during the backward pass so that backpropagation can be applied anyway [16]. It is worth noticing the irony: the most biologically inspired hardware is most easily trained by importing the least biologically plausible algorithm. Neuromorphic computing today is a promising research direction with real efficiency wins in narrow domains, not yet a general replacement for the GPU.

4.7 7. What AI Has Borrowed, and What It Has Not

4.7.1 7.1 Genuine borrowings

Three of the most important ideas in modern AI have clear neuroscientific lineage, though in each case the engineering implementation diverged sharply from the biology.

Convolutional networks descend directly from Hubel and Wiesel’s work on the cat visual cortex, which revealed simple and complex cells with local receptive fields arranged in a hierarchy of increasing abstraction [17]. Fukushima’s Neocognitron explicitly modeled this hierarchy [18], and the convolutional networks that now underpin computer vision inherit the core ideas of local receptive fields, weight sharing, and pooling. Weight sharing, however, is a pure engineering convenience with no biological counterpart: the brain does not tie the weights of neurons in different cortical locations.

Reinforcement learning has perhaps the deepest and most bidirectional relationship with neuroscience. The temporal difference learning algorithm was developed in machine learning and then found to predict the phasic firing of dopamine neurons with striking precision, so that the reward prediction error hypothesis of dopamine is now textbook neuroscience [6]. Here theory flowed in both directions, an unusually productive case.

Attention, the mechanism at the heart of the transformer, is named after the cognitive phenomenon of selective attention, the brain’s ability to prioritize some inputs over others [19]. But the resemblance is largely at the level of slogan. The scaled dot-product attention of a transformer is a specific differentiable operation computing a weighted average over learned key, query, and value projections; it bears no demonstrated mechanistic relationship to the neural circuits of biological attention. The name is an inspiration and a useful intuition pump, not a model.

4.7.2 7.2 What AI has not taken

The list of things AI has not borrowed is arguably more revealing. Mainstream deep learning has not adopted spiking communication, continuous online learning, neuromodulatory gating, local learning rules, the strict separation of excitation and inhibition, the brain’s extreme energy efficiency, or its sample efficiency. A child learns a new object category from a handful of examples; a deep network often needs thousands or millions. The brain learns continually without catastrophically forgetting what it learned before, a problem (catastrophic forgetting) that still plagues artificial networks. And the brain operates with a tiny fraction of the data and energy. These omissions are precisely the open problems of the field, which suggests that the parts of biology AI has ignored may be exactly the parts worth revisiting.

4.8 8. The Limits of Brain Inspiration

It is tempting to conclude that the path forward is simply more biological fidelity, but the history of the field counsels caution. The most successful components of modern AI, namely backpropagation, ReLU activations, weight sharing, layer normalization, and the transformer’s attention, are in large part biologically implausible or biologically silent. Performance, not fidelity, drove their adoption. Airplanes do not flap their wings, and the analogy is apt: the principles of aerodynamics that birds exploit were worth understanding, but slavish imitation of feathers and flapping would have delayed powered flight. Brain inspiration has been most useful as a source of abstract principles (hierarchy, local receptive fields, prediction errors, attention, event-driven sparsity) and least useful as a blueprint for literal copying.

There is also a deep epistemic caution. We do not actually understand how the brain computes. Our models of neural learning are incomplete and contested, and the danger of reasoning from the brain is that we may be reasoning from our current, possibly mistaken, theories of the brain rather than from the brain itself. The reverse inference, using artificial networks as models of the brain, is now a thriving subfield, with trained deep networks serving as the best available predictors of activity in visual cortex [20]. But this is a claim about representational similarity in trained systems, not a claim that the brain learns or computes the way the network does.

The mature position is dualistic. Biological and artificial intelligence are two largely independent solutions to overlapping problems, converging here and diverging there. Neuroscience remains a generous source of hypotheses, and the brain’s unmatched efficiency and sample efficiency mark out the frontier that artificial systems have not yet reached. But the artificial neuron should be understood for what it is: a loose, deliberately impoverished abstraction that succeeded because it could be optimized, not because it was faithful. Knowing the difference is what separates a clear understanding of these systems from the seductive and misleading picture of a digital brain.

4.9 9. Summary

The biological neuron is a dynamic electrochemical device communicating through timed spikes, learning through local, neuromodulated synaptic plasticity, and running on twenty watts. The artificial neuron is a static, differentiable function trained by global, exact backpropagation on hardware that consumes orders of magnitude more energy. The analogy that gave neural networks their name is real at the level of abstract principle and false at the level of mechanism. AI has borrowed genuine ideas from neuroscience, convolution, reinforcement learning, and the inspiration for attention, while leaving behind spiking, online learning, energy efficiency, and sample efficiency, which are exactly the field’s unsolved problems. Neuromorphic computing pursues the brain’s physical principles and earns real efficiency gains, but trains most easily by importing the unbiological backpropagation it was meant to escape. The brain remains a source of hypotheses and a benchmark of efficiency, but not a blueprint, and the clearest thinkers treat the two intelligences as related cousins rather than as the same thing in different substrates.

4.10 References

[1] Herculano-Houzel, S. (2009). The human brain in numbers: a linearly scaled-up primate brain. Frontiers in Human Neuroscience, 3, 31. https://doi.org/10.3389/neuro.09.031.2009

[2] Hodgkin, A. L., & Huxley, A. F. (1952). A quantitative description of membrane current and its application to conduction and excitation in nerve. The Journal of Physiology, 117(4), 500-544. https://doi.org/10.1113/jphysiol.1952.sp004764

[3] Beniaguev, D., Segev, I., & London, M. (2021). Single cortical neurons as deep artificial neural networks. Neuron, 109(17), 2727-2739. https://doi.org/10.1016/j.neuron.2021.07.002

[4] Hebb, D. O. (1949). The Organization of Behavior: A Neuropsychological Theory. Wiley. https://psycnet.apa.org/record/1950-02200-000

[5] Bi, G. Q., & Poo, M. M. (1998). Synaptic modifications in cultured hippocampal neurons: dependence on spike timing, synaptic strength, and postsynaptic cell type. The Journal of Neuroscience, 18(24), 10464-10472. https://doi.org/10.1523/JNEUROSCI.18-24-10464.1998

[6] Schultz, W., Dayan, P., & Montague, P. R. (1997). A neural substrate of prediction and reward. Science, 275(5306), 1593-1599. https://doi.org/10.1126/science.275.5306.1593

[7] McCulloch, W. S., & Pitts, W. (1943). A logical calculus of the ideas immanent in nervous activity. The Bulletin of Mathematical Biophysics, 5, 115-133. https://doi.org/10.1007/BF02478259

[8] Rosenblatt, F. (1958). The perceptron: a probabilistic model for information storage and organization in the brain. Psychological Review, 65(6), 386-408. https://doi.org/10.1037/h0042519

[9] Glorot, X., Bordes, A., & Bengio, Y. (2011). Deep sparse rectifier neural networks. Proceedings of the 14th International Conference on Artificial Intelligence and Statistics (AISTATS). https://proceedings.mlr.press/v15/glorot11a.html

[10] Rumelhart, D. E., Hinton, G. E., & Williams, R. J. (1986). Learning representations by back-propagating errors. Nature, 323(6088), 533-536. https://doi.org/10.1038/323533a0

[11] Lillicrap, T. P., Cownden, D., Tweed, D. B., & Akerman, C. J. (2016). Random synaptic feedback weights support error backpropagation for deep learning. Nature Communications, 7, 13276. https://doi.org/10.1038/ncomms13276

[12] Lillicrap, T. P., Santoro, A., Marris, L., Akerman, C. J., & Hinton, G. (2020). Backpropagation and the brain. Nature Reviews Neuroscience, 21(6), 335-346. https://doi.org/10.1038/s41583-020-0277-3

[13] Thorpe, S., Delorme, A., & Van Rullen, R. (2001). Spike-based strategies for rapid processing. Neural Networks, 14(6-7), 715-725. https://doi.org/10.1016/S0893-6080(01)00083-1

[14] Mead, C. (1990). Neuromorphic electronic systems. Proceedings of the IEEE, 78(10), 1629-1636. https://doi.org/10.1109/5.58356

[15] Davies, M., et al. (2018). Loihi: a neuromorphic manycore processor with on-chip learning. IEEE Micro, 38(1), 82-99. https://doi.org/10.1109/MM.2018.112130359

[16] Neftci, E. O., Mostafa, H., & Zenke, F. (2019). Surrogate gradient learning in spiking neural networks. IEEE Signal Processing Magazine, 36(6), 51-63. https://doi.org/10.1109/MSP.2019.2931595

[17] Hubel, D. H., & Wiesel, T. N. (1962). Receptive fields, binocular interaction and functional architecture in the cat’s visual cortex. The Journal of Physiology, 160(1), 106-154. https://doi.org/10.1113/jphysiol.1962.sp006837

[18] Fukushima, K. (1980). Neocognitron: a self-organizing neural network model for a mechanism of pattern recognition unaffected by shift in position. Biological Cybernetics, 36(4), 193-202. https://doi.org/10.1007/BF00344251

[19] Vaswani, A., et al. (2017). Attention is all you need. Advances in Neural Information Processing Systems, 30. https://arxiv.org/abs/1706.03762

[20] Yamins, D. L. K., & DiCarlo, J. J. (2016). Using goal-driven deep learning models to understand sensory cortex. Nature Neuroscience, 19(3), 356-365. https://doi.org/10.1038/nn.4244