3  The Philosophy of Machine Intelligence

The question of whether machines can think is older than the digital computer itself, and it remains stubbornly unresolved even as large language models produce text that many readers cannot distinguish from human writing. This chapter examines the conceptual foundations beneath that question. It asks what we mean by intelligence, surveys the major thought experiments and arguments that have shaped the debate, and shows how these decades old disputes structure current arguments about whether systems such as GPT, Claude, and Gemini understand anything at all. The goal is not to settle the matter but to give you the conceptual vocabulary to reason about it carefully.

3.1 1. What Intelligence Means and Why It Resists Definition

3.1.1 1.1 The Problem of Definition

Intelligence is one of those concepts that everyone uses confidently and no one can pin down. Psychologists have offered operational definitions tied to test performance, biologists have tied it to adaptive behavior, and computer scientists have often defaulted to task competence. Each definition captures something while excluding something else. A definition narrow enough to be measurable (for example, performance on a fixed benchmark) tends to miss the open ended flexibility we associate with genuine intelligence, while a definition broad enough to include that flexibility tends to become untestable.

Part of the difficulty is that intelligence is a cluster concept. It bundles together perception, memory, reasoning, learning, planning, language, and the capacity to transfer skill from one domain to another. These capacities can come apart. A calculator exceeds any human at arithmetic yet plans nothing, while a crow solves novel physical puzzles without arithmetic. Because the bundle is loose, any single yardstick will look arbitrary to someone who weights the components differently.

3.1.2 1.2 Behavioral Versus Internal Criteria

A second fault line runs between behavioral and internal accounts. A behavioral account says that intelligence is as intelligence does: if a system behaves in the ways an intelligent agent would, that settles the matter. An internal account insists that behavior is merely evidence, and that what makes behavior intelligent is the kind of process that produces it. A lookup table that stored a correct response for every possible conversation would behave intelligently while (most would agree) understanding nothing. This tension between what a system does and how it does it recurs throughout the chapter, and it is the hinge on which most of the classic arguments turn.

3.1.3 1.3 Why the Definitional Problem Matters for AI

The definitional problem is not idle philosophizing. When researchers claim a system has reached human level intelligence, or when critics deny it, they are often disagreeing about definitions rather than about facts. Clarifying what we are asking lets us see that some apparent disputes are verbal, while others are substantive. As we will see, the modern debate about whether language models understand is partly a rerun of the behavioral versus internal disagreement under new conditions.

3.2 2. The Turing Test

3.2.1 2.1 Turing’s Proposal

In his 1950 paper “Computing Machinery and Intelligence,” Alan Turing sidestepped the unanswerable question “Can machines think?” and replaced it with an operational one. He described the imitation game: a human interrogator converses by text with two hidden participants, one human and one machine, and tries to tell which is which. If the machine fools the interrogator as often as a human would, Turing proposed, we should be prepared to say it thinks. The move is deliberately behavioral. Turing was skeptical that “thinking” could be defined in a way that would command agreement, so he offered a test that any competent machine could in principle pass or fail.

3.2.2 2.2 What the Test Gets Right

The test has real virtues. It is medium neutral, judging the system on conversation rather than on appearance or substrate, and so it forecloses prejudice based on a machine being made of metal rather than carbon. It also sets a demanding bar, because open ended conversation draws on reasoning, world knowledge, humor, and the ability to handle the unexpected. Turing anticipated many objections, including the claim that machines could never be creative or original, and he answered them with arguments that still read freshly.

3.2.3 2.3 Critiques of the Test

The test has nonetheless drawn sustained criticism. One objection is that it tests deception rather than intelligence: a system might pass by exploiting human gullibility, deflecting hard questions, or imitating the typing errors and evasions of a person, none of which require genuine understanding. The 2014 episode in which a chatbot posing as a thirteen year old Ukrainian boy reportedly fooled a third of judges illustrated how a low bar and a clever persona can substitute for substance. A second objection is that the test is anthropocentric, treating human conversation as the gold standard and thereby missing forms of intelligence that are real but nonhuman. A third, which the next sections develop, is that passing the test shows competence at producing the right outputs without showing that anything inside the system understands those outputs. Modern language models sharpen all three worries, because they are explicitly trained on human text and can produce fluent conversation while their internal grasp of meaning is precisely what is in dispute.

3.3 3. Searle’s Chinese Room

3.3.1 3.1 The Argument

In 1980 John Searle introduced a thought experiment designed to show that passing a behavioral test, even a perfect one, does not establish understanding. Imagine Searle locked in a room. Speakers of Chinese pass written questions under the door. Searle, who knows no Chinese, consults a vast rulebook written in English that tells him, purely in terms of the shapes of the symbols, which Chinese symbols to write in response. By following the rules he produces answers indistinguishable from those of a native speaker. To those outside, the room appears to understand Chinese. Yet Searle, the only one who understands anything in the room, understands not a word of Chinese. He is merely manipulating symbols by their form.

3.3.2 3.2 The Target

Searle’s target is what he called strong AI, the thesis that a suitably programmed computer would thereby have a mind and genuinely understand. His claim is that a digital computer is exactly like the person in the room: it manipulates symbols according to formal rules (its program) without any access to what those symbols mean. Syntax, the formal manipulation of symbols, is not sufficient for semantics, the meaningful content those symbols carry. Because running a program is just syntax, and because understanding requires semantics, no program could by itself constitute understanding. The argument is meant to apply no matter how sophisticated the program, which is why it bears directly on systems far more capable than anything that existed in 1980.

3.4 4. Strong and Weak AI, Functionalism, and the Computational Theory of Mind

3.4.1 4.1 The Strong and Weak Distinction

Searle drew a distinction that organizes much of the field. Weak AI treats the computer as a powerful tool for studying the mind and for performing tasks that would require intelligence if a human did them, without any claim that the computer itself has a mind. Strong AI claims that the right program is a mind, that mental states just are computational states of the appropriate kind. Almost no one disputes weak AI. The philosophical action is entirely about the strong claim, and it is the strong claim that Searle attacks.

3.4.2 4.2 Functionalism

The theoretical backbone of strong AI is functionalism, the dominant position in the philosophy of mind through the late twentieth century. Functionalism holds that mental states are defined not by what they are made of but by their causal role, by how they relate to sensory inputs, to other mental states, and to behavioral outputs. Pain, on this view, is whatever state is typically caused by bodily damage and typically causes avoidance and complaint, regardless of whether it is realized in neurons or in silicon. This thesis of multiple realizability is attractive because it explains how creatures with very different brains could share mental states, and it opens the door to minds implemented in hardware utterly unlike our own.

3.4.3 4.3 The Computational Theory of Mind

The computational theory of mind takes functionalism a step further by specifying the relevant functional organization as computation. On this view, thinking is information processing, the rule governed transformation of internal representations, and the mind stands to the brain roughly as software stands to hardware. If that picture is correct, then a computer running the right program would not merely simulate thought but instantiate it, because thought just is that kind of computation. The Chinese Room is precisely an attack on this inference. Searle grants that the room (or the computer) carries out the right computation, and insists that understanding still fails to appear, so computation cannot be sufficient for mind.

3.5 5. Symbol Grounding

3.5.1 5.1 The Problem

Closely related to Searle’s worry is the symbol grounding problem, articulated by Stevan Harnad in 1990. The symbols inside a classical AI system are meaningful only to us, the interpreters who read them. To the system they are bare tokens, defined entirely by their relations to other equally meaningless tokens. Harnad’s image is of trying to learn Chinese from a Chinese to Chinese dictionary alone: every definition sends you to more symbols, and you never break out of the circle into the world. How, then, could a symbol manipulating system ever connect its symbols to the things they are supposed to be about?

3.5.2 5.2 Proposed Solutions

Harnad’s suggested remedy was to ground at least some symbols in the system’s sensory interactions with the world, so that the token for “horse” is tied to the system’s own perceptual capacity to detect horses, with other symbols built up from this grounded base. This line of thought motivates robotics and embodied approaches to AI, which hold that genuine understanding requires a body that perceives and acts, giving symbols a causal anchor in the environment. The grounding problem presses hard on systems trained only on text. A language model learns from a vast corpus of symbols and their statistical relations, which is exactly the dictionary go round Harnad warned about, and this is one reason critics doubt that such models understand the words they so fluently arrange.

3.6 6. Consciousness and Whether It Matters

3.6.1 6.1 Two Questions, Not One

It is essential to separate two questions that are easily conflated. The first is whether a machine can be intelligent, can solve problems, reason, and use language. The second is whether a machine can be conscious, can have subjective experience, such that there is something it is like to be that machine. These come apart in principle. A system might be highly intelligent with no inner experience whatsoever (a so called philosophical zombie), and conversely a creature might have rich experience with modest intelligence. Much confusion in public discussion comes from sliding between the two.

3.6.2 6.2 The Hard Problem

David Chalmers famously distinguished the easy problems of consciousness, which concern explaining cognitive functions such as discrimination, integration, and reportability, from the hard problem, which is explaining why any of this functioning is accompanied by subjective experience at all. The easy problems are easy only by comparison: they are the kind of thing a complete cognitive science could in principle solve. The hard problem is hard because even a complete functional account seems to leave open why there is felt experience rather than mere processing in the dark. For AI, the hard problem implies that even a system that perfectly replicated human cognitive function might still face an open question about whether it feels anything.

3.6.3 6.3 Does It Matter for AI?

Whether consciousness matters depends on what we want from AI. For most practical and scientific purposes, intelligence is what we are after, and a system that reasons and acts effectively serves us whether or not it has experience. Consciousness becomes central, however, for moral status. If a system can suffer, then how we treat it raises ethical questions, and the difficulty of detecting consciousness from the outside (the same difficulty the Turing Test cannot resolve) means we may face hard moral uncertainty. For the narrow question of whether a machine understands, many philosophers hold that understanding is a cognitive achievement that need not require consciousness, though Searle himself ties intentionality closely to the biological character of brains.

3.7 7. The Systems Reply and Other Responses to the Chinese Room

3.7.1 7.1 The Systems Reply

The most influential response to Searle is the systems reply. It concedes that the person in the room does not understand Chinese but denies that this is the relevant point. Understanding, the reply says, is a property of the whole system, the person together with the rulebook, the scratch paper, the symbols, and the procedures, not of the person alone. Searle is merely the central processing unit of a larger system, and there is no reason to expect a component to possess the understanding of the whole, any more than a single neuron understands English.

3.7.2 7.2 Searle’s Rejoinder and the Counter

Searle’s reply is to internalize the system. Let the person memorize the entire rulebook and do all the computation in his head, dispensing with the room and the paper entirely. Now the person is the whole system, and he still understands no Chinese, so where is the understanding supposed to reside? Critics counter that internalization smuggles in an intuition pump: it asks us to imagine something cognitively impossible (memorizing and executing a program vast enough to sustain fluent conversation) and then trusts our untrained intuition about that impossible scenario. The defender of the systems reply argues that the person who has internalized the program would be implementing a second cognitive system, distinct from his ordinary self, and that this second system might understand Chinese even though the host person does not, just as a single brain can in unusual cases sustain two streams of awareness.

3.7.3 7.3 The Robot and Other Replies

Other replies push in different directions. The robot reply grants Searle’s point about a disembodied program but argues that a system embedded in a robot that perceives and acts would have its symbols grounded in the world, addressing the very objection Harnad later formalized. The brain simulator reply imagines a program that simulates the exact firing of a Chinese speaker’s neurons and asks how Searle could deny understanding to that without also denying it to the original brain. Searle resists each, but the proliferation of replies shows that the thought experiment, however vivid, does not command universal assent. What it does accomplish is to make undeniable the gap between behaving as if one understands and understanding, which is exactly the gap at issue in contemporary debates.

3.8 8. How These Debates Inform Modern LLM Discussions

3.8.1 8.1 Stochastic Parrots

The phrase “stochastic parrots,” from a 2021 paper by Emily Bender, Timnit Gebru, and colleagues, crystallized the skeptical position for the language model era. The argument is that a model trained to predict the next token learns the statistical distribution of word forms without any access to meaning or communicative intent, and so it stitches together plausible sequences without understanding them, much as a parrot repeats sounds. This is the symbol grounding problem and the Chinese Room recast for systems trained on text. The model, on this view, manipulates form (syntax) with no grip on content (semantics), and its fluency is precisely what makes the absence of understanding hard to notice.

3.8.2 8.2 Understanding Versus Prediction

The opposing view holds that the dichotomy between prediction and understanding is too crude. Predicting the next token well across a sufficiently rich corpus may require the model to build internal structure that functions like understanding: representations of entities, relations, and even the state of an unfolding situation. Empirical work probing the internals of trained models has found representations that track features of the world the text describes, which suggests that next token prediction can be a means to richer internal organization rather than its opposite. A functionalist will say that if a system reliably exhibits the right input output relations and the right internal information processing, then withholding the word “understanding” begins to look like substrate prejudice, the very prejudice Turing warned against. A Searlean will reply that no amount of internal structure converts syntax into semantics, and that grounding through training on text alone remains a dictionary go round.

3.8.3 8.3 Why the Old Arguments Still Bind

These are not new arguments wearing new clothes by accident. The language model debate inherits the exact structure of the older one. The behavioral camp points to performance and asks what more could be wanted. The internalist camp points to the manner of operation and insists that performance is not the point. The grounding camp asks how text trained symbols could be about anything. Progress on the empirical questions, such as what representations models actually form and whether multimodal grounding changes the picture, is real and ongoing. But the conceptual questions, what understanding is, whether it requires consciousness, and whether the right functional organization suffices for mind, remain genuinely open. A responsible practitioner should therefore resist both the temptation to declare these systems minds and the temptation to dismiss them as mere parrots, and should instead hold the distinctions this chapter has drawn clearly in view.

3.9 9. Conclusion

The philosophy of machine intelligence supplies the questions that capability benchmarks cannot answer. Intelligence resists definition because it bundles many capacities and because we disagree about whether behavior or its underlying process is what counts. The Turing Test made the question operational and behavioral, Searle’s Chinese Room challenged the sufficiency of behavior and computation for understanding, and functionalism and the computational theory of mind supplied the framework Searle attacked. Symbol grounding asks how any of these systems could connect to the world, consciousness raises a further question that intelligence alone does not settle, and the systems reply keeps the central dispute alive. When you read a claim that a language model does or does not understand, you are reading a move in this long argument. Knowing the moves will not tell you who is right, but it will let you see clearly what is being claimed, what would count as evidence, and where reasonable people still disagree.

3.10 References

  1. Turing, A. M. (1950). Computing Machinery and Intelligence. Mind, 59(236), 433 to 460. https://academic.oup.com/mind/article/LIX/236/433/986238

  2. Searle, J. R. (1980). Minds, Brains, and Programs. Behavioral and Brain Sciences, 3(3), 417 to 457. https://www.cambridge.org/core/journals/behavioral-and-brain-sciences/article/minds-brains-and-programs/DC644B47A4299C637C89772FACC2706A

  3. Harnad, S. (1990). The Symbol Grounding Problem. Physica D, 42, 335 to 346. https://www.sciencedirect.com/science/article/abs/pii/0167278990900876

  4. Chalmers, D. J. (1995). Facing Up to the Problem of Consciousness. Journal of Consciousness Studies, 2(3), 200 to 219. https://consc.net/papers/facing.html

  5. Bender, E. M., Gebru, T., McMillan-Major, A., and Shmitchell, S. (2021). On the Dangers of Stochastic Parrots: Can Language Models Be Too Big? Proceedings of FAccT 2021. https://dl.acm.org/doi/10.1145/3442188.3445922

  6. Block, N. (1981). Psychologism and Behaviorism. The Philosophical Review, 90(1), 5 to 43. https://www.jstor.org/stable/2184371

  7. Cole, D. (2023). The Chinese Room Argument. Stanford Encyclopedia of Philosophy. https://plato.stanford.edu/entries/chinese-room/

  8. Levesque, H. J., Davis, E., and Morgenstern, L. (2012). The Winograd Schema Challenge. Proceedings of the Thirteenth International Conference on Principles of Knowledge Representation and Reasoning. https://cdn.aaai.org/ocs/4492/4492-21843-1-PB.pdf

  9. Mitchell, M., and Krakauer, D. C. (2023). The Debate Over Understanding in AI’s Large Language Models. PNAS, 120(13). https://www.pnas.org/doi/10.1073/pnas.2215907120

  10. Putnam, H. (1967). Psychological Predicates (later titled The Nature of Mental States). In Art, Mind, and Religion. University of Pittsburgh Press. https://philpapers.org/rec/PUTPP