224 Face Verification, Liveness, and Presentation-Attack Detection
224.1 1. Introduction
The previous chapter extracted a portrait, from a document’s printed photo or, better, from its NFC chip. This chapter answers the second half of identity verification: is the person presenting the document its genuine, living owner? That decomposes into two distinct technical problems. Face verification asks whether a live selfie matches the reference portrait (a 1:1 comparison). Liveness detection, or presentation-attack detection (PAD), asks whether the selfie comes from a real, present human rather than a photo, a video replay, a mask, or an injected deepfake. A system that solves the first but not the second is trivially defeated by holding up a printout of the victim’s face, so in practice the two must always travel together.
We develop the embedding paradigm and the margin losses that define modern face recognition, the independent NIST benchmarks and their sobering demographic findings, the PAD threat taxonomy and methods, the rising deepfake and injection threats of 2024 to 2025, the certification standards, and the privacy and legal constraints that govern any deployment.
224.2 2. Recognition, Verification (1:1), and Identification (1:N)
Three operational modes must be distinguished, because their error behaviour differs sharply:
- Verification (1:1), “is this person who they claim to be?” A probe (selfie) is compared against a single claimed reference (the ID photo), yielding one score thresholded to accept or reject. This is the eKYC mode.
- Identification (1:N), “who is this?” A probe is searched against a gallery of N enrolled identities. The probability of a false positive scales with N, so an algorithm safe at 1:1 can be unsafe when searching millions of identities.
- Recognition is the umbrella term for both.
224.2.1 2.1 The Embedding Paradigm
A modern system maps a face to a fixed-length vector, an embedding or template, typically 128 to 512 dimensions, via a deep encoder (historically a CNN such as ResNet or MobileFaceNet, increasingly a Vision Transformer). The network is trained so that embeddings of the same identity cluster tightly (intra-class compactness) while different identities are pushed apart (inter-class separability). At inference, matching reduces to a distance computation between two embeddings; the encoder itself is fixed. This is the same representational idea developed in the embeddings chapter, specialized to faces.
224.2.2 2.2 Margin Losses
The defining innovation of the 2015 to 2019 era was the loss function that shapes the embedding space:
- FaceNet / triplet loss (Schroff et al., 2015) optimizes triplets of (anchor, positive, negative) so the anchor, positive distance is smaller than the anchor, negative distance by a margin. Powerful but sensitive to triplet mining and slow to converge.
- SphereFace (2017) introduced a multiplicative angular margin on the angle between a feature and its class-weight vector.
- CosFace (2018) applies an additive cosine margin, subtracting a fixed margin from the target-class cosine on a hypersphere of fixed radius.
- ArcFace (Deng et al., CVPR 2019) adds the margin directly to the angle, the geodesic distance on the hypersphere, giving an exact, constant, geometrically interpretable margin. ArcFace became the de-facto standard and remains a strong baseline.
All three margin variants share one goal: penalize the target logit to force a gap at the decision boundary. They operate on L2-normalized features and weights, which is why cosine similarity is the natural scoring metric.
224.2.3 2.3 Scoring and Error Trade-offs
A selfie-to-ID match runs both images through the encoder, L2-normalizes the two embeddings, computes cosine similarity, and thresholds it. The threshold sets the trade-off:
- False Match Rate (FMR), impostors wrongly accepted (a security failure).
- False Non-Match Rate (FNMR), genuine users wrongly rejected (a usability failure, and in eKYC an exclusion failure).
Lowering the threshold raises FMR and lowers FNMR. The full trade-off is visualized by the ROC curve (true-accept vs. false-accept) or, more diagnostically in the low-error regime, the DET curve (FNMR vs. FMR on log axes). Operators fix an operating point, say FMR = 10⁻⁶, and report the resulting FNMR.
224.3 3. NIST Benchmarks and the Demographic Reality
NIST’s Face Recognition Vendor Test (FRVT), now the Face Recognition Technology Evaluation (FRTE), with morph and age strands under FATE, is the authoritative independent benchmark, evaluating hundreds of vendor algorithms on sequestered operational datasets. This independence matters: it is the credible alternative to vendor self-report.
State of the art (1:1). The best algorithms achieve roughly FNMR ≈ 0.0001 to 0.002 at FMR = 10⁻⁶ on a ~12-million-image mugshot dataset, under 1% miss rate at a one-in-a-million false-match setting. (Specific numbers are leaderboard snapshots that shift continuously.)
Demographic differentials, NISTIR 8280 (FRVT Part 3, December 2019). This landmark study should be cited precisely, because it is frequently exaggerated and frequently dismissed:
- In 1:1 verification, false positives were higher for Asian and African American faces than for Caucasian faces, with differentials “often ranging from a factor of 10 to 100×,” depending on the algorithm.
- For US-developed algorithms, the highest false positives appeared for the American Indian group, with elevated rates for Asian and African American faces.
- A crucial nuance: algorithms developed in Asian countries did not show the Asian-versus-Caucasian disparity, evidence that training-data composition, not any immutable property, drives much of the gap.
- In 1:N identification, the highest false positives were for African American females, operationally the most serious finding, since 1:N false positives in law-enforcement search can implicate innocent people.
NIST’s follow-up work (NISTIR 8429) clarifies the mechanism: false-negative inequities are largely an image-quality problem (under-exposure of darker skin), correctable at the capture stage, whereas the larger false-positive variations persist even in high-quality images and must be addressed in algorithm design. The best modern algorithms are markedly more equitable than the 2019 cohort, but the differentials have not vanished, which is why an eKYC deployment must monitor error rates by demographic group, not just in aggregate.
224.4 4. Liveness and Presentation-Attack Detection
PAD determines whether the biometric sample comes from a live, present human. Passive liveness analyzes a single image or short clip with no user action (texture, moiré, reflectance, micro-physiology). Active (challenge, response) liveness prompts the user, blink, smile, turn the head, or follow on-screen color flashes for reflectance, and is more robust but higher-friction and more spoofable by interactive deepfakes.
224.4.1 4.1 The Attack Taxonomy (ISO/IEC 30107)
ISO/IEC 30107-1 frames the threat. Presentation-attack instruments (PAIs) include print attacks (a photo on paper), replay attacks (a video on a screen), 3D masks (paper, resin, silicone), and increasingly digital/synthetic artefacts. An important scope point: classic PAD addresses artefacts presented to the sensor; injection attacks that bypass the sensor entirely (Section 5) fall outside the original presentation model.
224.4.2 4.2 Methods
- Texture and image-quality cues detect print and screen artefacts (moiré, banding, reduced micro-texture, color distortion).
- Remote photoplethysmography (rPPG) recovers the subtle pulse-driven skin-color signal, present in live faces, absent in prints and masks, but is computationally heavy and sensitive to motion and lighting.
- Depth exploits the planarity of prints and replays versus genuine 3D facial structure (stereo, structured light, or learned depth).
- Deep-learning PAD dominates: CNN/transformer classifiers, auxiliary-supervision models that jointly regress depth maps and rPPG signals, and generalization-focused approaches (domain adaptation, anomaly detection) to handle unseen attack types, the central open problem, since PAD models generalize poorly across datasets.
224.4.3 4.3 Benchmarks
Canonical datasets include CASIA-FASD (50 subjects), Replay-Attack (Idiap; 1,200 videos), OULU-NPU (4,950 clips across four protocols isolating illumination, instrument, and camera), SiW (165 subjects), the multi-modal cross-ethnicity CASIA-SURF and CeFA, and CelebA-Spoof (~625k images, 10,177 subjects). The recurring lesson across community competitions is poor cross-domain generalization, a detector tuned on one dataset’s attacks often fails on another’s.
224.5 5. Deepfakes, Morphing, and Injection Attacks
224.5.1 5.1 Face Morphing (the ID-issuance attack)
A morph blends two faces into one image that matches both contributing identities above threshold. If a passport is issued from a morphed photo, two people share one credential, defeating downstream verification. The Morphing Attack Detection (MAD) literature splits into single-image MAD (detect artefacts in one image) and differential MAD (compare the document image against a trusted live capture). NIST runs FATE MORPH for independent benchmarking and in 2025 released a lay-language guide (NISTIR 8584). The primary operational defense is live, supervised enrolment at issuance, which the EU mandated for passports under Regulation 2019/1157, removing the applicant-supplied photo that morphing exploits.
224.5.2 5.2 Injection Attacks, the 2024 to 2025 Shift
The fastest-growing threat is not holding an artefact to a camera but injecting a synthetic video stream directly into the verification pipeline through a virtual camera, bypassing the physical sensor. Industry threat-intelligence reporting (for example iProov’s 2025 report) documents steep year-over-year rises in virtual-camera and face-swap attacks and a growing share of verification attempts involving deepfakes. These vendor figures are directional threat intelligence, not peer-reviewed measurements, and should be cited with that caveat. The structural point, however, is robust: because injection sidesteps the sensor, traditional presentation-attack detection does not cover it. Defenses require trusted capture, device attestation, server-side imagery-integrity checks, and one-time challenge schemes (such as randomized screen-illumination patterns) that are hard to pre-render, rather than analysis of the image alone.
224.6 6. Standards and Certification
- ISO/IEC 30107-3 specifies the testing and reporting methodology for PAD, defining APCER (attack presentations wrongly accepted) and BPCER (bona-fide presentations wrongly rejected) against defined attack species.
- iBeta (an NVLAP-accredited lab) is the dominant PAD test house. Level 1 covers 2D attacks (prints, cutouts, screen replays) and requires 0% successful attacks across the battery; Level 2 adds 3D attacks (silicone/resin/latex masks, wrapped 3D paper) with higher material budgets. Conformance is reported per ISO/IEC 30107-3.
- FIDO Alliance Biometric Component Certification independently certifies biometric subcomponents for both PAD and recognition performance via accredited labs, giving a vendor-neutral assurance bar.
224.7 7. Privacy, Bias, and Regulation
- Template protection. Raw embeddings are sensitive and partially invertible, so ISO/IEC 24745 (biometric information protection) and cancelable biometrics, irreversible, revocable transforms of the template, plus secure-element and fuzzy-vault schemes are the standard mitigations. They allow a compromised template to be revoked and re-issued without re-enrolling the person, which a raw face image can never be.
- EU AI Act (Article 5). Prohibits certain biometric practices: real-time remote biometric identification in public spaces for law enforcement (with narrow, court-authorized exceptions); biometric categorisation inferring sensitive attributes; and emotion recognition in workplaces and education (with safety/medical carve-outs). Most verification and permitted remote-identification uses are pushed into the high-risk tier with conformity-assessment duties, a theme developed in the eKYC and business-applications chapters.
- Biometric privacy law (US). Illinois’ Biometric Information Privacy Act (BIPA, 2008) is the template statute: it mandates informed written consent and retention/destruction policies for face-geometry data and uniquely provides a private right of action with statutory damages ($1,000 per negligent, $5,000 per reckless violation). It has driven nine-figure settlements (Facebook $650M; Clearview AI), and a 2024 amendment limiting per-scan accumulation was held retroactive in 2026. Texas and Washington have analogous but non-private-right regimes.
224.8 8. Conclusion
Face verification is a mature, independently benchmarked technology: margin-loss embeddings compared by cosine similarity, with error trade-offs made explicit by the DET curve, and a sobering, well-documented demographic gap that any deployment must monitor by group rather than in aggregate. But verification without liveness is no defense at all, and liveness is now an arms race, the threat moved from printed photos to 3D masks to, in 2024 to 2025, deepfake injection that bypasses the camera entirely, pushing the trust anchor toward device attestation and trusted capture. With document reading (previous chapter) and biometric verification (this chapter) in hand, the next chapter assembles them into a complete eKYC system and confronts the regulation, risk scoring, and fairness obligations that govern it.
224.9 References
- Deng, J. et al. “ArcFace: Additive Angular Margin Loss for Deep Face Recognition.” CVPR 2019. https://arxiv.org/abs/1801.07698
- Schroff, F., Kalenichenko, D., Philbin, J. “FaceNet: A Unified Embedding for Face Recognition and Clustering.” CVPR 2015. https://arxiv.org/abs/1503.03832
- Wang, H. et al. “CosFace: Large Margin Cosine Loss for Deep Face Recognition.” 2018. https://arxiv.org/abs/1801.09414
- NIST. “FRVT Part 3: Demographic Effects” (NISTIR 8280). December 2019. https://nvlpubs.nist.gov/nistpubs/ir/2019/nist.ir.8280.pdf
- NIST. “FRVT Part 8 / Demographic Effects update” (NISTIR 8429). https://pages.nist.gov/frvt/reports/demographics/nistir_8429.pdf
- ISO/IEC 30107-3. Biometric presentation attack detection, Testing and reporting. https://www.iso.org/standard/79520.html
- Liu, Y., Jourabloo, A., Liu, X. “Learning Deep Models for Face Anti-Spoofing: Binary or Auxiliary Supervision.” CVPR 2018. https://arxiv.org/abs/1803.11097
- NIST FATE MORPH and NISTIR 8584 (morph-attack detection). https://pages.nist.gov/frvt/html/frvt_morph.html
- iProov. “Threat Intelligence Report 2025” (injection/deepfake trends; vendor threat intelligence). https://www.iproov.com/reports/threat-intelligence-report-2025-remote-identity-attack
- FIDO Alliance. “Biometric Component Certification.” https://fidoalliance.org/certification/biometric-component-certification/
- EU AI Act, Article 5 (prohibited practices). https://artificialintelligenceact.eu/article/5/