226  Business Applications of Visual Inference: From eKYC to Physiognomy

226.1 1. Introduction

The previous chapters built systems that verify a claimed identity. This chapter widens the lens to the broader commercial question that practitioners and executives actually ask: what business value can AI extract from a face, a voice, or a video, and where does that value turn into liability, pseudoscience, or illegality?

The organizing idea is a single distinction that cuts through the entire field:

Verification asks a checkable question (“is this the person in the document?” “is this person plausibly over 18?”). Inference asks an unanswerable one (“is this person creditworthy, employable, honest, or gay, from their face?”).

As applications move from the first kind to the second, the “signal” the model finds increasingly turns out to be self-presentation, demographic proxy, or dataset artifact rather than the trait claimed. This chapter organizes the landscape into three tiers, legitimate, contested, and discredited, with concrete business and academic references for each, and closes with practical guidance on which use cases are safe and compelling to demonstrate in a real business setting, which is exactly what a responsible practitioner needs before promising a client a “face-based” product.

226.2 2. Tier 1, Legitimate, Defensible Applications

These are well-grounded because the task is a 1:1 match or a bounded estimation with objective ground truth, not an inference about who someone is inside.

Customer onboarding (eKYC) and fraud prevention. The dominant legitimate use is selfie-to-document matching with liveness detection during remote account opening, standard at banks, neobanks, fintechs, crypto exchanges, gig-economy marketplaces, and telecoms (SIM registration). The business case is compliance (KYC/AML) plus fraud-loss reduction, and the credible academic backbone is NIST’s ongoing, public, demographic-stratified biometric evaluations. Vendors report large fraud-catch and manual-review-cost improvements; present these as industry claims, not peer-reviewed effect sizes.

Age verification and estimation. Facial age estimation, a regression on apparent age, distinct from identification, is now a regulator-recognized “highly effective age check” under the UK Online Safety Act. The most transparent vendor publishes a mean absolute error around 1.1 years for ages 13 to 17 and ~2.1 years for 18 to 24, independently evaluated in NIST’s Face Analysis Technology Evaluation. It is defensible precisely because the output is a bounded number with a published error distribution, and because privacy-preserving deployments delete the image immediately and never link it to an identity.

Access control, payment authentication, account recovery, returning-user re-verification. Face unlock, face-based payment confirmation, biometric re-authentication, and re-verifying a returning driver or gig worker are all 1:1 verification against an enrolled template, defensible when consented, tightly governed, and offered with a non-biometric fallback.

Why Tier 1 is safe. The ground truth is objective and checkable, error rates are independently measurable, and nothing is inferred about character. The genuine risks are operational and equity risks, the demographic accuracy gaps documented earlier, spoofing/deepfakes, and exclusion of people the system cannot read, not the epistemic risk of inferring an unmeasurable trait.

226.3 3. Tier 2, Commercially Deployed but Scientifically Contested

Here a real published literature claims predictive signal, products ship, and serious methodological critiques and regulation exist. The job is to separate “a paper reports a correlation” from “this is a valid, deployable, non-discriminatory inference.” This is the tier the user’s question, “credit scoring from face or video”, lands in, so it deserves the most careful treatment.

226.3.1 3.1 Credit Scoring and Default Prediction from Faces

There is a genuine peer-reviewed thread in top finance and management journals:

  • Duarte, Siegel & Young (2012), Review of Financial Studies, “Trust and Credit.” Borrowers who appear more trustworthy in peer-to-peer-lending photos are more likely to be funded, get lower rates, and actually default less. Perceived trustworthiness carried some real signal, but this is human raters scoring photos, and the effect is small relative to hard financial variables.
  • Chen, Liu, Meng & Wang (2023), Management Science, “What’s in a Face?” The most important paper for a balanced view. A machine-learning model can predict repayment from facial images to some degree, but giving human loan officers the photos does not improve their decisions, humans hold biased facial priors and over-weight facial information. The lesson is double-edged: even where a weak algorithmic signal exists, injecting faces into the human decision pipeline degrades judgment and imports bias.
  • CFO facial-trustworthiness studies find that firms whose executives have more trustworthy-looking faces obtain better loan terms, evidence of an appearance premium to be controlled, not a feature to productize.

Commercially, microlending vendors and patents have promoted “read the applicant’s face to score repayment,” sometimes blended with smartphone/digital-footprint scoring, but note that the digital-footprint signal (behavioral exhaust predicting default) is far better validated and is not facial inference.

Why deploying this is hazardous. Four problems compound: (1) demographic proxy, facial “signal” for default is confounded with age, gender, race, and socioeconomic markers, so a face score can launder protected-class discrimination into a credit decision; (2) reverse causality / self-presentation, a “trustworthy” photo reflects grooming, income, and access, not bone structure; (3) leakage, the image source (professional headshot vs. webcam vs. mugshot) carries the apparent signal; (4) legality, in the US this collides with fair-lending disparate-impact doctrine (ECOA), and in the EU with GDPR special-category rules and the AI Act’s high-risk classification of creditworthiness AI. A correlation in a research dataset is real but small, dominated by confounds, and deploying it as a credit feature is both ethically and legally dangerous.

226.3.2 3.2 Automated Video Interviewing and Hireability Inference

The product story is cautionary. A pioneer of AI-scored asynchronous video interviews included automated facial-expression analysis to infer traits, then, after an FTC complaint, ACLU criticism, and scrutiny under the Illinois Artificial Intelligence Video Interview Act, publicly dropped facial analysis in 2021, stating it “no longer significantly added value” relative to language analysis. Regulation has since tightened (Illinois AIVIA, NYC Local Law 144’s bias-audit mandate, EEOC guidance under Title VII/ADA).

The academic evidence is genuinely mixed and label-dependent. Hickman et al. (2022, Journal of Applied Psychology) trained models on ~1,073 video interviews to predict Big Five personality: models trained on observer-rated personality explained on average R² ≈ 0.16, but models trained on self-reported personality explained essentially nothing (R² ≈ 0.01). The algorithm partly learns to reproduce raters’ impressions, not the construct itself. The defensible residue is structured, content/verbal scoring with bias audits; facial-expression-to-hireability inference is the part the market itself retreated from.

226.3.3 3.3 Affect / Emotion Recognition

The market is large and real, ad-testing, market research, call-center voice analytics, and automotive driver-state monitoring. But the foundational scientific critique is devastating: Barrett et al. (2019), Psychological Science in the Public Interest, “Emotional Expressions Reconsidered,” a ~60-page review concluding that the assumed one-to-one mapping from facial configurations to internal emotional states is not supported, how people move their faces for a given emotion varies widely within a person, across contexts, and across cultures. The inference “this face = this felt emotion” is therefore scientifically unreliable. A useful nuance: driver drowsiness/distraction monitoring is more defensible than “customer emotion,” because it targets observable physiological states (eyelid closure, gaze) with a safety rationale, and the EU AI Act carves out exactly such safety uses while banning workplace emotion inference.

226.4 4. Tier 3, Scientifically Discredited and Banned

These claim to read inner character or protected status from facial structure. This is physiognomy, the long-discredited pseudoscience (Lavater, Lombroso) that fed scientific racism, re-skinned with deep learning.

  • Wu & Zhang (2016), “Automated Inference on Criminality Using Face Images,” claimed classifiers distinguish “criminal” from “non-criminal” faces. The fatal flaw: the “non-criminal” images were professional/ID photos (often smiling) while “criminal” images were government mugshots. The model learned expression and photo-source artifacts, not criminality, and “criminal” is a socially constructed, enforcement-biased label with no causal link to face geometry.
  • Wang & Kosinski (2018), “Detecting Sexual Orientation From Facial Images,” reported high AUC distinguishing gay from straight in dating-profile photos. Critiques (notably Agüera y Arcas and colleagues) showed the signal comes overwhelmingly from self-presentation and grooming, makeup, facial hair, glasses, camera angle, not innate structure, demolishing the authors’ prenatal-hormone story. The classifier exposes grooming norms and stereotypes, not biology.
  • Kosinski (2021), “Facial Recognition Technology Can Expose Political Orientation,” drew the same family of objections: self-presentation, demographic and regional confounds, and an unfounded leap from correlation to essence.
  • The umbrella critique, “Physiognomy’s New Clothes” (Agüera y Arcas, Mitchell & Todorov, 2017), is the definitive accessible takedown: these systems revive the exact logic historically used to justify discrimination, and their apparent accuracy reflects confounds, not any real face-to-character mapping.

In one line: there is no validated causal mechanism linking facial morphology to criminality, sexuality, or politics; the “accuracy” is real pattern-matching on confounds, and high AUC on a biased dataset is not evidence of a true relationship.

Regulatory bans. The EU AI Act (Article 5, prohibitions effective February 2025) bans social scoring; emotion recognition in workplaces and education (narrow safety/medical carve-outs); biometric categorisation inferring race, political opinions, religion, or sexual orientation; and untargeted facial-image scraping. Tier-3 use cases map almost exactly onto these prohibitions, while creditworthiness and employment AI (Tier 2) are separately classified high-risk, permitted but heavily constrained.

226.5 5. How to Demo Responsibly in a Real Business Case

The user’s practical need, compelling demos for real business cases, has a clear, safe answer: lean entirely on Tier 1.

  • Best demo, identity verification with deepfake defense. A live selfie-to-ID match with liveness/PAD, ideally showing a deepfake spoof being caught, framed around fraud-loss reduction and KYC/AML compliance. It has objective ground truth, independent (NIST) evaluation, and no character inference. This is the demo that wins enterprise trust.
  • Strong second, privacy-preserving age estimation. Show the published mean-absolute-error on screen, estimate an age, and delete the image immediately. A concrete, regulator-aligned use case (UK Online Safety Act) with honest error bars.
  • Other safe demos, payment authentication, account-recovery re-verification, returning-user matching, and document-fraud detection from the Document AI chapter.

Present Tier 2 only as analysis, never as a live product pitch. If credit-from-face or video-interview scoring must be shown, show it as a cautionary case: demonstrate the confound directly, for instance, that a “risk” model’s output flips when a mugshot-style photo is swapped for a smiling headshot, or that interview scores track rater impressions rather than job performance. Pair every Tier 2 example with its critique (Chen et al. on humans over-weighting faces; Hickman et al. on label dependence; Barrett et al. on emotion), and note that workplace emotion recognition is prohibited in the EU.

Never demo Tier 3 as if it works. Use the criminality, sexuality, and political-orientation papers only as worked examples of how confounds, leakage, biased labels, and reverse causality manufacture spurious accuracy, a teaching device for “why high accuracy ≠ a real relationship,” with the EU AI Act bans flagged explicitly.

226.6 6. Conclusion

The commercial value of AI on faces and video is real but lives almost entirely in verification, confirming a checkable claim of identity or age, not in inference of latent traits. The verification-versus-inference axis predicts both the science and the law: Tier 1 verifies an objective fact and is defensible; Tier 3 infers an unobservable essence and is pseudoscience the EU now bans; Tier 2 is the contested middle, where weak real correlations exist but are dominated by confounds and constrained by fair-lending and high-risk-AI regulation. For a practitioner building a demo or a product, the discipline is simple to state and hard to hold: if the question has an objective, checkable answer, you can build on it; if it requires reading character from a face, the accuracy is an artifact and the deployment is a liability. That principle, more than any model, is the takeaway of this cluster.

226.7 References

  1. Duarte, J., Siegel, S., Young, L. “Trust and Credit: The Role of Appearance in Peer-to-Peer Lending.” Review of Financial Studies, 2012. https://academic.oup.com/rfs/article-abstract/25/8/2455/1570804
  2. Chen, Z., Liu, B., Meng, Y., Wang, Z. “What’s in a Face? An Experiment on Facial Information and Loan-Approval Decision.” Management Science, 2023. https://pubsonline.informs.org/doi/10.1287/mnsc.2022.4436
  3. Hickman, L. et al. “Automated Video Interview Personality Assessments: Reliability, Validity, and Generalizability.” Journal of Applied Psychology, 2022. https://pubmed.ncbi.nlm.nih.gov/34110849/
  4. Barrett, L. F., Adolphs, R., Marsella, S., Martinez, A., Pollak, S. “Emotional Expressions Reconsidered.” Psychological Science in the Public Interest, 2019. https://journals.sagepub.com/doi/10.1177/1529100619832930
  5. Wu, X., Zhang, X. “Automated Inference on Criminality Using Face Images.” 2016. https://arxiv.org/abs/1611.04135
  6. Wang, Y., Kosinski, M. “Deep Neural Networks Are More Accurate Than Humans at Detecting Sexual Orientation From Facial Images.” J. Personality and Social Psychology, 2018. https://www.gsb.stanford.edu/faculty-research/publications/deep-neural-networks-are-more-accurate-humans-detecting-sexual
  7. Agüera y Arcas, B., Mitchell, M., Todorov, A. “Physiognomy’s New Clothes.” 2017. https://medium.com/(blaisea/physiognomys-new-clothes-f2d4b59fdd6a?)
  8. Yoti. “Facial Age Estimation” (accuracy / NIST FATE evaluation). https://www.yoti.com/business/age-verification/
  9. EU AI Act, Article 5 (prohibited practices). https://artificialintelligenceact.eu/article/5/
  10. NIST. Face Analysis Technology Evaluation (FATE), Age Estimation. https://pages.nist.gov/frvt/html/frvt_age_estimation.html