Table of Contents
A deepfake in a hiring pipeline is not just a fake video you scroll past. It is a person you may hire, pay, and give access to your systems and data.
The FBI has confirmed that more than 300 U.S. companies, including Fortune 500 firms, unknowingly hired operatives using fabricated identities.
And you cannot reliably catch this by looking, asking interviewers to watch for something “off” is not a real plan. Detection has to be built into the process.
However, one detector running one check is not enough. In hiring, a single score is too easy to get wrong. Reliable deepfake detection needs multiple independent signals working together.
Here is why.
A single check is a coin flip
Most deepfake detectors run one type of check. They are usually trained on older deepfakes, often made for online media, celebrity clips, etc. Then they give you one score.
That works until the detector encounters a new type of deepfake. Then it fails entirely.
A single-model detector only checks one thing. It's one AI model that learned what fakes look like from a big batch of real and fake videos, and now it looks at a new clip and gives back a single score. One signal, one number. The catch is it only knows the fakes it was trained on, so a new kind it hasn't seen throws it off.
The problem is what happens when the fake is new. Researchers tested a standard single-model detector on a fake type it was not trained on. On a familiar fake it scored 93.9% accuracy. On the unfamiliar one it scored 51.2%. Fifty percent is a coin flip. At 51%, the detector cannot tell the fake from a real person, which means it passes fakes through as real. This is not a one-off. Across the research, single-method detectors lose 20 to 60 percentage points when they meet a fake from a tool they were not trained on.
New face-swap tools ship every few weeks, so the deepfake your detector has never seen is not the rare case. It is the normal case. A single check is permanently one step behind the one tool it hasn't learned yet.
The problem also works the other way. Real interview conditions are messy. Candidates join on cheap webcams, rooms are dim, faces sit far back from the camera or internet connections compress the video.
Those conditions remove the exact clues many detectors rely on. In one test, plain video compression dropped a detector from 98% accuracy to 61%. Low light and a partly blocked face made performance even worse.
So a single check creates both problems at once. It can let fake candidates through, and it can flag real candidates who just joined from a bad setup. Neither is acceptable when the result affects a hiring decision.
What multiple signals do: a test
Instead of relying on one check, a hiring detector should score every interview clip across several independent signals.
Each signal should look at a different part of the video. That way, if one signal is weak, the others can still catch what it missed.
We ran three clips through four signals: one real candidate and two real-time face swaps. Each signal returned a score from 0 to 100, where a higher score means more likely to be fake.
Note before the numbers: these are real results from our own tool, and the scores are exactly what came back. We've labeled the four signals A through D instead of disclosing our signals.
| Check | Real candidate 1 | Deepfake Candidate 1 | Deepfake Candidate 2 |
|---|---|---|---|
| Signal A | 14 | 91 | 96 |
| Signal B | 22 | 88 | 93 |
| Signal C | 18 | 96 | 97 |
| Signal D | 16 | 67 | 76 |
| Overall | 14 / 100 | 86 / 100 | 93 / 100 |
The real candidate scores low on all four signals. Both fakes score high on all four. The one worth looking at is Signal D, only because it was the least sure of the four: it scored the two fakes at just 67 and 76, while the other three scored them in the high 80s and 90s. Signal D did flag the fakes, but it was far less confident than the rest.
Now imagine a tool that runs only one check, and that check is a weaker one like Signal D. On a sophisticated deepfake, a score like 67 could slip below the line, and the tool would pass the fake off as real, because that one number is all it has to go on.
That did not happen here. The other three signals scored those same fakes 88 and higher, which pulled the overall score to a clear 86 and 93 out of 100. The weak signal didn't decide the call, because it wasn't deciding alone and therefore the result was accurate.
What interview deepfake detection should look for
So multiple signals are one thing but what should those signals even look for?
The strongest signals are not just about whether a face looks real. They are about whether the candidate behaves like a live human in a real room.
A hiring detector should look at human behaviour. Real people produce tiny, involuntary movements. Their expressions shift with the conversation. Their face changes when they are thinking, reacting, stalling, or answering. A face swap often has a narrower, flatter, more repetitive range of movement. It may look convincing at first, but the behaviour stays too consistent across the call.
It should also look at room and body consistency. A real person moves naturally, they shift, lean, gesture, and fidget. The light on their face matches the room. Shadows move when they turn their head. A face swap does not understand the room it is in. The lighting can drift out of sync. The face may stay too locked and centred because bigger movement can break the track.
It should also look at consistency under motion. A real face holds together from frame to frame when someone moves. A generated face is rebuilt frame by frame, so small issues can appear when the head turns, the person moves quickly, or a hand passes near the face.
These signals are harder to fake because they are not just checking how the face looks. They are checking whether the person behaves like a real person on a live call.
What this means for hiring teams
A deepfake online might cost you nothing. A deepfake in your hiring pipeline can cost you a salary, a laptop, and access to your systems. The opposite mistake is also a problem. You might reject a real candidate because their webcam, lighting, or internet connection made them look suspicious.
One check gives you one number. It does not give you enough confidence in either direction.
That is why one signal is not enough. New face-swap tools are always coming, and your detector will always meet something it has not seen before.
So you do not bet on one test. You use several signals that work in different ways. Together, they check whether the candidate looks, moves, and behaves like a real human on a real call.
That is how we built Tofu. Book a demo to see how it performs on your pipeline.
FAQs
What is a deepfake in the context of hiring?
It's when a candidate uses AI to fake their appearance or identity during a video interview, usually through a real-time face swap. Unlike a deepfake you see online, this one can end up on your payroll with access to your systems, which is what makes it a security and hiring risk rather than just a novelty.
Could a deepfake detector wrongly flag a real candidate?
Yes, and this is the other side of the problem. People join on cheap webcams, in dim rooms, or over compressed connections, and those conditions strip out the clues many detectors rely on. In one test, plain video compression dropped a detector from 98% to 61% accuracy, which means real candidates can get flagged just for having a bad setup.
Why does Tofu use multiple signals instead of one detection score?
A single check only knows the fakes it was trained on, so a new face-swap tool can slip right past it. New tools ship constantly, so one detector is always a step behind. Tofu runs several signals that work in different ways, so if one is less certain, the others still catch what it missed. That keeps the result accurate even on deepfakes the system hasn't seen before.
What should a reliable deepfake detector actually look for?
The strongest signals check whether someone behaves like a live human in a real room, not just whether a face looks real. That means looking at natural human behavior and involuntary movement, consistency between the face and the room's lighting and shadows, and whether the face holds together under motion. These are harder to fake than appearance alone.
Can deepfake detection be added without slowing down hiring?
Yes. Detection that runs on the interview itself works in the background rather than adding steps for the candidate or the interviewer. The goal is to build it into the process you already have, so you get a clear signal on each interview without turning every call into a security checkpoint.
How is detecting a deepfake candidate different from verifying an identity?
Identity verification confirms a document or face matches a record. Deepfake detection asks a different question: is the person on this live call actually a real human in a real room, or a generated face. Someone can pass a static identity check and still be running a real-time face swap during the interview, which is why the live signals matter.