We launched deepfake detection at Tofu a few weeks ago. We expected interest. The response has been...overwhelming to say the least.
Demo requests are coming in from companies we've never spoken to and every recruiting leader on these calls wants to talk about the same thing: catching fraud inside the interview itself, whether that's a deepfake on screen or a seat-swap behind it.
Some teams are already seeing this in their funnel, others haven't yet. Lucky them. Either way, this is following the same arc resume fraud did. Easy to ignore until it's everywhere, and then suddenly everyone is asking why no one was looking sooner.
What we're seeing right now
The deepfake problem in hiring isn't a future problem. It's a current one, and the curve is steepening fast.
Three things are happening in parallel, and they're each moving in the wrong direction:
The attack side is getting trivially easy - Research found it takes as little as 70 minutes for someone with zero image manipulation experience to create a fake candidate capable of passing a video interview. Not 70 hours, but minutes! The barrier to running this kind of fraud has collapsed.
The detection side is decaying - Detection models trained on last year's deepfakes fail on this year's. One academic study found a detector dropped from nearly 40% accuracy to below 4% when tested against a new synthesis method. Whatever a detector knew last quarter, it knows less of next quarter and this cycle continues.
Humans aren't picking up the slack - Human reviewers detect high-quality deepfake videos only 24.5% meaning three out of four high-quality deepfakes get through. Telling recruiters to look more carefully isn't a strategy.
The gap between what attackers can do and what hiring teams can see is widening every quarter. That's the dynamic that made this urgent for us.
So this is why we built it
The interview can't be the unprotected part of the funnel anymore. Our existing fraud detection model has a margin of error of 3%, which is already notably low compared to what's out there, but we'd really like that number to be zero. That 3% is the slice where a synthetic identity is convincing enough to clear our thresholds — a fabricated resume that matches plausibly to a real background check, a LinkedIn profile that's been seeded over time, a candidate persona that's just consistent enough to not raise flags upstream.
This is why we ask recruiters to use their own judgment when something feels off and don't hand the final call to the model. Recruiters know their funnel better than any model ever will, and we'd rather they have the last word than the other way around. But we also didn't want to lean on recruiter intuition alone to close that 3%, especially when the gap is exactly where the most sophisticated attacks live, and especially when the data is clear that humans can't catch these on their own anyway.
Deepfake detection paired with seat-swap detection closes it by putting a check on the interview itself.
Deepfakes are the on-screen version: real-time face-swap and voice-clone tools let someone sit through a video interview as a person they aren't.
Seat-swap is the behind-the-screen version: a candidate clears the screening round, then a more qualified person takes the technical round in their place. Or two people share a loop, trading off who's on camera. It's lower-tech, harder to talk about, and in some ways harder to catch because there's no synthetic media to flag. The signal only lives in the interview itself. And unless you put a recruiter on every single round, which would double the time spent per candidate and slow hiring across the board, there hasn't really been a way to catch it.
Together they cover the same surface. Even if a fake identity clears every earlier step, it still has to hold up live, and it still has to be the same person from the first round to the last. That matters because the interview is where most hiring teams assume the verification has already happened. Spoiler: it hasn't. It's where the fraud has the most room to operate, and historically the least friction.
Why we built our own model
The next question was whether to use an existing model. We didn't.
Most deepfake detection tools on the market are trained for content moderation, broadcast media, or account-opening identity checks. The signals that matter in those contexts aren't the signals that matter in a 45-minute structured interview with a candidate who knows they're being recorded. We needed a model that understood interview cadence, the way candidates move when they're thinking, the way audio behaves on a real laptop versus piped through a real-time face-swap tool. That's a different problem. The only way to do it right was to build it ourselves.
To see how this plays out in practice, we ran a benchmark across 5,000 video clips, 4,000 real candidates and 1,000 deepfakes, against three of the most widely used general-purpose deepfake detection tools on the market. We've anonymized them as Tool A, Tool B, and Tool C.
Accuracy on real candidates (n = 4,000)
| Tool A | Tool B | Tool C | Tofu | |
|---|---|---|---|---|
| Correctly flagged as REAL | 3,000 | 3,800 | 2,500 | 3,500 |
| Inconclusive | 200 | 0 | 0 | 500 |
| Wrongly flagged as FAKE | 800 | 200 | 1,500 | 0 |
| Accuracy | 75.0% | 95.0% | 62.5% | 87.5% |
Accuracy on deepfakes (n = 1,000)
| Tool A | Tool B | Tool C | Tofu | |
|---|---|---|---|---|
| Correctly flagged as FAKE | 400 | 200 | 400 | 700 |
| Inconclusive | 0 | 0 | 0 | 200 |
| Missed (called REAL) | 600 | 800 | 600 | 100 |
| Accuracy | 40.0% | 20.0% | 40.0% | 70.0% |
The general-purpose tools mostly do okay on real candidates and badly on deepfakes. The best of them caught 200 out of 1,000 deepfakes. The worst caught 400. None got above 40%. That's the cost of training on broadcast-quality fakes and content moderation data, they weren't built for what an actual interview attack looks like.
Our model isn't perfect either, we caught 700 out of 1,000 deepfakes. But the gap between 700 caught and 200 caught is the difference between detection that actually changes outcomes and detection that just gets logged.
A leader from one of the larger video-based platform companies that had tested the market told us our model was "the best they had seen out of anything they had tested." Building rather than borrowing is paying off where it counts: on real interviews, against real attempts. Not to mention, it puts us and our customers in a position of strength because we're in control.
Where we're headed
Some companies are responding to all this by walking back remote hiring entirely. By mid-2025, Google and McKinsey had moved to mandatory in-person interviews to deal with AI interview fraud. These are companies built around scale and remote-friendly hiring, and if they're pulling candidates back into the room, the problem is serious.
The numbers say it's only going to get more serious. Experian's 2026 Future of Fraud Forecast named deepfake job candidates one of the top five fraud threats of the year. But rolling back remote interviews isn't the answer. Flying out every candidate costs tens of thousands of dollars per stage. It's also a great way to lose the best ones to a company that didn't make them book a flight. It pushes the cost of fraud onto the candidates and teams who had nothing to do with it, slowing down hiring for everyone in order to defend against a small percentage of bad actors.
The real answer is making the hiring funnel itself more secure. That means verifying signals across the interview, resume, background data, and network activity together, not relying on one recruiter in one video call to catch something most humans will miss.
That's what we built Tofu for. Come talk to us and we'll show you what it catches in your funnel.