We get this question often, especially from people who have just learned that their voice can be cloned from a few seconds of audio. The honest answer matters more than a marketing one. Here is the position.

What we DO use to train our models

Three sources, in this order of importance.

Licensed research recordings

Audio acquired under research-use agreements from established academic and commercial providers. We pay for them. We do not scrape them.

Synthetic data generated in-house

A meaningful share of our training material is produced inside our own pipelines. It is unlimited, clean of any user identity, and we control its quality.

Audio that real users have explicitly contributed

Through our opt-in Research Contribution program. This is a separate flow from the analysis tool. The opt-in checkbox is unchecked by default, the consent is revocable at any time, and the contribution is documented in the user's privacy dashboard. Program details available soon.

What we DO NOT use

We do NOT train our commercial detection models on audio submitted for free analysis on this site, unless the user has explicitly opted in to the Research Contribution program. The opt-in is a separate, deliberate action, not a hidden default.
We do NOT acquire user data from third-party brokers.
We do NOT use leaked datasets, breached datasets, or any audio of unverified provenance.
We do NOT use audio submitted under enterprise contracts for our public detection models. Enterprise data stays inside enterprise boundaries.

What happens to your audio when you analyze a sample for free

By default, your audio is processed in memory. The result is returned. The buffer is freed. Nothing is written to long-term storage. If you choose to opt in to Research Contribution, you will see a clearly separate checkbox, and you can revoke that consent at any time.

Why we publish this

The voice deepfake detection field is built on top of audio data. Most companies do not say where theirs comes from. We think users deserve to know the answer, in plain terms, without legalese.

We are not perfect. The field is evolving. If you spot a contradiction between this page and our actual behavior, write to [email protected]. We will fix the contradiction or fix the wording, in that order.

How ORAVYS trains its detection models

What we DO use to train our models

Licensed research recordings

Synthetic data generated in-house

Audio that real users have explicitly contributed

What we DO NOT use

What happens to your audio when you analyze a sample for free

Why we publish this

Related

What we DO use to train our models

Licensed research recordings

Synthetic data generated in-house

Audio that real users have explicitly contributed

What we DO NOT use

What happens to your audio when you analyze a sample for free

Why we publish this

Related

Unlock Full Analysis