Two Undergrads Built Dia, an Open-Source AI Speech Model to Rival Google’s NotebookLM

You are currently viewing Two Undergrads Built Dia, an Open-Source AI Speech Model to Rival Google’s NotebookLM

A pair of undergraduate students with limited prior AI experience have unveiled a new AI speech model designed to rival big-name tools like Google’s NotebookLM. Their creation, dubbed Dia, is now openly available and already showing impressive results in early tests.

The market for synthetic voice generation is booming, with major players like ElevenLabs dominating headlines. But the space is rapidly expanding. According to PitchBook, startups in the voice AI sector secured over $398 million in venture capital funding last year alone — a clear sign of growing investor confidence.

Toby Kim, one of the co-founders of Korea-based Nari Labs, says the duo began exploring speech AI just three months ago. Inspired by the flexibility of Google’s NotebookLM, their aim was to create a model that not only generated convincing dialogue but also offered deeper control over voice characteristics — including tone, disfluencies, laughter, and other nonverbal cues.

To bring Dia to life, Kim and his co-founder leveraged Google’s TPU Research Cloud program, which grants researchers free access to powerful TPU AI chips. The result: a 1.6-billion-parameter model capable of generating high-quality, podcast-style speech.

For context, “parameters” are the internal settings that help an AI model make predictions — and more parameters often mean better performance.

Dia is now available for download on both Hugging Face and GitHub. The model is designed to run on most modern PCs equipped with at least 10GB of VRAM, making it unusually accessible for a tool of its caliber.

By default, Dia generates a random voice, but users can prompt the model with a style description or even upload samples for voice cloning. In TechCrunch’s hands-on testing, Dia was able to smoothly generate two-way conversations on a wide variety of topics, with voice quality that stands up to some of the best commercial tools. Voice cloning was particularly easy to use, setting it apart from many competitors.

Like many text-to-speech tools, however, Dia lacks robust safeguards. The ease with which it can produce realistic voices raises concerns about potential misuse — from misinformation campaigns to scams. While Nari Labs includes a disclaimer discouraging abuse, it also states it “isn’t responsible” for how the model is used.

Another open question is the dataset behind Dia. Nari has not disclosed the sources used to train the model, which has raised eyebrows. Observers on platforms like Hacker News have speculated that some samples sound strikingly similar to real-world copyrighted content, including voices from NPR’s Planet Money podcast.

This mirrors a larger industry debate over the legality of training AI on copyrighted materials. Some developers argue that fair use allows such practices, while rights holders strongly disagree.

Despite these concerns, Nari Labs has ambitious plans. The team is developing a broader synthetic voice platform with social features layered on top of Dia and future models. They’ve also committed to releasing a technical report for Dia and expanding language support beyond English.

As the AI voice space heats up, the success of two undergraduates in developing a model of this quality is a clear sign: innovation isn’t just coming from deep-pocketed tech giants — it’s coming from everywhere.

Get the Latest AI News on AI Content Minds Blog

Leave a Reply