This Seattle Startup Is Building The First Emotional AI
Founded by two ex-Apple PhDs, Nuance Labs has raised $10 million to train a model, then ship a next-gen AI companion app.
The Upshot
“So I was going into it, like, not knowing what the questions were going to be, like, or anything. And, yeah, it was really cool. I liked it a lot.”
The first time Upstarts hears this line, the ‘uncanny valley’ is strong. The young man speaking in the video looks fairly natural, but there’s something stiff about his tone – reminiscent of the text-to-voice generators used a decade ago – and the eyes aren’t right, his gaze too unchanging. It’s easily clocked and categorized: this is an AI-generated video.
The second version is unsettlingly natural, like our speaker has gone to college, joined a fraternity, and is describing his first-round sales interview at a startup. The robotic tone of voice has become bro-ier; as he looks off to the side with each “like,” seemingly searching for the right words, there’s little signal of someone (or something) reading from a script.
For Nuance Labs founders Fangchang Ma and Edward Zhang, this is progress. The clip – presented alongside one from a leading video AI startup – is just a technology demo, not even a product one, Ma is quick to tell Upstarts. Whereas leading video generators can take minutes and have considerable resources behind them, Nuance Labs generated this demo in fractions of a second, trained off of a small data set and an open-source, year-old large language model, Llama 3.2.
The demo is too early for public consumption, Nuance’s founders say, but sharing a one-directional video like this could be realistic in just a few months. The bigger goal is to release a public-facing, interactive demo within the next year.
“It’s super early, but we are seeing a strong signal that as we scale it up, we can get better,” says Ma. “We’re aiming to have something that will show that, hey, this actually feels like talking to a human, and a human is talking to you.”
Based in Seattle, Nuance Labs has a lot of research to do to train its models. Currently a team of four researchers, Nuance has raised $10 million in a seed round led by Accel to hire more, Upstarts can exclusively report; South Park Commons and Lightspeed Venture Partners also participated.
PhDs who met while working together at Apple on digital personas for the Vision Pro hardware device, Ma and Zhang are most excited about launching what they believe could be the first emotionally intelligent, fully interactive consumer AI product, the companion version of ChatGPT.
Should it work, Nuance also plans to share its models with other companies via an API. That means the potential use cases for Nuance’s tech are vast, from more emotive virtual therapists and tutors to interactive video game characters.
It’s a risky bet that OpenAI, Anthropic and the big model shops can’t fast-follow, and one with social implications that quickly veer into the dystopian territory that Zhang says the startup doesn’t want to catalyze.
But Nuance is taking the kind of big swing that embodies an Upstarts startup. Its process, and potential impact, are worthy of the wider startup ecosystem’s attention. More on that below.
Presented by Vanta.
You need compliance to sign deals. But you also need your engineers to build the product. Enter a third option: make Vanta your first security hire.
Vanta uses AI and automation to get you compliant fast, simplify your audit process, and unblock deals — so you can signal to potential buyers that you take security seriously.
Plus, Vanta scales right along with you (backed by startup-speed support every step of the way) so you're not stuck ripping out one of those check-the-box solutions later...and rebuilding your whole program under pressure.
That's why top startups like Cursor, Linear, and Replit get secure early with Vanta. Claim a special offer of $1,000 off Vanta when you book a demo today.
Crossing the Uncanny Valley
When Ma and Zhang left Apple last year, they knew they wanted to build something in AI, they just weren’t sure what. Apple seemingly had the resources for a bold consumer AI bet, but the two were restless.
With a number of top AI researchers departing Apple since, Ma looks like he has a lot more to say on the subject when Upstarts asks, but catches himself. “Things should be moving much faster than they do in Big Tech,” is what he’ll say.
Both were highly technical, Ma with a PhD from MIT in robotics and machine learning, and Zhang with PhD in computer graphics from the University of Washington. They joined South Park Commons’ founder fellowship program to tinker with ideas that might feasibly turn into products, including using agents to automate image editing, and something similar to an AI-native Netflix.
“We want to have this consumer companion like in ‘Her’, but maybe not so dystopian.'“ — Nuance CEO Fangcheng Ma
The insight that brought them full circle back to their work with avatars at Apple: emotion was the most glaring missing capability with otherwise cutting-edge AI tools. “When humans talk to each other, like right now, I’m trying to adjust the way I’m talking with the tone and speed, to make sure it lands with you,” Ma says. “A lot is being left on the table around emotions.”
Tools for basic emotional recognition – like the cameras watching you at a Las Vegas casino for signs of cheating – lacked the subtlety they believed a digital companion would need – not just spotting when you’re angry, or nervous, but when you aren’t fully understanding something, or getting sad.
Startups like Synthesia and HeyGen could generate AI videos for use in demos and other business use cases; OpenAI and the big model labs might ship LLMs that got more emotionally responsive. To be able to generate truly responsive and interactive video, however, Ma and Zhang believed a new kind of model would be necessary.
In training its own models, Nuance has identified the different ways we project and perceive emotion – tone and pattern or rhythm of speech (known as “prosody”); eyes and general facial movement; lip motion and hand gestures; and developed a different kind of token for its models to train on and use.
These visual tokens allow the model to train much more efficiently, and cheaply, on just these variables, eliminating the need for a more generalized LLM to approximate the same knowledge by pulling info (such as what an open or closed mouth might mean) from its much wider dataset.
The result is a tool that can already generate the first frame of a demo like the tech one that Upstarts observed in just 0.3 to 0.4 seconds, then continue generating all the subsequent frames faster than the video’s playback speed. A process that might be compute-expensive and take minutes happens near instantly for the user.
“They’re very much aware that crossing that uncanny valley changes a lot of things.”
Such a quick feedback loop is crucial for interactive AI companions, says Ma. “One of the funny things about emotion is that if it comes with a delay, it’s really awkward. Imagine you’re telling Edward a joke, and he’s like ‘ha, ha, ha’ five or 10 seconds later. Then it’s not genuine, it’s almost sarcastic.”
Even improving LLMs would still cost any companion time, Nuance’s founders argue, because there would be more steps in the process: translate feedback or a prompt into text for the LLM, generate text back, then convert that text into a video output. Nuance’s model, if it works, would eventually be able to talk to LLMs concurrently to pull in memory and context on the user, but not depend on them for generating the response itself.
“We don’t have the answers yet on how to incorporate intelligence from external models and optimize it into ours,” says Zhang. “But we don’t want to compete on the text level.”
Chasing Samantha
Asked about what their app could look like, Nuance’s founders invoke two possible interfaces: a companion that functions more like a virtual assistant, giving you live feedback on your meetings (such as whether you’re losing someone’s attention — stay locked in, please!) or serving as an individually tailored companion, akin to Joaquin Phoenix’s digital companion in the 2013 film, ‘Her’.
Intentionally comparing your roadmap to a real-world Samantha, the AI character in that film voiced by Scarlett Johansson, is a choice – especially considering that last year, not long after Nuance got started, Johansson accused OpenAI of asking for her permission to use her voice as part of a voice AI launch, then using it anyway. (OpenAI claimed it used a different voice actress for its GPT-4o model, Sky.)
But in chatting several times with Nuance’s founders, it seems possible that the founders are using such a comparison for convenience, while they figure things out. Nuance’s first priority is to scale up its research team, currently including Claudia Vanea, an Oxford PhD; and Karren Yang, an MIT PhD and another ex-Apple AI team member.
What they do know is that it will be a consumer-facing, hopefully popular product – not one focused on a vertical, or B2B sales. “We don’t want to constrain it to any particular thing, because we don’t know how people will use it,” says Ma. “We worked at Apple, right? Consumer has always been exciting to us,” adds Zhang.
Such an opportunity is what investors like Accel are most excited about, says Vas Natarajan, the partner at Accel who led the seed in Nuance Labs. Accel is an investor in Synthesia, but sees its solution as remaining more focused on enterprise use cases, he adds, while Nuance competes more with the likes of Meta’s Reality Labs.
“There’s still a lot to be desired in the realism and believability of these services,” says Natarajan. “They’re going to be competing against some very well funded folks, but I would say their thesis and combination of focus, plus architectural innovation, can get them to that point of believability faster.”
By launching a consumer app, Nuance could also create a “data flywheel” of proprietary user interactions, notes Jonathan Brebner at South Park Commons.
If Nuance realizes its objective of a real-time, emotionally intelligent AI companion, the implications are potentially vast. Would such companions accelerate trends – good and bad – in therapy, or people having romantic relationships with a digital persona? Could such videos be used to replace educators or other leaders in classrooms and conferences?
Access to information might benefit; human interaction and relationships could suffer. The sense from Nuance’s founders and its investors is that such technology is coming, ready or not. Better to have these guys as the ones building it, Brebner argues. “They’re very much aware that crossing that uncanny valley changes a lot of things,” he says.
First, Nuance needs to grow its team and ship a public demo, at which point it would likely raise a much larger funding round to build such a product; there’s still a chance its research goes nowhere, Ma notes, at which point its founders would shut it down.
But for their part, Nuance’s founders maintain they’re focused on building something useful and helpful, not destructive. “We want to have this consumer companion like in ‘Her’, but maybe not so dystopian,” says Zhang. “People can come to it and talk to it, and enjoy talking to it.”