Vibe Check: Meet The Sequoia-Backed Startup Building Tests For AI Code
Skyramp is emerging from stealth with $10 million in funding and design partners like Box and Intel. Can its automated testing improve AI-generated code? Not everyone's convinced.

The Upshot
For the past three years, startup founder Nithin Michael has tinkered quietly on a potentially very big problem: testing for AI code that actually works.
The rise of ‘vibe coding,’ and tools like Cursor, GitHub Copilot and Claude Code, means that more and more code will be generated by large language models. And even increasingly sophisticated LLMs get things mostly – but not perfectly – right. Code might work from one line to the next. Then one little bug, or hallucination, can bring down the whole house of cards.
Unless that paradigm shifts, AI testing other AI will face the same problem. “The only way to build a counterbalance to vibe coding is if you’ve got at least one leg to stand on,” argues Michael. “If you ask an LLM for a test today, you’ll get something that may look correct, and five minutes later, you’ll get something that looks very different.”
His answer is Skyramp, the startup he and a group of former coworkers got started on in July 2022. Skyramp’s agent can run in any environment, generating any of the tests a developer might need, even filling in any missing bits. Crucially, the startup’s software doesn’t depend on calling out to those LLMs, but is built on top of proprietary algorithms – lots of hard math.
“It’s a beast of a problem to solve: making sure you are able to deterministically generate any test you want, taking in any types of input you may provide,” Michael tells Upstarts. “I think we are 99% of the way there. The engineer in me won’t let me claim 100% yet.”
But that’s good enough for Skyramp to emerge from stealth today, with $10 million in previously unannounced funding from Sequoia. A group of 10 enterprise design partners including Box, Broadcom, Cisco, Intel and MongoDB have spent months working with its tools – more on them later – and now Skyramp’s opening up wider access.
Tests are not a new category; a number of startups are making their own attempt to take them on. Why stay low-profile for so long? “We’re making a very strong claim, and strong claims require strong backup,” says Michael.
Whether Skyramp can deliver on that promise will be up to the developers who try out its tools. Some investors and AI experts who spoke to Upstarts expressed skepticism about companies over-promising in this area before.
But talk to leaders at vibe coding companies like Replit and Vercel, and they don’t dispute that whatever you call it – deterministic testing or vibe testing – better testing that can keep up with the AI will be important for generated code to deliver real business outcomes. So it’s worth the attempt.
More on Skyramp – and what its investors and peers are saying – below.
Presented by Notion.
From idea to IPO, Notion grows with your team. Move faster with structure for your docs, projects, and workflows — while Notion AI captures meeting notes, summarizes docs, and lets you search across your workspace and connected tools. 94% of the Forbes AI 50 companies use Notion to build faster, every day.
Stress Tested
After selling his previous startup to VMware, Michael lasted 18 months as a director of engineering at the bigger tech company, dreading each software release. Inevitably, senior engineers at major customers would call to let him know something was wrong.
“I just found myself constantly frustrated by code. Why couldn’t it be better? Why wasn’t it better tested?” he says. “I’d get screamed at on Saturday mornings. The fact that it was a pain point was really stressed to me.”
After growing up attending Indian embassy schools and reaching the U.S. on a scholarship to Drexel University to study electrical engineering, Michael had spent a stint working for the Office of Naval Research, designing secure communications algorithms, then pursued his PhD at Cornell in network algorithms. That research helped spur his first startup, Mode.net, where Michael was CEO and later CTO.
Mode’s innovation was to bring algorithm-based intelligent control to software-defined networks – we can leave it at that – and when, early in Covid-19 lockdown, Microsoft shut down a division that was one of Mode’s biggest customers, the startup had to seek a soft landing with a buyer.
By 2022, as the release of GPT-3 started to stir up excitement around generative AI, Michael saw his interest in testing limitations about to play out at a bigger scale. Michael doesn’t claim to have predicted the rapid popularity of ChatGPT, then code editor tools like Cursor and Copilot. But he did expect a deluge.
“Humans are already unwilling to test human code. How likely is it that they’re going to test any of this generated code?” he remembers now.
After nearly working with Sequoia and partner Shaun Maguire in his last startup, Michael teamed up with the firm to raise $10 million out of the gate, hiring back some of his old team from Mode and VMware. Then the San Francisco-based startup got to work on infrastructure that would allow its tests to run anywhere – locally on laptops, within developer workflows, and in containerized applications.
Next it spent long months on algorithms to figure out and create whatever type of tests might be needed by developers. (The most common test is a unit test of blocks of code, but there are also integration, load, UI tests and more.)
Those algorithms informed Skyramp’s agent, which could then automatically pull context or know to prompt a user to provide it, in order to run tests without any hallucinations, Michael says. Last was a control loop by which the agent can interface with Cursor, Copilot or other code editors to access and modify tests on its own, giving its tests resiliency to keep working over time without human supervision.
Skyramp’s design partners were chosen to be as big and complicated as possible, Michael says. Intel tried out Skyramp for an end-to-end test that involved multiple cloud environments and API and UI sub-tests, while also generating front-end test code. Box, meanwhile, used Skyramp for a ‘load test,’ seeing if its infrastructure could handle using AI tools across billions of files and thousands of folders.
In the Intel case, Michael says his software generated a 2,000 lines-of-code test in 10 seconds that would’ve taken a human engineer a week. “And those 10 seconds were basically the engineer going around and clicking to show what they wanted, so we could collect a trace,” he claims. (Michael was unable to connect Upstarts with the user at Intel to verify that account.)
The Test Of Time
Having operated in semi-stealth until now (it at least had a functional website), Skyramp hasn’t had the chance to impress the wider community yet. But as Upstarts described it to some AI investors and operators, their reactions suggested a skepticism that the startup will need to overcome.
If OpenAI hasn’t solved this problem yet, why should a small startup be able to, asks one investor. Another VC questions whether “record and replay” style tests that learn from human input can work fast enough with the pace of AI code development. They point to a whole flock of startups tackling testing in some fashion, such as QA Wolf, which raised $36 million last year, and Meticulous and Momentic, which each raised about $4 million earlier in 2025.
“Almost everyone thinks that testing is not very valuable, and we think they’re wrong. They’re basically living in the past,” says Sequoia’s Maguire. “Even taking code generation out of this, Skyramp is solving a much bigger problem” of complex testing, he adds.

At Replit, CEO Amjad Masad says that his startup is currently working to automate the coding process to work without human supervision end-to-end — meaning it will need a testing solution, either from another startup, or one built in-house.
“This is something that is crucial for autonomous agents like ours, like [OpenAI’s] Codex, or [Cognition’s] Devin. Otherwise, you’re always dependent on that human being there,” Masad tells Upstarts. “I think this would be super useful if they nailed it, but it’s going to take a lot.”
At Vercel, CTO Malte Ubl tells Upstarts by email that the startup’s vibe-coding v0 product doesn’t yet have a sophisticated testing-focused agentic loop; while it’s not a bottleneck to adoption currently, testing is “still a critical part of the process” in the long run.
Startups in the testing space can broadly struggle to reach 100% success rates with their tests, meaning small failure rates that compound at scale, adds Ubl. AI-generated systems, meanwhile, won’t hold up over time, he argues.
“But it’s completely worth investing in, and I am excited to try more of these startups,” Ubl says. “Testing is hard. And it has downsides. We are in the early days here compared to the code generation side of things.”
Of course, Skyramp isn’t dependent on AI models to work – its software is built from its own algorithms that required “geniuses locked in a room for years to crack,” responds Maguire. “It’s able to keep up with the speed of development of AI, testing in real time no matter how complicated, shitty or poorly conformed the code,” he says.
Michael, for his part, is eager for developers to sign up to try out Skyramp for themselves, and share their feedback head-to-head with any other tool on the market.
“The differentiation is the energy and time advantage,” Skyramp’s CEO says. “Even if OpenAI comes up with artificial general intelligence, a year from now we’ll still be more efficient, because we’re tailor-made for this problem.”