AI’s progress has hit a important constraint: entry to real-world knowledge. Whereas public datasets and net scraping powered AI’s early breakthroughs, at present’s fashions demand proprietary knowledge from hospitals, enterprises, studios, and controlled environments – knowledge that’s been locked away behind authorized, technical, and governance boundaries. This bottleneck impacts each stage of AI improvement, from pre-training to analysis, forcing mannequin builders to depend on artificial knowledge that may’t totally replicate the complexity of human conduct and real-world eventualities. Protege addresses this basic hole by making a platform the place knowledge holders can license their proprietary datasets whereas sustaining privateness, IP protections, and compliance – enabling AI builders to entry scientific information, media content material, audio conversations, movement seize knowledge, and different hard-to-find data at scale. Working with knowledge companions throughout healthcare, media, and movement seize, the corporate has aggregated entry to billions of knowledge factors, together with over 3B scientific notes, 100M medical pictures, 500K+ hours of video content material, and 500K+ hours of audio throughout 50+ languages. With their current acquisition of Calliope Networks and partnerships spanning from the vast majority of “Magnificent Seven” tech firms to tons of of knowledge suppliers, Protege is changing into the central infrastructure layer connecting proprietary knowledge with AI improvement wants.
AlleyWatch sat down with Protege CEO and Co-Founder Bobby Samuels to be taught extra in regards to the enterprise, its future plans, current funding spherical, and far way more…Who had been your buyers and the way a lot did you elevate?
Protege raised $30M in a Sequence A1 spherical led by Andreessen Horowitz (a16z). The financing expands the corporate’s $25M Sequence A from August 2025 and brings whole funding to roughly $65M since Protege’s founding in 2024. The spherical additionally consists of follow-on participation from present buyers reminiscent of Footwork, CRV, Bloomberg Beta, Flex Capital, Shaper Capital, and extra.
Inform us in regards to the services or products that Protege provides.
Protege is an AI knowledge platform unlocking entry to trusted, real-world knowledge at scale. We’re remodeling how the world’s actual knowledge powers AI — enabling individuals and establishments to contribute their information safely and form intelligence constructed on integrity, experience, and human goal. We work with non-public knowledge holders throughout healthcare, media, and different industries to license and curate high-quality datasets that AI builders want for coaching, analysis, and benchmarking. Our position is to behave because the connective tissue between these two sides, making it potential to unlock worthwhile knowledge whereas preserving privateness, IP rights, and regulatory compliance.
At its core, Protege is about turning knowledge that’s traditionally been siloed, delicate, or underutilized right into a responsibly ruled asset. We concentrate on real-world knowledge throughout industries as a result of that’s what finally determines how AI programs carry out as soon as they go away the lab and function in actual environments.
What impressed the beginning of Protege?
Whereas AI fashions and computer systems have superior quickly, entry to the precise knowledge has change into a bottleneck. The overwhelming majority of the world’s Most worthy knowledge, particularly in regulated industries like healthcare, shouldn’t be publicly out there, and artificial or manufactured knowledge can’t totally replicate real-world complexity. Protege was born from the assumption that AI’s subsequent leap will come from unlocking real-world knowledge, ethically sourced, expert-curated, and shared on human phrases.
My co-founders and I had spent years working in privacy-first knowledge ecosystems, and we noticed a possibility to use these classes to AI. We believed there was a greater path ahead than knowledge scraping from the web – one which compensated knowledge holders, revered privateness, and enabled AI builders to coach programs that may really work in the actual world.
How is Protege totally different?
We’re constructed round licensed, real-world knowledge from day one. When AI builders come to Protege, they’re in search of real-world knowledge: probably the most genuine sign of how individuals and programs really behave. This isn’t artificial knowledge created by AI nor manufactured knowledge created to simulate human conduct. Throughout each stage of the AI improvement lifecycle — from pre-training to post-training to fine-tuning to analysis — AI builders want this knowledge. They’re wanting throughout modalities and industries: healthcare, video, audio, movement seize, gaming, manufacturing, life sciences, actual property, finance, training, and lots of extra. Foundational, multi-modal model-builders (together with the vast majority of the Magnificent Seven) now work with us throughout a number of domains together with dozens of different mannequin builders.
We additionally concentrate on curation and fit-for-purpose datasets somewhat than solely quantity. As AI builders’ wants have matured, they’ve shifted from “extra knowledge” to “the precise knowledge,” and our platform is designed to satisfy that demand, whether or not it’s consultant scientific eventualities in healthcare, extremely particular content material in media, or up to date audio and movement seize wants. We unlock income for knowledge suppliers as nicely, empowering knowledge stewards to share their knowledge belongings safely and assist AI be taught responsibly, in order that progress is each highly effective and consultant of the broader human inhabitants.
What market does Protege goal and the way massive is it?
Protege sits on the intersection of AI improvement and proprietary knowledge, serving each AI builders and knowledge holders throughout a number of verticals, reminiscent of healthcare, media, motion-capture, and extra. Essentially, there are 3 bottlenecks to AI progress: compute, fashions, and knowledge. There are already a number of firms within the first two classes value billions, doubtlessly trillions. There may be but to be a dominant participant within the knowledge that’s wanted for AI improvement, and that’s the hole that Protege goals to fill.
As AI turns into extra multimodal and extra embedded in real-world workflows, demand for licensed, domain-specific knowledge will solely develop. We consider fixing AI’s knowledge entry drawback is a generational alternative, and the market spans practically each business touched by AI.
What’s your small business mannequin?
We at the moment function as a two-sided knowledge platform for AI improvement, the place AI builders buy licensed datasets and knowledge holders are compensated by structured agreements. We earn income for facilitating entry and offering value-added providers like curation and de-identification the place applicable. Over time, we’ve got additionally expanded into benchmarks and analysis datasets to help AI improvement throughout the complete lifecycle, not simply preliminary coaching.

How are you making ready for a possible financial slowdown?
In our business, we’ve seen an acceleration in demand throughout the totally different verticals that we serve. Particularly, we really feel well-positioned to make the most of not solely the rising want for knowledge for AI improvement but in addition the rising pattern in direction of moral knowledge licensing for AI throughout industries.
This has the potential to offer different firms, organizations, and rights-holders who could also be in industries which are prone to financial slowdowns an extra income stream alternative that didn’t beforehand exist. These are win-win conditions the place knowledge rights holders can profit from their present belongings, and we as an organization are capable of assist package deal that knowledge and join knowledge holders with AI builders actively in search of out these proprietary knowledge sources. This helps to insulate us to broader market circumstances whereas additionally offering others alternatives past their present enterprise strains.
What was the funding course of like?
Protege has been rising shortly, and we had been seeing clear alerts available in the market that there was a possibility to lift capital in a approach that will meaningfully speed up what we had been already doing: increasing knowledge partnerships, hiring thoughtfully, and staying versatile round potential strategic alternatives. a16z stood out as the precise associate given their depth in knowledge infrastructure, AI, and healthcare, in addition to the long-term orientation they bring about to firm constructing.
This spherical provides us extra alternatives to speed up product improvement, considerably increase Protege’s knowledge community into new domains and knowledge codecs, deepen partnerships with main establishments, and scale the crew and infrastructure required to ship AI-ready and rights-protected entry to real-world knowledge. On the similar time, we get to deliver on a world-class associate who’s deeply linked to the ecosystem during which we function.
Having Daisy Wolf, Companion at a16z, put money into us was an necessary a part of that call, given her expertise in healthcare and knowledge is extremely aligned with the place we’re going. The spherical moved shortly and included continued participation from our present buyers, which we see as a robust vote of confidence in each the enterprise and the route we’re heading.
What are the largest challenges that you just confronted whereas elevating capital?
A giant issue that’s usually missed is how we convey our imaginative and prescient for the world and the way we as an organization match into it when the world is altering so shortly. That is very true within the AI house, the place new fashions are launched what looks like each week, and innovation (and disruption) is occurring left and proper. So having a transparent and crisp imaginative and prescient that we will clearly talk to buyers is paramount to making sure that we see eye-to-eye with them shortly. This helps buyers develop conviction in our imaginative and prescient and mission shortly, whereas additionally making certain that we really feel assured that we’ve chosen the precise associate for the lengthy haul.
What elements about your small business led your buyers to write down the examine?
For years, the open web powered fast advances in AI—however that useful resource is now largely exhausted. Public datasets, reminiscent of Widespread Crawl, seize solely a small slice of the net, whereas the overwhelming majority of high-value knowledge lives offline, inside hospitals, enterprises, studios, and different regulated or proprietary environments. The actual bottleneck has shifted to accessing real-world knowledge responsibly. Buyers see Protege as important infrastructure for that subsequent section, enabling licensed, privacy-preserving entry to the information AI programs have to carry out reliably in apply. As well as, of us famous the power of the crew from a wide range of backgrounds, starting from healthcare knowledge to media to tech startups and extra.
For years, the open web powered fast advances in AI—however that useful resource is now largely exhausted. Public datasets, reminiscent of Widespread Crawl, seize solely a small slice of the net, whereas the overwhelming majority of high-value knowledge lives offline, inside hospitals, enterprises, studios, and different regulated or proprietary environments. The actual bottleneck has shifted to accessing real-world knowledge responsibly. Buyers see Protege as important infrastructure for that subsequent section, enabling licensed, privacy-preserving entry to the information AI programs have to carry out reliably in apply. As well as, of us famous the power of the crew from a wide range of backgrounds, starting from healthcare knowledge to media to tech startups and extra.
What are the milestones you intend to attain within the subsequent six months?
Within the subsequent six months, Protege goals to increase its verticals previous healthcare, audiovisual, and movement seize, with the aim of changing into a trusted supply of licensed, real-world knowledge throughout domains.
Past simply coaching knowledge, the Protege platform plans to evolve to help all phases of the AI mannequin improvement cycle, reminiscent of pre-training, post-training, fine-tuning, analysis & benchmarking, and inference, into its infrastructure, permitting for a extra superior analysis.
What recommendation are you able to supply firms in New York that wouldn’t have a recent injection of capital within the financial institution?
Just like earlier eras, the one benefit that smaller firms and startups have that incumbents don’t is pace. Within the age of AI, that is very true – the price of growing new merchandise, testing new concepts, and reaching new companions at scale has by no means been sooner. Whereas this could trigger conventional channels to change into saturated, it does additionally create a world the place it’s by no means been simpler for nice concepts to achieve the precise audiences that care about what you might be constructing.
In consequence, leaning into the pace benefit is nearly by no means a foul thought within the early phases. It will increase the floor space of alternatives, whereas additionally creating extra possibilities to find new insights and pivot as obligatory within the ever-changing panorama.
The place do you see the corporate going now over the close to time period?
Over the close to time period, Protege is concentrated on changing into the central platform for real-world, licensed knowledge utilized in AI improvement throughout industries, whereas additionally being the main voice in AI knowledge finest practices for mannequin builders. We consider that human knowledge that’s reflective of human exercise in the actual world will proceed to play a better and better a part of AI improvement. We goal to be the trusted chief for such a knowledge within the broader AI ecosystem.
What’s your favourite winter vacation spot in and across the metropolis?
I’m an enormous fan of a brand new AI-powered karaoke studio known as Beatbox. It’s a ton of enjoyable and an incredible house. (Although full disclosure, my spouse and her cofounder opened it up late final 12 months.)


