Scaling Beyond Annotation: How Reducto Powers Scale AI’s Agentic Expansion
“Having confidence in the details of every document is essential for Scale’s public sector work. Time and again Reducto has proven to be a trusted partner that we can depend on. As Scale focuses on agentic systems, we are confident Reducto has the products we need to continue to respond to the demands of our customers quickly and reliably.” - Kyra Huneycutt, Product Manager of Scale’s Donovan.
Scale AI is known as the human data layer powering the modern AI revolution. Valued at $29 billion, the company built its reputation as the data-annotation engine behind frontier models from OpenAI, Google, and Meta. But beyond its role in training cutting-edge LLMs, Scale has also expanded its product line to serve other markets such as public sector and large enterprise customers.
Building products for these verticals requires conquering some of the most document-intensive workflows any artificial intelligence can face. A prime example of this complex problem set is Scale’s work in support of public sector customers through its Donovan product. Donovan is a platform built for deploying mission-tailored AI agents, especially those oriented towards government, defense, and other high-stakes workflows. The platform allows decision-makers to organize and analyze large volumes of unstructured data—ultimately aiding in the understanding and production of actionable insights.
But this data rarely arrives in a clean format. The workflow’s inputs are most commonly government documents, including complex products such as official orders, situation reports, and intelligence briefs. These documents sometimes even include elements like handwriting, charts, labeled maps, and low-quality scans that defy standard automation. Yet preserving the accuracy and confidentiality of these elements is mission-critical for Donovan’s performance. These challenges are precisely why Scale turned to Reducto as the platform to handle the document-processing layer.
The limits of a homegrown approach
Prior to adopting Reducto, the Public Sector team relied on a homemade document-processing stack. Given the sensitive nature of government data, the team’s instinct was to build an in-house parser that they could tightly control. But that initial approach—creating ad-hoc solutions for each document type—required significant manual engineering effort.
"Our previous approach involved a one-off solution for each type of document,” says Shourya Munjal, a Software Engineer with the Public Sector team. “A customer will come in with a specific use case where our pipeline doesn't work as well as we would like. This led us to have to go in and manually support the new file."
As the customer base expanded and use cases multiplied, the variety of documents grew rapidly, which led the team to seek a more efficient and effective solution.
The team found that Reducto had the solution they required. Reducto was able to manage the multiplicity of document types and embeddings without, most importantly, compromising the confidentiality and security the customer demanded.
A unified foundation for complex documents
The shift to Reducto not only accelerated Scale’s ability to support new workflows, but also gave the company’s Donovan product a reliable foundation across all document types.
"A lot of the work we do is bespoke based on the customer and the contract. Instead of having each request come in and building from scratch, Reducto is very easy to use as a plug-and-play tool where we're able to tailor it very quickly for each workflow we want to run," shared Scale’s Shourya Munjal.
The team primarily leverages Reducto’s Parse endpoint to transform raw, unstructured documents into clean, high-quality text chunks that can be embedded and reused across its Retrieval-Augmented Generation (RAG) chats and other downstream mission applications.
Munjal continued, "We started by integrating Reducto into our embedding pipeline—for our RAG chats or other purposes we wanted to use embeddings for. This was a pretty significant step up from our previous approach."
Importantly, Reducto gives the team confidence that their RAG applications are grounded in complete and accurate context. Instead of losing information in messy PDFs or skipping over visuals, Reducto produces clean, well-structured chunks and offers figure summaries that produce better results.
Built for air-gapped, mission-critical environments
During the Public Sector team’s search for an external solution, one requirement was non-negotiable: the product had to support on-premises deployment to safeguard highly sensitive government data. As the team put it, they needed a system capable of operating “in an air-gapped environment.”
“We work with a lot of government customers who have very strict requirements around data security and privacy. We needed to make sure that we could deploy this in a way that met those requirements,” Kyra explained.
Beyond security, the team also had highly specific technical requirements. The workflows had to run efficiently under tight GPU constraints, integrate with approved cloud providers, and support the small set of models authorized for use in high-side environments.
Fortunately, Reducto was able to fit Scale’s needs. Its on-premises deployment option allowed the team to run securely inside air-gapped environments, and Reducto also integrated with the cloud providers and approved models, giving the team a solution that was both operationally feasible and fully compliant.
Enabling Scale AI’s Next Chapter
Strategically, Donovan represents Scale’s next step beyond its roots in data annotation toward full-lifecycle deployment of model-driven agents. Customers that need these agents, like those in the Public Sector and Enterprise, require not only high-quality data, but also secure systems capable of turning unstructured information into actionable insights.
Reducto has become an important piece of the solution development. By providing a robust yet flexible document-processing foundation for both the Public Sector and Enterprise teams, Reducto enables Scale to take on increasingly complex customer workloads without sacrificing accuracy or compliance.
As Scale continues to grow, Reducto is honored to be an important partner to deliver end-to-end AI solutions that meet the needs of the most document-heavy verticals.