So much of the world’s most important data is locked up in PDFs and spreadsheets — from financial statements to medical records and insurance claims. Humans simply can’t manually review and enter data from these vast files fast enough. Teams are increasingly using LLMs to make better use of their data, but these unstructured documents were made for humans to parse and are difficult for LLMs to directly reason with.
Today, we’re incredibly proud to share that Reducto is helping hundreds of companies– from startups to Fortune 10 enterprises–turn their most important files into accurate inputs for LLMs. We’ve raised $8.4 million in funding led by First Round to further our mission of making human data LLM-ready.
Reducto started when we were helping enterprises build RAG pipelines. We expected to focus on building great retrieval systems and optimizing inference, but we were constantly bottlenecked by ingestion accuracy.
Simply put, accurately processing complex PDFs and spreadsheets is really hard.
Almost everything on the market works when given simple layouts and perfect file metadata, but those same solutions fail to parse complex documents. Multi-column layouts get jumbled together, figures are ignored, and tables are a consistent nightmare. These errors in inputs lead to inaccurate and hallucinated outputs, which are difficult to detect and prevent.
Our peers had to spend dozens of hours building and maintaining in-house processing pipelines because off-the-shelf solutions weren't good enough. We built Reducto to be their ingestion team.
Every visual cue in a complex document matters. Spaces between paragraphs capture semantic splits, tabs in lists show nested hierarchies, and tabular structures associate related data. We trained a series of models to capture all of that context, allowing us to turn really complex layouts into ideal inputs for language models.
This approach unlocks up to a 30% improvement in RAG accuracy and helps teams support complex files that fail in traditional pipelines. Most importantly, with Reducto we’re making PDF processing an out of sight, out of mind problem so our customers can focus on the parts of their product that matter most.
You can play around with Reducto in action here.
Reducto has quickly become the go-to choice for some of the world’s leading AI teams. That includes startups like Leya Law, who chose Reducto for ingesting legal documents when they needed to improve processing speed and accuracy. That also includes an AI healthcare company serving the largest hospital networks, a Series F company processing documents for US government agencies, a Fortune 10 tech company building RAG and automation tooling, and many more. Today, we’re processing millions of pages daily for our customers, and usage is growing rapidly.
We’re fortunate to work with a group of incredible investors. Our seed round was led by First Round Capital, with participation from funds like YCombinator, BoxGroup, SVAngel, and Liquid2, alongside founders we admire including Arash Ferdowsi (Dropbox), Andrew Ofstad (Airtable), Kulveer Taggar (Zeus), JJ Fliegelman (WayUp), Richard Aberman (WePay), Ralph Goottee and Tracy Young (PlanGrid), and more.
We take our role as the ingestion team for our customers seriously, and this round will help us serve more teams with an ever improving product. Reducto started with vision models for documents, expanded to offer state of the art parsing for spreadsheets, and we have a long roadmap ahead as we set out to make unstructured human data LLM-ready.
We’re a small team with outsized impact, and we’re growing rapidly to keep up with customer demand. We’re hiring multiple full stack and ML engineers, so if you’re passionate about your work, deeply curious, and excited about building the ingestion layer for LLMs we’d love to meet you!
You can view our open roles here or learn more at reducto.ai.
More from us soon.
Find out why leading startups and Fortune 10 enterprises trust Reducto to accurately ingest unstructured data.