Split

Document Segmentation, Made Easy

Describe your sections in plain English. Split returns the page ranges for each one.

Try the API free Request a demo

Helping everyone from startups to Fortune 10 enterprises unlock their data.

Split

Find where each section starts and ends

Definition: Split classifies every page of a document against a list of sections you describe in natural language and returns the page numbers each section occupies. Add a partition_key and Split groups repeating sections by an identifier from the document itself.

Who it's for: Teams that need to route different parts of a document to different schemas or separate bundled sub-documents for individual processing.

The problem it solves: Long documents waste time and tokens when sent whole to an LLM. Split finds section boundaries first so downstream steps only process the pages they need.

Split in the platform

How Split connects to the rest of the platform

Endpoint

Use when

Output

How it works with Split

/parseParse

Structured content from any document is needed for LLM or RAG use.

Structured chunks with typed blocks, bounding boxes, and confidence scores.

Returns the content itself, not a section map.

/extractExtract

The fields to pull are defined and typed JSON is needed.

Schema-typed JSON with optional citations on every value.

Pulls specific fields. Pipe Split page ranges into Extract.

/splitSplit

One file contains multiple logical documents or sections.

Page ranges for each section, with confidence scores.

Read the Split docs

/classifyClassify

Files need to be routed by type before processing.

Best-matching category with per-criterion confidence.

Identifies file type. Split maps sections within a file.

/editEdit

A PDF form needs filling or a DOCX needs updating.

A downloadable edited file, plus a reusable form schema.

Produces a filled document, not a page map.

Try out Split in Studio or via the API.

Open Studio Request a demo

Deep Split

A verification loop for ambiguous documents

What it is: Deep Split runs an agentic loop that checks section assignments against the source and re-classifies pages until it reaches the quality threshold. Enable it with settings.deep_split: true.

Built for: The agent harness is built for longer documents and workflows with many categories, including documents with thousands of pages and 150+ categories in early testing.

Contextual evidence: Split returns contextual evidence for why a page was assigned to a category, making results easier to trust, debug, and iterate on.

Read how the Deep Split agent works

Where AI teams ship Split

Get only the pages you need

When a downstream step only needs part of a long document, Split finds the right pages first.

Annual reports & 10-Ks

Separate the executive summary, financials, and risk factors so each section gets the right extraction schema.

Combined brokerage statements

One PDF, many accounts. Set partition_key: "account_number" and Split returns one partition per account, with no manual page-counting.

Patient charts & encounter histories

Group pages by patient visit using a partition key, then process each encounter with the right schema.

Mailroom & intake batches

Identify cover letters, policies, and supporting documents inside a single intake PDF so each team gets the right pages.

Long contracts & data rooms

Find the indemnity clause, fee schedule, or assignment language, then pass the page range to Extract.

Reuse one parse for many splits

Parse a 200-page packet once, then run Split with different section descriptions against the same jobid://. No re-uploading, no re-billing for the parse.

Try out Split in Studio or via the API.

Open Studio Request a demo

Why Split

Why teams switch to Split

01
Describe sections in plain English
You write descriptions, not rules. Split classifies each page against them.
02
Repeating sections, grouped automatically
Set a partition_key and Split returns one partition per identifier (account, patient, claim) read straight off the page.
03
Confidence on every section
Each split returns high or low confidence. Route low-confidence segments to review and auto-process the rest.
04
One parse, many splits
Pass a jobid:// from a prior Parse and Split runs against the cached read, saving the Parse credits on each iteration.
05
Tunable for table-heavy documents
Set table_cutoff: "preserve" to send full tables when partition keys live deep inside one.
06
Composable with the rest of the platform
Output is a map of section names to page ranges. Pipe them into Parse, Classify, or Extract.

How Split works

How Split works in four steps

STEP 01
Send file + section list
Upload a file or point at a URL. Describe each section in natural language.
POST /split
STEP 02
Parse runs underneath
OCR, layout detection, and table reconstruction produce structured content for the classifier.
jobid:// available
STEP 03
Classify pages by section
Every page is scored against your descriptions. Partition keys group matched pages by an identifier from the document.
descriptions → pages
STEP 04
You get splits[]
One entry per section with name, pages, and confidence. Feed page ranges into downstream steps.
splits[].pages

Read the full Split reference in the docs

Built for production

Enterprise-ready from day one

SOC 2 Type II
HIPAA
Zero Data Retention
VPC · On-prem · Air-gapped
EU · AU regional endpoints
99.9%+ uptime SLA
Enterprise support

Visit the Trust Center

Try out Split in Studio or via the API.

Open Studio Request a demo

The rest of the platform

What runs after Split

/parse

Common questions about Split

Document work starts here

Try Split on your documents

No setup, no credit card.

Try the API free Request a demo

Product

Industries

Resources

Describe sections in plain English

Repeating sections, grouped automatically

Confidence on every section

One parse, many splits

Tunable for table-heavy documents

Composable with the rest of the platform

Send file + section list

Parse runs underneath

Classify pages by section

You get splits[]

Enterprise-ready from day one

Parse

Extract

Classify

Edit

Studio