What is document extraction?
Extraction uses AI to read your uploaded files and output clean, structured data (JSON, CSV, XLSX, Markdown, XML, or SQL).
It supports:
PDF documents (contracts, invoices, reports, forms)
Word documents (.docx)
Excel spreadsheets (.xlsx, .csv)
Images (JPG, PNG — useful for scanned documents)
Text files and Markdown
How to run an extraction:
Click + New → New Project
Choose Document Extraction
Upload your files (drag and drop, or click to browse)
Choose your output format (JSON recommended for structured data)
Optionally define a custom schema — tell the AI exactly what fields to extract
Click Extract
The job runs in the background. You will see a live progress bar. When done, the extracted data appears in the project detail page.
Custom schemas:
By default, the AI decides what fields to extract.
To control this:
In the extraction setup, click Define Schema
Add field names and descriptions (e.g. "invoice_number: The invoice reference number at the top of the document")
The AI will populate exactly those fields from each document
Custom schemas are saved as templates you can reuse across projects.
Processing modes:
Standard — AI processes your documents one at a time (recommended for accuracy)
Fast / Balanced — distributes work in parallel for faster results on large batches
What happens after extraction?
Your extracted data file is stored securely in your project. From the project detail page you can:
Preview the data inline
Download it (JSON, CSV, XLSX, XML, SQL, Markdown)
Use it as input for a report
Run the data through an agent workflow