Documents & Extractions

My extraction results are inaccurate. How can I improve them?

Accuracy depends on document quality and how clearly you define your schema fields.

3 min read 8 views Updated May 31, 2026

Extraction accuracy depends on two main factors: the quality of the source document and the clarity of your schema field definitions. Here are the most effective ways to improve results:

Improve document quality

  • Use text-based PDFs rather than scanned images where possible — they are more accurate and faster to process

  • If you must use scanned documents, ensure the scan is at least 300 DPI and the text is not tilted or obscured

  • Remove irrelevant pages (blank pages, cover pages with no data) before uploading

Write clearer schema field definitions

  • Give each field a descriptive name that matches how the data appears in the document (e.g. "invoice_total" rather than "amount")

  • Always include an example value — this is the single most effective way to improve accuracy

  • Add constraints when the format is specific (e.g. "ISO 8601 date format", "must be a positive number in EUR")

  • Choose the correct data type for each field (string, number, date, boolean)

If results are still wrong for specific fields, try rewriting the field name and example to better match the language used in your documents.

Was this article helpful?

Ready to Automate
Your Data Operations?