Extraction accuracy depends on two main factors: the quality of the source document and the clarity of your schema field definitions. Here are the most effective ways to improve results:
Improve document quality
Use text-based PDFs rather than scanned images where possible — they are more accurate and faster to process
If you must use scanned documents, ensure the scan is at least 300 DPI and the text is not tilted or obscured
Remove irrelevant pages (blank pages, cover pages with no data) before uploading
Write clearer schema field definitions
Give each field a descriptive name that matches how the data appears in the document (e.g. "invoice_total" rather than "amount")
Always include an example value — this is the single most effective way to improve accuracy
Add constraints when the format is specific (e.g. "ISO 8601 date format", "must be a positive number in EUR")
Choose the correct data type for each field (string, number, date, boolean)
If results are still wrong for specific fields, try rewriting the field name and example to better match the language used in your documents.