Docs Agents & Automation

Data Extraction Agents

How to configure the Data Extraction node to pull specific fields from documents.

Updated 1 day ago 3 min read 9 views

The Data Extraction node is the most commonly used node in Lymnus agents. It instructs the AI to read one or more documents and extract specific structured fields from them.

Configuring schema fields

A schema field is a data point you want to extract from each document. For example, if you are processing invoices, your schema might include:

invoice_number (string)
vendor_name (string)
invoice_date (date)
total_amount (number)
currency (string)
vat_amount (number)

For each field, you can specify:

Name — the field identifier used in the output
Type — string, number, date, boolean, or array
Example — a sample value that helps the AI understand the expected format
Constraints — any validation rules (e.g. "must be a positive number", "ISO 8601 date format")

Document types

Selecting a Document Type gives the AI additional context about what it is reading. Available types include invoices, contracts, purchase orders, receipts, financial statements, and custom. Choosing the correct type generally improves accuracy.

Handling missing data

If a field cannot be found in a document, Lymnus returns a null value for that field rather than guessing. You can configure extraction nodes to flag rows with missing required fields so you can review them manually.

Output format

Extraction results are passed downstream as JSON, which downstream nodes (like Export App) can convert to CSV, Excel, or send directly to connected apps.

Was this page helpful?

Configuring schema fields

Document types

Handling missing data

Output format

Search Documentation