The Data Extraction node is the most commonly used node in Lymnus agents. It instructs the AI to read one or more documents and extract specific structured fields from them.
Configuring schema fields
A schema field is a data point you want to extract from each document. For example, if you are processing invoices, your schema might include:
invoice_number (string)
vendor_name (string)
invoice_date (date)
total_amount (number)
currency (string)
vat_amount (number)
For each field, you can specify:
Name — the field identifier used in the output
Type — string, number, date, boolean, or array
Example — a sample value that helps the AI understand the expected format
Constraints — any validation rules (e.g. "must be a positive number", "ISO 8601 date format")
Document types
Selecting a Document Type gives the AI additional context about what it is reading. Available types include invoices, contracts, purchase orders, receipts, financial statements, and custom. Choosing the correct type generally improves accuracy.
Handling missing data
If a field cannot be found in a document, Lymnus returns a null value for that field rather than guessing. You can configure extraction nodes to flag rows with missing required fields so you can review them manually.
Output format
Extraction results are passed downstream as JSON, which downstream nodes (like Export App) can convert to CSV, Excel, or send directly to connected apps.