Structured Extraction
Extract structured data from documents
Chunkr provides structured extraction capabilities to extract specific data fields from documents according to a defined JSON schema. This allows you to convert unstructured document content into structured data formats.
JSON Schema
When creating a task, you can provide a JSON schema that defines the structure of data you want to extract. Here is an example of how to set up a structured extraction task:
JSON Schema Structure
The json_schema
defines the structure of the data to be extracted. It consists of a title
, type
, and a list of properties
. Each property represents a specific field to extract from the document.
TypeScript Interface Representation
Below are the TypeScript interfaces that model the JSON schema:
Property Fields Explanation
- name: The identifier for the field in the extracted data.
- title: A human-readable title for the field.
- type: The data type of the field (e.g.,
string
,list
). - description: A description of what the field represents.
- default: The default value for the field if no data is extracted.
Interpreting the Response
Once the task is completed, the response will include the extracted data in a structured format. Here is an example of the output: