Authorizations
Body
JSON request to create a parse task
The file to be parsed. Supported inputs:
ch://files/{file_id}
: Reference to an existing file. Upload via the Files APIhttp(s)://...
: Remote URL to fetchdata:*;base64,...
or raw base64 string
Controls the setting for the chunking and post-processing of each chunk.
Controls how errors are handled during processing:
Fail
: Stops processing and fails the task when any error occursContinue
: Attempts to continue processing despite non-critical errors (eg. LLM refusals etc.)
Fail
, Continue
Controls the LLM used for the task.
Controls the Optical Character Recognition (OCR) strategy.
All
: Processes all pages with OCR. (Latency penalty: ~0.5 seconds per page)Auto
: Selectively applies OCR only to pages with missing or low-quality text. When text layer is present the bounding boxes from the text layer are used.
All
, Auto
Azure
, Chunkr
Configuration for how each document segment is processed and formatted.
Each segment has sensible defaults, but you can override specific settings:
format
: Output asHtml
orMarkdown
strategy
:Auto
(rule-based),LLM
(AI-generated), orIgnore
(skip)crop_image
: Whether to crop images to segment boundsextended_context
: Use full page as context for LLM processingdescription
: Generate descriptions for segments
Defaults per segment type: Check the documentation for more details.
Only specify the fields you want to change - everything else uses the defaults.
Controls the segmentation strategy:
LayoutAnalysis
: Analyzes pages for layout elements (e.g.,Table
,Picture
,Formula
, etc.) using bounding boxes. Provides fine-grained segmentation and better chunking.Page
: Treats each page as a single segment. Faster processing, but without layout element detection and only simple chunking.
LayoutAnalysis
, Page
The number of seconds until task is deleted. Expired tasks can not be updated, polled or accessed via web interface.
The name of the file to be parsed. If not set a name will be generated.
Response
Task created successfully.
The date and time when the task was created and queued.
Information about the input file.
A message describing the task's status or any errors that occurred.
The status of the task.
Starting
, Processing
, Succeeded
, Failed
, Cancelled
The unique identifier for the task.
Parse
, Extract
Version information for the task.
The date and time when the task will expire.
The date and time when the task was finished.
The presigned URL of the input file.
Deprecated use file_info.url
instead.
The processed results of a document parsing task
The date and time when the task was started.
The presigned URL of the task.