Authorizations
Body
JSON request to create a task
The file to be uploaded. Supported inputs:
ch://files/{file_id}
: Reference to an existing file. Upload via the Files APIhttp(s)://...
: Remote URL to fetchdata:*;base64,...
or raw base64 string
Controls the setting for the chunking and post-processing of each chunk.
Controls how errors are handled during processing:
Fail
: Stops processing and fails the task when any error occursContinue
: Attempts to continue processing despite non-critical errors (eg. LLM refusals etc.)
Fail
, Continue
The number of seconds until task is deleted. Expired tasks can not be updated, polled or accessed via web interface.
The name of the file to be uploaded. If not set a name will be generated.
Controls the LLM used for the task.
Controls the Optical Character Recognition (OCR) strategy.
All
: Processes all pages with OCR. (Latency penalty: ~0.5 seconds per page)Auto
: Selectively applies OCR only to pages with missing or low-quality text. When text layer is present the bounding boxes from the text layer are used.
All
, Auto
Choose the provider whose models will be used for segmentation and OCR.
The output will be unified to the Chunkr output
format.
Azure
, Chunkr
Defines how each segment type is handled when generating the final output.
Each segment uses one of three strategies. The chosen strategy controls:
- Whether the segment is kept (
Auto
,LLM
) or skipped (Ignore
). - How the content is produced (rule-based vs. LLM).
- The output format (
Html
orMarkdown
).
Optional flags such as image cropping, extended context, and descriptions further refine behaviour.
Default strategy per segment
Title
,SectionHeader
,Text
,ListItem
,Caption
,Footnote
→ Auto (Markdown, description off)Table
→ LLM (HTML, description on)Picture
→ LLM (Markdown, description off, cropping All)Formula
,Page
→ LLM (Markdown, description off)PageHeader
,PageFooter
→ Ignore (removed from output)
Strategy reference
- Auto – rule-based content generation.
- LLM – generate content with an LLM.
- Ignore – exclude the segment entirely.
Controls the segmentation strategy:
LayoutAnalysis
: Analyzes pages for layout elements (e.g.,Table
,Picture
,Formula
, etc.) using bounding boxes. Provides fine-grained segmentation and better chunking.Page
: Treats each page as a single segment. Faster processing, but without layout element detection and only simple chunking.
LayoutAnalysis
, Page
Response
Task created successfully.
The date and time when the task was created and queued.
A message describing the task's status or any errors that occurred.
The status of the task.
Starting
, Processing
, Succeeded
, Failed
, Cancelled
The unique identifier for the task.
The date and time when the task will expire.
The date and time when the task was finished.
The processed results of a document analysis task
The date and time when the task was started.
The presigned URL of the task.