Authorizations
Path Parameters
Body
JSON request to update an task
Controls the setting for the chunking and post-processing of each chunk.
Controls how errors are handled during processing:
Fail
: Stops processing and fails the task when any error occursContinue
: Attempts to continue processing despite non-critical errors (eg. LLM refusals etc.)
Fail
, Continue
The number of seconds until task is deleted. Expired tasks can not be updated, polled or accessed via web interface.
Whether to use high-resolution images for cropping and post-processing. (Latency penalty: ~7 seconds per page)
Controls the LLM used for the task.
Controls the Optical Character Recognition (OCR) strategy.
All
: Processes all pages with OCR. (Latency penalty: ~0.5 seconds per page)Auto
: Selectively applies OCR only to pages with missing or low-quality text. When text layer is present the bounding boxes from the text layer are used.
All
, Auto
Choose the provider whose models will be used for segmentation and OCR.
The output will be unified to the Chunkr output
format.
Azure
, Chunkr
Defines how each segment type is handled when generating the final output.
Each segment uses one of three strategies. The chosen strategy controls:
- Whether the segment is kept (
Auto
,LLM
) or skipped (Ignore
). - How the content is produced (rule-based vs. LLM).
- The output format (
Html
orMarkdown
).
Optional flags such as image cropping, extended context, and descriptions further refine behaviour.
Default strategy per segment
Title
,SectionHeader
,Text
,ListItem
,Caption
,Footnote
→ Auto (Markdown, description off)Table
→ LLM (HTML, description on)Picture
→ LLM (Markdown, description off, cropping All)Formula
,Page
→ LLM (Markdown, description off)PageHeader
,PageFooter
→ Ignore (removed from output)
Strategy reference
- Auto – rule-based content generation.
- LLM – generate content with an LLM.
- Ignore – exclude the segment entirely.
Controls the segmentation strategy:
LayoutAnalysis
: Analyzes pages for layout elements (e.g.,Table
,Picture
,Formula
, etc.) using bounding boxes. Provides fine-grained segmentation and better chunking.Page
: Treats each page as a single segment. Faster processing, but without layout element detection and only simple chunking.
LayoutAnalysis
, Page
Response
Task updated and re-queued for processing.
The date and time when the task was created and queued.
A message describing the task's status or any errors that occurred.
The status of the task.
Starting
, Processing
, Succeeded
, Failed
, Cancelled
The unique identifier for the task.
The date and time when the task will expire.
The date and time when the task was finished.
The processed results of a document analysis task
The date and time when the task was started.
The presigned URL of the task.