PATCH
/
api
/
v1
/
task
/
{task_id}
curl --request PATCH \
  --url https://api.chunkr.ai/api/v1/task/{task_id} \
  --header 'Authorization: <api-key>' \
  --header 'Content-Type: multipart/form-data' \
  --form chunk_processing=null \
  --form expires_in=123 \
  --form high_resolution=true \
  --form ocr_strategy=null \
  --form pipeline=null \
  --form segment_processing=null \
  --form segmentation_strategy=null
{
  "configuration": {
    "chunk_processing": {
      "ignore_headers_and_footers": true,
      "target_length": 512
    },
    "expires_in": 123,
    "high_resolution": true,
    "input_file_url": "<string>",
    "json_schema": "<any>",
    "model": null,
    "ocr_strategy": "All",
    "pipeline": null,
    "segment_processing": {
      "Caption": null,
      "Footnote": null,
      "Formula": null,
      "ListItem": null,
      "Page": null,
      "PageFooter": null,
      "PageHeader": null,
      "Picture": null,
      "SectionHeader": null,
      "Table": null,
      "Text": null,
      "Title": null
    },
    "segmentation_strategy": "LayoutAnalysis",
    "target_chunk_length": 123
  },
  "created_at": "2023-11-07T05:31:56Z",
  "expires_at": "2023-11-07T05:31:56Z",
  "finished_at": "2023-11-07T05:31:56Z",
  "message": "<string>",
  "output": null,
  "started_at": "2023-11-07T05:31:56Z",
  "status": "Starting",
  "task_id": "<string>",
  "task_url": "<string>"
}

Authorizations

Authorization
string
header
required

Path Parameters

task_id
string
required

Body

multipart/form-data
Multipart form request to update an task
chunk_processing
object | null

Controls the setting for the chunking and post-processing of each chunk.

expires_in
integer | null

The number of seconds until task is deleted. Expried tasks can not be updated, polled or accessed via web interface.

high_resolution
boolean | null

Whether to use high-resolution images for cropping and post-processing. (Latency penalty: ~7 seconds per page)

ocr_strategy
enum<string> | null

Controls the Optical Character Recognition (OCR) strategy.

  • All: Processes all pages with OCR. (Latency penalty: ~0.5 seconds per page)
  • Auto: Selectively applies OCR only to pages with missing or low-quality text. When text layer is present the bounding boxes from the text layer are used.
Available options:
All,
Auto
pipeline
enum<string> | null

The pipeline to use for processing. If pipeline is set to Azure then Azure layout analysis will be used for segmentation and OCR. The output will be unified to the Chunkr output.

Available options:
Azure
segment_processing
object | null

Controls the post-processing of each segment type. Allows you to generate HTML and Markdown from chunkr models for each segment type. By default, the HTML and Markdown are generated manually using the segmentation information except for Table and Formula. You can optionally configure custom LLM prompts and models to generate an additional llm field with LLM-processed content for each segment type.

segmentation_strategy
enum<string> | null

Controls the segmentation strategy:

  • LayoutAnalysis: Analyzes pages for layout elements (e.g., Table, Picture, Formula, etc.) using bounding boxes. Provides fine-grained segmentation and better chunking. (Latency penalty: ~TBD seconds per page).
  • Page: Treats each page as a single segment. Faster processing, but without layout element detection and only simple chunking.
Available options:
LayoutAnalysis,
Page

Response

200
application/json
Detailed information describing the task, its status and processed outputs
configuration
object
required

The configuration used for the task.

created_at
string
required

The date and time when the task was created and queued.

message
string
required

A message describing the task's status or any errors that occurred.

status
enum<string>
required

The status of the task.

Available options:
Starting,
Processing,
Succeeded,
Failed,
Cancelled
task_id
string
required

The unique identifier for the task.

expires_at
string | null

The date and time when the task will expire.

finished_at
string | null

The date and time when the task was finished.

output
object | null

The processed results of a document analysis task

started_at
string | null

The date and time when the task was started.

task_url
string | null

The presigned URL of the task.