POST
/
tasks
/
parse
Create Task
curl --request POST \
  --url https://api.chunkr.ai/api/v1/tasks/parse \
  --header 'Authorization: <api-key>' \
  --header 'Content-Type: application/json' \
  --data '{
  "chunk_processing": null,
  "error_handling": null,
  "expires_in": 123,
  "file": "<string>",
  "file_name": "<string>",
  "llm_processing": null,
  "ocr_strategy": null,
  "pipeline": null,
  "segment_processing": null,
  "segmentation_strategy": null
}'
{
  "configuration": {
    "chunk_processing": {
      "ignore_headers_and_footers": true,
      "target_length": 4096,
      "tokenizer": {
        "Enum": "Word"
      }
    },
    "client_version": null,
    "error_handling": "Fail",
    "expires_in": 123,
    "high_resolution": true,
    "input_file_url": "<string>",
    "llm_processing": {
      "fallback_strategy": "None",
      "llm_model_id": "<string>",
      "max_completion_tokens": 1,
      "temperature": 123
    },
    "ocr_strategy": "All",
    "pipeline": null,
    "segment_processing": {
      "Caption": null,
      "Footnote": null,
      "Formula": null,
      "ListItem": null,
      "Page": null,
      "PageFooter": null,
      "PageHeader": null,
      "Picture": null,
      "SectionHeader": null,
      "Table": null,
      "Text": null,
      "Title": null
    },
    "segmentation_strategy": "LayoutAnalysis",
    "target_chunk_length": 1
  },
  "created_at": "2023-11-07T05:31:56Z",
  "expires_at": "2023-11-07T05:31:56Z",
  "finished_at": "2023-11-07T05:31:56Z",
  "message": "<string>",
  "output": null,
  "started_at": "2023-11-07T05:31:56Z",
  "status": "Starting",
  "task_id": "<string>",
  "task_url": "<string>"
}

Authorizations

Authorization
string
header
required

Body

application/json

JSON request to create a task

file
string
required

The file to be uploaded. Supported inputs:

  • ch://files/{file_id}: Reference to an existing file. Upload via the Files API
  • http(s)://...: Remote URL to fetch
  • data:*;base64,... or raw base64 string
chunk_processing
object | null

Controls the setting for the chunking and post-processing of each chunk.

error_handling
enum<string> | null
default:Fail

Controls how errors are handled during processing:

  • Fail: Stops processing and fails the task when any error occurs
  • Continue: Attempts to continue processing despite non-critical errors (eg. LLM refusals etc.)
Available options:
Fail,
Continue
expires_in
integer | null

The number of seconds until task is deleted. Expired tasks can not be updated, polled or accessed via web interface.

file_name
string | null

The name of the file to be uploaded. If not set a name will be generated.

llm_processing
object | null

Controls the LLM used for the task.

ocr_strategy
enum<string> | null
default:All

Controls the Optical Character Recognition (OCR) strategy.

  • All: Processes all pages with OCR. (Latency penalty: ~0.5 seconds per page)
  • Auto: Selectively applies OCR only to pages with missing or low-quality text. When text layer is present the bounding boxes from the text layer are used.
Available options:
All,
Auto
pipeline
enum<string> | null
default:Azure

Choose the provider whose models will be used for segmentation and OCR. The output will be unified to the Chunkr output format.

Available options:
Azure,
Chunkr
segment_processing
object | null

Defines how each segment type is handled when generating the final output.

Each segment uses one of three strategies. The chosen strategy controls:

  • Whether the segment is kept (Auto, LLM) or skipped (Ignore).
  • How the content is produced (rule-based vs. LLM).
  • The output format (Html or Markdown).

Optional flags such as image cropping, extended context, and descriptions further refine behaviour.

Default strategy per segment

  • Title, SectionHeader, Text, ListItem, Caption, FootnoteAuto (Markdown, description off)
  • TableLLM (HTML, description on)
  • PictureLLM (Markdown, description off, cropping All)
  • Formula, PageLLM (Markdown, description off)
  • PageHeader, PageFooterIgnore (removed from output)

Strategy reference

  • Auto – rule-based content generation.
  • LLM – generate content with an LLM.
  • Ignore – exclude the segment entirely.
segmentation_strategy
enum<string> | null
default:LayoutAnalysis

Controls the segmentation strategy:

  • LayoutAnalysis: Analyzes pages for layout elements (e.g., Table, Picture, Formula, etc.) using bounding boxes. Provides fine-grained segmentation and better chunking.
  • Page: Treats each page as a single segment. Faster processing, but without layout element detection and only simple chunking.
Available options:
LayoutAnalysis,
Page

Response

Task created successfully.

configuration
object
required
created_at
string<date-time>
required

The date and time when the task was created and queued.

message
string
required

A message describing the task's status or any errors that occurred.

status
enum<string>
required

The status of the task.

Available options:
Starting,
Processing,
Succeeded,
Failed,
Cancelled
task_id
string
required

The unique identifier for the task.

expires_at
string<date-time> | null

The date and time when the task will expire.

finished_at
string<date-time> | null

The date and time when the task was finished.

output
object | null

The processed results of a document analysis task

started_at
string<date-time> | null

The date and time when the task was started.

task_url
string | null

The presigned URL of the task.