Create Parse Task

POST

tasks

parse

Create Parse Task

curl --request POST \
  --url https://api.chunkr.ai/tasks/parse \
  --header 'Authorization: <api-key>' \
  --header 'Content-Type: application/json' \
  --data '
{
  "file": "<string>",
  "chunk_processing": {
    "ignore_headers_and_footers": null,
    "target_length": 4096,
    "tokenizer": {
      "Enum": "Word"
    }
  },
  "error_handling": "Fail",
  "ocr_strategy": "All",
  "pipeline": "Chunkr",
  "segment_processing": "<unknown>",
  "segmentation_strategy": "LayoutAnalysis",
  "expires_in": 123,
  "file_name": "<string>"
}
'

{
  "completed": true,
  "configuration": {
    "chunk_processing": {
      "ignore_headers_and_footers": null,
      "target_length": 4096,
      "tokenizer": {
        "Enum": "Word"
      }
    },
    "error_handling": "Fail",
    "ocr_strategy": "All",
    "pipeline": "Chunkr",
    "segment_processing": "<unknown>",
    "segmentation_strategy": "LayoutAnalysis"
  },
  "created_at": "2023-11-07T05:31:56Z",
  "file_info": {
    "url": "<string>",
    "mime_type": "<string>",
    "name": "<string>",
    "page_count": 1,
    "ss_cell_count": 1
  },
  "message": "<string>",
  "status": "Starting",
  "task_id": "<string>",
  "task_type": "Parse",
  "version_info": {
    "client_version": "Legacy",
    "server_version": "<string>"
  },
  "expires_at": "2023-11-07T05:31:56Z",
  "finished_at": "2023-11-07T05:31:56Z",
  "input_file_url": "<string>",
  "output": "<unknown>",
  "started_at": "2023-11-07T05:31:56Z",
  "task_url": "<string>"
}

Authorizations

Authorization

string

header

required

Body

application/json

JSON request to create a parse task

file

string

required

The file to be parsed. Supported inputs:

ch://files/{file_id}: Reference to an existing file. Upload via the Files API
http(s)://...: Remote URL to fetch
data:*;base64,... or raw base64 string

chunk_processing

object

Controls the setting for the chunking and post-processing of each chunk.

Show child attributes

error_handling

enum<string>

default:Fail

Controls how errors are handled during processing:

Fail: Stops processing and fails the task when any error occurs
Continue: Attempts to continue processing despite non-critical errors (eg. LLM refusals etc.)

Available options:

Fail,

Continue

ocr_strategy

enum<string>

default:All

Controls the Optical Character Recognition (OCR) strategy.

All: Processes all pages with OCR. (Latency penalty: ~0.5 seconds per page)
Auto: Selectively applies OCR only to pages with missing or low-quality text. When text layer is present the bounding boxes from the text layer are used.

Available options:

All,

Auto

pipeline

enum<string>

default:Chunkr

deprecated

Available options:

Azure,

Chunkr

segment_processing

object

Configuration for how each document segment is processed and formatted.

Each segment has sensible defaults, but you can override specific settings:

format: Output as Html or Markdown
strategy: Auto (rule-based), LLM (AI-generated), or Ignore (skip)
crop_image: Whether to crop images to segment bounds
extended_context: Use full page as context for LLM processing
description: Generate descriptions for segments

Defaults per segment type: Check the documentation for more details.

Only specify the fields you want to change - everything else uses the defaults.

Show child attributes

segmentation_strategy

enum<string>

default:LayoutAnalysis

Controls the segmentation strategy:

LayoutAnalysis: Analyzes pages for layout elements (e.g., Table, Picture, Formula, etc.) using bounding boxes. Provides fine-grained segmentation and better chunking.
Page: Treats each page as a single segment. Faster processing, but without layout element detection and only simple chunking.

Available options:

LayoutAnalysis,

Page

expires_in

integer<int32> | null

The number of seconds until task is deleted. Expired tasks can not be updated, polled or accessed via web interface.

file_name

string | null

The name of the file to be parsed. If not set a name will be generated.

Response

Task created successfully.

completed

boolean

required

True when the task reaches a terminal state i.e. status is Succeeded or Failed or Cancelled

configuration

Parse · object

required

Show child attributes

created_at

string<date-time>

required

The date and time when the task was created and queued.

file_info

object

required

Information about the input file.

Show child attributes

message

string

required

A message describing the task's status or any errors that occurred.

status

enum<string>

required

The status of the task.

Available options:

Starting,

Processing,

Succeeded,

Failed,

Cancelled

task_id

string

required

The unique identifier for the task.

task_type

enum<string>

required

Available options:

Parse,

Extract

version_info

object

required

Version information for the task.

Show child attributes

expires_at

string<date-time> | null

The date and time when the task will expire.

finished_at

string<date-time> | null

The date and time when the task was finished.

input_file_url

string | null

deprecated

The presigned URL of the input file. Deprecated use file_info.url instead.

output

Parse · object

The processed results of a document parsing task

Show child attributes

started_at

string<date-time> | null

The date and time when the task was started.

task_url

string | null

The presigned URL of the task.

Get TaskRetrieves the current state of a task. Returns task details such as processing status, configuration, output (when available), file metadata, and timestamps. Typical uses: - Poll a task during processing - Retrieve the final output once processing is complete - Access task metadata and configuration

⌘I

Create Parse Task

curl --request POST \
  --url https://api.chunkr.ai/tasks/parse \
  --header 'Authorization: <api-key>' \
  --header 'Content-Type: application/json' \
  --data '
{
  "file": "<string>",
  "chunk_processing": {
    "ignore_headers_and_footers": null,
    "target_length": 4096,
    "tokenizer": {
      "Enum": "Word"
    }
  },
  "error_handling": "Fail",
  "ocr_strategy": "All",
  "pipeline": "Chunkr",
  "segment_processing": "<unknown>",
  "segmentation_strategy": "LayoutAnalysis",
  "expires_in": 123,
  "file_name": "<string>"
}
'

{
  "completed": true,
  "configuration": {
    "chunk_processing": {
      "ignore_headers_and_footers": null,
      "target_length": 4096,
      "tokenizer": {
        "Enum": "Word"
      }
    },
    "error_handling": "Fail",
    "ocr_strategy": "All",
    "pipeline": "Chunkr",
    "segment_processing": "<unknown>",
    "segmentation_strategy": "LayoutAnalysis"
  },
  "created_at": "2023-11-07T05:31:56Z",
  "file_info": {
    "url": "<string>",
    "mime_type": "<string>",
    "name": "<string>",
    "page_count": 1,
    "ss_cell_count": 1
  },
  "message": "<string>",
  "status": "Starting",
  "task_id": "<string>",
  "task_type": "Parse",
  "version_info": {
    "client_version": "Legacy",
    "server_version": "<string>"
  },
  "expires_at": "2023-11-07T05:31:56Z",
  "finished_at": "2023-11-07T05:31:56Z",
  "input_file_url": "<string>",
  "output": "<unknown>",
  "started_at": "2023-11-07T05:31:56Z",
  "task_url": "<string>"
}

Extras

Files

Health

Tasks

Webhook

API Reference

Create Parse Task

Authorizations

Body

Response