Create Extract Task

POST

tasks

extract

Create Extract Task

curl --request POST \
  --url https://api.chunkr.ai/tasks/extract \
  --header 'Authorization: <api-key>' \
  --header 'Content-Type: application/json' \
  --data '{
  "parse_configuration": null,
  "schema": {},
  "system_prompt": "You are an expert at structured data extraction. You will be given parsed text from a document and should convert it into the given structure.",
  "expires_in": 123,
  "file": "<string>",
  "file_name": "<string>"
}'

{
  "completed": true,
  "configuration": {
    "parse_configuration": null,
    "schema": {},
    "system_prompt": "You are an expert at structured data extraction. You will be given parsed text from a document and should convert it into the given structure."
  },
  "created_at": "2023-11-07T05:31:56Z",
  "expires_at": "2023-11-07T05:31:56Z",
  "file_info": {
    "mime_type": "<string>",
    "name": "<string>",
    "page_count": 1,
    "ss_cell_count": 1,
    "url": "<string>"
  },
  "finished_at": "2023-11-07T05:31:56Z",
  "input_file_url": "<string>",
  "message": "<string>",
  "output": null,
  "parse_task_id": "<string>",
  "started_at": "2023-11-07T05:31:56Z",
  "status": "Starting",
  "task_id": "<string>",
  "task_type": "Parse",
  "task_url": "<string>",
  "version_info": {
    "client_version": "Legacy",
    "server_version": "<string>"
  }
}

Authorizations

Authorization

string

header

required

Body

application/json

JSON request to create an extract task

schema

object

required

The schema to be used for the extraction.

file

string

required

The file to be extracted. Supported inputs:

ch://files/{file_id}: Reference to an existing file. Upload via the Files API
http(s)://...: Remote URL to fetch
data:*;base64,... or raw base64 string
task_id: Reference to an existing parsetask.

parse_configuration

object | null

Optional configuration for the parse task. Can not be used if file is a task_id.

Show child attributes

system_prompt

string | null

default:You are an expert at structured data extraction. You will be given parsed text from a document and should convert it into the given structure.

The system prompt to be used for the extraction.

expires_in

integer | null

The number of seconds until task is deleted. Expired tasks can not be updated, polled or accessed via web interface.

file_name

string | null

The name of the file to be extracted. If not set a name will be generated. Can not be provided if the file is a task_id.

Response

Task created successfully.

completed

boolean

required

True when the task reaches a terminal state i.e. status is Succeeded or Failed or Cancelled

configuration

object

required

Show child attributes

created_at

string<date-time>

required

The date and time when the task was created and queued.

file_info

object

required

Information about the input file.

Show child attributes

message

string

required

A message describing the task's status or any errors that occurred.

status

enum<string>

required

The status of the task.

Available options:

Starting,

Processing,

Succeeded,

Failed,

Cancelled

task_id

string

required

The unique identifier for the task.

task_type

enum<string>

required

Available options:

Parse,

Extract

version_info

object

required

Version information for the task.

Show child attributes

expires_at

string<date-time> | null

The date and time when the task will expire.

finished_at

string<date-time> | null

The date and time when the task was finished.

input_file_url

string | null

deprecated

The presigned URL of the input file. Deprecated use file_info.url instead.

output

object | null

The processed results of a document extraction task.

Shapes:

results: JSON matching the user-provided schema.
citations: mirror of results; only leaf positions (primitive or array-of-primitives) contain a Vec<Citation> supporting that field.
metrics: mirror of results; only leaf positions contain a Metrics object for that field.

Show child attributes

parse_task_id

string | null

The ID of the source parse task that was used for extraction

started_at

string<date-time> | null

The date and time when the task was started.

task_url

string | null

The presigned URL of the task.

Create Parse TaskQueues a document for processing and returns a `TaskResponse` with the assigned `task_id`, initial configuration, file metadata, and timestamps. The initial status is `Starting`. Creates a parse task and returns its metadata immediately.

⌘I

Create Extract Task

curl --request POST \
  --url https://api.chunkr.ai/tasks/extract \
  --header 'Authorization: <api-key>' \
  --header 'Content-Type: application/json' \
  --data '{
  "parse_configuration": null,
  "schema": {},
  "system_prompt": "You are an expert at structured data extraction. You will be given parsed text from a document and should convert it into the given structure.",
  "expires_in": 123,
  "file": "<string>",
  "file_name": "<string>"
}'

{
  "completed": true,
  "configuration": {
    "parse_configuration": null,
    "schema": {},
    "system_prompt": "You are an expert at structured data extraction. You will be given parsed text from a document and should convert it into the given structure."
  },
  "created_at": "2023-11-07T05:31:56Z",
  "expires_at": "2023-11-07T05:31:56Z",
  "file_info": {
    "mime_type": "<string>",
    "name": "<string>",
    "page_count": 1,
    "ss_cell_count": 1,
    "url": "<string>"
  },
  "finished_at": "2023-11-07T05:31:56Z",
  "input_file_url": "<string>",
  "message": "<string>",
  "output": null,
  "parse_task_id": "<string>",
  "started_at": "2023-11-07T05:31:56Z",
  "status": "Starting",
  "task_id": "<string>",
  "task_type": "Parse",
  "task_url": "<string>",
  "version_info": {
    "client_version": "Legacy",
    "server_version": "<string>"
  }
}

Extras

Files

Health

Tasks

Webhook

API Reference

Create Extract Task

Authorizations

Body

Response