Skip to main content
POST
/
tasks
/
extract
Create Extract Task
curl --request POST \
  --url https://api.chunkr.ai/api/v1/tasks/extract \
  --header 'Authorization: <api-key>' \
  --header 'Content-Type: application/json' \
  --data '{
  "parse_configuration": null,
  "schema": {},
  "system_prompt": "You are an expert at structured data extraction. You will be given parsed text from a document and should convert it into the given structure.",
  "expires_in": 123,
  "file": "<string>",
  "file_name": "<string>"
}'
{
  "configuration": {
    "parse_configuration": null,
    "schema": {},
    "system_prompt": "You are an expert at structured data extraction. You will be given parsed text from a document and should convert it into the given structure."
  },
  "created_at": "2023-11-07T05:31:56Z",
  "expires_at": "2023-11-07T05:31:56Z",
  "file_info": {
    "mime_type": "<string>",
    "name": "<string>",
    "page_count": 1,
    "url": "<string>"
  },
  "finished_at": "2023-11-07T05:31:56Z",
  "input_file_url": "<string>",
  "message": "<string>",
  "output": null,
  "source_task_id": "<string>",
  "started_at": "2023-11-07T05:31:56Z",
  "status": "Starting",
  "task_id": "<string>",
  "task_type": "Parse",
  "task_url": "<string>",
  "version_info": {
    "client_version": "Legacy",
    "server_version": "<string>"
  }
}

Authorizations

Authorization
string
header
required

Body

application/json

JSON request to create an extract task

schema
object
required

The schema to be used for the extraction.

file
string
required

The file to be extracted. Supported inputs:

  • ch://files/{file_id}: Reference to an existing file. Upload via the Files API
  • http(s)://...: Remote URL to fetch
  • data:*;base64,... or raw base64 string
  • task_id: Reference to an existing parsetask.
parse_configuration
object | null

Optional configuration for the parse task. Can not be used if file is a task_id.

system_prompt
string | null
default:You are an expert at structured data extraction. You will be given parsed text from a document and should convert it into the given structure.

The system prompt to be used for the extraction.

expires_in
integer | null

The number of seconds until task is deleted. Expired tasks can not be updated, polled or accessed via web interface.

file_name
string | null

The name of the file to be extracted. If not set a name will be generated. Can not be provided if the file is a task_id.

Response

Task created successfully.

configuration
object
required
created_at
string<date-time>
required

The date and time when the task was created and queued.

file_info
object
required

Information about the input file.

message
string
required

A message describing the task's status or any errors that occurred.

status
enum<string>
required

The status of the task.

Available options:
Starting,
Processing,
Succeeded,
Failed,
Cancelled
task_id
string
required

The unique identifier for the task.

task_type
enum<string>
required
Available options:
Parse,
Extract
version_info
object
required

Version information for the task.

expires_at
string<date-time> | null

The date and time when the task will expire.

finished_at
string<date-time> | null

The date and time when the task was finished.

input_file_url
string | null
deprecated

The presigned URL of the input file. Deprecated use file_info.url instead.

output
object | null

The processed results of a document extraction task.

Shapes:

  • results: JSON matching the user-provided schema.
  • citations: mirror of results; only leaf positions (primitive or array-of-primitives) contain a Vec<Citation> supporting that field.
  • metrics: mirror of results; only leaf positions contain a Metrics object for that field.
source_task_id
string | null

The ID of the source parse task that was used for extraction

started_at
string<date-time> | null

The date and time when the task was started.

task_url
string | null

The presigned URL of the task.