Skip to main content
GET
/
tasks
/
{task_id}
/
extract
Get Extract Task
curl --request GET \
  --url https://api.chunkr.ai/api/v1/tasks/{task_id}/extract \
  --header 'Authorization: <api-key>'
{
  "configuration": {
    "parse_configuration": null,
    "schema": {},
    "system_prompt": "You are an expert at structured data extraction. You will be given parsed text from a document and should convert it into the given structure."
  },
  "created_at": "2023-11-07T05:31:56Z",
  "expires_at": "2023-11-07T05:31:56Z",
  "file_info": {
    "mime_type": "<string>",
    "name": "<string>",
    "page_count": 1,
    "url": "<string>"
  },
  "finished_at": "2023-11-07T05:31:56Z",
  "input_file_url": "<string>",
  "message": "<string>",
  "output": null,
  "source_task_id": "<string>",
  "started_at": "2023-11-07T05:31:56Z",
  "status": "Starting",
  "task_id": "<string>",
  "task_type": "Parse",
  "task_url": "<string>",
  "version_info": {
    "client_version": "Legacy",
    "server_version": "<string>"
  }
}

Authorizations

Authorization
string
header
required

Path Parameters

task_id
string | null
required

Id of the task to retrieve

Query Parameters

base64_urls
boolean

Whether to return base64 encoded URLs. If false, the URLs will be returned as presigned URLs.

include_chunks
boolean

Whether to include chunks in the output response

Response

Task details.

configuration
object
required
created_at
string<date-time>
required

The date and time when the task was created and queued.

file_info
object
required

Information about the input file.

message
string
required

A message describing the task's status or any errors that occurred.

status
enum<string>
required

The status of the task.

Available options:
Starting,
Processing,
Succeeded,
Failed,
Cancelled
task_id
string
required

The unique identifier for the task.

task_type
enum<string>
required
Available options:
Parse,
Extract
version_info
object
required

Version information for the task.

expires_at
string<date-time> | null

The date and time when the task will expire.

finished_at
string<date-time> | null

The date and time when the task was finished.

input_file_url
string | null
deprecated

The presigned URL of the input file. Deprecated use file_info.url instead.

output
object | null

The processed results of a document extraction task.

Shapes:

  • results: JSON matching the user-provided schema.
  • citations: mirror of results; only leaf positions (primitive or array-of-primitives) contain a Vec<Citation> supporting that field.
  • metrics: mirror of results; only leaf positions contain a Metrics object for that field.
source_task_id
string | null

The ID of the source parse task that was used for extraction

started_at
string<date-time> | null

The date and time when the task was started.

task_url
string | null

The presigned URL of the task.