Retrieves the current state of a task.
Returns task details such as processing status, configuration, output (when available), file metadata, and timestamps.
Typical uses:
curl --request GET \
--url https://api.chunkr.ai/tasks/{task_id} \
--header 'Authorization: <api-key>'{
"completed": true,
"configuration": {
"chunk_processing": {
"ignore_headers_and_footers": null,
"target_length": 4096,
"tokenizer": {
"Enum": "Word"
}
},
"error_handling": "Fail",
"ocr_strategy": "All",
"pipeline": "Chunkr",
"segment_processing": "<unknown>",
"segmentation_strategy": "LayoutAnalysis"
},
"created_at": "2023-11-07T05:31:56Z",
"file_info": {
"url": "<string>",
"mime_type": "<string>",
"name": "<string>",
"page_count": 1,
"ss_cell_count": 1
},
"message": "<string>",
"status": "Starting",
"task_id": "<string>",
"task_type": "Parse",
"version_info": {
"client_version": "Legacy",
"server_version": "<string>"
},
"expires_at": "2023-11-07T05:31:56Z",
"finished_at": "2023-11-07T05:31:56Z",
"input_file_url": "<string>",
"output": "<unknown>",
"parse_task_id": "<string>",
"started_at": "2023-11-07T05:31:56Z",
"task_url": "<string>"
}Id of the task to retrieve
Whether to return base64 encoded URLs. If false, the URLs will be returned as presigned URLs.
Whether to include chunks in the output response
Task details.
True when the task reaches a terminal state i.e. status is Succeeded or Failed or Cancelled
Unified configuration type that can represent either parse or extract configurations
Show child attributes
Controls the setting for the chunking and post-processing of each chunk.
Show child attributes
DEPRECATED: use segment_processing.ignore instead
The target number of words in each chunk. If 0, each chunk will contain a single segment.
x >= 0The tokenizer to use for the chunking process.
Show child attributes
Use one of the predefined tokenizer types
Word, Cl100kBase, XlmRobertaBase, BertBaseUncased Controls how errors are handled during processing:
Fail: Stops processing and fails the task when any error occursContinue: Attempts to continue processing despite non-critical errors (eg. LLM refusals etc.)Fail, Continue Controls the Optical Character Recognition (OCR) strategy.
All: Processes all pages with OCR. (Latency penalty: ~0.5 seconds per page)Auto: Selectively applies OCR only to pages with missing or low-quality text. When text layer is present the bounding boxes from the text layer are used.All, Auto Azure, Chunkr Configuration for how each document segment is processed and formatted.
Each segment has sensible defaults, but you can override specific settings:
format: Output as Html or Markdownstrategy: Auto (rule-based), LLM (AI-generated), or Ignore (skip)crop_image: Whether to crop images to segment boundsextended_context: Use full page as context for LLM processingdescription: Generate descriptions for segmentsDefaults per segment type: Check the documentation for more details.
Only specify the fields you want to change - everything else uses the defaults.
Show child attributes
Controls the processing and generation for the segment.
crop_image controls whether to crop the file's images to the segment's bounding box.
The cropped image will be stored in the segment's image field. Use All to always crop,
or Auto to only crop when needed for post-processing.format specifies the output format: Html or Markdownstrategy determines how the content is generated: Auto, LLM, or Ignore
Auto: Process content automaticallyLLM: Use large language models for processingIgnore: Exclude segments from final outputdescription enables LLM-generated descriptions for segments.
Note: This uses chunkr's own VLM models and is not configurable via LLM processing configuration.extended_context uses the full page image as context for LLM generation.Show child attributes
Controls the cropping strategy for an item (e.g. segment, chunk, etc.)
All crops all images in the itemAuto crops images only if required for post-processingAll, Auto Generate LLM descriptions for this segment
Use the full page image as context for LLM generation
The format for the content field of a segment.
Html, Markdown The strategy for generating the content field of a segment.
LLM, Auto, Ignore Controls the processing and generation for the segment.
crop_image controls whether to crop the file's images to the segment's bounding box.
The cropped image will be stored in the segment's image field. Use All to always crop,
or Auto to only crop when needed for post-processing.format specifies the output format: Html or Markdownstrategy determines how the content is generated: Auto, LLM, or Ignore
Auto: Process content automaticallyLLM: Use large language models for processingIgnore: Exclude segments from final outputdescription enables LLM-generated descriptions for segments.
Note: This uses chunkr's own VLM models and is not configurable via LLM processing configuration.extended_context uses the full page image as context for LLM generation.Show child attributes
Controls the cropping strategy for an item (e.g. segment, chunk, etc.)
All crops all images in the itemAuto crops images only if required for post-processingAll, Auto Generate LLM descriptions for this segment
Use the full page image as context for LLM generation
The format for the content field of a segment.
Html, Markdown The strategy for generating the content field of a segment.
LLM, Auto, Ignore New segment types - must be Optional for backwards compatibility.
Show child attributes
Controls the cropping strategy for an item (e.g. segment, chunk, etc.)
All crops all images in the itemAuto crops images only if required for post-processingAll, Auto Generate LLM descriptions for this segment
Use the full page image as context for LLM generation
The format for the content field of a segment.
Html, Markdown The strategy for generating the content field of a segment.
LLM, Auto, Ignore Controls the processing and generation for the segment.
crop_image controls whether to crop the file's images to the segment's bounding box.
The cropped image will be stored in the segment's image field. Use All to always crop,
or Auto to only crop when needed for post-processing.format specifies the output format: Html or Markdownstrategy determines how the content is generated: Auto, LLM, or Ignore
Auto: Process content automaticallyLLM: Use large language models for processingIgnore: Exclude segments from final outputdescription enables LLM-generated descriptions for segments.
Note: This uses chunkr's own VLM models and is not configurable via LLM processing configuration.extended_context uses the full page image as context for LLM generation.Show child attributes
Controls the cropping strategy for an item (e.g. segment, chunk, etc.)
All crops all images in the itemAuto crops images only if required for post-processingAll, Auto Generate LLM descriptions for this segment
Use the full page image as context for LLM generation
The format for the content field of a segment.
Html, Markdown The strategy for generating the content field of a segment.
LLM, Auto, Ignore Controls the processing and generation for the segment.
crop_image controls whether to crop the file's images to the segment's bounding box.
The cropped image will be stored in the segment's image field. Use All to always crop,
or Auto to only crop when needed for post-processing.format specifies the output format: Html or Markdownstrategy determines how the content is generated: Auto, LLM, or Ignore
Auto: Process content automaticallyLLM: Use large language models for processingIgnore: Exclude segments from final outputdescription enables LLM-generated descriptions for segments.
Note: This uses chunkr's own VLM models and is not configurable via LLM processing configuration.extended_context uses the full page image as context for LLM generation.Show child attributes
Controls the cropping strategy for an item (e.g. segment, chunk, etc.)
All crops all images in the itemAuto crops images only if required for post-processingAll, Auto Generate LLM descriptions for this segment
Use the full page image as context for LLM generation
The format for the content field of a segment.
Html, Markdown The strategy for generating the content field of a segment.
LLM, Auto, Ignore Controls the processing and generation for the segment.
crop_image controls whether to crop the file's images to the segment's bounding box.
The cropped image will be stored in the segment's image field. Use All to always crop,
or Auto to only crop when needed for post-processing.format specifies the output format: Html or Markdownstrategy determines how the content is generated: Auto, LLM, or Ignore
Auto: Process content automaticallyLLM: Use large language models for processingIgnore: Exclude segments from final outputdescription enables LLM-generated descriptions for segments.
Note: This uses chunkr's own VLM models and is not configurable via LLM processing configuration.extended_context uses the full page image as context for LLM generation.Show child attributes
Controls the cropping strategy for an item (e.g. segment, chunk, etc.)
All crops all images in the itemAuto crops images only if required for post-processingAll, Auto Generate LLM descriptions for this segment
Use the full page image as context for LLM generation
The format for the content field of a segment.
Html, Markdown The strategy for generating the content field of a segment.
LLM, Auto, Ignore Controls the processing and generation for the segment.
crop_image controls whether to crop the file's images to the segment's bounding box.
The cropped image will be stored in the segment's image field. Use All to always crop,
or Auto to only crop when needed for post-processing.format specifies the output format: Html or Markdownstrategy determines how the content is generated: Auto, LLM, or Ignore
Auto: Process content automaticallyLLM: Use large language models for processingIgnore: Exclude segments from final outputdescription enables LLM-generated descriptions for segments.
Note: This uses chunkr's own VLM models and is not configurable via LLM processing configuration.extended_context uses the full page image as context for LLM generation.Show child attributes
Controls the cropping strategy for an item (e.g. segment, chunk, etc.)
All crops all images in the itemAuto crops images only if required for post-processingAll, Auto Generate LLM descriptions for this segment
Use the full page image as context for LLM generation
The format for the content field of a segment.
Html, Markdown The strategy for generating the content field of a segment.
LLM, Auto, Ignore Controls the processing and generation for the segment.
crop_image controls whether to crop the file's images to the segment's bounding box.
The cropped image will be stored in the segment's image field. Use All to always crop,
or Auto to only crop when needed for post-processing.format specifies the output format: Html or Markdownstrategy determines how the content is generated: Auto, LLM, or Ignore
Auto: Process content automaticallyLLM: Use large language models for processingIgnore: Exclude segments from final outputdescription enables LLM-generated descriptions for segments.
Note: This uses chunkr's own VLM models and is not configurable via LLM processing configuration.extended_context uses the full page image as context for LLM generation.Show child attributes
Controls the cropping strategy for an item (e.g. segment, chunk, etc.)
All crops all images in the itemAuto crops images only if required for post-processingAll, Auto Generate LLM descriptions for this segment
Use the full page image as context for LLM generation
The format for the content field of a segment.
Html, Markdown The strategy for generating the content field of a segment.
LLM, Auto, Ignore Controls the processing and generation for the segment.
crop_image controls whether to crop the file's images to the segment's bounding box.
The cropped image will be stored in the segment's image field. Use All to always crop,
or Auto to only crop when needed for post-processing.format specifies the output format: Html or Markdownstrategy determines how the content is generated: Auto, LLM, or Ignore
Auto: Process content automaticallyLLM: Use large language models for processingIgnore: Exclude segments from final outputdescription enables LLM-generated descriptions for segments.
Note: This uses chunkr's own VLM models and is not configurable via LLM processing configuration.extended_context uses the full page image as context for LLM generation.Show child attributes
Controls the cropping strategy for an item (e.g. segment, chunk, etc.)
All crops all images in the itemAuto crops images only if required for post-processingAll, Auto Generate LLM descriptions for this segment
Use the full page image as context for LLM generation
The format for the content field of a segment.
Html, Markdown The strategy for generating the content field of a segment.
LLM, Auto, Ignore Controls the processing and generation for the segment.
crop_image controls whether to crop the file's images to the segment's bounding box.
The cropped image will be stored in the segment's image field. Use All to always crop,
or Auto to only crop when needed for post-processing.format specifies the output format: Html or Markdownstrategy determines how the content is generated: Auto, LLM, or Ignore
Auto: Process content automaticallyLLM: Use large language models for processingIgnore: Exclude segments from final outputdescription enables LLM-generated descriptions for segments.
Note: This uses chunkr's own VLM models and is not configurable via LLM processing configuration.extended_context uses the full page image as context for LLM generation.Show child attributes
Controls the cropping strategy for an item (e.g. segment, chunk, etc.)
All crops all images in the itemAuto crops images only if required for post-processingAll, Auto Generate LLM descriptions for this segment
Use the full page image as context for LLM generation
The format for the content field of a segment.
Html, Markdown The strategy for generating the content field of a segment.
LLM, Auto, Ignore Controls the processing and generation for the segment.
crop_image controls whether to crop the file's images to the segment's bounding box.
The cropped image will be stored in the segment's image field. Use All to always crop,
or Auto to only crop when needed for post-processing.format specifies the output format: Html or Markdownstrategy determines how the content is generated: Auto, LLM, or Ignore
Auto: Process content automaticallyLLM: Use large language models for processingIgnore: Exclude segments from final outputdescription enables LLM-generated descriptions for segments.
Note: This uses chunkr's own VLM models and is not configurable via LLM processing configuration.extended_context uses the full page image as context for LLM generation.Show child attributes
Controls the cropping strategy for an item (e.g. segment, chunk, etc.)
All crops all images in the itemAuto crops images only if required for post-processingAll, Auto Generate LLM descriptions for this segment
Use the full page image as context for LLM generation
The format for the content field of a segment.
Html, Markdown The strategy for generating the content field of a segment.
LLM, Auto, Ignore Controls the processing and generation for the segment.
crop_image controls whether to crop the file's images to the segment's bounding box.
The cropped image will be stored in the segment's image field. Use All to always crop,
or Auto to only crop when needed for post-processing.format specifies the output format: Html or Markdownstrategy determines how the content is generated: Auto, LLM, or Ignore
Auto: Process content automaticallyLLM: Use large language models for processingIgnore: Exclude segments from final outputdescription enables LLM-generated descriptions for segments.
Note: This uses chunkr's own VLM models and is not configurable via LLM processing configuration.extended_context uses the full page image as context for LLM generation.Show child attributes
Controls the cropping strategy for an item (e.g. segment, chunk, etc.)
All crops all images in the itemAuto crops images only if required for post-processingAll, Auto Generate LLM descriptions for this segment
Use the full page image as context for LLM generation
The format for the content field of a segment.
Html, Markdown The strategy for generating the content field of a segment.
LLM, Auto, Ignore Controls the processing and generation for the segment.
crop_image controls whether to crop the file's images to the segment's bounding box.
The cropped image will be stored in the segment's image field. Use All to always crop,
or Auto to only crop when needed for post-processing.format specifies the output format: Html or Markdownstrategy determines how the content is generated: Auto, LLM, or Ignore
Auto: Process content automaticallyLLM: Use large language models for processingIgnore: Exclude segments from final outputdescription enables LLM-generated descriptions for segments.
Note: This uses chunkr's own VLM models and is not configurable via LLM processing configuration.extended_context uses the full page image as context for LLM generation.Show child attributes
Controls the cropping strategy for an item (e.g. segment, chunk, etc.)
All crops all images in the itemAuto crops images only if required for post-processingAll, Auto Generate LLM descriptions for this segment
Use the full page image as context for LLM generation
The format for the content field of a segment.
Html, Markdown The strategy for generating the content field of a segment.
LLM, Auto, Ignore Controls the processing and generation for the segment.
crop_image controls whether to crop the file's images to the segment's bounding box.
The cropped image will be stored in the segment's image field. Use All to always crop,
or Auto to only crop when needed for post-processing.format specifies the output format: Html or Markdownstrategy determines how the content is generated: Auto, LLM, or Ignore
Auto: Process content automaticallyLLM: Use large language models for processingIgnore: Exclude segments from final outputdescription enables LLM-generated descriptions for segments.
Note: This uses chunkr's own VLM models and is not configurable via LLM processing configuration.extended_context uses the full page image as context for LLM generation.Show child attributes
Controls the cropping strategy for an item (e.g. segment, chunk, etc.)
All crops all images in the itemAuto crops images only if required for post-processingAll, Auto Generate LLM descriptions for this segment
Use the full page image as context for LLM generation
The format for the content field of a segment.
Html, Markdown The strategy for generating the content field of a segment.
LLM, Auto, Ignore Controls the processing and generation for the segment.
crop_image controls whether to crop the file's images to the segment's bounding box.
The cropped image will be stored in the segment's image field. Use All to always crop,
or Auto to only crop when needed for post-processing.format specifies the output format: Html or Markdownstrategy determines how the content is generated: Auto, LLM, or Ignore
Auto: Process content automaticallyLLM: Use large language models for processingIgnore: Exclude segments from final outputdescription enables LLM-generated descriptions for segments.
Note: This uses chunkr's own VLM models and is not configurable via LLM processing configuration.extended_context uses the full page image as context for LLM generation.Show child attributes
Controls the cropping strategy for an item (e.g. segment, chunk, etc.)
All crops all images in the itemAuto crops images only if required for post-processingAll, Auto Generate LLM descriptions for this segment
Use the full page image as context for LLM generation
The format for the content field of a segment.
Html, Markdown The strategy for generating the content field of a segment.
LLM, Auto, Ignore Controls the processing and generation for the segment.
crop_image controls whether to crop the file's images to the segment's bounding box.
The cropped image will be stored in the segment's image field. Use All to always crop,
or Auto to only crop when needed for post-processing.format specifies the output format: Html or Markdownstrategy determines how the content is generated: Auto, LLM, or Ignore
Auto: Process content automaticallyLLM: Use large language models for processingIgnore: Exclude segments from final outputdescription enables LLM-generated descriptions for segments.
Note: This uses chunkr's own VLM models and is not configurable via LLM processing configuration.extended_context uses the full page image as context for LLM generation.Show child attributes
Controls the cropping strategy for an item (e.g. segment, chunk, etc.)
All crops all images in the itemAuto crops images only if required for post-processingAll, Auto Generate LLM descriptions for this segment
Use the full page image as context for LLM generation
The format for the content field of a segment.
Html, Markdown The strategy for generating the content field of a segment.
LLM, Auto, Ignore Controls the processing and generation for the segment.
crop_image controls whether to crop the file's images to the segment's bounding box.
The cropped image will be stored in the segment's image field. Use All to always crop,
or Auto to only crop when needed for post-processing.format specifies the output format: Html or Markdownstrategy determines how the content is generated: Auto, LLM, or Ignore
Auto: Process content automaticallyLLM: Use large language models for processingIgnore: Exclude segments from final outputdescription enables LLM-generated descriptions for segments.
Note: This uses chunkr's own VLM models and is not configurable via LLM processing configuration.extended_context uses the full page image as context for LLM generation.Show child attributes
Controls the cropping strategy for an item (e.g. segment, chunk, etc.)
All crops all images in the itemAuto crops images only if required for post-processingAll, Auto Generate LLM descriptions for this segment
Use the full page image as context for LLM generation
The format for the content field of a segment.
Html, Markdown The strategy for generating the content field of a segment.
LLM, Auto, Ignore Controls the segmentation strategy:
LayoutAnalysis: Analyzes pages for layout elements (e.g., Table, Picture, Formula, etc.) using bounding boxes. Provides fine-grained segmentation and better chunking.Page: Treats each page as a single segment. Faster processing, but without layout element detection and only simple chunking.LayoutAnalysis, Page The date and time when the task was created and queued.
Information about the input file.
Show child attributes
The presigned URL/Base64 encoded URL of the input file.
The MIME type of the file.
The name of the file.
The number of pages in the file.
x >= 0The number of cells in the file. Only used for spreadsheets.
x >= 0A message describing the task's status or any errors that occurred.
The status of the task.
Starting, Processing, Succeeded, Failed, Cancelled The unique identifier for the task.
Parse, Extract The date and time when the task will expire.
The date and time when the task was finished.
The presigned URL of the input file.
Deprecated use file_info.url instead.
Unified output type that can represent either parse or extract results
Show child attributes
Collection of document chunks, where each chunk contains one or more segments
Show child attributes
The total number of tokens in the embed field of the chunk. Calculated by the tokenizer.
x >= 0Collection of document segments that form this chunk.
When target_chunk_length > 0, contains the maximum number of segments
that fit within that length (segments remain intact).
Otherwise, contains exactly one segment.
Show child attributes
Bounding box for an item. It is used for segments and OCR results.
Show child attributes
The height of the bounding box.
The left coordinate of the bounding box.
The top coordinate of the bounding box.
The width of the bounding box.
Height of the page/sheet containing the segment.
Page number/Sheet number of the segment.
x >= 0Width of the page/sheet containing the segment.
Unique identifier for the segment.
All the possible types for a segment.
Caption, Footnote, Formula, FormRegion, GraphicalItem, Legend, LineNumber, ListItem, Page, PageFooter, PageHeader, PageNumber, Picture, Table, Text, Title, Unknown, SectionHeader Confidence score of the layout analysis model
Content of the segment, will be either HTML or Markdown, depending on format chosen.
Description of the segment, generated by the LLM.
Embeddable content of the segment.
Presigned URL to the image of the segment.
LLM representation of the segment.
OCR results for the segment.
Show child attributes
Bounding box for an item. It is used for segments and OCR results.
Show child attributes
The height of the bounding box.
The left coordinate of the bounding box.
The top coordinate of the bounding box.
The width of the bounding box.
The recognized text of the OCR result.
The confidence score of the recognized text.
The unique identifier for the OCR result.
Excel-style cell reference (e.g., "A1" or "A1:B2") when OCR originates from a spreadsheet cell
Length of the segment in tokens.
x >= 0Cells of the segment. Only used for Spreadsheets.
Show child attributes
The cell ID.
Range of the cell.
Text content of the cell.
Formula of the cell.
Hyperlink URL if the cell contains a link (e.g., "https://www.chunkr.ai").
Styling information for the cell including colors, fonts, and formatting.
Show child attributes
Alignment of the cell content.
Left, Center, Right, Justify Background color of the cell (e.g., "#FFFFFF" or "#DAE3F3").
Font face/family of the cell (e.g., "Arial", "Daytona").
Whether the cell content is bold.
Text color of the cell (e.g., "#000000" or "red").
Vertical alignment of the cell content.
Top, Middle, Bottom, Baseline The computed/evaluated value of the cell. This represents the actual result after evaluating any formulas, as opposed to the raw text content. For cells with formulas, this is the calculated result; for cells with static content, this is typically the same as the text field.
Example: text might show "3.14" (formatted to 2 decimal places) while value could be "3.141592653589793" (full precision).
Bounding box of the header of the segment, if found. Only used for Spreadsheets.
Show child attributes
The height of the bounding box.
The left coordinate of the bounding box.
The top coordinate of the bounding box.
The width of the bounding box.
OCR results of the header of the segment, if found. Only used for Spreadsheets.
Show child attributes
Bounding box for an item. It is used for segments and OCR results.
Show child attributes
The height of the bounding box.
The left coordinate of the bounding box.
The top coordinate of the bounding box.
The width of the bounding box.
The recognized text of the OCR result.
The confidence score of the recognized text.
The unique identifier for the OCR result.
Excel-style cell reference (e.g., "A1" or "A1:B2") when OCR originates from a spreadsheet cell
Header range of the segment, if found.
The header can have overlap with the segment.range if the table contains the header,
if the header is located in a different sheet, the header range will have no overlap with the segment.range.
Only used for Spreadsheets.
Text content of the header of the segment, if found. Only used for Spreadsheets.
Range of the segment in Excel notation (e.g., A1:B5). Only used for Spreadsheets.
Name of the sheet containing the segment. Only used for Spreadsheets.
Text content of the segment. Calculated by the OCR results.
The unique identifier for the chunk.
The content of the chunk. This is the text that is generated by combining the content field from each segment.
Can be used provided as context to the LLM.
Suggested text to be embedded for the chunk. This text is generated by combining the embed field from each segment.
The name of the file. Deprecated use file_info.name instead.
The MIME type of the file. Deprecated use file_info.mime_type instead.
The number of pages in the file. Deprecated use file_info.page_count instead.
x >= 0The pages of the file. Includes the image and metadata for each page.
Show child attributes
The presigned URL of the page/sheet image.
The number of pages in the file.
The number of pages in the file.
x >= 0The number of pages in the file.
DPI of the page/sheet. All cropped images are scaled to this DPI.
The name of the sheet containing the page. Only used for Spreadsheets.
The presigned URL of the PDF file.
The ID of the source parse task that was used for the task
The date and time when the task was started.
The presigned URL of the task.
Was this page helpful?
curl --request GET \
--url https://api.chunkr.ai/tasks/{task_id} \
--header 'Authorization: <api-key>'{
"completed": true,
"configuration": {
"chunk_processing": {
"ignore_headers_and_footers": null,
"target_length": 4096,
"tokenizer": {
"Enum": "Word"
}
},
"error_handling": "Fail",
"ocr_strategy": "All",
"pipeline": "Chunkr",
"segment_processing": "<unknown>",
"segmentation_strategy": "LayoutAnalysis"
},
"created_at": "2023-11-07T05:31:56Z",
"file_info": {
"url": "<string>",
"mime_type": "<string>",
"name": "<string>",
"page_count": 1,
"ss_cell_count": 1
},
"message": "<string>",
"status": "Starting",
"task_id": "<string>",
"task_type": "Parse",
"version_info": {
"client_version": "Legacy",
"server_version": "<string>"
},
"expires_at": "2023-11-07T05:31:56Z",
"finished_at": "2023-11-07T05:31:56Z",
"input_file_url": "<string>",
"output": "<unknown>",
"parse_task_id": "<string>",
"started_at": "2023-11-07T05:31:56Z",
"task_url": "<string>"
}