Skip to main content

Create Task (upload-first)

Most workflows start by uploading a local file, then creating a task using the uploaded file URL.
  • client.files.create(): Uploads a local file and returns a URL.
  • client.tasks.parse.create(): Submits the uploaded file URL for processing.
from chunkr_ai import Chunkr
import os

client = Chunkr(api_key=os.environ['CHUNKR_API_KEY'])

# Upload a local file first
with open('doc.pdf', 'rb') as f:
    uploaded_file = client.files.create(file=f)

# Create the task with the uploaded file URL
task = client.tasks.parse.create(file=uploaded_file.url)

print(f'Task created with ID: {task.task_id}')
print(f'Initial status: {task.status}')  # "Starting" or "Processing"

Supported Input Sources

You can provide a file via a URL, a local file (upload-first), or a base64-encoded string.
from chunkr_ai import Chunkr
import os
import base64

client = Chunkr(api_key=os.environ['CHUNKR_API_KEY'])

# From a URL (if available)
task = client.tasks.parse.create(file='https://chunkr.ai/doc.pdf')

# Or, from a local file
with open('doc.pdf', 'rb') as f:
    uploaded_file = client.files.create(file=f)
    task = client.tasks.parse.create(file=uploaded_file.url)

# OR from a base64 string
with open('doc.pdf', 'rb') as f:
    base64_string = base64.b64encode(f.read()).decode('utf-8')
    task = client.tasks.parse.create(
        file=f'data:application/pdf;base64,{base64_string}'
    )

Configuration

Most users can start without any configuration. If needed, you can set optional parameters like expires_in for data retention when creating a task. For advanced options, see API Reference.

Get Task

Retrieve information for any task using its task_id. There are several ways to get task results depending on your needs.

Get Completed Task

For tasks that have already completed processing, you can retrieve the results immediately:
from chunkr_ai import Chunkr
import os

client = Chunkr(api_key=os.environ['CHUNKR_API_KEY'])

# Get the task
task = client.tasks.parse.get(task_id='task_123')

# Access task info
print(f'Status: {task.status}')
if task.status == 'Succeeded' and task.output is not None:
    print(f'Chunks: {len(task.output.chunks)}')
    for chunk in task.output.chunks[:5]:
        if chunk.content is not None:
            print(f'- {chunk.content[:100]}...')

Robust Polling with Retry Logic

For tasks still processing, implement polling with retry logic using dedicated retry libraries for better error handling and exponential backoff. We recommend using tenacity for python and p-retry for typescript.
from chunkr_ai import Chunkr
from tenacity import retry, retry_if_result, stop_after_attempt, wait_fixed
import os

client = Chunkr(api_key=os.environ['CHUNKR_API_KEY'])


@retry(
    retry=retry_if_result(lambda result: not result.completed),
    stop=stop_after_attempt(1500),
    wait=wait_fixed(3),
)
def get_task(task_id):
    task = client.tasks.parse.get(task_id=task_id)

    print(f'Task ID: {task_id}, Status: {task.status}')
    return task


# Get task with polling
task = get_task('task_123')

print(task.status)  # Will be "Succeeded"
if task.status == 'Succeeded' and task.output is not None:
    print(f'Found {len(task.output.chunks)} chunks')

Get Task with Base64-Encoded Assets

By default, Chunkr provides access to generated files (like images or PDF crops) via temporary pre-signed URLs that expire after 10 minutes. For long-term access, you can retrieve file assets as base64-encoded strings, which embeds the data directly in the task response. Set base64_urls=True when fetching a task to get base64-encoded strings:
from chunkr_ai import Chunkr
import os

client = Chunkr(api_key=os.environ['CHUNKR_API_KEY'])

# Set base64_urls=True
# Assets are now embedded as base64 strings and won't expire
task = client.tasks.parse.get(task_id='task_123', base64_urls=True)

Asynchronous Processing (Python)

For Python applications that require non-blocking operations, you can use the AsyncChunkr client instead of Chunkr. The async client provides the exact same methods and parameters, but all operations are awaitable.
import asyncio
import os

from chunkr_ai import AsyncChunkr
from tenacity import retry, retry_if_result, stop_after_attempt, wait_fixed


@retry(
    retry=retry_if_result(lambda result: not result.completed),
    stop=stop_after_attempt(25),
    wait=wait_fixed(3),
)
async def get_task(client: AsyncChunkr, task_id: str):
    return await client.tasks.parse.get(task_id=task_id)


async def process_document():
    client = AsyncChunkr(api_key=os.environ["CHUNKR_API_KEY"])

    # Create task
    task = await client.tasks.parse.create(file="https://chunkr.ai/doc.pdf")
    print(f"Task created with ID: {task.task_id}")

    # Get results
    task = await get_task(client, task.task_id)

    print(task.status)
    if task.status == "Succeeded" and task.output is not None:
        print(f"Processed {len(task.output.chunks)} chunks")


# Run with asyncio
asyncio.run(process_document())
Key points about async processing:
  • Import AsyncChunkr instead of Chunkr
  • Use await before all client method calls
  • All method names and parameters remain exactly the same
  • Perfect for applications already using asyncio or handling multiple concurrent operations
This means you don’t need to learn a different API - just switch the client class and add await to your calls.

Data Retention

While we store all outputs, original files, and image crops, you can use Chunkr solely as a processing engine. For security and privacy, use the expires_in parameter to automatically delete all task data from Chunkr’s servers after processing. Here’s an example config that sets the data to expire in 24 hours for Zero Data Retention. You would then use the get methods described above to retrieve your results before the data expires:
from chunkr_ai import Chunkr

client = Chunkr()

# Set expires_in for Zero Data Retention (ZDR)
task = client.tasks.parse.create(
    file='https://chunkr.ai/doc.pdf',
    expires_in=24 * 60 * 60,  # After 24 hours
)

Advanced Features

While creating and reading tasks are the most common operations, Chunkr also provides functionality for more advanced task management:
  • List Tasks: View all your tasks with pagination, filtering, and sorting options.
  • Delete Tasks: Permanently remove completed or failed tasks to clean up your workspace
  • Cancel Tasks: Stop a queued task before it begins processing if it’s no longer needed
For detailed information on these operations, see the API references.