Skip to main content
All processing in Chunkr is handled through a task-based system. When you submit a file, a new task is created, and you receive a task_id. Use this ID to get a task asynchronously. This asynchronous approach allows you to submit long-running jobs without tying up your application. Once the task status is Succeeded, the full processing results are available.

Key Features

  • Scalability: Handle millions of files without tying-up your infrastructure.
  • Broad File Support: Process a variety of file types like PDFs, Excel, PPTs, Doc.
  • Multiple Input Sources: Provide files from local path, from a URL, or as a base64-encoded string.
  • Data Retention: Set custom expiration times for automatic data deletion.
  • Webhook Support: Receive real-time notifications when tasks complete.

Example: Upload a file, create a task, and get results

Here’s how to upload, create a parse task, and retrieve results.
import os
import time

from chunkr_ai import Chunkr

# Initialize the client
client = Chunkr(api_key=os.environ["CHUNKR_API_KEY"])

# 1. Upload a local file
with open("path/to/doc.pdf", "rb") as f:
    uploaded = client.files.create(file=f)

# 2. Create a parse task using the uploaded file URL
task = client.tasks.parse.create(file=uploaded.url)
print(f"Task created with ID: {task.task_id}")

# 3. Wait for the task to complete
while True:
    task = client.tasks.parse.get(task_id=task.task_id)
    if task.completed:
        break
    else:
        print(f"Task status: {task.status}")
        time.sleep(3)

# 4. Access the results
if task.output is not None:
    print("Task completed successfully!")
    print(f"Document has {len(task.output.chunks)} chunks")
else:
    print(f"Task status: {task.status}")