Documentation Index
Fetch the complete documentation index at: https://docs.chunkr.ai/llms.txt
Use this file to discover all available pages before exploring further.
All processing in Chunkr is handled through a task-based system. When you submit a file, a new task is created, and you receive a task_id. Use this ID to get a task asynchronously.
This asynchronous approach allows you to submit long-running jobs without tying up your application. Once the task status is Succeeded, the full processing results are available.
Key Features
- Scalability: Handle millions of files without tying-up your infrastructure.
- Broad File Support: Process a variety of file types like PDFs, Excel, PPTs, Doc.
- Multiple Input Sources: Provide files from local path, from a URL, or as a base64-encoded string.
- Data Retention: Set custom expiration times for automatic data deletion.
- Webhook Support: Receive real-time notifications when tasks complete.
Example: Upload a file, create a task, and get results
Here’s how to upload, create a parse task, and retrieve results.
import os
import time
from chunkr_ai import Chunkr
# Initialize the client
client = Chunkr(api_key=os.environ["CHUNKR_API_KEY"])
# 1. Upload a local file
with open("path/to/doc.pdf", "rb") as f:
uploaded = client.files.create(file=f)
# 2. Create a parse task using the uploaded file URL
parse_task = client.tasks.parse.create(file=uploaded.url)
print(f"Task created with ID: {parse_task.task_id}")
# 3. Wait for the task to complete
while not parse_task.completed:
print(f"Task status: {parse_task.status}")
time.sleep(3)
parse_task = client.tasks.parse.get(task_id=parse_task.task_id)
# 4. Access the results
if parse_task.status == "Succeeded" and parse_task.output is not None:
print("Task completed successfully!")
print(f"Document has {len(parse_task.output.chunks)} chunks")
else: # Could be "Failed" or "Cancelled"
print(f"Task status: {parse_task.status}")