Chunkr AI is an API service to convert complex documents into LLM/RAG-ready data. We support a wide range of document types, including PDFs, Office files (Word, Excel, PowerPoint), and images.

Getting Started

To get started with Chunkr AI, follow these simple steps to set up your account and integrate our API into your application.

Step 1: Sign Up and Create an API Key

  1. Visit Chunkr AI
  2. Click on “Login” and create your account
  3. Once logged in, navigate to “API Keys” in the dashboard

Step 2: Install our client SDK

pip install chunkr-ai

Step 3: Upload your document

from chunkr_ai import Chunkr

# Initialize the Chunkr client with your API key - get this from https://chunkr.ai
chunkr = Chunkr(api_key="your_api_key")

# Upload a document via url or local file path
url = "https://chunkr-web.s3.us-east-1.amazonaws.com/landing_page/input/specs.pdf"
task = chunkr.upload(url) 

Step 4: Export the results

Chunkr AI will return a TaskResponse object. This object contains the results of the document conversion. You can export the results in various formats or load them into a variable.

# Export HTML of document
html = task.html(output_file="output.html")

# Export markdown of document
markdown = task.markdown(output_file="output.md")

# Export text of document
content = task.content(output_file="output.txt")

# Export result as JSON - TaskResponse is already in memory so no need to load it into a variable
task.json(output_file="output.json")

Step 5: Explore the output

The output of the task can be used to build your RAG pipeline. Checkout the API Reference for more details.

# The output of the task is a list of chunks
chunks = task.output.chunks

# Each chunk is a list of segments
for chunk in chunks:
    for segment in chunk.segments:
        print(segment.segment_type)

# You can also access the `embed` field in the chunk 
# for content to be used in RAG pipelines
for chunk in chunks:
    print(chunk.embed)

Step 6: Clean up

You can clean up the open connections by calling the close() method on the Chunkr client.

chunkr.close()

Authentication Options

You can authenticate with the Chunkr AI API in two ways:

  1. Direct API Key - Pass your API key directly when initializing the client
  2. Environment Variable - Set CHUNKR_API_KEY in your .env file
from chunkr_ai import Chunkr

# Option 1: Initialize with API key directly
chunkr = Chunkr(api_key="your_api_key")

# Option 2: Initialize without api_key parameter - will use CHUNKR_API_KEY from environment
chunkr = Chunkr()

Self Hosted

If you’re using a self-hosted deployment of Chunkr AI, you can configure the API URL when initializing the client:

from chunkr_ai import Chunkr

# Option 1: With direct API key
chunkr = Chunkr(
    api_key="your_api_key",
    base_url="https://your-self-hosted-chunkr.com"
)

# Option 2: Using environment variables
# Set CHUNKR_API_KEY and CHUNKR_URL in your .env file
chunkr = Chunkr()

When using environment variables for self-hosted deployments, set both CHUNKR_API_KEY and CHUNKR_URL in your .env file.

Was this page helpful?