Chunkr AI is designed to be configurable, so there probably exists a configuration that will work for your use case. Here are some examples to get you started.

Complex Documents with Extended Context

For documents with complex layouts where elements need surrounding context (such as tables with separate legends, charts with explanatory text, or images that need page context), enable extended context processing:

from chunkr_ai import Chunkr
from chunkr_ai.models import (
    Configuration,
    GenerationConfig,
    GenerationStrategy,
    SegmentProcessing
)

chunkr = Chunkr()

config = Configuration(
    high_resolution=True,
    segment_processing=SegmentProcessing(
        Table=GenerationConfig(
            html=GenerationStrategy.LLM,
            markdown=GenerationStrategy.LLM,
            extended_context=True
        ),
        Picture=GenerationConfig(
            html=GenerationStrategy.LLM,
            markdown=GenerationStrategy.LLM,
            extended_context=True
        ),
        Formula=GenerationConfig(
            html=GenerationStrategy.LLM,
            markdown=GenerationStrategy.LLM,
            extended_context=True
        )
    )
)

chunkr.upload("path/to/file", config)

Pre-signed URLs and Base64 Alternatives

When retrieving tasks, Chunkr generates pre-signed URLs for accessing:

  • Images
  • Input File
  • PDF File

These URLs expire after 10 minutes. For longer persistence or storage, you can request base64-encoded URLs instead:

from chunkr_ai import Chunkr

chunkr = Chunkr()

# Get task with base64-encoded URLs instead of pre-signed URLs
task = chunkr.get_task("task_123", base64_urls=True)