Configuration - Chunkr

Different applications have different needs. Chunkr AI API is designed to be flexible and customizable to meet your specific requirements. We support the following configuration options:

chunk_processing: Controls the setting for the chunking and post-processing of each chunk.
expires_in: The number of seconds until task is deleted.
ocr_strategy: Controls the Optical Character Recognition (OCR) strategy.
pipeline: Options for layout analysis and OCR providers.
segment_processing: Controls the post-processing of each segment type. Allows you to generate HTML, markdown and run custom VLM prompts.
segmentation_strategy: Controls the segmentation strategy

The configuration options can be combined to create a customized processing pipeline. When a Task is created, the configuration is done through the Configuration object. Here is an example of how to configure the API to run a custom VLM prompt on each picture in a document:

from chunkr_ai import Chunkr
from chunkr_ai.models import (
    Configuration, 
    GenerationConfig, 
    GenerationStrategy, 
    SegmentProcessing, 
    SegmentationStrategy,
    SegmentFormat
)

chunkr = Chunkr()

chunkr.upload("path/to/file", Configuration(
    segment_processing=SegmentProcessing(
        picture=GenerationConfig(
            format=SegmentFormat.MARKDOWN,
            strategy=GenerationStrategy.LLM,
            llm="Does this picture have a cat in it? Answer must be true or false."
        )
    ),
))