Pipeline

In addition to using chunkr’s default models, we also provide a pipeline interface to allow you to use Azure Document Intelligence as a provider. When using Azure, instead of the default models, your files are processed through the Azure layout analysis model, the Azure OCR model, and the Azure table OCR model.

You can still leverage Chunkr’s intelligent chunking and segment processing. The output will be mapped to the Chunkr output format.

When to use Azure

If our queue is full, you can use Azure to process your files
If you don’t need VLMs on your tables, you can use the Azure table OCR model. This will allow you to get much faster results.
Better OCR (we are working on it!)

We improve the outputs from Azure with a combination of last-mile engineering and LLMs. In our testing, the hybrid approach (traditional layout analysis + OCR for simple elements and LLMs for complex elements) has the most accurate results.

Example

Use default segment processing and chunking with the Chunkr layout analysis model and OCR model.

from chunkr_ai import Chunkr
from chunkr_ai.models import (
    Configuration,
    Pipeline
)

chunkr = Chunkr()

chunkr.upload("path/to/file", Configuration(
    pipeline=Pipeline.CHUNKR
))

Use default chunking with the Azure layout analysis model, OCR model and table OCR model. In this case, the content for the Table segment will be generated by the Azure table OCR model.

from chunkr_ai import Chunkr
from chunkr_ai.models import (
    Configuration,
    GenerationConfig,
    GenerationStrategy,
    SegmentProcessing,
    Pipeline,
    SegmentFormat
)

chunkr = Chunkr()

chunkr.upload("path/to/file", Configuration(
    segment_processing=SegmentProcessing(
        Table=GenerationConfig(
            format=SegmentFormat.MARKDOWN,
            strategy=GenerationStrategy.AUTO
        ),
    ),
    pipeline=Pipeline.AZURE,
))

Get Started

Features

Use Cases

Self Hosting

When to use Azure

Example

Get Started

Features

Use Cases

Self Hosting

​When to use Azure

​Example

When to use Azure

Example