The chunkr AI API allows you to specify a segmentation_strategy for each document. This strategy controls how the document is segmented.

We have two strategies:

  • LayoutAnalysis: Run our state-of-the-art layout analysis model to identify the layout elements. This is the default strategy.
  • Page: Each segment is a page.

This is how you can configure the segmentation strategy:

from chunkr_ai import Chunkr
from chunkr_ai.models import Configuration, SegmentationStrategy

chunkr = Chunkr()

chunkr.upload("path/to/file", Configuration(
    segmentation_strategy=SegmentationStrategy.LAYOUT_ANALYSIS 
))

When to use each strategy

For most documents, we recommend using the LayoutAnalysis strategy. This will give you the best results.

Use Page for:

  • Faster processing speed when you need quick results and layout isn’t critical
  • Documents with unusual layouts that confuse the layout analysis model
  • If the layout is complex but not very information dense, Page + VLM can generate surprisingly good HTML and markdown (see Segment Processing).