Chunking is the process of splitting a document into smaller segments. These chunks can be used for semantic search, and better LLM performance.

By leveraging layout analysis, we create intelligent chunks that preserve document structure and context. Our algorithm:

  • Respects natural document boundaries (paragraphs, sections)
  • Maintains semantic relationships between segments
  • Optimizes chunk size for LLM processing

You can review the implementation in our GitHub repository.

from chunkr_ai import Chunkr
from chunkr_ai.models import (
    ChunkProcessing,
    Configuration
)

chunkr = Chunkr()

chunkr.upload("path/to/file", Configuration(
    chunk_processing=ChunkProcessing(
        ignore_headers_and_footers=True, 
        target_length=1024 
    ),
))

Defaults

  • ignore_headers_and_footers: True
  • target_length: 512