Features
Intelligent chunking
Chunking
Chunking is the process of splitting a document into smaller segments. These chunks can be used for semantic search, and better LLM performance.
By leveraging layout analysis, we create intelligent chunks that preserve document structure and context. Our algorithm:
- Respects natural document boundaries (paragraphs, sections)
- Maintains semantic relationships between segments
- Optimizes chunk size for LLM processing
You can review the implementation in our GitHub repository.
Defaults
ignore_headers_and_footers
: Truetarget_length
: 512