Features
Segment Processing
Post-processing of segments
Chunkr provides various post-processing capabilities. Once segments have been extracted, you can use our defaults or configure how each segment type is processed.
Processing Methods
- Vision Language Models (VLM): Leverage AI models to generate HTML/Markdown content and run custom prompts
- Heuristic-based Processing: Apply rule-based algorithms for consistent HTML/Markdown generation
Additional Features
- Cropping: Get back the cropped images
These processing options allow you to build highly specific pipelines. Our default processing works for most documents, and RAG use cases.
Defaults
By default, Chunkr applies the following processing strategies for each segment type.
You can override these defaults by specifying custom configuration in your SegmentProcessing
settings.
HTML and Markdown are always returned.
Example
Here is a quick example of how to use Chunkr to process a document with different segment processing configurations. This configuration will:
- Summarize the key trends of all
Table
segments - Crop all
SectionHeader
segments to the bounding box - Generate HTML using heurstics and Markdown using a VLM for all
Text
segments