
Key Features
- Perfect Markdown & HTML: LLM-ready content (Markdown, HTML, tables, etc).
- Reading order intact: Maintains the natural reading flow for complex layouts.
- Granular bounding boxes: Pinpoints element coordinates with precision for easy citations.
- Native Spreadsheet handling: 100% reconstruction with formulas, styling, and cell values preserved; precise ranges; cleans tables and converts charts to structured data.
- Post-processing: Token-aware chunking, cropped images, and more.
Example: Parse and access chunk content
Here’s how you can parse a document and access its chunks using our SDKs.Our default configuration is optimized through extensive testing and provides
excellent results for most documents. You can customize parse if you have
specific requirements.
Advanced Configuration
Advanced Configuration
For a comprehensive breakdown of every available configuration, please refer to our API Reference. Here is an overview of our configuration options:
- Pipeline (
pipeline
): Choose the provider (Azure
orChunkr
) for layout analysis and OCR models. - Layout Analysis & OCR:
- Segmentation Strategy (
segmentation_strategy
): Choose betweenLayoutAnalysis
(default) or a full-page VLM approach for parsing. - OCR Strategy (
ocr_strategy
): UseAuto
to selectively apply OCR orAll
to force it on every page.
- Segmentation Strategy (
- Segment-level Customization (
segment_processing
): Control processing for each document element (e.g.,Text
,Table
,Picture
):- Processing Strategy (
strategy
): For each segment, set the strategy to generate HTML/Markdown.Auto
(simple OCR + logic),LLM
(VLM generation), orIgnore
(remove from output). - Format Control (
format
): Control the output format (Markdown
orHTML
) for segment content. - Extended Context (
extended_context
): Provide the full page image as additional context for VLM processing of a segment. Useful for cases like distant legends for tables and pictures. - Cropped Images (
crop_image
): Control if a cropped image of the segment is included.
- Processing Strategy (
- Chunking (
chunk_processing
): Configure chunking strategy, sizes, and token-counting model. - Error Handling (
error_handling
): Set toFail
(default) to stop on any error, orContinue
to process despite non-critical errors.