Excel files support all the same configuration options as regular documents, but some behave differently due to Excel’s native spreadsheet structure.
Quick Summary: Most configuration options work identically to other file types. OCR, and pipeline settings are ignored since Excel files use native processing.
Configuration Options Overview
Excel configuration options fall into two categories:
| Category | Options | Behavior |
|---|
| Work Normally | Segmentation, Segment Processing, Chunking, LLM Processing, Error Handling, Expiration | Same as other file types with minor Excel-specific notes |
| Ignored | OCR Strategy, Pipeline Provider | No effect on Excel processing |
Options That Work Normally
These configuration options work the same as other file types, with some Excel-specific behavior noted below.
Segmentation Strategy
Controls how Excel sheets are analyzed and segmented.
from chunkr_ai import Chunkr
from chunkr_ai.models import Configuration, SegmentationStrategy
config = Configuration(
segmentation_strategy=SegmentationStrategy.LAYOUT_ANALYSIS
)
task = chunkr.upload("spreadsheet.xlsx", config)
Available Options:
LayoutAnalysis (Recommended): Runs Excel layout analysis to identify tables, charts, and text regions
Page: Outputs each full Excel sheet as a single Table segment
Segment Processing
Configure how different segment types are processed and formatted.
from chunkr_ai.models import (
Configuration,
SegmentProcessing,
GenerationConfig,
SegmentFormat
)
config = Configuration(
segment_processing=SegmentProcessing(
Table=GenerationConfig(
format=SegmentFormat.MARKDOWN, # get Tables as markdown
)
)
)
Excel-Specific Behavior:
- Tables: The
strategy field (Auto/LLM) is ignored - tables are always extracted natively from Excel
- All Other Segments: Picture, Text, Title, etc. work exactly as with other file types
Chunk Processing
Controls how content is divided into chunks for RAG applications.
from chunkr_ai.models import Configuration, ChunkProcessing
config = Configuration(
chunk_processing=ChunkProcessing(
target_chunk_length=1000
)
)
Excel-Specific Behavior:
- Works the same as other file types
- Important: Chunks will break on new sheets (unlike PDFs that chunk across pages)
- Each Excel worksheet is treated as a boundary for chunking
LLM Processing
Configure custom models and prompts for content generation.
from chunkr_ai.models import Configuration, LlmProcessing
config = Configuration(
llm_processing=LlmProcessing(
# Custom LLM configuration
)
)
Excel Behavior:
- Works exactly the same as other file types
- Affects segment processing only
- Can be combined with segment-specific LLM prompts
Error Handling Strategy
Controls how processing errors are handled.
from chunkr_ai.models import Configuration, ErrorHandlingStrategy
config = Configuration(
error_handling=ErrorHandlingStrategy.CONTINUE
)
Available Options:
Fail: Stop processing on any error
Continue: Continue processing despite non-critical errors
Expiration Time
Sets how long task results are retained before deletion.
config = Configuration(
expires_in=3600 # 1 hour
)
Excel Behavior: Works exactly the same as other file types.
Options That Are Ignored
These configuration options have no effect when processing Excel files because Excel uses native processing methods.
OCR Strategy
Ignored for Excel files - Excel files contain native text data, so OCR is never applied regardless of this setting.
All OCR-related configurations (All, Auto) are ignored since Excel files provide native text extraction.
Pipeline Provider (Azure Feature)
Ignored for Excel files - Excel files always use Chunkr’s native processing pipeline.
Azure Document Intelligence and other pipeline providers are not used for Excel processing.
Complete Configuration Example
Here’s a comprehensive configuration example optimized for Excel processing:
from chunkr_ai import Chunkr
from chunkr_ai.models import (
Configuration,
SegmentationStrategy,
ErrorHandlingStrategy,
SegmentProcessing,
GenerationConfig,
SegmentFormat
)
chunkr = Chunkr()
# Optimal Excel configuration
config = Configuration(
# Core settings
segmentation_strategy=SegmentationStrategy.LAYOUT_ANALYSIS,
error_handling=ErrorHandlingStrategy.CONTINUE,
# Segment processing
segment_processing=SegmentProcessing(
Table=GenerationConfig(
format=SegmentFormat.HTML,
llm="Analyze this Excel table and extract key insights"
),
Picture=GenerationConfig(
llm="Describe this Excel chart with key data points"
)
),
# Task settings
expires_in=7200 # 2 hours
)
task = chunkr.upload("financial_report.xlsx", config)