Excel files support all the same configuration options as regular documents, but some behave differently due to Excel’s native spreadsheet structure.
Quick Summary: Most configuration options work identically to other file types. OCR, high resolution, and pipeline settings are ignored since Excel files use native processing.

Configuration Options Overview

Excel configuration options fall into two categories:
CategoryOptionsBehavior
Work NormallySegmentation, Segment Processing, Chunking, LLM Processing, Error Handling, ExpirationSame as other file types with minor Excel-specific notes
IgnoredOCR Strategy, High Resolution, Pipeline ProviderNo effect on Excel processing

Options That Work Normally

These configuration options work the same as other file types, with some Excel-specific behavior noted below.

Segmentation Strategy

Controls how Excel sheets are analyzed and segmented.
from chunkr_ai import Chunkr
from chunkr_ai.models import Configuration, SegmentationStrategy

config = Configuration(
    segmentation_strategy=SegmentationStrategy.LAYOUT_ANALYSIS
)

task = chunkr.upload("spreadsheet.xlsx", config)
Available Options:
  • LayoutAnalysis (Recommended): Runs Excel layout analysis to identify tables, charts, and text regions
  • Page: Outputs each full Excel sheet as a single Table segment

Segment Processing

Configure how different segment types are processed and formatted.
from chunkr_ai.models import (
    Configuration,
    SegmentProcessing,
    GenerationConfig,
    SegmentFormat
)

config = Configuration(
    segment_processing=SegmentProcessing(
        Table=GenerationConfig(
            format=SegmentFormat.HTML,
            llm="Extract key insights from this Excel table"
        )
    )
)
Excel-Specific Behavior:
  • Tables: The strategy field (Auto/LLM) is ignored - tables are always extracted natively from Excel
  • LLM Prompts: You can still use custom llm prompts for table processing
  • All Other Segments: Picture, Text, Title, etc. work exactly as with other file types

Chunk Processing

Controls how content is divided into chunks for RAG applications.
from chunkr_ai.models import Configuration, ChunkProcessing

config = Configuration(
    chunk_processing=ChunkProcessing(
        target_chunk_length=1000
    )
)
Excel-Specific Behavior:
  • Works the same as other file types
  • Important: Chunks will break on new sheets (unlike PDFs that chunk across pages)
  • Each Excel worksheet is treated as a boundary for chunking

LLM Processing

Configure custom models and prompts for content generation.
from chunkr_ai.models import Configuration, LlmProcessing

config = Configuration(
    llm_processing=LlmProcessing(
        # Custom LLM configuration
    )
)
Excel Behavior:
  • Works exactly the same as other file types
  • Affects segment processing only
  • Can be combined with segment-specific LLM prompts

Error Handling Strategy

Controls how processing errors are handled.
from chunkr_ai.models import Configuration, ErrorHandlingStrategy

config = Configuration(
    error_handling=ErrorHandlingStrategy.CONTINUE
)
Available Options:
  • Fail: Stop processing on any error
  • Continue: Continue processing despite non-critical errors

Expiration Time

Sets how long task results are retained before deletion.
config = Configuration(
    expires_in=3600  # 1 hour
)
Excel Behavior: Works exactly the same as other file types.

Options That Are Ignored

These configuration options have no effect when processing Excel files because Excel uses native processing methods.

OCR Strategy

Ignored for Excel files - Excel files contain native text data, so OCR is never applied regardless of this setting.
All OCR-related configurations (All, Auto) are ignored since Excel files provide native text extraction.

High Resolution Processing

Ignored for Excel files - Excel files use native resolution processing regardless of this setting.
The high_resolution setting has no effect since Excel files don’t require image-based processing for text content.

Pipeline Provider (Azure Feature)

Ignored for Excel files - Excel files always use Chunkr’s native processing pipeline.
Azure Document Intelligence and other pipeline providers are not used for Excel processing.

Complete Configuration Example

Here’s a comprehensive configuration example optimized for Excel processing:
from chunkr_ai import Chunkr
from chunkr_ai.models import (
    Configuration,
    SegmentationStrategy,
    ErrorHandlingStrategy,
    SegmentProcessing,
    GenerationConfig,
    SegmentFormat
)

chunkr = Chunkr()

# Optimal Excel configuration
config = Configuration(
    # Core settings
    segmentation_strategy=SegmentationStrategy.LAYOUT_ANALYSIS,
    error_handling=ErrorHandlingStrategy.CONTINUE,
    
    # Segment processing
    segment_processing=SegmentProcessing(
        Table=GenerationConfig(
            format=SegmentFormat.HTML,
            llm="Analyze this Excel table and extract key insights"
        ),
        Picture=GenerationConfig(
            llm="Describe this Excel chart with key data points"
        )
    ),
    
    # Task settings
    expires_in=7200  # 2 hours
)

task = chunkr.upload("financial_report.xlsx", config)