Excel Configuration Options

Excel files support all the same configuration options as regular documents, but some behave differently due to Excel’s native spreadsheet structure.

Quick Summary: Most configuration options work identically to other file types. OCR, and pipeline settings are ignored since Excel files use native processing.

Configuration Options Overview

Excel configuration options fall into two categories:

Category	Options	Behavior
Work Normally	Segmentation, Segment Processing, Chunking, LLM Processing, Error Handling, Expiration	Same as other file types with minor Excel-specific notes
Ignored	OCR Strategy, Pipeline Provider	No effect on Excel processing

Options That Work Normally

These configuration options work the same as other file types, with some Excel-specific behavior noted below.

Segmentation Strategy

Controls how Excel sheets are analyzed and segmented.

from chunkr_ai import Chunkr
from chunkr_ai.models import Configuration, SegmentationStrategy

config = Configuration(
    segmentation_strategy=SegmentationStrategy.LAYOUT_ANALYSIS
)

task = chunkr.upload("spreadsheet.xlsx", config)

Available Options:

LayoutAnalysis (Recommended): Runs Excel layout analysis to identify tables, charts, and text regions
Page: Outputs each full Excel sheet as a single Table segment

Segment Processing

Configure how different segment types are processed and formatted.

from chunkr_ai.models import (
    Configuration,
    SegmentProcessing,
    GenerationConfig,
    SegmentFormat
)

config = Configuration(
    segment_processing=SegmentProcessing(
        Table=GenerationConfig(
            format=SegmentFormat.MARKDOWN, # get Tables as markdown
        )
    )
)

Excel-Specific Behavior:

Tables: The strategy field (Auto/LLM) is ignored - tables are always extracted natively from Excel
All Other Segments: Picture, Text, Title, etc. work exactly as with other file types

Chunk Processing

Controls how content is divided into chunks for RAG applications.

from chunkr_ai.models import Configuration, ChunkProcessing

config = Configuration(
    chunk_processing=ChunkProcessing(
        target_chunk_length=1000
    )
)

Excel-Specific Behavior:

Works the same as other file types
Important: Chunks will break on new sheets (unlike PDFs that chunk across pages)
Each Excel worksheet is treated as a boundary for chunking

LLM Processing

Configure custom models and prompts for content generation.

from chunkr_ai.models import Configuration, LlmProcessing

config = Configuration(
    llm_processing=LlmProcessing(
        # Custom LLM configuration
    )
)

Excel Behavior:

Works exactly the same as other file types
Affects segment processing only
Can be combined with segment-specific LLM prompts

Error Handling Strategy

Controls how processing errors are handled.

from chunkr_ai.models import Configuration, ErrorHandlingStrategy

config = Configuration(
    error_handling=ErrorHandlingStrategy.CONTINUE
)

Available Options:

Fail: Stop processing on any error
Continue: Continue processing despite non-critical errors

Expiration Time

Sets how long task results are retained before deletion.

config = Configuration(
    expires_in=3600  # 1 hour
)

Excel Behavior: Works exactly the same as other file types.

Options That Are Ignored

These configuration options have no effect when processing Excel files because Excel uses native processing methods.

OCR Strategy

Ignored for Excel files - Excel files contain native text data, so OCR is never applied regardless of this setting.

All OCR-related configurations (All, Auto) are ignored since Excel files provide native text extraction.

Pipeline Provider (Azure Feature)

Ignored for Excel files - Excel files always use Chunkr’s native processing pipeline.

Azure Document Intelligence and other pipeline providers are not used for Excel processing.

Complete Configuration Example

Here’s a comprehensive configuration example optimized for Excel processing:

from chunkr_ai import Chunkr
from chunkr_ai.models import (
    Configuration,
    SegmentationStrategy,
    ErrorHandlingStrategy,
    SegmentProcessing,
    GenerationConfig,
    SegmentFormat
)

chunkr = Chunkr()

# Optimal Excel configuration
config = Configuration(
    # Core settings
    segmentation_strategy=SegmentationStrategy.LAYOUT_ANALYSIS,
    error_handling=ErrorHandlingStrategy.CONTINUE,
    
    # Segment processing
    segment_processing=SegmentProcessing(
        Table=GenerationConfig(
            format=SegmentFormat.HTML,
            llm="Analyze this Excel table and extract key insights"
        ),
        Picture=GenerationConfig(
            llm="Describe this Excel chart with key data points"
        )
    ),
    
    # Task settings
    expires_in=7200  # 2 hours
)

task = chunkr.upload("financial_report.xlsx", config)

Understanding Output

Learn about Excel-specific response fields and data structures

Segment Processing

Detailed segment processing configuration options

Chunking

Advanced chunking configuration for RAG applications

LLM Processing

Custom LLM models and prompt configuration

Get Started

Features

Excel Parser

Webhooks

Use Cases

Self Hosting

Excel Configuration Options

Configuration Options Overview

Options That Work Normally

Segmentation Strategy

Segment Processing

Chunk Processing

LLM Processing

Error Handling Strategy

Expiration Time

Options That Are Ignored

OCR Strategy

Pipeline Provider (Azure Feature)

Complete Configuration Example

Understanding Output

Segment Processing

Chunking

LLM Processing

Get Started

Features

Excel Parser

Webhooks

Use Cases

Self Hosting

​Configuration Options Overview

​Options That Work Normally

​Segmentation Strategy

​Segment Processing

​Chunk Processing

​LLM Processing

​Error Handling Strategy

​Expiration Time

​Options That Are Ignored

​OCR Strategy

​Pipeline Provider (Azure Feature)

​Complete Configuration Example

​Related Documentation

Understanding Output

Segment Processing

Chunking

LLM Processing

Configuration Options Overview

Options That Work Normally

Segmentation Strategy

Segment Processing

Chunk Processing

LLM Processing

Error Handling Strategy

Expiration Time

Options That Are Ignored

OCR Strategy

Pipeline Provider (Azure Feature)

Complete Configuration Example

Related Documentation