Layout analysis is a crucial step in document processing that involves analyzing and understanding the spatial arrangement of content within a document. It helps identify and classify different regions of a document, such asDocumentation Index
Fetch the complete documentation index at: https://docs.chunkr.ai/llms.txt
Use this file to discover all available pages before exploring further.
text, table, headers, footers, and pictures.
Basically, it tells us where and what is in the document.
Why is Layout Analysis Important?
Layout analysis serves several key purposes:- Structure Recognition: It helps identify the logical structure and reading order of a document
- Data Extraction: By identifying specific regions (like tables, headers, or paragraphs), we can use specialized extraction methods for each type, improving accuracy
- Better Chunking: Layout elements allows us to identify sections of the document and generate better chunks.
- Citations: It allows LLMs to cite the correct region of the document, which can then be highlighted for a better experience.

Segment Types
Chunkr uses a two way vision-grid transformer to identify the layout of the document. We support the following segment types:- Caption: Text describing figures, tables, or other visual elements
- Footnote: References or additional information at the bottom of pages
- Formula: Mathematical or scientific equations
- List Item: Individual items in bulleted or numbered lists
- Page: Entire page (
segmentation_strategy=Page) - Page Footer: Content that appears at the bottom of each page
- Page Header: Content that appears at the top of each page
- Picture: Images, diagrams, or other visual elements
- Section Header: Headers that divide the document into sections
- Table: Structured data arranged in rows and columns
- Text: Regular paragraph text
- Title: Main document title