Features
Pipeline
Choose providers to process your documents
In addition to using chunkr’s default models, we also provide a pipeline interface to allow you to use Azure Document Intelligence as a provider. When using Azure, instead of the default models, your files are processed through the Azure layout analysis model, the Azure OCR model, and the Azure table OCR model.
You can still leverage Chunkr’s intelligent chunking and segment processing. The output will be mapped to the Chunkr output format.
When to use Azure
- If our queue is full, you can use Azure to process your files
- If you don’t need VLMs on your tables, you can use the Azure table OCR model. This will allow you to get much faster results.
- Better OCR (we are working on it!)
We improve the outputs from Azure with a combination of last-mile engineering and LLMs. In our testing, the hybrid approach (traditional layout analysis + OCR for simple elements and LLMs for complex elements) has the most accurate results.
Example
- Use default segment processing and chunking with the Chunkr layout analysis model and OCR model.
- Use default chunking with the Azure layout analysis model, OCR model and table OCR model.
In this case, the HTML and Markdown for the
Table
segment will be generated by the Azure table OCR model.
Was this page helpful?