Chunking
Understanding our intelligent chunking
Chunking is the process of grouping segments into logical chunks. Our segmentation models can produce a hierarchy of segments, and we can use that to create chunks.
The exact strategy varies based on end-application and use-case, but generally the goal is to put together segments in a way that maintains the context of the information.
We offer an intelligent chunking algorithm but you can also turn it off to receive individual segments to handle chunking yourself.
Configuration
You can configure intelligent chunking by setting the target_chunk_length
parameter. This is the approximate number of words a chunk can contain.
Intelligent Chunking
The chunking algorithm works as follows:
- Remove headers and footers
- Add segments to a chunk until we hit a breaking condition, or if the chunk length >=
target_chunk_length
.
Breaking Conditions
We go down the segment hierarchy (from Title -> Section header -> Other). Once we hit a segment_type
that is higher in the hierarchy than the current segment type, we break the chunk.
Turning it off
Setting target_chunk_length
to 0
will turn off intelligent chunking, and each chunk will contain a single segment. Click here to learn more about the chunk model.