large language models

Fixed-size, Semantic and Recursive Chunking Strategies for LLMs

Rabindra Lamsal

29 Apr 2025 • 3 min read

In modern Retrieval-Augmented Generation (RAG) systems, handling large documents efficiently is critical. Embedding models have token limits that, when exceeded, can lead to incomplete processing or errors. This is where chunking becomes critical. By breaking large documents into smaller, manageable pieces, chunking ensures that information remains accessible, relevant, and optimized for retrieval and processing.

In this blog post, we will dive deep into chunking techniques with Langformers, exploring all three Fixed-size, Semantic and Recursive chunking strategies — complete with practical examples.

What is Chunking?

Chunking is the process of splitting a large document into smaller units called chunks. Each chunk is small enough to fit within the token limits of the chosen embedding model, yet sufficient enough to retain meaningful information.

The ultimate goal of chunking is to improve:

Retrieval accuracy — relevant chunks are retrieved instead of entire documents.
Processing efficiency — smaller inputs lead to faster model inference. LLMs have token generation limits; feeding a whole text document, such as a book, is not feasible.

Across all chunking strategies in Langformers, tokenization plays a central role. The chunk size is defined in terms of tokens (not words), and the number of tokens can vary based on the tokenizer used.

Chunking Strategies in Langformers

As of writing this post, Langformers provides three main chunking strategies:

Fixed-size Chunking (also with overlapping)
Semantic Chunking
Recursive Chunking

Let's explore each in detail.

Fixed-size Chunking

Fixed-size chunking is the simplest method: documents are split into chunks of a specified number of tokens. It’s fast, predictable, and works well for content where structure is less important.

Key Features

Divides text based purely on token count.
Optional overlapping between chunks for better context preservation.

Let's see how we can implement this chunking strategy with Langformers.

First, make sure you have Langformers installed in your environment. If not, install it using pip:

pip install -U langformers

Fixed-size Chunking Example

# Import Langformers
from langformers import tasks

# Create a fixed-size chunker
chunker = tasks.create_chunker(strategy="fixed_size", tokenizer="sentence-transformers/all-mpnet-base-v2")

# Chunk a document
chunks = chunker.chunk(
    document="This is a test document. It contains several sentences. We will chunk it into smaller pieces.",
    chunk_size=8
)

In this example:

We use the sentence-transformers/all-mpnet-base-v2 tokenizer.
Each chunk contains approximately 8 tokens.
Overlapping chunks can be created by specifying the overlap parameter.

Semantic Chunking

Semantic chunking goes a step further by considering the meaning of the content. It first creates small initial chunks, then merges them based on semantic similarity. This leads to more contextually meaningful chunks.

Semantic chunking is ideal for:

Technical documents
Research papers
Legal texts
Any domain where preserving semantic integrity is crucial

How Semantic Chunking Works

Initially, the document is split into small chunks based on a token limit.
The chunks are then grouped together based on their semantic similarity, controlled by a similarity threshold.

Semantic Chunking Example

# Import Langformers
from langformers import tasks

# Create a semantic chunker
chunker = tasks.create_chunker(strategy="semantic", model_name="sentence-transformers/all-mpnet-base-v2")

# Chunk a document
chunks = chunker.chunk(
    document="Cats are awesome. Dogs are awesome. Python is amazing.",
    initial_chunk_size=4,
    max_chunk_size=10,
    similarity_threshold=0.3
)

In this example:

The document is initially split into very small chunks (4 tokens).
Similar chunks are merged until a maximum size of 10 tokens is reached.
Only chunks with similarity greater than 0.3 are grouped together.

Recursive Chunking

With this chunking strategy, the document is split hierarchically based on the provided separators.

Generally, a document can be first divided by sections, then by paragraphs, and further down at token -level as needed. Langformers adopts this strategy by first splitting text at double newlines (\n\n) to identify sections, then at single newlines (\n) for paragraphs, and eventually down to individual tokens. If any chunk exceeds the chunk size limit, it is recursively broken down into smaller chunks until all fit within the allowed size.

We also have the option to specify custom separators for splitting the document.

Recursive Chunking Example

# Import Langformers
from langformers import tasks

# Create a chunker
chunker = tasks.create_chunker(strategy="recursive", tokenizer="sentence-transformers/all-mpnet-base-v2")

# Chunk a document
chunks = chunker.chunk(document="Cats are awesome.\n\nDogs are awesome.\nPython is amazing.",
                        separators=["\n\n", "\n"],
                        chunk_size=5)

If chunk size is not provided, tokenizer's max length will be used.

That's all. It's that easy to split documents into chunks with Langformers.

Choosing the right chunking strategy can make a significant difference in the performance of your RAG pipeline or search system. Langformers will continue to add additional chunking strategies in the future. Stay posted!

View official documentation here: https://langformers.com/chunking-for-llms.html