large language models

Data Labelling Using LLMs with Langformers

Rabindra Lamsal

22 Apr 2025 • 2 min read

When most people think of Large Language Models (LLMs), they think of conversations, content generation, or summarization. But LLMs are also incredibly effective at data labelling — and now, with Langformers, you can easily utilize that power for you text labelling tasks.

Whether you're preparing training data, building a classifier, or just need quick annotations (maybe something around weak supervision), Langformers offers the simplest way to define labels and let LLMs do the heavy lifting.

How It Works

Langformers provides a high-level API to turn any supported LLM into a data labeller in just a couple of lines of code. All you need to do is:

Load an LLM with Langformers.
Define labels and conditions you care about.
Label texts.

It’s that simple.

Getting Started with Langformers

First, a few quick notes:

Hugging Face Models: Langformers supports chat-tuned models (those with a chat_template in their tokenizer_config.json) that are compatible with the Transformers library and your hardware.
- Example: meta-llama/Llama-3.2-1B-Instruct — make sure you have access via Hugging Face.
Ollama Models: Ensure you have Ollama installed and the model pulled.
- Install Ollama: Download Ollama
- Pull a model (example): ollama pull llama3.1:8b

Install Langformers

First, install Langformers using pip:

pip install -U langformers

Best practice: It’s recommended to create a virtual environment before installing any Python package globally. Check out Langformers official installation guide if you need help setting that up.

Langformers + LLMs for Data Labelling

Here's a quick example of how you can load an LLM, define labels and conditions, and label a text (for a single label task).

# Import langformers
from langformers import tasks

# Load an LLM as a data labeller
labeller = tasks.create_labeller(provider="huggingface", model_name="meta-llama/Meta-Llama-3-8B-Instruct", multi_label=False)

# Provide labels and conditions
conditions = {
    "Positive": "The text expresses a positive sentiment.",
    "Negative": "The text expresses a negative sentiment.",
    "Neutral": "The text does not express any emotions."
}

# Label a text
text = "No doubt, The Shawshank Redemption is a cinematic masterpiece."
labeller.label(text, conditions)

If your use case involves labelling a complete dataset, put `labeller.label()` inside a loop.

We could also pass multiple texts at once to the LLM, however LLMs might produce incorrect labels for texts as they go down the list. Therefore, it is best to label one sentence at a time, if computing resource is not an issue.

If we set multi_label=True, the LLM will get to select multiple labels.

View official documentation here: https://langformers.com/data-labelling-llms.html

How It Works

Getting Started with Langformers

Install Langformers

Langformers + LLMs for Data Labelling

Your Privacy Matters

Cookie Preferences