Semantic Search with Vector Databases (FAISS, ChromaDB, Pinecone)
In today's data-driven world, finding relevant information quickly is crucial. Traditional keyword-based search engines often fall short when you need more context-aware retrieval. That’s where semantic search steps in — and Langformers makes it easier than ever to set up your own semantic search engine in just a few lines of code!
This post will walk you through how to quickly set up semantic search with Langformers using popular vector databases like FAISS, ChromaDB, and Pinecone.
What is Semantic Search?
Semantic search goes beyond simple keyword matching. It understands the meaning behind your query and finds the most relevant documents, even if they don’t contain the exact words you typed.
By utilizing vector embeddings — numerical representations of text — semantic search can recognize similarity in meaning, not just surface-level text.
For example, the sentences "I love tofu." and "I love soy products." share a strong semantic similarity. A well-trained sentence embedding model should capture this and produce embeddings that are close together in vector space. In contrast, a sentence like "I want to go to KFC." would have a much lower similarity to the first two.
Let's get started with setting up our semantic search engine. 😄
Installing Langformers
First, make sure you have Langformers installed in your environment. If not, install it using pip:
pip install -U langformers
Semantic Search with Langformers
With Langformers, you can create a complete semantic search engine by specifying just three things:
- An embedding model (like Sentence Transformers),
- A vector database (FAISS, ChromaDB, or Pinecone),
- An index type (if needed).
Here’s a simple example:
# Import langformers
from langformers import tasks
# Initialize a searcher
searcher = tasks.create_searcher(
embedder="sentence-transformers/all-MiniLM-L12-v2",
database="faiss",
index_type="HNSW"
)
That’s it! You now have a working semantic searcher!
Adding Data to the Search Engine
Let’s add some sentences and their corresponding metadata to the searcher:
# Sentences to add in the vector database
sentences = [
"He is learning Python programming.",
"The coffee shop opens at 8 AM.",
"She bought a new laptop yesterday.",
"He loves to play basketball with friends.",
"Artificial Intelligence is evolving rapidly.",
"He studies CS at the University of Melbourne."
]
# Metadata for the respective sentences
metadata = [
{"action": "learning", "category": "education"},
{"action": "opens", "category": "business"},
{"action": "bought", "category": "shopping"},
{"action": "loves", "category": "sports"},
{"action": "evolving", "category": "technology"},
{"action": "studies", "category": "education"}
]
# Add the sentences
searcher.add(texts=sentences, metadata=metadata)
Searching the Database
Now, let's search the database using a semantic query:
# Define a search query
query_sentence = "computer science"
# Query the vector database
results = searcher.query(query=query_sentence, items=2, include_metadata=True)
print(results)
Switching Databases: FAISS, ChromaDB, and Pinecone
Langformers supports multiple vector stores out of the box:
- FAISS (for local search)
searcher = tasks.create_searcher(
embedder="sentence-transformers/all-MiniLM-L12-v2",
database="faiss",
index_type="HNSW"
)
- ChromaDB (for local search)
searcher = tasks.create_searcher(
embedder="sentence-transformers/all-MiniLM-L12-v2",
database="chromadb"
)
- Pinecone (for cloud-hosted scalable search)
searcher = tasks.create_searcher(
embedder="sentence-transformers/all-MiniLM-L12-v2",
database="pinecone",
api_key="your-api-key-here"
)
Loading an Existing Database
Langformers automatically saves your index/database after you initialize and add data. To load an existing database:
- FAISS
searcher = tasks.create_searcher(database="faiss", index_path="path/to/index", db_path="path/to/db")
- ChromaDB
searcher = tasks.create_searcher(database="chromadb", db_path="path/to/db", collection_name="your_collection")
- Pinecone
searcher = tasks.create_searcher(database="pinecone", index_name="your_index_name", api_key="your-api-key-here")
That's all. It is that easy.
Building a semantic search engine no longer has to be complex or time-consuming. With Langformers, you can rapidly set up, add data, and start querying with minimal effort — all while having the flexibility to choose the backend that fits your needs best.
Happy searching? 🚀
View official documentation here: https://langformers.com/semantic-search.html