Implement FAISS-Based Semantic Search with Sentence Transformers
FAISS (Facebook AI Similarity Search) is a powerful library that speeds up semantic search by enabling fast nearest-neighbor search on large-scale datasets.
Install Required Libraries
If you haven't installed them yet:
pip install sentence-transformers faiss-cpu
How FAISS Works:
- Convert all documents into embeddings using Sentence Transformers.
- Store embeddings in a FAISS index.
- Convert the search query into an embedding.
- Use FAISS to quickly retrieve the most similar documents.
1. Generate Embeddings for Documents
We'll use Sentence Transformers to generate embeddings.
from sentence_transformers import SentenceTransformer
import numpy as np
# Load model
model = SentenceTransformer('sentence-transformers/all-MiniLM-L6-v2')
# Example documents
documents = [
"I love playing football.",
"Artificial Intelligence is the future.",
"Machine learning powers AI.",
"The weather is sunny today.",
"Deep learning improves neural networks.",
"It's raining outside."
]
# Generate embeddings
document_embeddings = model.encode(documents)
print("Document Embeddings Shape:", document_embeddings.shape)
2. Index Embeddings with FAISS
Now, let's create a FAISS index and store the embeddings.
import faiss
# Get embedding dimension
embedding_dimension = document_embeddings.shape[1]
# Create FAISS Index (L2 or cosine similarity search)
index = faiss.IndexFlatL2(embedding_dimension) # L2 distance (Euclidean distance)
index.add(document_embeddings) # Add embeddings to the index
print(f"Number of Documents in Index: {index.ntotal}")
3. Perform Semantic Search
Now let's search for the most similar documents.
# User Query
query = "Future of AI technology"
# Encode query
query_embedding = model.encode([query])
# Search in the index
k = 3 # Top 3 matches
distances, indices = index.search(query_embedding, k)
print(f"Query: {query}\n")
# Show results
for i in range(k):
print(f"{documents[indices[0][i]]} (Distance: {distances[0][i]:.4f})")
Output Example
Query: Future of AI technology
Artificial Intelligence is the future. (Distance: 0.3105)
Machine learning powers AI. (Distance: 0.3489)
Deep learning improves neural networks. (Distance: 0.3702)
Why FAISS?
| Feature | Traditional Search | FAISS |
|---|---|---|
| Speed | Slow | ⚡ Fast |
| Scale | Small datasets | Large datasets (Millions of documents) |
| Efficiency | Keyword-based | Vector-based |
Bonus Tip 🚀
If you're working with cosine similarity instead of Euclidean distance, you can normalize embeddings like this before adding them to FAISS:
faiss.normalize_L2(document_embeddings)
Conclusion
FAISS + Sentence Transformers is the best combination for fast, large-scale semantic search.
Comments
Post a Comment