Perform Semantic Search with Sentence Transformers in Python
Semantic Search helps find the most relevant text from a database by understanding the meaning of the text rather than just matching keywords.
Install Required Libraries
If you haven't installed them yet:
pip install sentence-transformers
How Semantic Search Works
- Convert all documents into embeddings using Sentence Transformers.
- Convert the user query into an embedding.
- Use cosine similarity to compare the query with all documents.
- Return the most similar documents.
1. Generate Embeddings for Documents
Let's assume you have a list of documents:
from sentence_transformers import SentenceTransformer
import numpy as np
from sklearn.metrics.pairwise import cosine_similarity
# Load pre-trained model
model = SentenceTransformer('sentence-transformers/all-MiniLM-L6-v2')
# Example documents
documents = [
"I love playing football.",
"Artificial Intelligence is the future.",
"Machine learning powers AI.",
"The weather is sunny today.",
"Deep learning improves neural networks.",
"It's raining outside."
]
# Generate embeddings for documents
document_embeddings = model.encode(documents)
print("Embeddings Shape:", document_embeddings.shape)
2. Perform Semantic Search
Let's take a query like: 👉 "What is the future of AI?"
We'll find the document that matches this query best.
# User Query
query = "Future of AI technology"
# Encode query
query_embedding = model.encode(query)
# Calculate cosine similarity
similarities = cosine_similarity([query_embedding], document_embeddings)
# Get the top match
top_match_index = np.argmax(similarities)
print(f"Query: {query}")
print(f"Best Match: {documents[top_match_index]}")
print(f"Similarity Score: {similarities[0][top_match_index]:.4f}")
Output Example
Query: Future of AI technology
Best Match: Artificial Intelligence is the future.
Similarity Score: 0.89
3. Return Top N Results
If you want the top 3 most similar documents:
# Get Top 3 Matches
top_n = 3
top_indices = np.argsort(similarities[0])[::-1][:top_n]
print("\nTop 3 Matches:")
for idx in top_indices:
print(f"{documents[idx]} (Score: {similarities[0][idx]:.4f})")
Why Use Sentence Transformers for Semantic Search?
| Method | Advantage | Use Case |
|---|---|---|
| Keyword Search | Fast, simple | Exact keyword match |
| Semantic Search | Meaning-based search | Chatbots, FAQ, Search Engines |
Bonus Tip:
You can speed up the search for large datasets using FAISS (Facebook AI Similarity Search).
Comments
Post a Comment