Code Example: Generate Embeddings Using BERT & Sentence Transformers in Python
Here's how you can generate sentence embeddings using BERT and Sentence Transformers in Python.
Install Required Libraries
pip install transformers sentence-transformers torch
1. Generate Embeddings with BERT (Hugging Face)
Using the Hugging Face transformers library to generate BERT embeddings.
from transformers import AutoTokenizer, AutoModel
import torch
# Load BERT model and tokenizer
model_name = "bert-base-uncased"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModel.from_pretrained(model_name)
# Sentence to embed
sentence = "I love learning about AI."
# Tokenize sentence
inputs = tokenizer(sentence, return_tensors="pt", padding=True, truncation=True)
# Get BERT embeddings
with torch.no_grad():
outputs = model(**inputs)
embeddings = outputs.last_hidden_state
# Average pooling to get sentence embedding
sentence_embedding = torch.mean(embeddings, dim=1)
print("BERT Embedding Shape:", sentence_embedding.shape)
print("Embedding Vector:", sentence_embedding)
2. Generate Embeddings with Sentence Transformers
Sentence Transformers is a simpler way to generate contextual sentence embeddings.
from sentence_transformers import SentenceTransformer
# Load pre-trained Sentence Transformer model
model = SentenceTransformer('sentence-transformers/all-MiniLM-L6-v2')
# Sentence list
sentences = ["I love AI.", "Artificial Intelligence is fascinating."]
# Generate embeddings
embeddings = model.encode(sentences)
print("Sentence Embedding Shape:", embeddings.shape)
print("Embedding Vector:", embeddings[0])
Which One Should You Use?
| Library | Pros | Cons |
|---|---|---|
| Hugging Face (BERT) | Fine control, raw embeddings | Slower, more code needed |
| Sentence Transformers | Fast, optimized for sentences | Less customizable |
Bonus Tip:
If you're building semantic search, clustering, or sentence similarity models, go with Sentence Transformers because it's faster and pre-trained on sentence-level tasks.
Comments
Post a Comment