Vector Stores & Embeddings
AutoSchemaKG uses vector embeddings to represent nodes, edges, and text passages in the Knowledge Graph. This enables semantic retrieval and similarity search.
Embedding Models
The framework provides a flexible BaseEmbeddingModel interface that supports various embedding backends, including HuggingFace Sentence Transformers, NVIDIA’s NV-Embed, and OpenAI-compatible APIs.
1. Sentence Transformers (Local)
For standard local embeddings using the sentence-transformers library.
from sentence_transformers import SentenceTransformer
from atlas_rag.vectorstore.embedding_model import SentenceEmbedding
# Initialize the underlying model
model = SentenceTransformer("sentence-transformers/all-MiniLM-L6-v2")
# Wrap it with SentenceEmbedding
encoder = SentenceEmbedding(model)
# Encode text
embeddings = encoder.encode(["Hello world", "Knowledge Graph"])
2. NVIDIA NV-Embed (Local)
Optimized support for NVIDIA’s embedding models (e.g., NV-Embed-v2). This wrapper handles specific instruction formatting required by these models.
from transformers import AutoModel
from atlas_rag.vectorstore.embedding_model import NvEmbed
# Initialize the model
model = AutoModel.from_pretrained("nvidia/NV-Embed-v2", trust_remote_code=True)
# Wrap it
encoder = NvEmbed(model)
# Encode with query types (automatically adds instructions)
# Supported types: 'passage', 'entity', 'edge', 'fill_in_edge'
embeddings = encoder.encode("What is the capital of France?", query_type="passage")
3. OpenAI-Compatible API
Use any OpenAI-compatible embedding API (OpenAI, DeepInfra, vLLM, etc.).
from openai import OpenAI
from atlas_rag.vectorstore.embedding_model import EmbeddingAPI
# Initialize OpenAI client
client = OpenAI(api_key="...")
# Initialize wrapper
encoder = EmbeddingAPI(client, model_name="text-embedding-3-small")
# Encode
embeddings = encoder.encode(["Text to embed"])
4. Qwen Embedding (API)
Specialized wrapper for Qwen embedding models served via API, handling specific instruction formats.
from atlas_rag.vectorstore.embedding_model import Qwen3Emb
# Initialize OpenAI client
client = OpenAI(api_key="...")
encoder = Qwen3Emb(client, model_name="Qwen/Qwen3-Embedding-0.6B")
Creating Vector Indices
Once you have an embedding model, you can generate embeddings for your Knowledge Graph components (nodes, edges, text) and create FAISS indices for efficient retrieval.
from atlas_rag.vectorstore import create_embeddings_and_index
# This function:
# 1. Reads the CSV files generated by KnowledgeGraphExtractor
# 2. Computes embeddings for nodes, edges, and source texts
# 3. Saves the embeddings back to CSV
# 4. Builds and saves FAISS indices
data = create_embeddings_and_index(
sentence_encoder=encoder,
model_name="all-MiniLM-L6-v2",
working_directory="./output_dir",
keyword="dataset_name",
include_concept=True, # Set to True if concepts were generated
include_events=False,
normalize_embeddings=True
)