NVIDIA NIM embeddings + Moorcheh
This integration uses the NVIDIA NIM OpenAI-compatible Embeddings API withnvidia/llama-nemotron-embed-vl-1b-v2 and Moorcheh vector namespaces to store and search vectors with ITS ranking.
The model outputs 2048-dimensional vectors (model reference). For this model you must set input_type: use passage when embedding content you index, and query when embedding search strings—mixing them hurts retrieval quality.
Architecture
Embedding generation
POST https://integrate.api.nvidia.com/v1/embeddings with model, input, and input_typeVector storage
Store vectors in Moorcheh vector namespaces
Semantic retrieval
Embed the query with
input_type: query and run vector searchAuthentication
Authorization: Bearer your NVIDIA API key (NVIDIA API Catalog)Prerequisites
MOORCHEH_API_KEYfrom the Moorcheh ConsoleNVIDIA_API_KEYfrom the NVIDIA API Catalog (create an API key for the integrate endpoint)- Python 3.9+
.env file
.env.
input_type (passage vs query)
input_type | When to use |
|---|---|
passage | Chunks or documents you store in Moorcheh |
query | User or system queries at search time |
Vector dimensions
nvidia/llama-nemotron-embed-vl-1b-v2 outputs 2048 dimensions per text. Set Moorcheh vector_dimension to 2048 for the namespace.
End-to-end example
The following example loads keys from.env, embeds passages and a query through the NVIDIA embeddings endpoint, uploads vectors to Moorcheh, and runs similarity search.
Runnable demo script
Seeintegrations/nvidia/nvidia_moorcheh_demo.py.
Run from the repo root (or set PYTHONPATH as needed):
Important notes
Vector dimension must match
Vector dimension must match
nvidia/llama-nemotron-embed-vl-1b-v2 is 2048 dimensions. Create the Moorcheh namespace with vector_dimension=2048.Use passage vs query
Use passage vs query
Use
passage for stored chunks and query for search queries, per the NIM API schema.Store text on each vector
Store text on each vector
Include
text on each uploaded vector so search results can return the original chunk.OpenAI-compatible clients
OpenAI-compatible clients
You can also use an OpenAI-compatible client with
base_url=https://integrate.api.nvidia.com/v1 and the same model and input_type fields; the example above uses requests for clarity.Troubleshooting
401/ auth errors: VerifyNVIDIA_API_KEYandAuthorization: Bearerformat.Dimension mismatch: Namespace must be 2048 for this model’s default output.- Low relevance: Check
input_type(passage at index, query at search), chunking,threshold, andtop_k.