Gemini + Moorcheh
This integration uses Google Gemini to generate embeddings and Moorcheh vector namespaces to store and search them with ITS ranking. Gemini embedding models can map text, image, video, audio, and PDFs (including interleaved combinations) into a unified vector space. This page focuses on thegemini-embedding-2-preview model with text; you can extend the same pattern to files using the Gemini Embedding API.
Use this approach when you want full control over the embedding model and upload pre-computed vectors directly to Moorcheh.
Architecture
Embedding generation
Generate vectors with Gemini
gemini-embedding-2-preview and task types such as RETRIEVAL_DOCUMENT / RETRIEVAL_QUERYVector storage
Store vectors in Moorcheh vector namespaces
Semantic retrieval
Search by vector query for high-relevance results
Model flexibility
Tune output dimensionality to balance quality and storage
Prerequisites
MOORCHEH_API_KEYfrom the Moorcheh ConsoleGEMINI_API_KEYfrom Google AI Studio- Python 3.9+
google-genai (with a hyphen). That provides the Python module google.genai (with a dot). If you see ModuleNotFoundError: No module named 'google.genai', run the pip install line above in the same environment you use to run the script.
.env file
Task types
The Gemini Embedding API accepts atask_type that optimizes vectors for the intended use. Common choices for retrieval:
| Task type | When to use |
|---|---|
RETRIEVAL_DOCUMENT | Chunks or documents you index for search |
RETRIEVAL_QUERY | User queries at search time |
SEMANTIC_SIMILARITY, CLASSIFICATION, CLUSTERING, CODE_RETRIEVAL_QUERY, QUESTION_ANSWERING, and FACT_VERIFICATION. Use the same model and dimension settings for both indexing and querying.
Vector dimensions
By default,gemini-embedding-2-preview returns 3072 dimensions. You can set output_dimensionality (for example 768 or 1536) to reduce storage. The Moorcheh namespace vector_dimension must match the size you produce at index and query time.
End-to-end example
The following example loads keys from.env, embeds document chunks with RETRIEVAL_DOCUMENT, uploads them to Moorcheh, embeds a query with RETRIEVAL_QUERY, and runs vector search.
Embedding PDFs and other files
You can pass binary parts toembed_content (for example a PDF) using types.Part.from_bytes:
Runnable demo script
Seeintegrations/gemini/gemini_moorcheh_demo.py.
Important notes
Vector dimension must match
Vector dimension must match
The namespace
vector_dimension must exactly match the length of vectors you upload. If you use output_dimensionality on the Gemini side, create the namespace with that same size.Use document vs query task types
Use document vs query task types
Use
RETRIEVAL_DOCUMENT (or equivalent) for indexed content and RETRIEVAL_QUERY for search queries.Store text on each vector
Store text on each vector
Include
text in each uploaded vector object so search results can return the original chunk without an extra lookup.Keep model settings consistent
Keep model settings consistent
Use the same model, task types, and dimension settings for indexing and querying.
Troubleshooting
No vector namespace found: Create the namespace first withtype="vector".Dimension mismatch: Recreate the namespace with the correctvector_dimensionor align Geminioutput_dimensionalitywith the namespace.- Auth errors: Confirm
GEMINI_API_KEYis set and valid for the Gemini API.
Related docs
- Create Namespace
- Upload Vector Data
- Search Query
- Gemini embeddings documentation
- Cohere integration (alternative embedding provider)