similarity_search.query
Performs a semantic search across one or more namespaces.
Parameters
A list of one or more namespace names to search within.
query
Union[str, List[float]]
required
The search query (text or a vector).
Number of top relevant chunks for your query across given namespaces. Default is 10.
Minimum relevance score threshold (0-1) to filter out chunks below this relevance level. Required when kiosk_mode is true.
Enable kiosk mode to filter chunks below certain relevance. When kiosk mode is on, threshold is required.
Returns: Dict[str, Any] - A dictionary containing the search results under the results key.
Raises: NamespaceNotFound, InvalidInputError.
Basic Example
from moorcheh_sdk import MoorchehClient
with MoorchehClient() as client:
results = client.similarity_search.query(
namespaces=["my-faq-documents"],
query="How long do I have to return an item?",
top_k=5
)
for result in results.get('results', []):
print(f"Score: {result['score']:.3f}")
print(f"Text: {result['text'][:100]}...")
print("---")
Advanced Examples
from moorcheh_sdk import MoorchehClient
with MoorchehClient() as client:
results = client.similarity_search.query(
namespaces=["faq-documents", "policy-documents"],
query="return policy",
top_k=5,
threshold=0.7
)
for result in results['results']:
print(f"ID: {result['id']}")
print(f"Score: {result['score']:.3f}")
print(f"Text: {result['text'][:100]}...")
print("---")
from moorcheh_sdk import MoorchehClient
with MoorchehClient() as client:
# Search using a vector query
query_vector = [0.1, 0.2, 0.3, 0.4, ...] # Your query vector
results = client.similarity_search.query(
namespaces=["vector-embeddings"],
query=query_vector,
top_k=10,
kiosk_mode=True,
threshold=0.5
)
for result in results.get('results', []):
print(f"Similarity: {result['score']:.3f}")
Complete Example
from moorcheh_sdk import MoorchehClient
import time
with MoorchehClient() as client:
namespace = "customer-support"
# 1. Create namespace and upload support documents
client.namespaces.create(namespace, type="text")
support_docs = [
{
"id": "policy-1",
"text": "Our return policy allows returns within 30 days of purchase with original receipt.",
"category": "returns"
},
{
"id": "policy-2",
"text": "We offer free shipping on orders over $50. Standard shipping takes 3-5 business days.",
"category": "shipping"
}
]
client.documents.upload(namespace, support_docs)
print("Documents uploaded, waiting for processing...")
time.sleep(5)
# 2. Perform searches
print("\n=== SEARCH RESULTS ===")
search_results = client.similarity_search.query(
namespaces=[namespace],
query="return policy",
top_k=2
)
for result in search_results['results']:
print(f"Score: {result['score']:.3f} | ID: {result['id']}")
print(f"Text: {result['text'][:80]}...")
print()
Search Result Structure
Search results contain the following fields:
{
'results': [
{
'id': 'document-id',
'score': 0.85, # Similarity score (0-1)
'label': 'High Relevance', # Human-readable relevance
'text': 'Document content...',
'metadata': { # Your custom metadata
'category': 'faq',
'author': 'support-team'
}
}
],
'execution_time': 0.123,
'timings': {...}, # Detailed timing breakdown
'optimization_info': {...} # Search optimization details
}
ITS Scoring System
Results are scored using Information Theoretic Similarity (ITS), providing nuanced relevance measurements:
| Label | Score Range | Description |
|---|
| Close Match | score ≥ 0.894 | Near-perfect relevance to the query |
| Very High Relevance | 0.632 ≤ score < 0.894 | Strongly related content |
| High Relevance | 0.447 ≤ score < 0.632 | Significantly related content |
| Good Relevance | 0.316 ≤ score < 0.447 | Moderately related content |
| Low Relevance | 0.224 ≤ score < 0.316 | Minimally related content |
| Very Low Relevance | 0.1 ≤ score < 0.224 | Barely related content |
| Irrelevant | score < 0.1 | No meaningful relation to the query |
Best Practices
- Use specific, clear queries for better results
- Set appropriate thresholds to filter low-quality results
- Use multiple namespaces for comprehensive searches
- Consider kiosk_mode for production applications
- Use appropriate
top_k values - higher values provide more context but may increase response time
Error Handling
Robust Search with Error Handling
from moorcheh_sdk import MoorchehClient, NamespaceNotFound, InvalidInputError
try:
with MoorchehClient() as client:
results = client.similarity_search.query(
namespaces=["my-namespace"],
query="search query",
top_k=5
)
if results['results']:
print(f"Found {len(results['results'])} results")
else:
print("No results found")
except NamespaceNotFound:
print("One or more namespaces don't exist")
except InvalidInputError as e:
print(f"Invalid search parameters: {e}")
except Exception as e:
print(f"Unexpected error: {e}")
Use Cases
- Document Retrieval: Find relevant documents across knowledge bases
- Content Discovery: Explore related content with semantic understanding
- Customer Support: Find relevant answers from support documentation
- Research & Analysis: Search through research papers and technical documents
- E-commerce: Product similarity and recommendation engines