Skip to main content
Airweave provides AI-powered search capabilities that combine semantic understanding with keyword precision. Search across all your connected data sources through a unified interface with full control over the search pipeline.

Overview

When you query a collection, Airweave runs a multi-step search pipeline:
  1. Query expansion: Generate variations to capture synonyms and related terms
  2. Retrieval: Use keyword, neural, or hybrid methods to fetch candidates
  3. Filtering: Apply structured metadata filters
  4. Reranking: AI-powered reordering for higher precision
  5. Answer generation: Return raw documents or synthesize a natural language response
All search parameters have sensible defaults. You can start with a simple query and add complexity as needed.

Quick Start

The simplest search requires only a query string:
from airweave import AirweaveSDK

client = AirweaveSDK(api_key="YOUR_API_KEY")

# Basic search
results = client.collections.search(
    readable_id="customer-support-x7k9m",
    query="How do I reset my password?"
)

# Access results
for result in results.results:
    print(f"Score: {result['score']:.3f}")
    print(f"Source: {result['source_name']}")
    print(f"Content: {result['md_content'][:200]}...")
    print(f"URL: {result.get('url', 'N/A')}")
    print("---")

# Access AI-generated answer (if generate_answer was true)
if results.completion:
    print(f"Answer: {results.completion}")

Search Strategies

Choose how Airweave searches your data with the retrieval_strategy parameter.
Combines semantic understanding with keyword matching for best results.
results = client.collections.search(
    readable_id="my-collection",
    query="authentication security vulnerabilities",
    retrieval_strategy="hybrid"  # default
)
Use when: You want the best of both worlds - finds results by meaning AND exact keywords.

Filtering Results

Apply structured metadata filters to narrow your search. Filters use a boolean logic structure with must (AND), should (OR), and must_not (NOT) conditions.

Filter by Source

# Filter to specific source
results = client.collections.search(
    readable_id="my-collection",
    query="deployment issues",
    filter={
        "must": [{
            "key": "source_name",
            "match": {"value": "GitHub"}  # Case-sensitive!
        }]
    }
)

# Multiple sources (OR)
results = client.collections.search(
    readable_id="my-collection",
    query="customer feedback",
    filter={
        "must": [{
            "key": "source_name",
            "match": {"any": ["Zendesk", "Intercom", "Slack"]}
        }]
    }
)

Date Range Filters

from datetime import datetime, timezone, timedelta

# Last 7 days
results = client.collections.search(
    readable_id="my-collection",
    query="bug reports",
    filter={
        "must": [{
            "key": "created_at",
            "range": {
                "gte": (datetime.now(timezone.utc) - timedelta(days=7)).isoformat()
            }
        }]
    }
)

# Specific date range
results = client.collections.search(
    readable_id="my-collection",
    query="Q1 analytics",
    filter={
        "must": [{
            "key": "updated_at",
            "range": {
                "gte": "2024-01-01T00:00:00Z",
                "lt": "2024-04-01T00:00:00Z"
            }
        }]
    }
)

Exclude Results

# Exclude resolved items
results = client.collections.search(
    readable_id="my-collection",
    query="open tickets",
    filter={
        "must_not": [{
            "key": "status",
            "match": {"any": ["resolved", "closed", "done"]}
        }]
    }
)

Complex Filters

Combine multiple conditions:
results = client.collections.search(
    readable_id="my-collection",
    query="critical bugs",
    filter={
        "must": [
            # Only from GitHub
            {
                "key": "source_name",
                "match": {"value": "GitHub"}
            },
            # From last 30 days
            {
                "key": "created_at",
                "range": {
                    "gte": (datetime.now(timezone.utc) - timedelta(days=30)).isoformat()
                }
            }
        ],
        # NOT resolved
        "must_not": [{
            "key": "status",
            "match": {"value": "resolved"}
        }]
    }
)

AI Features

Query Expansion

Generate query variations to improve recall. Enabled by default.
# With expansion (default)
results = client.collections.search(
    readable_id="my-collection",
    query="customer churn analysis",
    expand_query=True  # default
)

# Without expansion (faster, exact query only)
results = client.collections.search(
    readable_id="my-collection",
    query="customer churn analysis",
    expand_query=False
)

Reranking

LLM-based reordering for improved relevance. Adds ~10 seconds of latency.
# With reranking (default, more accurate)
results = client.collections.search(
    readable_id="my-collection",
    query="authentication methods",
    rerank=True  # default
)

# Without reranking (faster)
results = client.collections.search(
    readable_id="my-collection",
    query="authentication methods",
    rerank=False
)
Reranking adds about 10 seconds to your search. Disable it if you need fast results for interactive applications.

Answer Generation

Generate AI-synthesized answers from search results. Enabled by default.
# Generate answer (default)
results = client.collections.search(
    readable_id="my-collection",
    query="What are our refund policies?",
    generate_answer=True  # default
)

print(f"Answer: {results.completion}")
# Answer: According to the customer support documentation, 
# refunds are processed within 5-7 business days...

# Raw results only (faster)
results = client.collections.search(
    readable_id="my-collection",
    query="refund policies",
    generate_answer=False
)

for result in results.results:
    print(result['md_content'])

Filter Interpretation (Beta)

Beta Feature: Filter interpretation can occasionally filter too narrowly. Verify result counts.
Automatically extract structured filters from natural language queries.
# AI interprets "last week" and "Asana" as filters
results = client.collections.search(
    readable_id="my-collection",
    query="open Asana tickets from last week",
    interpret_filters=True
)
# AI understands: Asana source, open status, last 7 days

# Another example
results = client.collections.search(
    readable_id="my-collection",
    query="critical bugs from GitHub this month",
    interpret_filters=True
)
# AI extracts: GitHub source, critical priority, current month date range

Pagination

Navigate through large result sets with limit and offset.
# First 50 results
results = client.collections.search(
    readable_id="my-collection",
    query="documentation",
    limit=50,
    offset=0
)

# Next 50 results
results = client.collections.search(
    readable_id="my-collection",
    query="documentation",
    limit=50,
    offset=50
)
limit
integer
default:"1000"
Maximum number of results to return (1-1000)
offset
integer
default:"0"
Number of results to skip for pagination
For real-time results, use the streaming endpoint with Server-Sent Events (SSE).
import asyncio

async def stream_search():
    async for event in client.collections.search_stream(
        readable_id="my-collection",
        query="deployment procedures"
    ):
        if event["type"] == "result":
            print(f"Result: {event['data']}")
        elif event["type"] == "completion":
            print(f"Answer: {event['data']}")
        elif event["type"] == "done":
            print("Search complete")
            break

asyncio.run(stream_search())

Complete Example

Here’s a comprehensive search using all available parameters:
from airweave import AirweaveSDK
from datetime import datetime, timezone, timedelta

client = AirweaveSDK(api_key="YOUR_API_KEY")

results = client.collections.search(
    readable_id="customer-support-x7k9m",
    query="customer feedback about pricing",
    
    # Search strategy
    retrieval_strategy="hybrid",
    
    # Filters
    filter={
        "must": [
            {"key": "source_name", "match": {"any": ["Zendesk", "Slack"]}},
            {"key": "created_at", "range": {
                "gte": (datetime.now(timezone.utc) - timedelta(days=30)).isoformat()
            }}
        ],
        "must_not": [
            {"key": "status", "match": {"value": "resolved"}}
        ]
    },
    
    # AI features
    expand_query=True,
    rerank=True,
    generate_answer=True,
    
    # Pagination
    limit=50,
    offset=0
)

# Process results
print(f"Found {len(results.results)} results\n")

if results.completion:
    print(f"AI Answer: {results.completion}\n")

for i, result in enumerate(results.results[:5], 1):
    print(f"Result {i}:")
    print(f"  Score: {result['score']:.3f}")
    print(f"  Source: {result['source_name']}")
    print(f"  Content: {result['md_content'][:150]}...")
    print(f"  URL: {result.get('url', 'N/A')}")
    print()

Search Parameters Reference

ParameterTypeDefaultDescription
querystringrequiredSearch query text (max 2048 tokens)
retrieval_strategystring"hybrid""hybrid", "neural", or "keyword"
filterobjectnullStructured metadata filters
expand_querybooleantrueGenerate query variations
interpret_filtersbooleanfalseExtract filters from natural language
rerankbooleantrueLLM-based reranking
generate_answerbooleantrueGenerate AI answer
limitinteger1000Max results (1-1000)
offsetinteger0Skip results for pagination

Response Structure

{
  "results": [
    {
      "entity_id": "abc123-def456-789012",
      "source_name": "GitHub",
      "md_content": "# Password Reset Guide\n\nTo reset your password...",
      "metadata": {
        "file_path": "docs/auth/password-reset.md",
        "last_modified": "2024-03-15T09:30:00Z"
      },
      "score": 0.92,
      "breadcrumbs": ["docs", "auth", "password-reset.md"],
      "url": "https://github.com/company/docs/blob/main/docs/auth/password-reset.md"
    }
  ],
  "completion": "To reset your password, navigate to the login page..."
}

Next Steps

Collections

Learn about organizing data with collections

Webhooks

Get notified when syncs complete