Overview
Airweave is built around a few core concepts that work together to provide unified search across all your data sources. Understanding these concepts will help you make the most of Airweave.Sources
A source is a connector type that defines how Airweave connects to and extracts data from a specific platform or database. Sources are the building blocks of your data integration.What sources define
- Authentication methods: OAuth, API keys, database credentials
- Configuration options: Repository names, filters, sync preferences
- Entity types: What data structures the source can extract
- Sync capabilities: Full sync, incremental sync, continuous sync
Available sources
Airweave supports 50+ sources across different categories:Productivity & Communication
Productivity & Communication
- Notion: Pages, databases, and wikis
- Slack: Messages, threads, and files
- Gmail: Emails, threads, and attachments
- Google Drive: Documents, spreadsheets, and files
- Confluence: Pages and spaces
- Microsoft Teams: Messages and files
Development & Project Management
Development & Project Management
- GitHub: Repositories, issues, pull requests, and code
- GitLab: Projects, issues, and merge requests
- Jira: Issues, projects, and boards
- Linear: Issues and projects
- Asana: Tasks and projects
- ClickUp: Tasks and docs
Business & CRM
Business & CRM
- Stripe: Customers, payments, and invoices
- HubSpot: Contacts, deals, and companies
- Salesforce: Accounts, opportunities, and cases
- Zendesk: Tickets and knowledge base
- Freshdesk: Tickets and contacts
Databases & Storage
Databases & Storage
- PostgreSQL: Tables and records
- MySQL: Tables and records
- MongoDB: Collections and documents
- Dropbox: Files and folders
- Box: Files and folders
- OneDrive: Files and folders
View all connectors
See the complete list of supported sources and their capabilities
Source properties
Each source has specific properties that determine how it behaves:| Property | Description | Example |
|---|---|---|
short_name | Unique identifier for the source type | github, stripe, notion |
auth_methods | Supported authentication methods | ["oauth_browser", "direct"] |
output_entity_definitions | Entity types this source produces | ["github_issue", "github_pr"] |
supports_continuous | Supports real-time syncing | true or false |
supports_temporal_relevance | Entities have timestamps for recency ranking | true or false |
supports_access_control | Document-level permissions | true or false |
Source connections
A source connection is an authenticated instance of a source linked to a specific collection. It represents the actual connection between Airweave and your data.Key characteristics
- One source, many connections: You can create multiple connections to the same source type (e.g., multiple GitHub repos)
- Collection-scoped: Each connection belongs to exactly one collection
- Authenticated: Contains credentials or OAuth tokens for accessing the source
- Configurable: Source-specific settings like filters, branches, or table names
Authentication methods
Source connections support different authentication methods:- Direct (API Key)
- OAuth Browser
- OAuth Token
- Auth Provider
Provide credentials directly (API keys, passwords, connection strings):
Connection lifecycle
Source connections go through different states:| Status | Description |
|---|---|
PENDING_AUTH | Created but not yet authenticated |
ACTIVE | Authenticated and syncing successfully |
SYNCING | Currently running a sync job |
ERROR | Last sync failed (check error details) |
INACTIVE | Manually disabled by user |
Collections
A collection is a searchable container that groups multiple source connections together. Collections are the primary search interface in Airweave.Why collections?
- Unified search: Query multiple sources with a single API call
- Logical grouping: Organize sources by use case, team, or project
- Search isolation: Each collection has its own search index
- Access control: Control who can search which collections
Collection structure
Collection status
Collections have three possible states:NEEDS_SOURCE
NEEDS_SOURCE
The collection has no authenticated connections, or connections exist but haven’t synced yet. You can’t search an empty collection.
ACTIVE
ACTIVE
At least one source connection has completed a sync or is currently syncing. The collection is searchable.
ERROR
ERROR
All source connections have failed their last sync. Check individual connection errors for details.
Example: Multi-source collection
Entities
An entity is a single piece of data extracted from a source connection. Entities are the searchable units in Airweave.What are entities?
Entities represent different types of data depending on the source:| Source | Entity Types | Examples |
|---|---|---|
| GitHub | Issues, Pull Requests, Files | github_issue, github_pr, github_file |
| Stripe | Customers, Payments, Invoices | stripe_customer, stripe_payment |
| Notion | Pages, Databases | notion_page, notion_database |
| Gmail | Threads, Messages | gmail_thread, gmail_message |
| Slack | Messages, Threads | slack_message, slack_thread |
| PostgreSQL | Table Rows | postgres_row |
Entity structure
Every entity contains:- Content: The actual data (markdown, JSON, text)
- Metadata: Source name, timestamps, status, custom fields
- System fields: Entity ID, collection ID, organization ID
- Embeddings: Vector representations for semantic search
Entity lifecycle
Accessing entity details
Search results include entity payloads:Syncs
A sync orchestrates the process of extracting data from a source connection and indexing it in a collection. Syncs run automatically on a schedule or can be triggered manually.Sync types
- Full Sync
- Incremental Sync
- Manual Sync
Full sync processes all data from the source, regardless of when it was last synced.
- When to use: Initial sync, data refresh, schema changes
- Schedule: Hourly, daily, weekly (cron expressions)
- Performance: Slower but ensures completeness
Sync jobs
Each sync execution creates a sync job that tracks:- Status:
PENDING,RUNNING,COMPLETED,FAILED,CANCELLED - Timing: Start time, end time, duration
- Entity counts: Inserted, updated, deleted, failed
- Errors: Error messages and details if failed
Monitoring syncs
You can monitor sync progress through:- Dashboard: Visual sync history and status
- API: Query sync jobs programmatically
- Webhooks: Receive notifications on sync events (coming soon)
Putting it all together
Here’s how all the concepts work together in a real-world example:Register sources
Airweave comes with 50+ pre-built sources (GitHub, Stripe, Notion, etc.). Sources define how to connect and extract data.
Create a collection
You create a collection called “Customer Intelligence” to search all customer-related data.
Add source connections
You add three source connections to the collection:
- Stripe (for payment data)
- Zendesk (for support tickets)
- Slack (for customer conversations)
Syncs run automatically
Airweave automatically syncs data from all three sources on the configured schedule. Each sync creates entities:
- Stripe: Customer, Payment, Invoice entities
- Zendesk: Ticket, Comment entities
- Slack: Message, Thread entities
Entities are indexed
All entities are vectorized and indexed in the collection’s search index. You now have a unified view of all customer data.
Next steps
Add connectors
Explore the 50+ available sources and add them to your collections
Master search
Learn about filters, reranking, query expansion, and advanced search features
API reference
Explore the complete REST API documentation
Build agents
Integrate Airweave with AI agents and RAG systems