Source connections are authenticated instances of sources that sync data into collections. They represent the actual link between external platforms and your Airweave collections.
What is a Source Connection?
A source connection is a configured, authenticated instance of a source. It includes:
Credentials for accessing the external platform
Configuration specifying what data to sync
Sync schedule for automatic data updates
Sync history tracking jobs and their status
For example, a GitHub source connection might be configured to sync the airweave-ai/airweave repository on the main branch, with a daily sync schedule.
Creating Source Connections
The authentication method determines the creation flow. Airweave supports four authentication methods:
Direct Authentication
Provide credentials (API key, token) directly. The connection is created immediately and ready to sync.
curl -X POST https://api.airweave.ai/v1/source-connections \
-H "Authorization: Bearer $AIRWEAVE_API_KEY " \
-H "Content-Type: application/json" \
-d '{
"name": "Airweave Docs Repo",
"short_name": "github",
"readable_collection_id": "technical-docs-a8x2k",
"config": {
"repo_name": "airweave-ai/airweave",
"branch": "main"
},
"auth": {
"direct": {
"credentials": {
"personal_access_token": "ghp_abc123..."
}
}
},
"sync_immediately": true
}'
OAuth Browser Flow
Authenticate users via browser redirect. Returns a connection with an auth_url to redirect users for authentication.
import airweave
client = airweave.Client( api_key = "your-api-key" )
# Step 1: Create connection with OAuth browser auth
connection = client.source_connections.create(
name = "My Notion Workspace" ,
short_name = "notion" ,
readable_collection_id = "team-wiki-x9k2m" ,
auth = {
"oauth_browser" : {
"redirect_uri" : "https://myapp.com/oauth/callback"
}
},
sync_immediately = False # Don't sync until OAuth completes
)
print ( f "Auth URL: { connection.auth.auth_url } " )
# Redirect user to connection.auth.auth_url
# Step 2: After OAuth callback, check connection status
connection = client.source_connections.get(connection.id)
if connection.status == "active" :
# OAuth completed successfully
# Trigger initial sync
job = client.source_connections.run(connection.id)
print ( f "Sync job started: { job.id } " )
The OAuth callback endpoint at /v1/source-connections/callback handles the redirect from the OAuth provider and completes the authentication flow. After successful authentication, users are redirected to your redirect_uri with query parameters status=success and source_connection_id={id}.
OAuth Token
Provide a pre-obtained OAuth access token. Useful when you’ve already authenticated the user.
connection = client.source_connections.create(
name = "My Slack Workspace" ,
short_name = "slack" ,
readable_collection_id = "team-messages-k3x9m" ,
auth = {
"oauth_token" : {
"access_token" : "xoxb-1234567890-..." ,
"expires_at" : "2026-12-31T23:59:59Z" # Optional
}
},
sync_immediately = True
)
Auth Provider
Use a third-party auth provider (Composio, Pipedream) for authentication. The provider handles OAuth flows and credential management.
connection = client.source_connections.create(
name = "Salesforce CRM" ,
short_name = "salesforce" ,
readable_collection_id = "customer-data-m8x1k" ,
auth = {
"auth_provider" : {
"provider_readable_id" : "composio-prod-x7k9m" ,
"provider_config" : {
"entity_id" : "user-123"
}
}
},
sync_immediately = True
)
Source Connection Lifecycle
Create
Create a source connection with authentication and configuration
Authenticate
Complete OAuth flow if using browser-based authentication
Sync
Trigger manual sync or configure automatic schedule
Monitor
Track sync jobs and view entity statistics
Update
Modify configuration or credentials as needed
Delete
Remove connection and optionally clean up synced data
Listing Connections
Retrieve all source connections in your organization:
curl https://api.airweave.ai/v1/source-connections \
-H "Authorization: Bearer $AIRWEAVE_API_KEY "
Getting Connection Details
Retrieve full details of a specific connection:
connection = client.source_connections.get(
"550e8400-e29b-41d4-a716-446655440000"
)
print ( f "Name: { connection.name } " )
print ( f "Source: { connection.short_name } " )
print ( f "Status: { connection.status } " )
print ( f "Collection: { connection.readable_collection_id } " )
print ( f "Entities synced: { connection.entity_count } " )
print ( f "Last sync: { connection.last_sync_at } " )
print ( f "Created: { connection.created_at } " )
print ( f " \n Configuration:" )
for key, value in connection.config.items():
print ( f " { key } : { value } " )
Running Syncs
Trigger a data synchronization job:
curl -X POST https://api.airweave.ai/v1/source-connections/{id}/run \
-H "Authorization: Bearer $AIRWEAVE_API_KEY "
The sync runs asynchronously in the background. Monitor progress using the jobs endpoint.
Monitoring Sync Jobs
Retrieve sync job history and status:
import airweave
client = airweave.Client( api_key = "your-api-key" )
# Get recent jobs
jobs = client.source_connections.get_jobs(
source_connection_id = "550e8400-e29b-41d4-a716-446655440000" ,
limit = 10
)
for job in jobs:
print ( f "Job { job.id } " )
print ( f " Status: { job.status } " )
print ( f " Started: { job.created_at } " )
print ( f " Completed: { job.completed_at } " )
print ( f " Entities processed: { job.entities_created } " )
if job.error_message:
print ( f " Error: { job.error_message } " )
Job Status Values
PENDING : Job is queued, waiting for worker
RUNNING : Sync is actively processing data
COMPLETED : Sync finished successfully
FAILED : Sync encountered an unrecoverable error
CANCELLING : Cancellation requested, worker is stopping
CANCELLED : Sync was cancelled and cleanup is scheduled
Updating Connections
Modify connection configuration or credentials:
import airweave
client = airweave.Client( api_key = "your-api-key" )
# Update configuration
connection = client.source_connections.update(
source_connection_id = "550e8400-e29b-41d4-a716-446655440000" ,
name = "Updated Docs Repo" ,
config = {
"repo_name" : "airweave-ai/airweave" ,
"branch" : "develop" # Changed from main to develop
},
schedule = {
"cron" : "0 0 * * *" # Daily at midnight
}
)
print ( f "Updated: { connection.name } " )
print ( f "New branch: { connection.config[ 'branch' ] } " )
Only direct authentication credentials can be updated. For OAuth connections, delete and recreate the connection to re-authenticate.
Deleting Connections
Permanently delete a source connection and all synced data:
import airweave
client = airweave.Client( api_key = "your-api-key" )
deleted = client.source_connections.delete(
"550e8400-e29b-41d4-a716-446655440000"
)
print ( f "Deleted: { deleted.name } " )
print ( f "Entities removed: { deleted.entity_count } " )
What gets deleted:
Any running sync is cancelled (API waits up to 15s for worker to stop)
Source connection, sync config, job history, and entity metadata are cascade-deleted from the database
A background cleanup workflow is scheduled to remove:
Vector embeddings from the vector database (Vespa)
Raw data from storage (ARF)
The API returns immediately after step 2. Vector database cleanup happens asynchronously but data becomes unsearchable as soon as database records are deleted. This action cannot be undone.
Sync Schedules
Configure automatic syncs using cron expressions:
connection = client.source_connections.create(
name = "Daily Docs Sync" ,
short_name = "github" ,
readable_collection_id = "technical-docs-a8x2k" ,
config = { "repo_name" : "airweave-ai/airweave" },
auth = { "direct" : { "credentials" : { "personal_access_token" : "ghp_..." }}},
schedule = {
"cron" : "0 2 * * *" , # Daily at 2 AM UTC
"continuous" : False ,
"cursor_field" : "last_repository_pushed_at" # For incremental sync
},
sync_immediately = False
)
Common Cron Expressions
0 * * * * - Every hour
0 */6 * * * - Every 6 hours
0 0 * * * - Daily at midnight
0 9 * * 1-5 - Weekdays at 9 AM
0 0 * * 0 - Weekly on Sunday
Continuous Sync
For sources that support it (e.g., GitHub), enable continuous sync mode:
schedule = {
"continuous" : True ,
"cursor_field" : "last_repository_pushed_at"
}
Continuous sync uses cursors to track sync progress and only processes new/changed data after the initial full sync.
Next Steps
Browse Connectors Explore all 50+ available connectors
Authentication Learn about auth methods and providers
Collections Organize connections into collections
API Reference See the full API specification