What can swirl-search do?

federated multi-source query orchestration with parallel execution, connector-based data source abstraction with format translation, galaxy web ui with search interface and result visualization, asynchronous task execution with celery worker pool and result caching, source-specific authentication and credential management, admin interface for source configuration and search management, result normalization and relevance re-ranking across heterogeneous sources, retrieval-augmented generation (rag) with llm-powered answer synthesis, query transformation and source-specific syntax generation, real-time search progress tracking and websocket streaming, multi-provider llm abstraction with streaming support, microsoft 365 and graph api integration with oauth authentication, docker and kubernetes containerized deployment with configuration management, extensible processor pipeline for result transformation and filtering

swirl-search

RepositoryFree

AI Search & RAG Without Moving Your Data. Get instant answers from your company's knowledge across 100+ apps while keeping data secure. Deploy in minutes, not months.

Open Source

/ 100

14 capabilities

Capabilities14 decomposed

federated multi-source query orchestration with parallel execution

Medium confidence

Executes a single user query across 100+ heterogeneous data sources simultaneously using Celery workers and asynchronous task distribution, without copying or indexing data. The Search Orchestrator (swirl/models.py Search class) decomposes queries into source-specific formats, dispatches parallel tasks to Celery workers, and aggregates results as they complete. Uses Django ORM to manage Search objects with state tracking (RUNNING, COMPLETED, FAILED) and WebSocket communication for real-time progress updates to the Galaxy UI.

Solves for

Query across enterprise data silos (Salesforce, Jira, Slack, BigQuery, databases) in a single search without ETLGet results from all sources simultaneously rather than sequential API callsMaintain data residency and security by querying in-place without data movementReduce latency by parallelizing queries across slow and fast sources

Best for

Enterprise teams with fragmented data across 10+ SaaS and on-prem systems

Organizations with strict data residency or compliance requirements (HIPAA, GDPR)

Teams building unified search experiences without data warehouse consolidation

Requires

Python 3.9+

Django 3.2+

Celery 5.0+ with Redis or RabbitMQ broker

Limitations

Latency bounded by slowest source in parallel query set — no timeout-based early termination by default

Requires connector implementation for each data source type; 100+ connectors provided but custom sources need custom code

No built-in query optimization across sources — each source receives full query, may return irrelevant results requiring post-processing

What makes it unique

Uses Celery-based task distribution with per-source connector abstraction (swirl/connectors/) to parallelize queries across heterogeneous sources without data movement, combined with Django ORM state management for search lifecycle tracking. Unlike traditional metasearch engines that require data indexing, SWIRL queries live data in-place through connector adapters that translate queries to source-native formats (SQL, GraphQL, REST, Elasticsearch DSL).

vs alternatives

Faster than centralized data warehouse approaches for real-time queries because it eliminates ETL latency and data sync delays; more secure than cloud-based search services because data never leaves on-premises systems.

connector-based data source abstraction with format translation

Medium confidence

Provides extensible connector framework (swirl/connectors/connector.py base class) that abstracts 100+ data sources (HTTP APIs, databases, search engines, Microsoft Graph) into a unified interface. Each connector translates SWIRL's normalized query format into source-native syntax (SQL WHERE clauses, Elasticsearch queries, REST API parameters, GraphQL), executes the query, and normalizes results back to SWIRL's unified schema. Supports HTTP connectors for REST/GraphQL APIs, database connectors for SQL/NoSQL, and specialized connectors for Salesforce, Jira, Microsoft 365, Slack, BigQuery, and others.

Solves for

Add new data sources without modifying core search engine — implement connector subclassQuery databases, REST APIs, search engines, and SaaS platforms with identical query syntaxHandle source-specific authentication (OAuth, API keys, database credentials) transparentlyNormalize heterogeneous result formats (JSON, XML, CSV, database rows) into consistent schema

Best for

Platform teams building extensible search infrastructure for enterprise customers

Developers integrating proprietary or custom data sources into unified search

Organizations needing to support 20+ data sources without maintaining separate query logic per source

Requires

Python 3.9+

Connector base class understanding (swirl/connectors/connector.py)

Source-specific API documentation and authentication credentials

Limitations

Connector implementation required for each new source type — no automatic discovery or schema inference

Query translation is connector-specific; complex queries may not translate perfectly across all sources

Authentication management is per-connector; no centralized credential vault (requires external integration)

What makes it unique

Implements connector base class (swirl/connectors/connector.py) with pluggable execute() and normalize_results() methods, allowing each source to define its own query translation and result mapping logic. Supports 100+ pre-built connectors covering HTTP APIs, SQL/NoSQL databases, Elasticsearch, Solr, Salesforce, Jira, Microsoft Graph, Slack, BigQuery, and more. Unlike generic API clients, each connector understands source-specific pagination, authentication, and result structure.

vs alternatives

More flexible than API aggregation libraries because connectors can implement source-specific optimizations (e.g., Elasticsearch filter context vs query context); more maintainable than custom query translation logic because connector interface is standardized.

galaxy web ui with search interface and result visualization

Medium confidence

Provides Galaxy web-based user interface (Django templates, static files, JavaScript) accessible at port 8000 for searching and visualizing results. Implements real-time search progress tracking via WebSocket, progressive result display as sources complete, and result filtering/sorting. Supports both simple keyword search and advanced search with filters, date ranges, and field-specific queries. Includes result preview, source attribution, and relevance scoring visualization. Built with Django templates and vanilla JavaScript for minimal dependencies.

Solves for

Provide end-user search interface for querying federated data sourcesDisplay search results with real-time progress updatesFilter and sort results by relevance, source, date, and other criteriaShow which data sources were queried and their execution time

Best for

End-users searching across enterprise data sources

Teams building search applications with web-based UI

Organizations wanting to provide unified search experience across multiple data sources

Requires

Python 3.9+

Django 3.2+

Modern web browser (Chrome, Firefox, Safari, Edge)

Limitations

Galaxy UI is basic web interface; no advanced visualization or analytics

Customization requires Django template and JavaScript knowledge

No mobile-optimized interface; primarily desktop-focused

What makes it unique

Implements Galaxy web UI as Django-based application (Django templates, static files, JavaScript) with WebSocket integration for real-time search progress and result streaming. Supports both simple keyword search and advanced search with filters and field-specific queries. Built with minimal dependencies (vanilla JavaScript) for easy customization.

vs alternatives

More integrated than separate frontend because it's part of SWIRL Search application; more real-time than traditional search UIs because it streams results via WebSocket; more customizable than SaaS search interfaces because source code is available.

asynchronous task execution with celery worker pool and result caching

Medium confidence

Implements asynchronous search execution using Celery task queue (swirl/tasks.py) with configurable worker pool for parallel query execution across sources. Each source query is dispatched as separate Celery task, allowing independent execution and failure handling. Results are cached in Redis (configurable TTL) to avoid redundant queries for identical search parameters. Celery workers can be scaled horizontally to handle increased query load. Supports task monitoring, retry logic, and dead-letter queue for failed tasks.

Solves for

Execute queries on slow data sources without blocking user interfaceScale query execution horizontally by adding more Celery workersCache search results to avoid redundant queries for identical searchesMonitor and debug query execution through Celery task tracking

Best for

Teams needing to handle high-volume search queries without blocking

Applications with slow data sources (data warehouses, external APIs) requiring asynchronous execution

Organizations wanting to scale search execution horizontally across multiple workers

Requires

Python 3.9+

Celery 5.0+

Redis 6.0+ or RabbitMQ 3.8+ (message broker)

Limitations

Celery adds operational complexity; requires message broker (Redis/RabbitMQ) and worker management

Result caching can return stale results; TTL must be configured appropriately per source

Task monitoring and debugging requires Celery knowledge and tools (Flower, etc.)

What makes it unique

Implements asynchronous search execution using Celery task queue (swirl/tasks.py) where each source query is dispatched as separate task for independent execution. Results are cached in Redis with configurable TTL to avoid redundant queries. Celery workers can be scaled horizontally to handle increased load. Supports task monitoring, retry logic, and dead-letter queue for failed tasks.

vs alternatives

More scalable than synchronous execution because it allows horizontal scaling of workers; more responsive than blocking execution because UI updates are pushed via WebSocket while tasks execute; more resilient than single-threaded execution because task failures don't block other queries.

source-specific authentication and credential management

Medium confidence

Implements per-source authentication handling (swirl/connectors/) supporting multiple authentication methods: API keys, OAuth 2.0, basic auth, database credentials, and custom authentication schemes. Each connector manages its own authentication logic, allowing sources to use different authentication methods simultaneously. Credentials are stored in Django settings or environment variables (not in code). Supports OAuth token refresh for long-lived sessions. No centralized credential vault; requires external integration for enterprise credential management.

Solves for

Authenticate to multiple data sources with different authentication methods (API keys, OAuth, basic auth)Manage credentials securely without storing in code or version controlSupport OAuth token refresh for long-lived sessionsIntegrate with enterprise credential management systems

Best for

Teams integrating multiple data sources with different authentication methods

Organizations with security requirements for credential management

Applications needing to support OAuth for SaaS integrations

Requires

Python 3.9+

Source-specific authentication credentials (API keys, OAuth tokens, database credentials)

Environment variable or Django settings configuration for credentials

Limitations

No centralized credential vault; requires external integration (HashiCorp Vault, AWS Secrets Manager, etc.)

Credentials stored in environment variables or Django settings; not encrypted at rest by default

OAuth token refresh requires background job management; token expiration handling adds complexity

What makes it unique

Implements per-source authentication handling (swirl/connectors/) supporting multiple authentication methods (API keys, OAuth 2.0, basic auth, database credentials) through connector-specific implementations. Each connector manages its own authentication logic, allowing sources to use different methods simultaneously. Credentials are stored in environment variables or Django settings, not in code.

vs alternatives

More flexible than single authentication method because each source can use different auth; more secure than hardcoded credentials because credentials are stored in environment variables; supports OAuth unlike basic auth-only solutions.

admin interface for source configuration and search management

Medium confidence

Provides Django admin interface for configuring data sources, managing searches, and monitoring system health. Allows admins to add/edit/delete data sources, configure connector parameters, set authentication credentials, and manage search history. Includes admin guide (docs/Admin-Guide.md) for production deployment and troubleshooting. Supports bulk operations for managing multiple sources. Provides search analytics (query volume, source performance, result quality metrics).

Solves for

Configure data sources without code changes through admin interfaceMonitor search performance and source healthManage user searches and search historyTroubleshoot search issues and view error logs

Best for

System administrators managing SWIRL Search deployment

Teams needing to add/remove data sources without developer involvement

Organizations wanting to monitor search performance and system health

Requires

Python 3.9+

Django 3.2+

Admin user account with Django admin access

Limitations

Django admin interface is basic; limited customization without code changes

No role-based access control (RBAC) for admin functions; all admins have full access

Search analytics are basic; no advanced reporting or visualization

What makes it unique

Implements Django admin interface for source configuration and search management, allowing admins to add/edit/delete data sources without code changes. Includes admin guide (docs/Admin-Guide.md) for production deployment. Provides search analytics and system health monitoring through admin interface.

vs alternatives

More accessible than code-based configuration because it provides UI for non-developers; more integrated than separate admin tools because it's part of SWIRL Search application; more transparent than hidden configuration because all settings are visible in admin interface.

result normalization and relevance re-ranking across heterogeneous sources

Medium confidence

Implements result processing pipeline (swirl/processors/) that normalizes results from different sources into unified schema, applies relevance re-ranking algorithms, and deduplicates results. The Mixer component (swirl/mixers/mixer.py) combines results from multiple sources using configurable ranking strategies (BM25, TF-IDF, LLM-based relevance scoring). Processors transform raw connector output into normalized Result objects with standardized fields, handle PII removal (swirl/processors/remove_pii.py), and apply source-specific post-processing. Results are re-ranked based on relevance scores, source credibility, and recency.

Solves for

Combine results from Salesforce, Jira, Slack, and internal databases into single ranked listDeduplicate results when same content appears in multiple sourcesRe-rank results by relevance rather than source-specific rankingRemove sensitive data (PII) from results before returning to users

Best for

Teams building unified search UIs that need consistent result presentation across sources

Organizations with compliance requirements to filter PII from search results

Search applications where relevance ranking must account for source heterogeneity

Requires

Python 3.9+

Processor pipeline configuration (swirl/processors/)

Mixer strategy selection (BM25, TF-IDF, or custom scoring function)

Limitations

Relevance scoring assumes text-based results; non-text sources (images, structured records) require custom scoring logic

PII removal uses pattern matching and regex; may miss domain-specific sensitive data without custom rules

Deduplication is fuzzy-match based on title/URL; exact duplicates across sources may not be detected

What makes it unique

Implements pluggable processor pipeline (swirl/processors/processor.py base class) where each processor transforms results independently, enabling composition of normalization, ranking, and filtering logic. Mixer component (swirl/mixers/mixer.py) applies configurable ranking strategies (BM25, TF-IDF, or custom) to re-rank results from heterogeneous sources. PII removal processor uses pattern matching to detect and redact sensitive data before returning results.

vs alternatives

More flexible than fixed ranking algorithms because mixer strategies are pluggable; more comprehensive than simple result concatenation because it handles deduplication and PII removal in pipeline.

retrieval-augmented generation (rag) with llm-powered answer synthesis

Medium confidence

Implements RAG pipeline (swirl/processors/rag.py) that uses LLM APIs (OpenAI, Anthropic, Ollama, Azure OpenAI) to synthesize answers from search results without moving data. The RAG processor takes normalized search results, constructs a prompt with result snippets as context, and calls the configured LLM to generate a natural language answer. Supports streaming responses via WebSocket to Galaxy UI for real-time answer generation. Integrates with search result ranking to prioritize high-relevance results in LLM context window.

Solves for

Generate natural language answers from search results across multiple sourcesSynthesize information from 5-10 top results into a single coherent answerStream LLM responses in real-time to user interfaceUse LLM to re-rank or filter results based on relevance to user query

Best for

Teams building AI-powered search experiences that generate answers rather than just listing results

Organizations using LLMs to synthesize information from multiple enterprise data sources

Applications needing real-time answer generation with streaming responses

Requires

Python 3.9+

LLM API key (OpenAI, Anthropic, Azure OpenAI, or Ollama endpoint)

RAG processor configuration (model selection, prompt template, context window size)

Limitations

LLM context window limits number of results that can be included in prompt (typically 4-8 results for GPT-4)

LLM hallucination risk — generated answers may contain information not in source results; requires source attribution

LLM API latency adds 1-5 seconds to search response time depending on model and result set size

What makes it unique

Implements RAG as a processor in the result processing pipeline (swirl/processors/rag.py), allowing it to be composed with other processors (normalization, ranking, PII removal). Supports multiple LLM providers (OpenAI, Anthropic, Ollama, Azure) through pluggable LLM client abstraction. Streams responses via WebSocket to Galaxy UI for real-time answer generation without waiting for full LLM completion.

vs alternatives

More flexible than monolithic RAG systems because RAG is optional and composable with other processors; supports multiple LLM providers unlike single-model solutions; streams responses for better UX compared to batch answer generation.

query transformation and source-specific syntax generation

Medium confidence

Implements query processing layer (swirl/search.py, swirl/models.py Query class) that parses user natural language queries and transforms them into source-specific syntax. Each connector implements query_to_native_syntax() method to translate SWIRL's normalized query format (query string, filters, date ranges, field-specific queries) into source-native formats: SQL WHERE clauses for databases, Elasticsearch DSL for search engines, REST API parameters for HTTP APIs, GraphQL queries for GraphQL endpoints, Microsoft Graph query syntax for Microsoft 365. Supports query expansion, synonym replacement, and field mapping per source.

Solves for

Write single query that works across SQL databases, Elasticsearch, REST APIs, and GraphQL endpointsAutomatically translate filters and date ranges to source-specific syntaxMap user-friendly field names to source-specific field names (e.g., 'author' → 'created_by' in Jira)Expand queries with synonyms or related terms per source

Best for

Teams building unified search interfaces that hide source-specific query syntax from users

Developers integrating multiple data sources with different query languages

Applications needing to support complex queries (filters, date ranges, field-specific searches) across heterogeneous sources

Requires

Python 3.9+

Query configuration (field mappings, synonym lists, query expansion rules)

Source-specific query syntax knowledge (SQL, Elasticsearch DSL, GraphQL, etc.)

Limitations

Query translation is lossy — some query features may not translate to all sources (e.g., complex boolean logic may not work in all REST APIs)

Field mapping must be configured per source; no automatic schema discovery

Advanced query features (faceting, aggregations, nested queries) may not translate across all sources

What makes it unique

Implements query transformation as part of connector abstraction (swirl/connectors/connector.py) where each connector defines its own query_to_native_syntax() method, enabling source-specific optimizations and syntax variations. Supports field mapping and query expansion per source through connector configuration. Unlike generic query builders, each connector understands source-specific query semantics and optimization opportunities.

vs alternatives

More flexible than fixed query syntax because each connector can implement custom transformation logic; more maintainable than string-based query building because transformation is encapsulated in connector classes.

real-time search progress tracking and websocket streaming

Medium confidence

Implements WebSocket-based real-time communication (swirl/views.py, Galaxy UI) that streams search progress and results to clients as they complete. The Search Orchestrator updates Search object state (RUNNING, COMPLETED, FAILED) as Celery workers complete queries on individual sources. WebSocket connection pushes progress updates (source completion, result count, execution time) and result chunks to Galaxy UI in real-time, enabling users to see results from fast sources immediately without waiting for slow sources. Supports both polling (REST API) and streaming (WebSocket) interfaces.

Solves for

Show users which sources have completed and which are still executingDisplay results from fast sources immediately while waiting for slow sourcesStream result updates to UI without page refresh or pollingProvide real-time feedback on search progress and estimated completion time

Best for

Web applications needing real-time search feedback without polling

Teams building search UIs where user experience depends on immediate result visibility

Applications with slow data sources (data warehouses, external APIs) where progressive result display is important

Requires

Python 3.9+

Django 3.2+ with WebSocket support (Django Channels or similar)

Redis or RabbitMQ for message broker (required for multi-server deployments)

Limitations

WebSocket requires persistent connection; not suitable for stateless HTTP-only environments

Browser compatibility: older browsers may not support WebSocket; requires fallback to polling

Scaling WebSocket connections requires sticky sessions or message broker (Redis) for multi-server deployments

What makes it unique

Implements WebSocket streaming in Galaxy UI (swirl/views.py) that pushes Search state updates and result chunks to clients in real-time as Celery workers complete source queries. Supports both WebSocket (streaming) and REST API (polling) interfaces for flexibility. Results are streamed progressively, allowing users to see results from fast sources immediately without waiting for slow sources.

vs alternatives

Better UX than polling because updates are pushed immediately; more responsive than batch result delivery because results appear as sources complete; supports progressive result display unlike traditional search engines that wait for all results.

multi-provider llm abstraction with streaming support

Medium confidence

Provides LLM provider abstraction layer (swirl/processors/rag.py) that supports multiple LLM APIs (OpenAI, Anthropic, Ollama, Azure OpenAI) through unified interface. Each provider implementation handles authentication, request formatting, streaming response parsing, and error handling. Supports streaming responses where LLM output is returned token-by-token via WebSocket, enabling real-time answer generation in Galaxy UI. Allows switching between LLM providers through configuration without code changes.

Solves for

Use different LLM providers (OpenAI, Anthropic, Ollama) interchangeably in RAG pipelineStream LLM responses token-by-token for real-time answer generationSwitch LLM providers based on cost, latency, or availability without code changesSupport both cloud-hosted (OpenAI, Anthropic, Azure) and self-hosted (Ollama) models

Best for

Teams wanting to avoid vendor lock-in by supporting multiple LLM providers

Organizations using self-hosted models (Ollama) for data privacy

Applications needing to optimize LLM provider selection based on cost or latency

Requires

Python 3.9+

LLM provider API key (OpenAI, Anthropic, Azure OpenAI) or Ollama endpoint

LLM provider configuration (model name, API endpoint, authentication)

Limitations

API differences between providers require provider-specific implementations; not all features available on all providers

Streaming support varies by provider; some providers have higher latency for streaming responses

Model availability and capabilities differ across providers; prompt engineering may need adjustment per provider

What makes it unique

Implements pluggable LLM provider abstraction (swirl/processors/rag.py) supporting OpenAI, Anthropic, Ollama, and Azure OpenAI through unified interface. Each provider implementation handles authentication, request formatting, and streaming response parsing. Allows switching providers through configuration without code changes. Supports streaming responses where tokens are returned progressively via WebSocket.

vs alternatives

More flexible than single-provider solutions because it supports multiple LLM APIs; enables cost optimization by allowing provider switching; supports self-hosted models (Ollama) for data privacy unlike cloud-only solutions.

microsoft 365 and graph api integration with oauth authentication

Medium confidence

Provides specialized connectors for Microsoft 365 ecosystem (swirl/connectors/) including Microsoft Graph API connector for querying Teams, SharePoint, OneDrive, Outlook, and other Microsoft 365 services. Implements OAuth 2.0 authentication flow for secure credential management without storing passwords. Supports Microsoft Graph query syntax translation, pagination, and result normalization. Includes admin guide (docs/M365-Guide.md) for configuring Microsoft 365 integration in enterprise environments.

Solves for

Query Teams messages, SharePoint documents, OneDrive files, and Outlook emails in unified searchIntegrate Microsoft 365 data into enterprise search without copying data to external systemsUse OAuth for secure authentication without storing user passwordsSupport Microsoft 365 multi-tenant environments with per-tenant configuration

Best for

Microsoft 365 customers wanting to add AI search to their existing Microsoft ecosystem

Enterprise teams needing to search across Microsoft 365 services and other data sources

Organizations with strict data residency requirements for Microsoft 365 data

Requires

Python 3.9+

Microsoft 365 tenant with admin access

OAuth app registration in Azure AD with appropriate permissions

Limitations

Requires Microsoft 365 tenant admin to configure OAuth app and grant permissions

Microsoft Graph API rate limits apply; high-volume searches may hit rate limits

Some Microsoft 365 services have limited search capabilities (e.g., Teams message search is limited)

What makes it unique

Implements specialized Microsoft Graph connector (swirl/connectors/) with OAuth 2.0 authentication flow and Microsoft Graph query syntax translation. Supports querying Teams, SharePoint, OneDrive, Outlook, and other Microsoft 365 services through unified connector interface. Includes admin guide (docs/M365-Guide.md) for enterprise deployment with multi-tenant support.

vs alternatives

More integrated than generic REST API connectors because it understands Microsoft Graph semantics and pagination; more secure than password-based authentication because it uses OAuth; supports multiple Microsoft 365 services through single connector unlike service-specific integrations.

docker and kubernetes containerized deployment with configuration management

Medium confidence

Provides Docker containerization (Dockerfile, docker-compose.yml) and Kubernetes deployment manifests for production deployment. Includes Nginx reverse proxy configuration for load balancing and SSL termination. Supports environment-based configuration management through .env files and Django settings, enabling deployment across development, staging, and production environments without code changes. Includes CI/CD pipeline (GitHub Actions) for automated testing, building, and deployment.

Solves for

Deploy SWIRL Search to production using Docker containersScale SWIRL Search horizontally using Kubernetes with multiple replicasConfigure SWIRL Search for different environments (dev, staging, prod) without code changesAutomate testing and deployment using CI/CD pipeline

Best for

Teams deploying SWIRL Search to production using containerized infrastructure

Organizations using Kubernetes for container orchestration

DevOps teams needing automated CI/CD pipelines for SWIRL Search deployment

Requires

Docker 20.10+ or Docker Desktop

Kubernetes 1.20+ (for Kubernetes deployment)

PostgreSQL or MySQL database (for Django ORM)

Limitations

Requires Docker and Kubernetes knowledge for production deployment

Celery workers and Redis broker must be configured separately; adds operational complexity

Persistent storage (database, Redis) must be managed externally; no built-in persistence

What makes it unique

Provides complete Docker and Kubernetes deployment setup (Dockerfile, docker-compose.yml, Kubernetes manifests) with Nginx reverse proxy configuration for production use. Includes CI/CD pipeline (GitHub Actions) for automated testing, building, and deployment. Supports environment-based configuration management through .env files and Django settings for multi-environment deployments.

vs alternatives

More production-ready than source code deployment because it includes containerization and orchestration configuration; more automated than manual deployment because it includes CI/CD pipeline; more scalable than single-server deployment because it supports Kubernetes horizontal scaling.

extensible processor pipeline for result transformation and filtering

Medium confidence

Implements pluggable processor pipeline architecture (swirl/processors/processor.py base class) where each processor transforms search results independently. Processors are composed in sequence, allowing flexible result transformation workflows: normalization, ranking, PII removal, RAG synthesis, custom filtering. Each processor implements process() method that takes results and returns transformed results. Processors can be enabled/disabled through configuration, and custom processors can be added by subclassing Processor base class.

Solves for

Apply multiple transformations to search results (normalization, ranking, filtering) in sequenceAdd custom result processing logic without modifying core search engineEnable/disable processors through configuration for different use casesCompose processors to build complex result transformation workflows

Best for

Teams needing flexible result processing workflows beyond standard ranking and normalization

Developers building custom search applications with domain-specific result transformations

Organizations with complex compliance or data governance requirements (PII removal, data classification)

Requires

Python 3.9+

Processor base class understanding (swirl/processors/processor.py)

Configuration of processor pipeline (processor list and order)

Limitations

Processor composition adds latency; each processor adds ~10-100ms depending on complexity

Processor ordering matters; incorrect ordering can produce unexpected results

Debugging processor pipelines can be complex; requires understanding of each processor's input/output

What makes it unique

Implements processor pipeline as composable sequence of independent transformers (swirl/processors/processor.py base class) where each processor implements process() method. Processors can be enabled/disabled through configuration and composed in any order. Supports built-in processors (normalization, ranking, PII removal, RAG) and custom processors through subclassing.

vs alternatives

More flexible than fixed result processing because processors are composable and configurable; more maintainable than monolithic result processing because each processor has single responsibility; more extensible than hard-coded transformations because custom processors can be added without modifying core code.

Capabilities are decomposed by AI analysis. Each maps to specific user intents and improves with match feedback.

Related Artifactssharing capabilities

Artifacts that share capabilities with swirl-search, ranked by overlap. Discovered automatically through the match graph.

Product30

Ask String

Transform data: analyze, visualize, manage—intuitively,...

multi-source data integration and unified querying

1 shared capability

Product35

Coginiti

Instant query assistance, on-demand learning, and collaborative workspaces for efficient data and analytic product...

multi-warehouse query federation

1 shared capability

Product22

TalktoData

Data discovery, cleaing, analysis & visualization

data source integration and unified querying

1 shared capability

Product31

Kater

Transform data chaos into insights with intuitive AI-driven...

multi-source data integration and connection orchestration

1 shared capability

Product34

Presto

Optimize multi-source data queries in real-time,...

federated-sql-query-execution

1 shared capability

Product31

DataSquirrel

Democratizes data analysis with AI, ensuring accessibility, accuracy, and real-time...

multi-source data connector integration

1 shared capability

Best For

✓Enterprise teams with fragmented data across 10+ SaaS and on-prem systems
✓Organizations with strict data residency or compliance requirements (HIPAA, GDPR)
✓Teams building unified search experiences without data warehouse consolidation
✓Platform teams building extensible search infrastructure for enterprise customers
✓Developers integrating proprietary or custom data sources into unified search
✓Organizations needing to support 20+ data sources without maintaining separate query logic per source
✓End-users searching across enterprise data sources
✓Teams building search applications with web-based UI

Known Limitations

⚠Latency bounded by slowest source in parallel query set — no timeout-based early termination by default
⚠Requires connector implementation for each data source type; 100+ connectors provided but custom sources need custom code
⚠No built-in query optimization across sources — each source receives full query, may return irrelevant results requiring post-processing
⚠Celery/Redis dependency adds operational complexity; requires message broker setup and worker management
⚠Connector implementation required for each new source type — no automatic discovery or schema inference
⚠Query translation is connector-specific; complex queries may not translate perfectly across all sources

Requirements

Python 3.9+Django 3.2+Celery 5.0+ with Redis or RabbitMQ brokerNetwork connectivity to all target data sourcesAuthentication credentials for each source (API keys, OAuth tokens, database credentials)Connector base class understanding (swirl/connectors/connector.py)Source-specific API documentation and authentication credentialsNetwork access to target data source

Input / Output

Accepts: natural language query string, structured search parameters (filters, date ranges, field-specific queries), normalized SWIRL query object with query string, filters, pagination parameters, source configuration (endpoint URL, API key, database connection string), search query string, filter parameters (source, date range, field-specific filters), search query and source configuration, Celery task configuration (timeout, retry logic, priority), authentication credentials (API key, OAuth token, username/password, database connection string), source configuration with authentication method specification, source configuration (name, connector type, endpoint, authentication), search parameters (query, filters, date range), raw results from connectors with source-specific fields and formats, relevance scoring configuration (algorithm, weights, thresholds), normalized search results with title, body, URL, source, relevance score, user query string, LLM configuration (model name, temperature, max_tokens), structured query parameters (filters, date ranges, field-specific queries), source configuration with field mappings, search query and configuration, client WebSocket connection, prompt string with search results as context, LLM configuration (provider, model, temperature, max_tokens), streaming flag (enable/disable streaming responses), Microsoft Graph query parameters (filter, select, search), OAuth credentials (client ID, client secret, tenant ID), Dockerfile and docker-compose.yml configuration, Environment variables (.env file), Kubernetes manifests (YAML), search results from connectors or previous processor, processor configuration (parameters, thresholds)

Produces: unified result set with normalized fields across sources, result metadata (source, relevance score, timestamp), search execution state and progress tracking, normalized result objects with standardized fields (title, body, url, date, source, relevance_score), connector metadata (source name, result count, execution time), HTML web page with search results, JSON API responses (for AJAX requests), WebSocket messages (for real-time updates), Celery task ID for tracking, search results (cached or fresh), task execution metadata (status, latency, retry count), authenticated connection to data source, authentication metadata (token expiration, refresh status), configured data sources, search history and analytics, system health metrics, normalized Result objects with standardized fields (title, body, url, date, source, relevance_score), deduplicated and re-ranked result list, PII-filtered results (if PII removal processor enabled), natural language answer string, streaming response chunks (via WebSocket), source attribution (which results were used in answer generation), LLM metadata (model used, tokens consumed, latency), source-native query syntax (SQL, Elasticsearch DSL, REST API parameters, GraphQL, etc.), query metadata (estimated result count, execution plan), real-time progress updates (source completion, result count), result chunks as they complete, search metadata (execution time, source latency), LLM response string (non-streaming) or token stream (streaming), LLM metadata (model used, tokens consumed, latency, cost), normalized results from Microsoft 365 services (Teams messages, SharePoint documents, Outlook emails, etc.), result metadata (source service, last modified date, author), Docker image (swirlai/swirl-search), Running container with SWIRL Search service, Kubernetes pods and services, transformed results (normalized, ranked, filtered, or synthesized), processor metadata (execution time, transformations applied)

UnfragileRank

Adoption53%(30% weight)

Quality53%(20% weight)

Ecosystem80%(15% weight)

Match Graph25%(30% weight)

Freshness75%(5% weight)

UnfragileRank is computed from adoption signals, documentation quality, ecosystem connectivity, match graph feedback, and freshness. No artifact can pay for a higher rank.

Type: Repository

14 capabilities

Visit swirl-search→

Repository Details

3,001

Stars

284

Forks

Python

Language

Apache-2.0

License

Topics

ai-searchbigquerydjangofederated-queryfederated-searchgptlarge-language-modelsmetasearchpythonragrelevancyretrieval-augmented-generationsearchsearch-engineunified-search

Last commit: Apr 20, 2026

About

AI Search & RAG Without Moving Your Data. Get instant answers from your company's knowledge across 100+ apps while keeping data secure. Deploy in minutes, not months.

Alternatives to swirl-search

wink-embeddings-sg-100d24Repository

100-dimensional English word embeddings for wink-nlp

Compare →

voyage-ai-provider29API

Voyage AI Provider for running Voyage AI models with Vercel AI SDK

Compare →

@vibe-agent-toolkit/rag-lancedb27Agent

LanceDB implementation of RAG interfaces for vibe-agent-toolkit

Compare →

vectra38Repository

A lightweight, file-backed vector database for Node.js and browsers with Pinecone-compatible filtering and hybrid BM25 search.

Compare →

Are you the builder of swirl-search?

Claim this artifact to get a verified badge, access match analytics, see which intents users search for, and manage your listing.

Claim this artifact →Verification via email

Get the weekly brief

New tools, rising stars, and what's actually worth your time. No spam.

Data Sources

github

Looking for something else?

Search →

Capabilities14 decomposed

federated multi-source query orchestration with parallel execution

Medium confidence

Solves for

Best for

Enterprise teams with fragmented data across 10+ SaaS and on-prem systems

Organizations with strict data residency or compliance requirements (HIPAA, GDPR)

Teams building unified search experiences without data warehouse consolidation

Requires

Python 3.9+

Django 3.2+

Celery 5.0+ with Redis or RabbitMQ broker

Limitations

Latency bounded by slowest source in parallel query set — no timeout-based early termination by default

Requires connector implementation for each data source type; 100+ connectors provided but custom sources need custom code

No built-in query optimization across sources — each source receives full query, may return irrelevant results requiring post-processing

What makes it unique

vs alternatives

connector-based data source abstraction with format translation

Medium confidence

Solves for

Best for

Platform teams building extensible search infrastructure for enterprise customers

Developers integrating proprietary or custom data sources into unified search

Organizations needing to support 20+ data sources without maintaining separate query logic per source

Requires

Python 3.9+

Connector base class understanding (swirl/connectors/connector.py)

Source-specific API documentation and authentication credentials

Limitations

Connector implementation required for each new source type — no automatic discovery or schema inference

Query translation is connector-specific; complex queries may not translate perfectly across all sources

Authentication management is per-connector; no centralized credential vault (requires external integration)

What makes it unique

vs alternatives

galaxy web ui with search interface and result visualization

Medium confidence

Solves for

Best for

End-users searching across enterprise data sources

Teams building search applications with web-based UI

Organizations wanting to provide unified search experience across multiple data sources

Requires

Python 3.9+

Django 3.2+

Modern web browser (Chrome, Firefox, Safari, Edge)

Limitations

Galaxy UI is basic web interface; no advanced visualization or analytics

Customization requires Django template and JavaScript knowledge

No mobile-optimized interface; primarily desktop-focused

What makes it unique

vs alternatives

asynchronous task execution with celery worker pool and result caching

Medium confidence

Solves for

Best for

Teams needing to handle high-volume search queries without blocking

Applications with slow data sources (data warehouses, external APIs) requiring asynchronous execution

Organizations wanting to scale search execution horizontally across multiple workers

Requires

Python 3.9+

Celery 5.0+

Redis 6.0+ or RabbitMQ 3.8+ (message broker)

Limitations

Celery adds operational complexity; requires message broker (Redis/RabbitMQ) and worker management

Result caching can return stale results; TTL must be configured appropriately per source

Task monitoring and debugging requires Celery knowledge and tools (Flower, etc.)

What makes it unique

vs alternatives

source-specific authentication and credential management

Medium confidence

Solves for

Best for

Teams integrating multiple data sources with different authentication methods

Organizations with security requirements for credential management

Applications needing to support OAuth for SaaS integrations

Requires

Python 3.9+

Source-specific authentication credentials (API keys, OAuth tokens, database credentials)

Environment variable or Django settings configuration for credentials

Limitations

No centralized credential vault; requires external integration (HashiCorp Vault, AWS Secrets Manager, etc.)

Credentials stored in environment variables or Django settings; not encrypted at rest by default

OAuth token refresh requires background job management; token expiration handling adds complexity

What makes it unique

vs alternatives

admin interface for source configuration and search management

Medium confidence

Solves for

Configure data sources without code changes through admin interfaceMonitor search performance and source healthManage user searches and search historyTroubleshoot search issues and view error logs

Best for

System administrators managing SWIRL Search deployment

Teams needing to add/remove data sources without developer involvement

Organizations wanting to monitor search performance and system health

Requires

Python 3.9+

Django 3.2+

Admin user account with Django admin access

Limitations

Django admin interface is basic; limited customization without code changes

No role-based access control (RBAC) for admin functions; all admins have full access

Search analytics are basic; no advanced reporting or visualization

What makes it unique

vs alternatives

result normalization and relevance re-ranking across heterogeneous sources

Medium confidence

Solves for

Best for

Teams building unified search UIs that need consistent result presentation across sources

Organizations with compliance requirements to filter PII from search results

Search applications where relevance ranking must account for source heterogeneity

Requires

Python 3.9+

Processor pipeline configuration (swirl/processors/)

Mixer strategy selection (BM25, TF-IDF, or custom scoring function)

Limitations

Relevance scoring assumes text-based results; non-text sources (images, structured records) require custom scoring logic

PII removal uses pattern matching and regex; may miss domain-specific sensitive data without custom rules

Deduplication is fuzzy-match based on title/URL; exact duplicates across sources may not be detected

What makes it unique

vs alternatives

More flexible than fixed ranking algorithms because mixer strategies are pluggable; more comprehensive than simple result concatenation because it handles deduplication and PII removal in pipeline.

retrieval-augmented generation (rag) with llm-powered answer synthesis

Medium confidence

Solves for

Best for

Teams building AI-powered search experiences that generate answers rather than just listing results

Organizations using LLMs to synthesize information from multiple enterprise data sources

Applications needing real-time answer generation with streaming responses

Requires

Python 3.9+

LLM API key (OpenAI, Anthropic, Azure OpenAI, or Ollama endpoint)

RAG processor configuration (model selection, prompt template, context window size)

Limitations

LLM context window limits number of results that can be included in prompt (typically 4-8 results for GPT-4)

LLM hallucination risk — generated answers may contain information not in source results; requires source attribution

LLM API latency adds 1-5 seconds to search response time depending on model and result set size

What makes it unique

vs alternatives

query transformation and source-specific syntax generation

Medium confidence

Solves for

Best for

Teams building unified search interfaces that hide source-specific query syntax from users

Developers integrating multiple data sources with different query languages

Applications needing to support complex queries (filters, date ranges, field-specific searches) across heterogeneous sources

Requires

Python 3.9+

Query configuration (field mappings, synonym lists, query expansion rules)

Source-specific query syntax knowledge (SQL, Elasticsearch DSL, GraphQL, etc.)

Limitations

Query translation is lossy — some query features may not translate to all sources (e.g., complex boolean logic may not work in all REST APIs)

Field mapping must be configured per source; no automatic schema discovery

Advanced query features (faceting, aggregations, nested queries) may not translate across all sources

What makes it unique

vs alternatives

real-time search progress tracking and websocket streaming

Medium confidence

Solves for

Best for

Web applications needing real-time search feedback without polling

Teams building search UIs where user experience depends on immediate result visibility

Applications with slow data sources (data warehouses, external APIs) where progressive result display is important

Requires

Python 3.9+

Django 3.2+ with WebSocket support (Django Channels or similar)

Redis or RabbitMQ for message broker (required for multi-server deployments)

Limitations

WebSocket requires persistent connection; not suitable for stateless HTTP-only environments

Browser compatibility: older browsers may not support WebSocket; requires fallback to polling

Scaling WebSocket connections requires sticky sessions or message broker (Redis) for multi-server deployments

What makes it unique

vs alternatives

multi-provider llm abstraction with streaming support

Medium confidence

Solves for

Best for

Teams wanting to avoid vendor lock-in by supporting multiple LLM providers

Organizations using self-hosted models (Ollama) for data privacy

Applications needing to optimize LLM provider selection based on cost or latency

Requires

Python 3.9+

LLM provider API key (OpenAI, Anthropic, Azure OpenAI) or Ollama endpoint

LLM provider configuration (model name, API endpoint, authentication)

Limitations

API differences between providers require provider-specific implementations; not all features available on all providers

Streaming support varies by provider; some providers have higher latency for streaming responses

Model availability and capabilities differ across providers; prompt engineering may need adjustment per provider

What makes it unique

vs alternatives

microsoft 365 and graph api integration with oauth authentication

Medium confidence

Solves for

Best for

Microsoft 365 customers wanting to add AI search to their existing Microsoft ecosystem

Enterprise teams needing to search across Microsoft 365 services and other data sources

Organizations with strict data residency requirements for Microsoft 365 data

Requires

Python 3.9+

Microsoft 365 tenant with admin access

OAuth app registration in Azure AD with appropriate permissions

Limitations

Requires Microsoft 365 tenant admin to configure OAuth app and grant permissions

Microsoft Graph API rate limits apply; high-volume searches may hit rate limits

Some Microsoft 365 services have limited search capabilities (e.g., Teams message search is limited)

What makes it unique

vs alternatives

docker and kubernetes containerized deployment with configuration management

Medium confidence

Solves for

Best for

Teams deploying SWIRL Search to production using containerized infrastructure

Organizations using Kubernetes for container orchestration

DevOps teams needing automated CI/CD pipelines for SWIRL Search deployment

Requires

Docker 20.10+ or Docker Desktop

Kubernetes 1.20+ (for Kubernetes deployment)

PostgreSQL or MySQL database (for Django ORM)

Limitations

Requires Docker and Kubernetes knowledge for production deployment

Celery workers and Redis broker must be configured separately; adds operational complexity

Persistent storage (database, Redis) must be managed externally; no built-in persistence

What makes it unique

vs alternatives

extensible processor pipeline for result transformation and filtering

Medium confidence

Solves for

Best for

Teams needing flexible result processing workflows beyond standard ranking and normalization

Developers building custom search applications with domain-specific result transformations

Organizations with complex compliance or data governance requirements (PII removal, data classification)

Requires

Python 3.9+

Processor base class understanding (swirl/processors/processor.py)

Configuration of processor pipeline (processor list and order)

Limitations

Processor composition adds latency; each processor adds ~10-100ms depending on complexity

Processor ordering matters; incorrect ordering can produce unexpected results

Debugging processor pipelines can be complex; requires understanding of each processor's input/output

What makes it unique

vs alternatives

Capabilities are decomposed by AI analysis. Each maps to specific user intents and improves with match feedback.

Alternatives to swirl-search

wink-embeddings-sg-100d24Repository

100-dimensional English word embeddings for wink-nlp

Compare →

voyage-ai-provider29API

Voyage AI Provider for running Voyage AI models with Vercel AI SDK

Compare →

@vibe-agent-toolkit/rag-lancedb27Agent

LanceDB implementation of RAG interfaces for vibe-agent-toolkit

Compare →

vectra38Repository

A lightweight, file-backed vector database for Node.js and browsers with Pinecone-compatible filtering and hybrid BM25 search.

Compare →

swirl-search

Capabilities14 decomposed

federated multi-source query orchestration with parallel execution

connector-based data source abstraction with format translation

galaxy web ui with search interface and result visualization

asynchronous task execution with celery worker pool and result caching

source-specific authentication and credential management

admin interface for source configuration and search management

result normalization and relevance re-ranking across heterogeneous sources

retrieval-augmented generation (rag) with llm-powered answer synthesis

query transformation and source-specific syntax generation

real-time search progress tracking and websocket streaming

multi-provider llm abstraction with streaming support

microsoft 365 and graph api integration with oauth authentication

docker and kubernetes containerized deployment with configuration management

extensible processor pipeline for result transformation and filtering

Related Artifactssharing capabilities

Ask String

Coginiti

TalktoData

Kater

Presto

DataSquirrel

Best For

Known Limitations

Requirements

Input / Output

UnfragileRank

Repository Details

About

Categories

Alternatives to swirl-search

Are you the builder of swirl-search?

Get the weekly brief

Data Sources

swirl-search

Capabilities14 decomposed

federated multi-source query orchestration with parallel execution

connector-based data source abstraction with format translation

galaxy web ui with search interface and result visualization

asynchronous task execution with celery worker pool and result caching

source-specific authentication and credential management

admin interface for source configuration and search management

result normalization and relevance re-ranking across heterogeneous sources

retrieval-augmented generation (rag) with llm-powered answer synthesis

query transformation and source-specific syntax generation

real-time search progress tracking and websocket streaming

multi-provider llm abstraction with streaming support

microsoft 365 and graph api integration with oauth authentication

docker and kubernetes containerized deployment with configuration management

extensible processor pipeline for result transformation and filtering

Related Artifactssharing capabilities

Ask String

Coginiti

TalktoData

Kater

Presto

DataSquirrel

Best For

Known Limitations

Requirements

Input / Output

UnfragileRank

Repository Details

About

Categories

Alternatives to swirl-search

Are you the builder of swirl-search?

Get the weekly brief

Data Sources