Skip to main content

Connecting Data Sources

flAPI leverages DuckDB's powerful extension ecosystem to connect to 20+ data sources. You can mix and match different sources in a single flAPI instance, enabling powerful data integration scenarios.

Basic Configuration

Connections are defined in your flapi.yaml configuration file:

connections:
connection-name:
init: |
# SQL commands to initialize (load extensions)
INSTALL extension_name;
LOAD extension_name;
properties:
# Connection-specific properties
property1: value1
property2: value2

Supported Data Sources

Cloud Data Warehouses

  • BigQuery - Google's cloud data warehouse with millisecond caching
  • Snowflake - Cloud data platform with cost-optimized access
  • Databricks - Unified analytics platform

Databases

  • PostgreSQL - Popular open-source database
  • MySQL - Widely-used relational database
  • SQLite - Embedded database
  • ODBC - Connect to ANY database (Oracle, Teradata, DB2, and more)

File Formats

  • Parquet - Columnar storage format
  • CSV - Comma-separated values
  • JSON - JavaScript Object Notation
  • Apache Iceberg - Table format for huge datasets

Enterprise & BI Systems

  • SAP ERP - Enterprise resource planning (via ERPL extension)
  • SAP BW - Business warehouse
  • Power BI - Query PowerBI data models without opening PowerBI
  • MSOLAP - Microsoft SQL Server Analysis Services (SSAS/OLAP cubes)

No-Code & Collaborative

  • Google Sheets - Turn spreadsheets into APIs (no database needed!)
  • Airtable - Collaborative database platform (via API)

AI/ML & Advanced

  • Vector Search - Semantic search with Faiss/VSS for RAG applications
  • Arrow Flight - Connect to ML feature stores and model servers
  • Redis - In-memory data structures and pub/sub

Real-Time & Streaming

  • WebSocket Streams - Query evented data sources
  • Kafka - Distributed event streaming
  • Redis Queues - Real-time message queues

Quick Start Examples

Local Parquet File

connections:
my-data:
properties:
path: './data/customers.parquet'
-- In your SQL template
SELECT * FROM '{{{conn.path}}}'

BigQuery

connections:
bigquery-warehouse:
init: |
INSTALL 'bigquery';
LOAD 'bigquery';
properties:
project_id: 'my-project-id'
-- In your SQL template
SELECT * FROM bigquery_scan('project.dataset.table')

PostgreSQL

connections:
postgres-db:
init: |
INSTALL postgres;
LOAD postgres;
properties:
host: localhost
port: 5432
database: mydb
username: ${DB_USER}
password: ${DB_PASSWORD}
-- In your SQL template
SELECT * FROM postgres_scan('mydb', 'public', 'users')

Environment Variables

Use environment variables for sensitive data:

connections:
secure-db:
properties:
host: ${DB_HOST}
username: ${DB_USER}
password: ${DB_PASSWORD}

Environment whitelist (in main config):

template:
environment-whitelist:
- '^DB_.*'
- '^GOOGLE_.*'

Multiple Connections

Connect to multiple sources in one config:

connections:
# Production warehouse
bigquery-prod:
init: |
INSTALL 'bigquery';
LOAD 'bigquery';
properties:
project_id: 'prod-project'

# Reference data
customers-parquet:
properties:
path: './data/customers.parquet'

# Operational database
postgres-ops:
init: |
INSTALL postgres;
LOAD postgres;
properties:
host: ops.example.com
database: operations

Using Connections in Endpoints

Specify which connection to use in your endpoint YAML:

# Single connection
url-path: /customers/
template-source: customers.sql
connection:
- customers-parquet

# Multiple connections (join across sources!)
url-path: /enriched-orders/
template-source: enriched_orders.sql
connection:
- bigquery-warehouse
- customers-parquet

Unconventional Data Sources: Why They Matter

flAPI's strength is making any data source accessible via REST APIs. Here's why unconventional sources are powerful:

🎯 Google Sheets as Database

Perfect for non-technical teams, prototyping, or collaborative data management:

  • Marketing teams manage content without touching code
  • Forms & surveys become instant APIs
  • No database setup required

🤖 Vector Search for AI

Build RAG (Retrieval Augmented Generation) applications:

  • Semantic search over documentation
  • Similar product recommendations
  • AI agents with memory

📊 Power BI Integration

Reuse existing BI models without rebuilding:

  • Expose dashboard data as APIs
  • Mobile apps access BI logic
  • Automated reporting pipelines

🏢 ODBC for Legacy Systems

Connect to enterprise databases without native drivers:

  • Oracle, Teradata, DB2, Informix
  • Proprietary databases
  • Mainframe data sources

Arrow Flight for ML

Real-time ML model serving:

  • Feature store integration
  • Model prediction APIs
  • High-performance data exchange

Next Steps

Popular Sources:

AI/ML:

Enterprise:

  • Power BI: Query BI models programmatically
  • ODBC: Universal database connector

Examples:

🍪 Cookie Settings