Connecting Data Sources

flAPI leverages DuckDB's powerful extension ecosystem to connect to 20+ data sources. You can mix and match different sources in a single flAPI instance, enabling powerful data integration scenarios.

Basic Configuration

Connections are defined in your flapi.yaml configuration file:

connections:
  connection-name:
    init: |
      # SQL commands to initialize (load extensions)
      INSTALL extension_name;
      LOAD extension_name;
    properties:
      # Connection-specific properties
      property1: value1
      property2: value2

Supported Data Sources

Cloud Data Warehouses

BigQuery - Google's cloud data warehouse with millisecond caching
Snowflake - Cloud data platform with cost-optimized access
Databricks - Unified analytics platform

Databases

PostgreSQL - Popular open-source database
MySQL - Widely-used relational database
SQLite - Embedded database
ODBC - Connect to ANY database (Oracle, Teradata, DB2, and more)

File Formats

Parquet - Columnar storage format
CSV - Comma-separated values
JSON - JavaScript Object Notation
Apache Iceberg - Table format for huge datasets

Enterprise & BI Systems

SAP ERP - Enterprise resource planning (via ERPL extension)
SAP BW - Business warehouse
Power BI - Query PowerBI data models without opening PowerBI
MSOLAP - Microsoft SQL Server Analysis Services (SSAS/OLAP cubes)

No-Code & Collaborative

Google Sheets - Turn spreadsheets into APIs (no database needed!)
Airtable - Collaborative database platform (via API)

AI/ML & Advanced

Vector Search - Semantic search with Faiss/VSS for RAG applications
Arrow Flight - Connect to ML feature stores and model servers
Redis - In-memory data structures and pub/sub

Real-Time & Streaming

WebSocket Streams - Query evented data sources
Kafka - Distributed event streaming
Redis Queues - Real-time message queues

Quick Start Examples

Local Parquet File

connections:
  my-data:
    properties:
      path: './data/customers.parquet'

-- In your SQL template
SELECT * FROM '{{{conn.path}}}'

BigQuery

connections:
  bigquery-warehouse:
    init: |
      INSTALL 'bigquery';
      LOAD 'bigquery';
    properties:
      project_id: 'my-project-id'

-- In your SQL template
SELECT * FROM bigquery_scan('project.dataset.table')

PostgreSQL

connections:
  postgres-db:
    init: |
      INSTALL postgres;
      LOAD postgres;
    properties:
      host: localhost
      port: 5432
      database: mydb
      username: ${DB_USER}
      password: ${DB_PASSWORD}

-- In your SQL template
SELECT * FROM postgres_scan('mydb', 'public', 'users')

Environment Variables

Use environment variables for sensitive data:

connections:
  secure-db:
    properties:
      host: ${DB_HOST}
      username: ${DB_USER}
      password: ${DB_PASSWORD}

Environment whitelist (in main config):

template:
  environment-whitelist:
    - '^DB_.*'
    - '^GOOGLE_.*'

Multiple Connections

Connect to multiple sources in one config:

connections:
  # Production warehouse
  bigquery-prod:
    init: |
      INSTALL 'bigquery';
      LOAD 'bigquery';
    properties:
      project_id: 'prod-project'
  
  # Reference data
  customers-parquet:
    properties:
      path: './data/customers.parquet'
  
  # Operational database
  postgres-ops:
    init: |
      INSTALL postgres;
      LOAD postgres;
    properties:
      host: ops.example.com
      database: operations

Using Connections in Endpoints

Specify which connection to use in your endpoint YAML:

# Single connection
url-path: /customers/
template-source: customers.sql
connection:
  - customers-parquet

# Multiple connections (join across sources!)
url-path: /enriched-orders/
template-source: enriched_orders.sql
connection:
  - bigquery-warehouse
  - customers-parquet

Unconventional Data Sources: Why They Matter

flAPI's strength is making any data source accessible via REST APIs. Here's why unconventional sources are powerful:

🎯 Google Sheets as Database

Perfect for non-technical teams, prototyping, or collaborative data management:

Marketing teams manage content without touching code
Forms & surveys become instant APIs
No database setup required

🤖 Vector Search for AI

Build RAG (Retrieval Augmented Generation) applications:

Semantic search over documentation
Similar product recommendations
AI agents with memory

📊 Power BI Integration

Reuse existing BI models without rebuilding:

Expose dashboard data as APIs
Mobile apps access BI logic
Automated reporting pipelines

🏢 ODBC for Legacy Systems

Connect to enterprise databases without native drivers:

Oracle, Teradata, DB2, Informix
Proprietary databases
Mainframe data sources

⚡ Arrow Flight for ML

Real-time ML model serving:

Feature store integration
Model prediction APIs
High-performance data exchange

Next Steps

Popular Sources:

Google Sheets: Turn spreadsheets into APIs
BigQuery: Connect to Google BigQuery with caching
PostgreSQL: Connect to PostgreSQL
Parquet Files: Work with local/cloud files

AI/ML:

Vector Search: Build RAG applications
Snowflake: Cloud data warehouse integration

Enterprise:

Power BI: Query BI models programmatically
ODBC: Universal database connector

Examples:

Google Sheets API: Collaborative data API
Parquet API: Local file APIs
BigQuery Caching: Cloud warehouse optimization

Basic Configuration​

Supported Data Sources​

Cloud Data Warehouses​

Databases​

File Formats​

Enterprise & BI Systems​

No-Code & Collaborative​

AI/ML & Advanced​

Real-Time & Streaming​

Quick Start Examples​

Local Parquet File​

BigQuery​

PostgreSQL​

Environment Variables​

Multiple Connections​

Using Connections in Endpoints​

Unconventional Data Sources: Why They Matter​

🎯 Google Sheets as Database​

🤖 Vector Search for AI​

📊 Power BI Integration​

🏢 ODBC for Legacy Systems​

⚡ Arrow Flight for ML​

Next Steps​