Architecture Deep Dive
This guide explains how flAPI works under the hood, why it's designed the way it is, and how it achieves millisecond API responses while dramatically reducing costs.
- flAPI acts as a high-performance middleware, decoupling apps from slow backends.
- The core is a DuckDB-powered caching layer that serves API requests in milliseconds.
- This results in 99%+ cost savings and 10,000x faster responses compared to direct warehouse queries.
- Caching is essential for enterprise systems like SAP and BigQuery.
The Decoupling Problem
Modern applications and AI agents need fast, frequent access to data. But enterprise data systems weren't built for this:
Why Data Warehouses Make Poor API Servers
Snowflake, BigQuery, and SAP are optimized for different workloads:
| Analytical Workload | API Workload | |
|---|---|---|
| Query Pattern | Few large queries | Many small queries |
| Latency | Seconds acceptable | Milliseconds required |
| Concurrency | Low (10-100) | High (1000s) |
| Cost Model | Pay per data scanned | Pay per query |
| Frequency | Periodic (hourly/daily) | Continuous |
The Cost Problem:
Direct BigQuery Queries:
- Single query: $0.05 (scans 10GB)
- 1,000 queries/day: $50/day = $1,500/month
- 10,000 queries/day: $500/day = $15,000/month
With flAPI Cache:
- Initial query: $0.05
- Cache serves 1,000s of queries: $0
- Cost: ~$0.05/day = $1.50/month
The Performance Problem:
- Warehouse query: 2-10 seconds
- flAPI cache hit: 0.5-2 milliseconds
- 1000x to 20,000x faster
flAPI's Architecture
flAPI solves this by acting as a high-performance serving layer between your backends and consumers.
Three-Layer Architecture
Key Components
1. API Router
- Handles incoming HTTP requests
- Routes to appropriate endpoints
- Validates parameters
- Formats responses
- Serves OpenAPI documentation
2. Template Engine
- Processes Mustache templates
- Injects parameters safely
- Supports conditional logic
- Prevents SQL injection
3. DuckDB Cache Layer
- In-process analytical database
- Lightning-fast query execution
- Scheduled refresh from sources
- Supports incremental updates
4. Security Layer
- JWT authentication
- Row-level security in SQL
- Rate limiting per user/endpoint
- CORS and HTTPS enforcement
5. Source Connectors
- Connects to multiple data sources
- Uses DuckDB's extension ecosystem
- Pools connections efficiently
- Handles authentication
How Caching Works
The cache is the heart of flAPI's performance and cost optimization.
Cache Lifecycle
1. Initial Load
┌─────────────┐
│ Source DB │
└──────┬──────┘
│ Expensive query (once)
▼
┌─────────────┐
│ DuckDB │
│ Cache │
└──────┬──────┘
│
▼
Ready to serve APIs
2. Serving Phase
┌──────────┐
│ API │
│ Requests │
└────┬─────┘
│ (1000s of requests)
▼
┌─────────────┐
│ DuckDB │ ← Sub-millisecond responses
│ Cache │
└─────────────┘
3. Refresh
Every N minutes/hours:
┌─────────────┐
│ Source DB │ ← Refresh query
└──────┬──────┘
▼
┌─────────────┐
│ DuckDB │ ← Updated data
│ Cache │
└─────────────┘
Cache Configuration
# Example: Marketing campaign data, cached via DuckLake
cache:
enabled: true
table: campaign_stats
schema: analytics
schedule: 60m # Refresh every hour
template-file: campaigns_cache.sql
# Pick the refresh mode by what you provide:
# - omit both -> full refresh
# - cursor only -> incremental append
# - cursor + primary-key -> incremental merge / upsert
primary-key: [campaign_id]
cursor:
column: last_modified
type: timestamp
retention:
keep-last-snapshots: 5
max-snapshot-age: 14d
Cache Refresh Modes
flAPI's cache template is a SELECT query — flAPI writes the result set to DuckLake using one of three modes, chosen automatically from the keys above:
Full refresh (no cursor, no primary-key) — replaces all rows in the snapshot.
-- campaigns_cache.sql
SELECT *
FROM bigquery_scan('project.dataset.campaigns')
WHERE active = true
Incremental append (only cursor) — adds rows newer than the previous snapshot.
SELECT *
FROM bigquery_scan('project.dataset.campaigns')
WHERE last_modified > '{{cache.previousSnapshotTimestamp}}'
Incremental merge / upsert (cursor + primary-key) — updates changed rows and inserts new ones, keyed on primary-key.
SELECT *
FROM bigquery_scan('project.dataset.campaigns')
WHERE last_modified > '{{cache.previousSnapshotTimestamp}}'
Query Flow
Without flAPI (Direct Warehouse Access)
User Request → Application → BigQuery → Wait 3s → Response
($0.05)
Every request hits the warehouse. Slow and expensive.
With flAPI (Cached Serving)
User Request → flAPI → DuckDB Cache → Response (2ms)
(free)
Background:
Every hour → flAPI → BigQuery → Refresh Cache
($0.05)
Thousands of requests served from one warehouse query.
Performance Characteristics
Latency Comparison
| Data Source | Latency | Use Case |
|---|---|---|
| BigQuery (direct) | 2-10s | Analytical queries |
| Snowflake (direct) | 1-5s | Data warehouse queries |
| Postgres (direct) | 50-500ms | Transactional queries |
| flAPI (cached) | 1-50ms | API serving |
| Improvement | 1000-20,000x |
Cost Comparison
Scenario: Analytics API with 10,000 queries/day
| Approach | Daily Cost | Monthly Cost | Annual Cost |
|---|---|---|---|
| Direct BigQuery | $500 | $15,000 | $180,000 |
| flAPI (hourly refresh) | $1.20 | $36 | $432 |
| Savings | 99.76% | 99.76% | 99.76% |
Scaling Patterns
Vertical Scaling
flAPI is single-threaded but incredibly efficient. For most use cases, a single instance handles:
- 10,000+ requests/second (cached queries)
- Millions of rows in cache
- Dozens of endpoints
Increase instance size for:
- Larger cache datasets
- More complex SQL transforms
- Higher concurrent connections
Horizontal Scaling
For massive scale, run multiple flAPI instances:
Load Balancer
├── flAPI Instance 1 (with cache)
├── flAPI Instance 2 (with cache)
└── flAPI Instance 3 (with cache)
Each instance has its own cache (eventually consistent). Cache refresh happens independently.
Serverless Deployment
flAPI's millisecond startup makes it perfect for serverless:
# AWS Lambda Example
- Memory: 1GB
- Startup: < 100ms cold start
- Execution: 1-5ms per request
- Cost: Pay only for requests
Cache can be stored in:
- EFS (shared across invocations)
- S3 + local temp (reload on cold start)
- External database (DuckDB over HTTP)
Security Architecture
Multi-Layer Security
1. Network Layer
├── HTTPS enforcement
├── CORS policies
└── Rate limiting
2. Authentication
├── JWT tokens
├── API keys
└── Custom auth hooks
3. Authorization
├── Endpoint-level access
├── Row-level security in SQL
└── Column-level filtering
4. Data Protection
├── SQL injection prevention
├── Input validation
└── Output sanitization
Row-Level Security Example
-- Endpoint SQL template
-- auth.roles is a comma-joined string (e.g. "read,admin"), so we match with LIKE.
SELECT *
FROM sensitive_data
WHERE 1=1
AND (
-- Admins see everything
'{{auth.roles}}' LIKE '%admin%'
OR (
-- Other users see only their own region / department
region = '{{{params.region}}}'
AND created_by = '{{{auth.username}}}'
)
)
The cache contains all data, but the serving layer filters based on the authenticated user's auth.* context (auth.username, auth.roles, auth.email, auth.type, auth.authenticated).
When to Use flAPI
Ideal Use Cases ✅
- High-frequency access to stable data: Dashboard APIs, analytics endpoints
- AI agent tools: Structured data access for LLMs
- Customer-facing APIs: Fast responses required
- Cost-sensitive workloads: Reduce warehouse bills
- Multi-consumer scenarios: Multiple apps accessing same data
Not Ideal For ❌
- Real-time transactional data: Use a transactional database instead
- Data that changes every second: Cache refresh overhead too high
- Single-query workloads: No benefit from caching
- Extremely large datasets: Cache may not fit in memory
Hybrid Approach
Use flAPI for frequently-accessed data and direct queries for one-off analytics:
┌─────────────────┐
│ Application │
└────────┬────────┘
│
┌────┴────┐
│ │
▼ ▼
┌────────┐ ┌──────────┐
│ flAPI │ │ Snowflake│
│ (fast) │ │ (direct) │
└────────┘ └──────────┘
│ │
└─────┬───────┘
│
┌─────▼──────┐
│ Warehouse │
└────────────┘
Next Steps
- How It Works: Simple 3-step process overview
- Caching Strategy: Understand cost optimization through caching
- SQL Templating: Master dynamic query patterns
- Caching Setup: Configure caching for your endpoints
- Endpoints Overview: Learn about endpoint configuration
- Authentication: Secure your APIs
- Deployment Guide: Deploy to production
- Examples: See architecture in action
Need help architecting your solution? Check out our professional services.