Architecture Deep Dive
This guide explains how flAPI works under the hood, why it's designed the way it is, and how it achieves millisecond API responses while dramatically reducing costs.
- flAPI acts as a high-performance middleware, decoupling apps from slow backends.
- The core is a DuckDB-powered caching layer that serves API requests in milliseconds.
- This results in 99%+ cost savings and 10,000x faster responses compared to direct warehouse queries.
- Caching is essential for enterprise systems like SAP and BigQuery.
The Decoupling Problemβ
Modern applications and AI agents need fast, frequent access to data. But enterprise data systems weren't built for this:
Why Data Warehouses Make Poor API Serversβ
Snowflake, BigQuery, and SAP are optimized for different workloads:
Analytical Workload | API Workload | |
---|---|---|
Query Pattern | Few large queries | Many small queries |
Latency | Seconds acceptable | Milliseconds required |
Concurrency | Low (10-100) | High (1000s) |
Cost Model | Pay per data scanned | Pay per query |
Frequency | Periodic (hourly/daily) | Continuous |
The Cost Problem:
Direct BigQuery Queries:
- Single query: $0.05 (scans 10GB)
- 1,000 queries/day: $50/day = $1,500/month
- 10,000 queries/day: $500/day = $15,000/month
With flAPI Cache:
- Initial query: $0.05
- Cache serves 1,000s of queries: $0
- Cost: ~$0.05/day = $1.50/month
The Performance Problem:
- Warehouse query: 2-10 seconds
- flAPI cache hit: 0.5-2 milliseconds
- 1000x to 20,000x faster
flAPI's Architectureβ
flAPI solves this by acting as a high-performance serving layer between your backends and consumers.
Three-Layer Architectureβ
Key Componentsβ
1. API Routerβ
- Handles incoming HTTP requests
- Routes to appropriate endpoints
- Validates parameters
- Formats responses
- Serves OpenAPI documentation
2. Template Engineβ
- Processes Mustache templates
- Injects parameters safely
- Supports conditional logic
- Prevents SQL injection
3. DuckDB Cache Layerβ
- In-process analytical database
- Lightning-fast query execution
- Scheduled refresh from sources
- Supports incremental updates
4. Security Layerβ
- JWT authentication
- Row-level security in SQL
- Rate limiting per user/endpoint
- CORS and HTTPS enforcement
5. Source Connectorsβ
- Connects to multiple data sources
- Uses DuckDB's extension ecosystem
- Pools connections efficiently
- Handles authentication
How Caching Worksβ
The cache is the heart of flAPI's performance and cost optimization.
Cache Lifecycleβ
1. Initial Load
βββββββββββββββ
β Source DB β
ββββββββ¬βββββββ
β Expensive query (once)
βΌ
βββββββββββββββ
β DuckDB β
β Cache β
ββββββββ¬βββββββ
β
βΌ
Ready to serve APIs
2. Serving Phase
ββββββββββββ
β API β
β Requests β
ββββββ¬ββββββ
β (1000s of requests)
βΌ
βββββββββββββββ
β DuckDB β β Sub-millisecond responses
β Cache β
βββββββββββββββ
3. Refresh
Every N minutes/hours:
βββββββββββββββ
β Source DB β β Refresh query
ββββββββ¬βββββββ
βΌ
βββββββββββββββ
β DuckDB β β Updated data
β Cache β
βββββββββββββββ
Cache Configurationβ
# Example: Marketing campaign data
cache:
enabled: true
table: campaign_stats
schema: analytics
schedule: 60m # Refresh every hour
template_file: campaigns_cache.sql
# Optional: Incremental updates
strategy: merge
merge_key: campaign_id
updated_at_column: last_modified
Cache Strategiesβ
Full Refresh
-- Rebuild entire cache
CREATE OR REPLACE TABLE analytics.campaign_stats AS
SELECT * FROM bigquery_scan('project.dataset.campaigns')
WHERE active = true;
Incremental Append
-- Add only new rows
INSERT INTO analytics.campaign_stats
SELECT * FROM bigquery_scan('project.dataset.campaigns')
WHERE created_at > (SELECT MAX(created_at) FROM analytics.campaign_stats);
Merge/Upsert
-- Update existing, insert new
MERGE INTO analytics.campaign_stats AS target
USING bigquery_scan('project.dataset.campaigns') AS source
ON target.campaign_id = source.campaign_id
WHEN MATCHED THEN UPDATE SET *
WHEN NOT MATCHED THEN INSERT *;
Query Flowβ
Without flAPI (Direct Warehouse Access)β
User Request β Application β BigQuery β Wait 3s β Response
($0.05)
Every request hits the warehouse. Slow and expensive.
With flAPI (Cached Serving)β
User Request β flAPI β DuckDB Cache β Response (2ms)
(free)
Background:
Every hour β flAPI β BigQuery β Refresh Cache
($0.05)
Thousands of requests served from one warehouse query.
Performance Characteristicsβ
Latency Comparisonβ
Data Source | Latency | Use Case |
---|---|---|
BigQuery (direct) | 2-10s | Analytical queries |
Snowflake (direct) | 1-5s | Data warehouse queries |
Postgres (direct) | 50-500ms | Transactional queries |
flAPI (cached) | 1-50ms | API serving |
Improvement | 1000-20,000x |
Cost Comparisonβ
Scenario: Analytics API with 10,000 queries/day
Approach | Daily Cost | Monthly Cost | Annual Cost |
---|---|---|---|
Direct BigQuery | $500 | $15,000 | $180,000 |
flAPI (hourly refresh) | $1.20 | $36 | $432 |
Savings | 99.76% | 99.76% | 99.76% |
Scaling Patternsβ
Vertical Scalingβ
flAPI is single-threaded but incredibly efficient. For most use cases, a single instance handles:
- 10,000+ requests/second (cached queries)
- Millions of rows in cache
- Dozens of endpoints
Increase instance size for:
- Larger cache datasets
- More complex SQL transforms
- Higher concurrent connections
Horizontal Scalingβ
For massive scale, run multiple flAPI instances:
Load Balancer
βββ flAPI Instance 1 (with cache)
βββ flAPI Instance 2 (with cache)
βββ flAPI Instance 3 (with cache)
Each instance has its own cache (eventually consistent). Cache refresh happens independently.
Serverless Deploymentβ
flAPI's millisecond startup makes it perfect for serverless:
# AWS Lambda Example
- Memory: 1GB
- Startup: < 100ms cold start
- Execution: 1-5ms per request
- Cost: Pay only for requests
Cache can be stored in:
- EFS (shared across invocations)
- S3 + local temp (reload on cold start)
- External database (DuckDB over HTTP)
Security Architectureβ
Multi-Layer Securityβ
1. Network Layer
βββ HTTPS enforcement
βββ CORS policies
βββ Rate limiting
2. Authentication
βββ JWT tokens
βββ API keys
βββ Custom auth hooks
3. Authorization
βββ Endpoint-level access
βββ Row-level security in SQL
βββ Column-level filtering
4. Data Protection
βββ SQL injection prevention
βββ Input validation
βββ Output sanitization
Row-Level Security Exampleβ
-- Endpoint SQL template
SELECT *
FROM sensitive_data
WHERE 1=1
{{#context.user.roles.admin}}
-- Admins see everything
{{/context.user.roles.admin}}
{{^context.user.roles.admin}}
-- Users see only their region
AND region = '{{{context.user.region}}}'
AND department = '{{{context.user.department}}}'
{{/context.user.roles.admin}}
The cache contains all data, but the serving layer filters based on the authenticated user's context.
When to Use flAPIβ
Ideal Use Cases β β
- High-frequency access to stable data: Dashboard APIs, analytics endpoints
- AI agent tools: Structured data access for LLMs
- Customer-facing APIs: Fast responses required
- Cost-sensitive workloads: Reduce warehouse bills
- Multi-consumer scenarios: Multiple apps accessing same data
Not Ideal For ββ
- Real-time transactional data: Use a transactional database instead
- Data that changes every second: Cache refresh overhead too high
- Single-query workloads: No benefit from caching
- Extremely large datasets: Cache may not fit in memory
Hybrid Approachβ
Use flAPI for frequently-accessed data and direct queries for one-off analytics:
ββββββ βββββββββββββ
β Application β
ββββββββββ¬βββββββββ
β
ββββββ΄βββββ
β β
βΌ βΌ
ββββββββββ ββββββββββββ
β flAPI β β Snowflakeβ
β (fast) β β (direct) β
ββββββββββ ββββββββββββ
β β
βββββββ¬ββββββββ
β
βββββββΌβββββββ
β Warehouse β
βββββββββ βββββ
Next Stepsβ
- How It Works: Simple 3-step process overview
- Caching Strategy: Understand cost optimization through caching
- SQL Templating: Master dynamic query patterns
- Caching Setup: Configure caching for your endpoints
- Endpoints Overview: Learn about endpoint configuration
- Authentication: Secure your APIs
- Deployment Guide: Deploy to production
- Examples: See architecture in action
Need help architecting your solution? Check out our professional services.