Skip to main content

Architecture Deep Dive

This guide explains how flAPI works under the hood, why it's designed the way it is, and how it achieves millisecond API responses while dramatically reducing costs.

Key Takeaways
  • flAPI acts as a high-performance middleware, decoupling apps from slow backends.
  • The core is a DuckDB-powered caching layer that serves API requests in milliseconds.
  • This results in 99%+ cost savings and 10,000x faster responses compared to direct warehouse queries.
  • Caching is essential for enterprise systems like SAP and BigQuery.

The Decoupling Problem​

Modern applications and AI agents need fast, frequent access to data. But enterprise data systems weren't built for this:

Why Data Warehouses Make Poor API Servers​

Snowflake, BigQuery, and SAP are optimized for different workloads:

Analytical WorkloadAPI Workload
Query PatternFew large queriesMany small queries
LatencySeconds acceptableMilliseconds required
ConcurrencyLow (10-100)High (1000s)
Cost ModelPay per data scannedPay per query
FrequencyPeriodic (hourly/daily)Continuous

The Cost Problem:

Direct BigQuery Queries:
- Single query: $0.05 (scans 10GB)
- 1,000 queries/day: $50/day = $1,500/month
- 10,000 queries/day: $500/day = $15,000/month

With flAPI Cache:
- Initial query: $0.05
- Cache serves 1,000s of queries: $0
- Cost: ~$0.05/day = $1.50/month

The Performance Problem:

  • Warehouse query: 2-10 seconds
  • flAPI cache hit: 0.5-2 milliseconds
  • 1000x to 20,000x faster

flAPI's Architecture​

flAPI solves this by acting as a high-performance serving layer between your backends and consumers.

Three-Layer Architecture​

Key Components​

1. API Router​

  • Handles incoming HTTP requests
  • Routes to appropriate endpoints
  • Validates parameters
  • Formats responses
  • Serves OpenAPI documentation

2. Template Engine​

  • Processes Mustache templates
  • Injects parameters safely
  • Supports conditional logic
  • Prevents SQL injection

3. DuckDB Cache Layer​

  • In-process analytical database
  • Lightning-fast query execution
  • Scheduled refresh from sources
  • Supports incremental updates

4. Security Layer​

  • JWT authentication
  • Row-level security in SQL
  • Rate limiting per user/endpoint
  • CORS and HTTPS enforcement

5. Source Connectors​

  • Connects to multiple data sources
  • Uses DuckDB's extension ecosystem
  • Pools connections efficiently
  • Handles authentication

How Caching Works​

The cache is the heart of flAPI's performance and cost optimization.

Cache Lifecycle​

1. Initial Load
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ Source DB β”‚
β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”˜
β”‚ Expensive query (once)
β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ DuckDB β”‚
β”‚ Cache β”‚
β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”˜
β”‚
β–Ό
Ready to serve APIs

2. Serving Phase
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ API β”‚
β”‚ Requests β”‚
β””β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”˜
β”‚ (1000s of requests)
β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ DuckDB β”‚ ← Sub-millisecond responses
β”‚ Cache β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

3. Refresh
Every N minutes/hours:
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ Source DB β”‚ ← Refresh query
β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”˜
β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ DuckDB β”‚ ← Updated data
β”‚ Cache β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Cache Configuration​

# Example: Marketing campaign data
cache:
enabled: true
table: campaign_stats
schema: analytics
schedule: 60m # Refresh every hour
template_file: campaigns_cache.sql

# Optional: Incremental updates
strategy: merge
merge_key: campaign_id
updated_at_column: last_modified

Cache Strategies​

Full Refresh

-- Rebuild entire cache
CREATE OR REPLACE TABLE analytics.campaign_stats AS
SELECT * FROM bigquery_scan('project.dataset.campaigns')
WHERE active = true;

Incremental Append

-- Add only new rows
INSERT INTO analytics.campaign_stats
SELECT * FROM bigquery_scan('project.dataset.campaigns')
WHERE created_at > (SELECT MAX(created_at) FROM analytics.campaign_stats);

Merge/Upsert

-- Update existing, insert new
MERGE INTO analytics.campaign_stats AS target
USING bigquery_scan('project.dataset.campaigns') AS source
ON target.campaign_id = source.campaign_id
WHEN MATCHED THEN UPDATE SET *
WHEN NOT MATCHED THEN INSERT *;

Query Flow​

Without flAPI (Direct Warehouse Access)​

User Request β†’ Application β†’ BigQuery β†’ Wait 3s β†’ Response
($0.05)

Every request hits the warehouse. Slow and expensive.

With flAPI (Cached Serving)​

User Request β†’ flAPI β†’ DuckDB Cache β†’ Response (2ms)
(free)

Background:
Every hour β†’ flAPI β†’ BigQuery β†’ Refresh Cache
($0.05)

Thousands of requests served from one warehouse query.

Performance Characteristics​

Latency Comparison​

Data SourceLatencyUse Case
BigQuery (direct)2-10sAnalytical queries
Snowflake (direct)1-5sData warehouse queries
Postgres (direct)50-500msTransactional queries
flAPI (cached)1-50msAPI serving
Improvement1000-20,000x

Cost Comparison​

Scenario: Analytics API with 10,000 queries/day

ApproachDaily CostMonthly CostAnnual Cost
Direct BigQuery$500$15,000$180,000
flAPI (hourly refresh)$1.20$36$432
Savings99.76%99.76%99.76%

Scaling Patterns​

Vertical Scaling​

flAPI is single-threaded but incredibly efficient. For most use cases, a single instance handles:

  • 10,000+ requests/second (cached queries)
  • Millions of rows in cache
  • Dozens of endpoints

Increase instance size for:

  • Larger cache datasets
  • More complex SQL transforms
  • Higher concurrent connections

Horizontal Scaling​

For massive scale, run multiple flAPI instances:

Load Balancer
β”œβ”€β”€ flAPI Instance 1 (with cache)
β”œβ”€β”€ flAPI Instance 2 (with cache)
└── flAPI Instance 3 (with cache)

Each instance has its own cache (eventually consistent). Cache refresh happens independently.

Serverless Deployment​

flAPI's millisecond startup makes it perfect for serverless:

# AWS Lambda Example
- Memory: 1GB
- Startup: < 100ms cold start
- Execution: 1-5ms per request
- Cost: Pay only for requests

Cache can be stored in:

  • EFS (shared across invocations)
  • S3 + local temp (reload on cold start)
  • External database (DuckDB over HTTP)

Security Architecture​

Multi-Layer Security​

1. Network Layer
β”œβ”€β”€ HTTPS enforcement
β”œβ”€β”€ CORS policies
└── Rate limiting

2. Authentication
β”œβ”€β”€ JWT tokens
β”œβ”€β”€ API keys
└── Custom auth hooks

3. Authorization
β”œβ”€β”€ Endpoint-level access
β”œβ”€β”€ Row-level security in SQL
└── Column-level filtering

4. Data Protection
β”œβ”€β”€ SQL injection prevention
β”œβ”€β”€ Input validation
└── Output sanitization

Row-Level Security Example​

-- Endpoint SQL template
SELECT *
FROM sensitive_data
WHERE 1=1
{{#context.user.roles.admin}}
-- Admins see everything
{{/context.user.roles.admin}}
{{^context.user.roles.admin}}
-- Users see only their region
AND region = '{{{context.user.region}}}'
AND department = '{{{context.user.department}}}'
{{/context.user.roles.admin}}

The cache contains all data, but the serving layer filters based on the authenticated user's context.

When to Use flAPI​

Ideal Use Cases βœ…β€‹

  • High-frequency access to stable data: Dashboard APIs, analytics endpoints
  • AI agent tools: Structured data access for LLMs
  • Customer-facing APIs: Fast responses required
  • Cost-sensitive workloads: Reduce warehouse bills
  • Multi-consumer scenarios: Multiple apps accessing same data

Not Ideal For βŒβ€‹

  • Real-time transactional data: Use a transactional database instead
  • Data that changes every second: Cache refresh overhead too high
  • Single-query workloads: No benefit from caching
  • Extremely large datasets: Cache may not fit in memory

Hybrid Approach​

Use flAPI for frequently-accessed data and direct queries for one-off analytics:

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ Application β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”˜
β”‚
β”Œβ”€β”€β”€β”€β”΄β”€β”€β”€β”€β”
β”‚ β”‚
β–Ό β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β” β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ flAPI β”‚ β”‚ Snowflakeβ”‚
β”‚ (fast) β”‚ β”‚ (direct) β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
β”‚ β”‚
β””β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”˜
β”‚
β”Œβ”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”
β”‚ Warehouse β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Next Steps​

Need help architecting your solution? Check out our professional services.

πŸͺ Cookie Settings