Architecture Deep Dive

This guide explains how flAPI works under the hood, why it's designed the way it is, and how it achieves millisecond API responses while dramatically reducing costs.

Key Takeaways

flAPI acts as a high-performance middleware, decoupling apps from slow backends.
The core is a DuckDB-powered caching layer that serves API requests in milliseconds.
This results in 99%+ cost savings and 10,000x faster responses compared to direct warehouse queries.
Caching is essential for enterprise systems like SAP and BigQuery.

The Decoupling Problem

Modern applications and AI agents need fast, frequent access to data. But enterprise data systems weren't built for this:

Why Data Warehouses Make Poor API Servers

Snowflake, BigQuery, and SAP are optimized for different workloads:

	Analytical Workload	API Workload
Query Pattern	Few large queries	Many small queries
Latency	Seconds acceptable	Milliseconds required
Concurrency	Low (10-100)	High (1000s)
Cost Model	Pay per data scanned	Pay per query
Frequency	Periodic (hourly/daily)	Continuous

The Cost Problem:

Direct BigQuery Queries:
- Single query: $0.05 (scans 10GB)
- 1,000 queries/day: $50/day = $1,500/month
- 10,000 queries/day: $500/day = $15,000/month

With flAPI Cache:
- Initial query: $0.05
- Cache serves 1,000s of queries: $0
- Cost: ~$0.05/day = $1.50/month

The Performance Problem:

Warehouse query: 2-10 seconds
flAPI cache hit: 0.5-2 milliseconds
1000x to 20,000x faster

flAPI's Architecture

flAPI solves this by acting as a high-performance serving layer between your backends and consumers.

Three-Layer Architecture

Key Components

1. API Router

Handles incoming HTTP requests
Routes to appropriate endpoints
Validates parameters
Formats responses
Serves OpenAPI documentation

2. Template Engine

Processes Mustache templates
Injects parameters safely
Supports conditional logic
Prevents SQL injection

3. DuckDB Cache Layer

In-process analytical database
Lightning-fast query execution
Scheduled refresh from sources
Supports incremental updates

4. Security Layer

JWT authentication
Row-level security in SQL
Rate limiting per user/endpoint
CORS and HTTPS enforcement

5. Source Connectors

Connects to multiple data sources
Uses DuckDB's extension ecosystem
Pools connections efficiently
Handles authentication

How Caching Works

The cache is the heart of flAPI's performance and cost optimization.

Cache Lifecycle

1. Initial Load
   ┌─────────────┐
   │  Source DB  │
   └──────┬──────┘
          │ Expensive query (once)
          ▼
   ┌─────────────┐
   │  DuckDB     │
   │  Cache      │
   └──────┬──────┘
          │
          ▼
   Ready to serve APIs

2. Serving Phase
   ┌──────────┐
   │ API      │
   │ Requests │
   └────┬─────┘
        │ (1000s of requests)
        ▼
   ┌─────────────┐
   │  DuckDB     │ ← Sub-millisecond responses
   │  Cache      │
   └─────────────┘

3. Refresh
   Every N minutes/hours:
   ┌─────────────┐
   │  Source DB  │ ← Refresh query
   └──────┬──────┘
          ▼
   ┌─────────────┐
   │  DuckDB     │ ← Updated data
   │  Cache      │
   └─────────────┘

Cache Configuration

# Example: Marketing campaign data
cache:
  enabled: true
  table: campaign_stats
  schema: analytics
  schedule: 60m  # Refresh every hour
  template_file: campaigns_cache.sql
  
  # Optional: Incremental updates
  strategy: merge
  merge_key: campaign_id
  updated_at_column: last_modified

Cache Strategies

Full Refresh

-- Rebuild entire cache
CREATE OR REPLACE TABLE analytics.campaign_stats AS
SELECT * FROM bigquery_scan('project.dataset.campaigns')
WHERE active = true;

Incremental Append

-- Add only new rows
INSERT INTO analytics.campaign_stats
SELECT * FROM bigquery_scan('project.dataset.campaigns')
WHERE created_at > (SELECT MAX(created_at) FROM analytics.campaign_stats);

Merge/Upsert

-- Update existing, insert new
MERGE INTO analytics.campaign_stats AS target
USING bigquery_scan('project.dataset.campaigns') AS source
ON target.campaign_id = source.campaign_id
WHEN MATCHED THEN UPDATE SET *
WHEN NOT MATCHED THEN INSERT *;

Query Flow

Without flAPI (Direct Warehouse Access)

User Request → Application → BigQuery → Wait 3s → Response
                             ($0.05)

Every request hits the warehouse. Slow and expensive.

With flAPI (Cached Serving)

User Request → flAPI → DuckDB Cache → Response (2ms)
                       (free)

Background:
Every hour → flAPI → BigQuery → Refresh Cache
                    ($0.05)

Thousands of requests served from one warehouse query.

Performance Characteristics

Latency Comparison

Data Source	Latency	Use Case
BigQuery (direct)	2-10s	Analytical queries
Snowflake (direct)	1-5s	Data warehouse queries
Postgres (direct)	50-500ms	Transactional queries
flAPI (cached)	1-50ms	API serving
Improvement	1000-20,000x

Cost Comparison

Scenario: Analytics API with 10,000 queries/day

Approach	Daily Cost	Monthly Cost	Annual Cost
Direct BigQuery	$500	$15,000	$180,000
flAPI (hourly refresh)	$1.20	$36	$432
Savings	99.76%	99.76%	99.76%

Scaling Patterns

Vertical Scaling

flAPI is single-threaded but incredibly efficient. For most use cases, a single instance handles:

10,000+ requests/second (cached queries)
Millions of rows in cache
Dozens of endpoints

Increase instance size for:

Larger cache datasets
More complex SQL transforms
Higher concurrent connections

Horizontal Scaling

For massive scale, run multiple flAPI instances:

Load Balancer
    ├── flAPI Instance 1 (with cache)
    ├── flAPI Instance 2 (with cache)
    └── flAPI Instance 3 (with cache)

Each instance has its own cache (eventually consistent). Cache refresh happens independently.

Serverless Deployment

flAPI's millisecond startup makes it perfect for serverless:

# AWS Lambda Example
- Memory: 1GB
- Startup: < 100ms cold start
- Execution: 1-5ms per request
- Cost: Pay only for requests

Cache can be stored in:

EFS (shared across invocations)
S3 + local temp (reload on cold start)
External database (DuckDB over HTTP)

Security Architecture

Multi-Layer Security

1. Network Layer
   ├── HTTPS enforcement
   ├── CORS policies
   └── Rate limiting

2. Authentication
   ├── JWT tokens
   ├── API keys
   └── Custom auth hooks

3. Authorization
   ├── Endpoint-level access
   ├── Row-level security in SQL
   └── Column-level filtering

4. Data Protection
   ├── SQL injection prevention
   ├── Input validation
   └── Output sanitization

Row-Level Security Example

-- Endpoint SQL template
SELECT *
FROM sensitive_data
WHERE 1=1
{{#context.user.roles.admin}}
  -- Admins see everything
{{/context.user.roles.admin}}
{{^context.user.roles.admin}}
  -- Users see only their region
  AND region = '{{{context.user.region}}}'
  AND department = '{{{context.user.department}}}'
{{/context.user.roles.admin}}

The cache contains all data, but the serving layer filters based on the authenticated user's context.

When to Use flAPI

Ideal Use Cases ✅

High-frequency access to stable data: Dashboard APIs, analytics endpoints
AI agent tools: Structured data access for LLMs
Customer-facing APIs: Fast responses required
Cost-sensitive workloads: Reduce warehouse bills
Multi-consumer scenarios: Multiple apps accessing same data

Not Ideal For ❌

Real-time transactional data: Use a transactional database instead
Data that changes every second: Cache refresh overhead too high
Single-query workloads: No benefit from caching
Extremely large datasets: Cache may not fit in memory

Hybrid Approach

Use flAPI for frequently-accessed data and direct queries for one-off analytics:

┌─────────────────┐
│   Application   │
└────────┬────────┘
         │
    ┌────┴────┐
    │         │
    ▼         ▼
┌────────┐ ┌──────────┐
│ flAPI  │ │ Snowflake│
│ (fast) │ │ (direct) │
└────────┘ └──────────┘
  │             │
  └─────┬───────┘
        │
  ┌─────▼──────┐
  │  Warehouse │
  └────────────┘

Next Steps

How It Works: Simple 3-step process overview
Caching Strategy: Understand cost optimization through caching
SQL Templating: Master dynamic query patterns
Caching Setup: Configure caching for your endpoints
Endpoints Overview: Learn about endpoint configuration
Authentication: Secure your APIs
Deployment Guide: Deploy to production
Examples: See architecture in action

Need help architecting your solution? Check out our professional services.

The Decoupling Problem​

Why Data Warehouses Make Poor API Servers​

flAPI's Architecture​

Three-Layer Architecture​

Key Components​

1. API Router​

2. Template Engine​

3. DuckDB Cache Layer​

4. Security Layer​

5. Source Connectors​

How Caching Works​

Cache Lifecycle​

Cache Configuration​

Cache Strategies​

Query Flow​

Without flAPI (Direct Warehouse Access)​

With flAPI (Cached Serving)​

Performance Characteristics​

Latency Comparison​

Cost Comparison​

Scaling Patterns​

Vertical Scaling​

Horizontal Scaling​

Serverless Deployment​

Security Architecture​

Multi-Layer Security​

Row-Level Security Example​

When to Use flAPI​

Ideal Use Cases ✅​

Not Ideal For ❌​

Hybrid Approach​

Next Steps​