BigQuery + Caching Example
This example shows how to expose BigQuery data as a fast, cost-effective API using flAPI's caching layer.
Scenario
Problem:
- Marketing team needs a dashboard API
- Data in BigQuery (100GB table)
- 10,000 API calls per day
- Direct queries cost $500/day = $15,000/month
Solution:
- Cache aggregated data in DuckDB
- Refresh hourly
- Cost: $1.20/day = $36/month
- Savings: 99.76%
Architecture Flow
Complete Setup
1. Configuration
flapi.yaml:
project-name: marketing-api
project-description: Cached BigQuery marketing data
template:
path: './sqls'
connections:
bigquery-marketing:
init: |
INSTALL 'bigquery' FROM community;
LOAD 'bigquery';
properties:
project_id: 'my-project-id'
duckdb:
access_mode: READ_WRITE
# DuckLake-backed cache for incremental refresh and snapshots
ducklake:
enabled: true
alias: cache
metadata-path: ./data/cache.ducklake
data-path: ./data/cache
2. Endpoint Configuration
sqls/campaigns.yaml:
url-path: /campaigns/
# Enable caching
cache:
enabled: true
table: campaigns_cache
schema: analytics
schedule: 60m # Refresh every hour
template-file: campaigns_cache.sql
# API parameters
request:
- field-name: country
field-in: query
description: Filter by country code (e.g., US, DE)
required: false
validators:
- type: string
regex: '^[A-Z]{2}$'
- field-name: campaign_type
field-in: query
description: Campaign type (email, social, search)
required: false
validators:
- type: enum
allowedValues: [email, social, search, display]
template-source: campaigns.sql
connection:
- bigquery-marketing
3. Cache Template
sqls/campaigns_cache.sql:
-- This runs every hour to refresh the cache.
-- flAPI executes the SELECT and writes the result into the configured
-- cache table ({{cache.schema}}.{{cache.table}}) inside DuckLake.
SELECT
campaign_id,
campaign_type,
country,
campaign_name,
SUM(clicks) as total_clicks,
SUM(conversions) as total_conversions,
SUM(revenue) as total_revenue,
ROUND(SUM(revenue) / NULLIF(SUM(clicks), 0), 2) as revenue_per_click,
MAX(last_updated) as last_updated
FROM bigquery_scan('{{conn.project_id}}.marketing.campaigns')
WHERE date >= CURRENT_DATE - INTERVAL '30 days'
AND active = true
GROUP BY 1, 2, 3, 4
4. API Template
sqls/campaigns.sql:
-- This serves API requests from the cache (fast!)
-- No BigQuery cost per request. The cache table is fully qualified through
-- the DuckLake catalog/schema/table that flAPI injects into the template.
SELECT
campaign_id,
campaign_type,
country,
campaign_name,
total_clicks,
total_conversions,
total_revenue,
revenue_per_click,
last_updated
FROM {{cache.catalog}}.{{cache.schema}}.{{cache.table}}
WHERE 1=1
{{#params.country}}
AND country = '{{{params.country}}}'
{{/params.country}}
{{#params.campaign_type}}
AND campaign_type = '{{{params.campaign_type}}}'
{{/params.campaign_type}}
ORDER BY total_revenue DESC
LIMIT 100
Running the Example
Start flAPI
# Set credentials
export GOOGLE_APPLICATION_CREDENTIALS=/path/to/service-account.json
# Start server
$ ./flapi -c flapi.yaml
✓ Loaded 1 endpoints
✓ Cache scheduled: campaigns (60m)
✓ Server listening on :8080
Test the API
# All campaigns
$ curl http://localhost:8080/campaigns/
# US campaigns only
$ curl "http://localhost:8080/campaigns/?country=US"
# Social campaigns in Germany
$ curl "http://localhost:8080/campaigns/?country=DE&campaign_type=social"
Response:
{
"data": [
{
"campaign_id": "c_12345",
"campaign_type": "social",
"country": "US",
"campaign_name": "Summer Sale 2024",
"total_clicks": 150000,
"total_conversions": 1200,
"total_revenue": 45000.00,
"revenue_per_click": 0.30,
"last_updated": "2024-01-15 10:30:00"
}
]
}
Cost Analysis
Without flAPI
10,000 API calls/day × $0.05 = $500/day
Monthly cost: $15,000
Response time: 2-5 seconds
With flAPI
Cache refresh: 24 times/day × $0.05 = $1.20/day
API serving: FREE (from cache)
Monthly cost: $36
Response time: 1-50ms
Savings: $14,964/month (99.76%)
Speed: 1000-10,000x faster
Monitoring Cache
Check Cache Status
$ flapii cache status
campaigns_cache:
Rows: 12,450
Size: 2.3 MB
Last refresh: 5 minutes ago
API calls since refresh: 1,234
Manual Refresh
$ flapii cache refresh campaigns_cache
Refreshing cache: campaigns_cache
Source: BigQuery
✓ Query executed (2.3s)
✓ 12,450 rows loaded
✓ Cache updated
Three cache refresh modes
The example above uses a full refresh of an aggregated dataset every
hour. flAPI actually supports three refresh patterns, and the right choice
depends on table size, how the data mutates, and how fresh the API needs to
feel. All three reuse the same bigquery_scan() source.
Mode 1: Full refresh
The simplest mode: every scheduled tick rebuilds the cache table from scratch. Use for small tables, heavy aggregations, or data without a reliable updated-at column.
url-path: /campaigns/full/
template-source: campaigns.sql
connection: [bigquery-marketing]
cache:
enabled: true
table: campaigns_full_cache
schema: analytics
schedule: 60m
template-file: campaigns_full_cache.sql
-- No cursor, no primary key — flAPI rewrites the table on every run.
SELECT
campaign_id,
campaign_type,
country,
campaign_name,
clicks,
conversions,
revenue,
last_updated
FROM bigquery_scan('{{conn.project_id}}.marketing.campaigns')
WHERE active = true
Mode 2: Incremental append
Add a cursor: block (but no primary-key) and flAPI exposes
{{cache.previousSnapshotTimestamp}} to the template. Only rows newer than
the last snapshot are scanned, and they're appended to the existing table.
Perfect for immutable event streams.
url-path: /campaigns/events/
template-source: campaigns.sql
connection: [bigquery-marketing]
cache:
enabled: true
table: campaigns_events_cache
schema: analytics
schedule: 5m
template-file: campaigns_events_cache.sql
cursor:
column: last_updated
type: timestamp
-- Only fetch rows newer than the previous snapshot.
SELECT
campaign_id,
campaign_type,
country,
campaign_name,
clicks,
conversions,
revenue,
last_updated
FROM bigquery_scan('{{conn.project_id}}.marketing.campaigns')
WHERE last_updated > COALESCE(
TRY_CAST('{{{cache.previousSnapshotTimestamp}}}' AS TIMESTAMP),
TIMESTAMP '1970-01-01 00:00:00'
)
Mode 3: Incremental merge
Add both cursor: and primary-key: and flAPI upserts changed rows
into the cache. This is what you want for mutable fact tables — campaigns
whose budgets get adjusted, products whose stock changes, customers whose
emails update. The example below also opts into the full retention
controls: keep the 10 most recent snapshots for up to 30 days and allow
rollbacks within a 2-day window.
url-path: /campaigns/mutable/
template-source: campaigns.sql
connection: [bigquery-marketing]
cache:
enabled: true
table: campaigns_merge_cache
schema: analytics
schedule: 5m
template-file: campaigns_merge_cache.sql
primary-key: [campaign_id]
cursor:
column: last_updated
type: timestamp
retention:
keep-last-snapshots: 10
max-snapshot-age: 30d
rollback-window: 2d
-- Pull updated rows; flAPI merges them onto campaign_id.
SELECT
campaign_id,
campaign_type,
country,
campaign_name,
clicks,
conversions,
revenue,
last_updated
FROM bigquery_scan('{{conn.project_id}}.marketing.campaigns')
WHERE last_updated > COALESCE(
TRY_CAST('{{{cache.previousSnapshotTimestamp}}}' AS TIMESTAMP),
TIMESTAMP '1970-01-01 00:00:00'
)
Picking a mode
| Mode | Set | Rebuilds | Good for | Avoid when |
|---|---|---|---|---|
| Full refresh | nothing | entire table each tick | aggregations, small tables, no updated-at column | tables larger than your refresh budget |
| Incremental append | cursor | only new rows | event streams, immutable log tables | rows can change after insert |
| Incremental merge | cursor + primary-key | changed rows, upserted | mutable fact tables, slowly-changing dimensions | no reliable row identity |
Next Steps
- Caching Strategy: Understand cost optimization
- Architecture: Learn how flAPI achieves performance
- BigQuery Connection Guide: Detailed BigQuery setup
- Caching Setup Guide: Configure caching strategies
- SQL Templating: Master template syntax
- Parquet Example: Start with local files
- SAP ERP Example: Enterprise caching patterns
- Deployment: Deploy to production