Skip to main content

BigQuery + Caching Example

This example shows how to expose BigQuery data as a fast, cost-effective API using flAPI's caching layer.

Scenario

Problem:

  • Marketing team needs a dashboard API
  • Data in BigQuery (100GB table)
  • 10,000 API calls per day
  • Direct queries cost $500/day = $15,000/month

Solution:

  • Cache aggregated data in DuckDB
  • Refresh hourly
  • Cost: $1.20/day = $36/month
  • Savings: 99.76%

Architecture Flow

Complete Setup

1. Configuration

flapi.yaml:

project-name: marketing-api
project-description: Cached BigQuery marketing data

template:
path: './sqls'

connections:
bigquery-marketing:
init: |
INSTALL 'bigquery' FROM community;
LOAD 'bigquery';
properties:
project_id: 'my-project-id'

duckdb:
access_mode: READ_WRITE

# DuckLake-backed cache for incremental refresh and snapshots
ducklake:
enabled: true
alias: cache
metadata-path: ./data/cache.ducklake
data-path: ./data/cache

2. Endpoint Configuration

sqls/campaigns.yaml:

url-path: /campaigns/

# Enable caching
cache:
enabled: true
table: campaigns_cache
schema: analytics
schedule: 60m # Refresh every hour
template-file: campaigns_cache.sql

# API parameters
request:
- field-name: country
field-in: query
description: Filter by country code (e.g., US, DE)
required: false
validators:
- type: string
regex: '^[A-Z]{2}$'

- field-name: campaign_type
field-in: query
description: Campaign type (email, social, search)
required: false
validators:
- type: enum
allowedValues: [email, social, search, display]

template-source: campaigns.sql
connection:
- bigquery-marketing

3. Cache Template

sqls/campaigns_cache.sql:

-- This runs every hour to refresh the cache.
-- flAPI executes the SELECT and writes the result into the configured
-- cache table ({{cache.schema}}.{{cache.table}}) inside DuckLake.

SELECT
campaign_id,
campaign_type,
country,
campaign_name,
SUM(clicks) as total_clicks,
SUM(conversions) as total_conversions,
SUM(revenue) as total_revenue,
ROUND(SUM(revenue) / NULLIF(SUM(clicks), 0), 2) as revenue_per_click,
MAX(last_updated) as last_updated
FROM bigquery_scan('{{conn.project_id}}.marketing.campaigns')
WHERE date >= CURRENT_DATE - INTERVAL '30 days'
AND active = true
GROUP BY 1, 2, 3, 4

4. API Template

sqls/campaigns.sql:

-- This serves API requests from the cache (fast!)
-- No BigQuery cost per request. The cache table is fully qualified through
-- the DuckLake catalog/schema/table that flAPI injects into the template.

SELECT
campaign_id,
campaign_type,
country,
campaign_name,
total_clicks,
total_conversions,
total_revenue,
revenue_per_click,
last_updated
FROM {{cache.catalog}}.{{cache.schema}}.{{cache.table}}
WHERE 1=1
{{#params.country}}
AND country = '{{{params.country}}}'
{{/params.country}}
{{#params.campaign_type}}
AND campaign_type = '{{{params.campaign_type}}}'
{{/params.campaign_type}}
ORDER BY total_revenue DESC
LIMIT 100

Running the Example

Start flAPI

# Set credentials
export GOOGLE_APPLICATION_CREDENTIALS=/path/to/service-account.json

# Start server
$ ./flapi -c flapi.yaml

✓ Loaded 1 endpoints
✓ Cache scheduled: campaigns (60m)
✓ Server listening on :8080

Test the API

# All campaigns
$ curl http://localhost:8080/campaigns/

# US campaigns only
$ curl "http://localhost:8080/campaigns/?country=US"

# Social campaigns in Germany
$ curl "http://localhost:8080/campaigns/?country=DE&campaign_type=social"

Response:

{
"data": [
{
"campaign_id": "c_12345",
"campaign_type": "social",
"country": "US",
"campaign_name": "Summer Sale 2024",
"total_clicks": 150000,
"total_conversions": 1200,
"total_revenue": 45000.00,
"revenue_per_click": 0.30,
"last_updated": "2024-01-15 10:30:00"
}
]
}

Cost Analysis

Without flAPI

10,000 API calls/day × $0.05 = $500/day

Monthly cost: $15,000
Response time: 2-5 seconds

With flAPI

Cache refresh: 24 times/day × $0.05 = $1.20/day
API serving: FREE (from cache)

Monthly cost: $36
Response time: 1-50ms

Savings: $14,964/month (99.76%)
Speed: 1000-10,000x faster

Monitoring Cache

Check Cache Status

$ flapii cache status

campaigns_cache:
Rows: 12,450
Size: 2.3 MB
Last refresh: 5 minutes ago
API calls since refresh: 1,234

Manual Refresh

$ flapii cache refresh campaigns_cache

Refreshing cache: campaigns_cache
Source: BigQuery
✓ Query executed (2.3s)
✓ 12,450 rows loaded
✓ Cache updated

Three cache refresh modes

The example above uses a full refresh of an aggregated dataset every hour. flAPI actually supports three refresh patterns, and the right choice depends on table size, how the data mutates, and how fresh the API needs to feel. All three reuse the same bigquery_scan() source.

Mode 1: Full refresh

The simplest mode: every scheduled tick rebuilds the cache table from scratch. Use for small tables, heavy aggregations, or data without a reliable updated-at column.

sqls/campaigns-full.yaml
url-path: /campaigns/full/
template-source: campaigns.sql
connection: [bigquery-marketing]

cache:
enabled: true
table: campaigns_full_cache
schema: analytics
schedule: 60m
template-file: campaigns_full_cache.sql
sqls/campaigns_full_cache.sql
-- No cursor, no primary key — flAPI rewrites the table on every run.
SELECT
campaign_id,
campaign_type,
country,
campaign_name,
clicks,
conversions,
revenue,
last_updated
FROM bigquery_scan('{{conn.project_id}}.marketing.campaigns')
WHERE active = true

Mode 2: Incremental append

Add a cursor: block (but no primary-key) and flAPI exposes {{cache.previousSnapshotTimestamp}} to the template. Only rows newer than the last snapshot are scanned, and they're appended to the existing table. Perfect for immutable event streams.

sqls/campaigns-append.yaml
url-path: /campaigns/events/
template-source: campaigns.sql
connection: [bigquery-marketing]

cache:
enabled: true
table: campaigns_events_cache
schema: analytics
schedule: 5m
template-file: campaigns_events_cache.sql

cursor:
column: last_updated
type: timestamp
sqls/campaigns_events_cache.sql
-- Only fetch rows newer than the previous snapshot.
SELECT
campaign_id,
campaign_type,
country,
campaign_name,
clicks,
conversions,
revenue,
last_updated
FROM bigquery_scan('{{conn.project_id}}.marketing.campaigns')
WHERE last_updated > COALESCE(
TRY_CAST('{{{cache.previousSnapshotTimestamp}}}' AS TIMESTAMP),
TIMESTAMP '1970-01-01 00:00:00'
)

Mode 3: Incremental merge

Add both cursor: and primary-key: and flAPI upserts changed rows into the cache. This is what you want for mutable fact tables — campaigns whose budgets get adjusted, products whose stock changes, customers whose emails update. The example below also opts into the full retention controls: keep the 10 most recent snapshots for up to 30 days and allow rollbacks within a 2-day window.

sqls/campaigns-merge.yaml
url-path: /campaigns/mutable/
template-source: campaigns.sql
connection: [bigquery-marketing]

cache:
enabled: true
table: campaigns_merge_cache
schema: analytics
schedule: 5m
template-file: campaigns_merge_cache.sql

primary-key: [campaign_id]
cursor:
column: last_updated
type: timestamp

retention:
keep-last-snapshots: 10
max-snapshot-age: 30d
rollback-window: 2d
sqls/campaigns_merge_cache.sql
-- Pull updated rows; flAPI merges them onto campaign_id.
SELECT
campaign_id,
campaign_type,
country,
campaign_name,
clicks,
conversions,
revenue,
last_updated
FROM bigquery_scan('{{conn.project_id}}.marketing.campaigns')
WHERE last_updated > COALESCE(
TRY_CAST('{{{cache.previousSnapshotTimestamp}}}' AS TIMESTAMP),
TIMESTAMP '1970-01-01 00:00:00'
)

Picking a mode

ModeSetRebuildsGood forAvoid when
Full refreshnothingentire table each tickaggregations, small tables, no updated-at columntables larger than your refresh budget
Incremental appendcursoronly new rowsevent streams, immutable log tablesrows can change after insert
Incremental mergecursor + primary-keychanged rows, upsertedmutable fact tables, slowly-changing dimensionsno reliable row identity

Next Steps

🍪 Cookie Settings