Skip to main content

Caching Setup

Enable caching to dramatically reduce warehouse costs and improve API response times from seconds to milliseconds.

Quick Benefits

Without CacheWith Cache
2-10s response1-50ms response
$500/day (10k requests)$1.20/day (24 refreshes)
Direct warehouse loadScheduled refresh only

Basic Configuration

1. Enable Cache

# sqls/customers.yaml
url-path: /customers/

cache:
enabled: true
table: customers_cache
schedule: 60m # Refresh every hour
template_file: customers_cache.sql

request:
- field-name: segment
field-in: query

template-source: customers.sql
connection:
- bigquery-warehouse

2. Cache Template

Create SQL to populate cache:

-- sqls/customers_cache.sql
SELECT
c_custkey as id,
c_name as name,
c_mktsegment as segment,
c_acctbal as balance,
CURRENT_TIMESTAMP as cached_at
FROM bigquery_scan('project.dataset.customers')
WHERE c_acctbal > 0

3. API Template

Query the cache:

-- sqls/customers.sql
SELECT * FROM customers_cache
WHERE 1=1
{{#params.segment}}
AND segment = '{{{params.segment}}}'
{{/params.segment}}
ORDER BY balance DESC
LIMIT 100

Cache Strategies

Full Refresh (Default)

Rebuild entire cache on schedule:

cache:
enabled: true
schedule: 60m
strategy: full # Default

Use for: Most use cases, small-medium datasets (< 10M rows)

Incremental Append

Add only new data:

cache:
enabled: true
schedule: 15m
strategy: incremental
updated_at_column: created_at
-- Cache template: Append last 15 minutes
SELECT * FROM source_table
WHERE created_at > (
SELECT MAX(created_at) FROM {{cache.table}}
)

Use for: Append-only data (logs, events, orders)

Merge/Upsert

Update existing rows and insert new ones:

cache:
enabled: true
schedule: 30m
strategy: merge
merge_key: order_id
updated_at_column: last_modified
-- Cache template: Merge changed records
SELECT * FROM source_table
WHERE last_modified > (
SELECT MAX(last_modified) FROM {{cache.table}}
)

Use for: Data with updates (customer records, inventory)

Schedule Options

ScheduleUse Case
5mNear real-time data
15mFrequently changing data
1hStandard refresh
6hSlow-changing data
24hDaily reports

Configuration Reference

cache:
# ─────────────────────────────────────────────────────────────
# Required
# ─────────────────────────────────────────────────────────────
enabled: true
table: cache_table_name
schedule: 60m
template_file: cache.sql

# ─────────────────────────────────────────────────────────────
# Optional
# ─────────────────────────────────────────────────────────────
schema: analytics # Database schema
strategy: full # full | incremental | merge
merge_key: id # Primary key (for merge)
updated_at_column: updated_at # Timestamp column (for incremental/merge)
on_startup: true # Load cache on flAPI start
persist: true # Save cache to disk

Cache Management

Manual Refresh

# Trigger immediate refresh
flapii cache refresh /customers/

# Clear cache
flapii cache clear /customers/

# View cache status
flapii cache status

Monitor Cache

# Cache hit rate
flapii cache stats /customers/

# Output:
# Cache: customers_cache
# Size: 12,450 rows
# Last refresh: 2024-01-15 10:00:00
# Next refresh: 2024-01-15 11:00:00
# Hit rate: 99.8%

Best Practices

✅ DO

  • Start with 1-6 hour refresh intervals
  • Use incremental strategies for large datasets
  • Pre-aggregate data in cache template
  • Filter unnecessary data in cache template
  • Monitor cache hit rates

❌ DON'T

  • Refresh more than needed
  • Cache entire large tables without filtering
  • Use caching for real-time requirements (< 5 min)
  • Forget to add indexes on filter columns

Real-World Example

Marketing dashboard API with BigQuery backend:

# sqls/campaign-metrics.yaml
url-path: /marketing/campaigns/

cache:
enabled: true
table: campaign_metrics_cache
schedule: 12h # Twice daily
template_file: campaigns_cache.sql

request:
- field-name: campaign_id
field-in: query

template-source: campaigns.sql
connection:
- bigquery-warehouse
-- sqls/campaigns_cache.sql
-- Pre-aggregate expensive metrics
SELECT
campaign_id,
campaign_name,
DATE(timestamp) as date,
SUM(impressions) as total_impressions,
SUM(clicks) as total_clicks,
SUM(conversions) as total_conversions,
SUM(spend) as total_spend,
ROUND(SUM(clicks) * 100.0 / NULLIF(SUM(impressions), 0), 2) as ctr,
ROUND(SUM(spend) / NULLIF(SUM(conversions), 0), 2) as cost_per_conversion
FROM bigquery_scan('project.analytics.campaign_events')
WHERE DATE(timestamp) >= DATE_SUB(CURRENT_DATE(), INTERVAL 90 DAY)
GROUP BY campaign_id, campaign_name, DATE(timestamp)
-- sqls/campaigns.sql
-- Fast API queries
SELECT * FROM campaign_metrics_cache
WHERE 1=1
{{#params.campaign_id}}
AND campaign_id = '{{{params.campaign_id}}}'
{{/params.campaign_id}}
ORDER BY date DESC, total_spend DESC

Results:

  • Direct BigQuery: ~$0.05/query × 10,000 = $500/day
  • With cache: $0.05 × 2 refreshes = $0.10/day
  • Savings: 99.98% ($499.90/day)

Troubleshooting

Cache Not Refreshing

# Check schedule
flapii cache status /endpoint/

# Manual refresh
flapii cache refresh /endpoint/

# Check logs
tail -f /var/log/flapi/cache.log

High Memory Usage

# Limit cache size in template
SELECT * FROM large_table
WHERE date >= CURRENT_DATE - INTERVAL 30 DAY -- Last 30 days only
LIMIT 1000000 -- Cap at 1M rows

Slow Cache Refresh

  • Add indexes on source tables
  • Reduce data volume (filter in cache template)
  • Use incremental strategy
  • Increase DuckDB memory: duckdb.max_memory: 16GB

Next Steps

🍪 Cookie Settings