Caching Setup
Enable caching to dramatically reduce warehouse costs and improve API response times from seconds to milliseconds.
Quick Benefits
Without Cache | With Cache |
---|---|
2-10s response | 1-50ms response |
$500/day (10k requests) | $1.20/day (24 refreshes) |
Direct warehouse load | Scheduled refresh only |
Basic Configuration
1. Enable Cache
# sqls/customers.yaml
url-path: /customers/
cache:
enabled: true
table: customers_cache
schedule: 60m # Refresh every hour
template_file: customers_cache.sql
request:
- field-name: segment
field-in: query
template-source: customers.sql
connection:
- bigquery-warehouse
2. Cache Template
Create SQL to populate cache:
-- sqls/customers_cache.sql
SELECT
c_custkey as id,
c_name as name,
c_mktsegment as segment,
c_acctbal as balance,
CURRENT_TIMESTAMP as cached_at
FROM bigquery_scan('project.dataset.customers')
WHERE c_acctbal > 0
3. API Template
Query the cache:
-- sqls/customers.sql
SELECT * FROM customers_cache
WHERE 1=1
{{#params.segment}}
AND segment = '{{{params.segment}}}'
{{/params.segment}}
ORDER BY balance DESC
LIMIT 100
Cache Strategies
Full Refresh (Default)
Rebuild entire cache on schedule:
cache:
enabled: true
schedule: 60m
strategy: full # Default
Use for: Most use cases, small-medium datasets (< 10M rows)
Incremental Append
Add only new data:
cache:
enabled: true
schedule: 15m
strategy: incremental
updated_at_column: created_at
-- Cache template: Append last 15 minutes
SELECT * FROM source_table
WHERE created_at > (
SELECT MAX(created_at) FROM {{cache.table}}
)
Use for: Append-only data (logs, events, orders)
Merge/Upsert
Update existing rows and insert new ones:
cache:
enabled: true
schedule: 30m
strategy: merge
merge_key: order_id
updated_at_column: last_modified
-- Cache template: Merge changed records
SELECT * FROM source_table
WHERE last_modified > (
SELECT MAX(last_modified) FROM {{cache.table}}
)
Use for: Data with updates (customer records, inventory)
Schedule Options
Schedule | Use Case |
---|---|
5m | Near real-time data |
15m | Frequently changing data |
1h | Standard refresh |
6h | Slow-changing data |
24h | Daily reports |
Configuration Reference
cache:
# ─────────────────────────────────────────────────────────────
# Required
# ─────────────────────────────────────────────────────────────
enabled: true
table: cache_table_name
schedule: 60m
template_file: cache.sql
# ─────────────────────────────────────────────────────────────
# Optional
# ─────────────────────────────────────────────────────────────
schema: analytics # Database schema
strategy: full # full | incremental | merge
merge_key: id # Primary key (for merge)
updated_at_column: updated_at # Timestamp column (for incremental/merge)
on_startup: true # Load cache on flAPI start
persist: true # Save cache to disk
Cache Management
Manual Refresh
# Trigger immediate refresh
flapii cache refresh /customers/
# Clear cache
flapii cache clear /customers/
# View cache status
flapii cache status
Monitor Cache
# Cache hit rate
flapii cache stats /customers/
# Output:
# Cache: customers_cache
# Size: 12,450 rows
# Last refresh: 2024-01-15 10:00:00
# Next refresh: 2024-01-15 11:00:00
# Hit rate: 99.8%
Best Practices
✅ DO
- Start with 1-6 hour refresh intervals
- Use incremental strategies for large datasets
- Pre-aggregate data in cache template
- Filter unnecessary data in cache template
- Monitor cache hit rates
❌ DON'T
- Refresh more than needed
- Cache entire large tables without filtering
- Use caching for real-time requirements (< 5 min)
- Forget to add indexes on filter columns
Real-World Example
Marketing dashboard API with BigQuery backend:
# sqls/campaign-metrics.yaml
url-path: /marketing/campaigns/
cache:
enabled: true
table: campaign_metrics_cache
schedule: 12h # Twice daily
template_file: campaigns_cache.sql
request:
- field-name: campaign_id
field-in: query
template-source: campaigns.sql
connection:
- bigquery-warehouse
-- sqls/campaigns_cache.sql
-- Pre-aggregate expensive metrics
SELECT
campaign_id,
campaign_name,
DATE(timestamp) as date,
SUM(impressions) as total_impressions,
SUM(clicks) as total_clicks,
SUM(conversions) as total_conversions,
SUM(spend) as total_spend,
ROUND(SUM(clicks) * 100.0 / NULLIF(SUM(impressions), 0), 2) as ctr,
ROUND(SUM(spend) / NULLIF(SUM(conversions), 0), 2) as cost_per_conversion
FROM bigquery_scan('project.analytics.campaign_events')
WHERE DATE(timestamp) >= DATE_SUB(CURRENT_DATE(), INTERVAL 90 DAY)
GROUP BY campaign_id, campaign_name, DATE(timestamp)
-- sqls/campaigns.sql
-- Fast API queries
SELECT * FROM campaign_metrics_cache
WHERE 1=1
{{#params.campaign_id}}
AND campaign_id = '{{{params.campaign_id}}}'
{{/params.campaign_id}}
ORDER BY date DESC, total_spend DESC
Results:
- Direct BigQuery: ~$0.05/query × 10,000 = $500/day
- With cache: $0.05 × 2 refreshes = $0.10/day
- Savings: 99.98% ($499.90/day)
Troubleshooting
Cache Not Refreshing
# Check schedule
flapii cache status /endpoint/
# Manual refresh
flapii cache refresh /endpoint/
# Check logs
tail -f /var/log/flapi/cache.log
High Memory Usage
# Limit cache size in template
SELECT * FROM large_table
WHERE date >= CURRENT_DATE - INTERVAL 30 DAY -- Last 30 days only
LIMIT 1000000 -- Cap at 1M rows
Slow Cache Refresh
- Add indexes on source tables
- Reduce data volume (filter in cache template)
- Use incremental strategy
- Increase DuckDB memory:
duckdb.max_memory: 16GB
Next Steps
- Caching Strategy: Understand cost optimization
- BigQuery Example: Complete implementation
- SAP ERP Example: Enterprise caching