DuckLake Caching Setup
flAPI's caching layer is built on DuckLake, a snapshot-based table format over DuckDB. Each refresh produces a new snapshot, your endpoints serve from the latest snapshot in milliseconds, and a retention policy expires old snapshots automatically.
This guide walks through enabling DuckLake at the project level and configuring per-endpoint caches.
Step 1: Enable DuckLake Globally
Add a ducklake: block to flapi.yaml. Without it, endpoint caches do nothing.
# flapi.yaml
ducklake:
enabled: true
alias: cache # Catalog alias used in templates
metadata-path: ./data/cache.ducklake # DuckLake metadata directory
data-path: ./data/cache # DuckLake data files directory
retention:
keep-last-snapshots: 10
max-snapshot-age: 30d
compaction:
enabled: true
schedule: '@daily'
scheduler:
enabled: true
scan-interval: 5m # Background scheduler scan interval
The alias value (cache) becomes {{cache.catalog}} in every cache template.
Step 2: Enable Caching on an Endpoint
# sqls/customers/customers-rest.yaml
url-path: /customers/
cache:
enabled: true
table: customers_cache
schema: analytics # optional, default "main"
schedule: 5m # 30s | 5m | 1h | 2d
primary-key: [id] # required for merge mode
cursor:
column: registration_date # required for append/merge
type: date # int | date | timestamp
rollback-window: 2d
retention:
keep-last-snapshots: 5
max-snapshot-age: 14d
delete-handling: soft # soft | hard
template-file: customers_cache.sql # optional custom refresh SQL
request:
- field-name: segment
field-in: query
template-source: customers.sql
connection:
- bigquery-warehouse
All keys above are verified against cache_manager.cpp, CONFIG_REFERENCE.md §6 and the customer-common.yaml example.
Step 3: Refresh Modes
flAPI does not have a strategy: key. The refresh mode is derived from the combination of cursor and primary-key (CacheManager::determineCacheMode):
cursor | primary-key | Mode | Behaviour |
|---|---|---|---|
| absent | any | full | Rebuild the cache table on every refresh |
| present | absent | append | Append rows past the cursor watermark |
| present | present | merge | Reconcile inserts, updates and deletes |
Full Refresh
cache:
enabled: true
table: countries_cache
schedule: 1h
Incremental Append
cache:
enabled: true
table: events_cache
schedule: 15m
cursor:
column: created_at
type: timestamp
Incremental Merge
cache:
enabled: true
table: customers_cache
schedule: 1m
primary-key: [id]
cursor:
column: updated_at
type: timestamp
delete-handling: soft
Step 4: Write the Cache Template (optional)
If you provide template-file, flAPI runs that SQL on each refresh. Otherwise it derives a default. Cache templates have access to a cache.* context (verified against sql_template_processor.cpp):
| Variable | Description |
|---|---|
{{cache.catalog}} | DuckLake catalog alias |
{{cache.schema}} | Cache schema |
{{cache.table}} | Cache table name |
{{cache.schedule}} | Configured refresh schedule |
{{cache.mode}} | full, append, or merge |
{{cache.snapshotId}} | Current snapshot ID |
{{cache.snapshotTimestamp}} | Current snapshot timestamp |
{{cache.previousSnapshotId}} | Previous snapshot ID |
{{cache.previousSnapshotTimestamp}} | Previous snapshot timestamp |
{{cache.cursorColumn}} | cursor.column value |
{{cache.cursorType}} | cursor.type value |
{{cache.primaryKeys}} | Comma-separated primary key columns |
Real example from examples/sqls/customers/customers_cache.sql:
-- DuckLake cache template
CREATE OR REPLACE TABLE {{cache.catalog}}.{{cache.schema}}.{{cache.table}} AS
SELECT
id,
name,
email,
segment,
registration_date,
CURRENT_TIMESTAMP AS cache_updated_at,
'{{cache.snapshotId}}' AS cache_snapshot_id
FROM read_parquet('{{conn.path}}')
ORDER BY registration_date DESC;
Incremental templates can branch on the previous snapshot:
INSERT INTO {{cache.catalog}}.{{cache.schema}}.{{cache.table}}
SELECT * FROM source_table
{{#cache.previousSnapshotTimestamp}}
WHERE {{cache.cursorColumn}} > TIMESTAMP '{{cache.previousSnapshotTimestamp}}'
{{/cache.previousSnapshotTimestamp}}
Step 5: Write the API Template
The serving endpoint reads from the cache table just like any DuckDB table:
-- sqls/customers/customers.sql
SELECT *
FROM cache.analytics.customers_cache
WHERE 1=1
{{#params.segment}}
AND segment = '{{{params.segment}}}'
{{/params.segment}}
ORDER BY registration_date DESC
LIMIT 100;
Schedule Format
cache.schedule accepts a value-unit string parsed by TimeInterval::parseInterval:
| Schedule | Use case |
|---|---|
30s | Aggressive freshness |
5m | Near real-time dashboards |
1h | Standard BI refresh |
6h | Slow-changing data |
24h | Daily reports |
Retention and Snapshot Expiry
Two knobs control retention:
cache:
retention:
keep-last-snapshots: 10 # keep N newest snapshots
max-snapshot-age: 30d # also delete anything older
When either is set, flAPI calls ducklake_expire_snapshots against the catalog after a successful refresh. rollback-window reserves snapshots for time-travel queries during the window.
Audit Trail
CacheManager::initializeAuditTables creates <catalog>.audit.sync_events at startup. Every refresh records:
endpoint_path,cache_table,cache_schemasync_type(full,append,merge,garbage_collection)status(success,error,warning) andmessagesnapshot_id,rows_affected,sync_started_at,sync_completed_at,duration_ms
Query the audit table directly with DuckDB to debug refresh issues.
Real-World Example
Customer endpoint with DuckLake merge caching (from examples/sqls/customers/customers-rest.yaml):
url-path: /customers/
cache:
enabled: true
table: customers_rest_cache
schema: analytics
schedule: 5m
primary-key: [id]
cursor:
column: registration_date
type: date
retention:
keep-last-snapshots: 3
max-snapshot-age: 7d
with-pagination: true
Best Practices
- Start with a
fullrefresh, switch toappendormergeonce you have a stable cursor column. - Always set
retention— without it, DuckLake snapshot data grows unbounded. - Use
delete-handling: softif downstream consumers need a history of deletions. - Keep
scheduler.scan-intervalsmall enough to honour the tightestcache.schedulevalue across endpoints. - Pre-aggregate in the cache template; don't mirror raw source tables.
Troubleshooting
Cache table doesn't exist
- Confirm
ducklake.enabled: trueand a validmetadata-path/data-path. - Check the startup log for
Initialized DuckLake audit tables— if absent, DuckLake never came up. - Inspect
<catalog>.audit.sync_eventsforstatus='error'rows.
Refresh never fires
- Confirm
ducklake.scheduler.enabled: true. scheduler.scan-intervalmust be less than the smallestcache.scheduleyou want to honour.
Snapshots growing without bound
- Add
retention.keep-last-snapshotsand/orretention.max-snapshot-age. - Make sure
compaction.enabled: trueand acompaction.scheduleis set.
Next Steps
- Caching Strategy: DuckLake concepts and cost model
- BigQuery Example: Complete implementation
- SAP ERP Example: Enterprise caching