Skip to main content

DuckLake Caching Setup

flAPI's caching layer is built on DuckLake, a snapshot-based table format over DuckDB. Each refresh produces a new snapshot, your endpoints serve from the latest snapshot in milliseconds, and a retention policy expires old snapshots automatically.

This guide walks through enabling DuckLake at the project level and configuring per-endpoint caches.

Step 1: Enable DuckLake Globally

Add a ducklake: block to flapi.yaml. Without it, endpoint caches do nothing.

# flapi.yaml
ducklake:
enabled: true
alias: cache # Catalog alias used in templates
metadata-path: ./data/cache.ducklake # DuckLake metadata directory
data-path: ./data/cache # DuckLake data files directory
retention:
keep-last-snapshots: 10
max-snapshot-age: 30d
compaction:
enabled: true
schedule: '@daily'
scheduler:
enabled: true
scan-interval: 5m # Background scheduler scan interval

The alias value (cache) becomes {{cache.catalog}} in every cache template.

Step 2: Enable Caching on an Endpoint

# sqls/customers/customers-rest.yaml
url-path: /customers/

cache:
enabled: true
table: customers_cache
schema: analytics # optional, default "main"
schedule: 5m # 30s | 5m | 1h | 2d
primary-key: [id] # required for merge mode
cursor:
column: registration_date # required for append/merge
type: date # int | date | timestamp
rollback-window: 2d
retention:
keep-last-snapshots: 5
max-snapshot-age: 14d
delete-handling: soft # soft | hard
template-file: customers_cache.sql # optional custom refresh SQL

request:
- field-name: segment
field-in: query

template-source: customers.sql
connection:
- bigquery-warehouse

All keys above are verified against cache_manager.cpp, CONFIG_REFERENCE.md §6 and the customer-common.yaml example.

Step 3: Refresh Modes

flAPI does not have a strategy: key. The refresh mode is derived from the combination of cursor and primary-key (CacheManager::determineCacheMode):

cursorprimary-keyModeBehaviour
absentanyfullRebuild the cache table on every refresh
presentabsentappendAppend rows past the cursor watermark
presentpresentmergeReconcile inserts, updates and deletes

Full Refresh

cache:
enabled: true
table: countries_cache
schedule: 1h

Incremental Append

cache:
enabled: true
table: events_cache
schedule: 15m
cursor:
column: created_at
type: timestamp

Incremental Merge

cache:
enabled: true
table: customers_cache
schedule: 1m
primary-key: [id]
cursor:
column: updated_at
type: timestamp
delete-handling: soft

Step 4: Write the Cache Template (optional)

If you provide template-file, flAPI runs that SQL on each refresh. Otherwise it derives a default. Cache templates have access to a cache.* context (verified against sql_template_processor.cpp):

VariableDescription
{{cache.catalog}}DuckLake catalog alias
{{cache.schema}}Cache schema
{{cache.table}}Cache table name
{{cache.schedule}}Configured refresh schedule
{{cache.mode}}full, append, or merge
{{cache.snapshotId}}Current snapshot ID
{{cache.snapshotTimestamp}}Current snapshot timestamp
{{cache.previousSnapshotId}}Previous snapshot ID
{{cache.previousSnapshotTimestamp}}Previous snapshot timestamp
{{cache.cursorColumn}}cursor.column value
{{cache.cursorType}}cursor.type value
{{cache.primaryKeys}}Comma-separated primary key columns

Real example from examples/sqls/customers/customers_cache.sql:

-- DuckLake cache template
CREATE OR REPLACE TABLE {{cache.catalog}}.{{cache.schema}}.{{cache.table}} AS
SELECT
id,
name,
email,
segment,
registration_date,
CURRENT_TIMESTAMP AS cache_updated_at,
'{{cache.snapshotId}}' AS cache_snapshot_id
FROM read_parquet('{{conn.path}}')
ORDER BY registration_date DESC;

Incremental templates can branch on the previous snapshot:

INSERT INTO {{cache.catalog}}.{{cache.schema}}.{{cache.table}}
SELECT * FROM source_table
{{#cache.previousSnapshotTimestamp}}
WHERE {{cache.cursorColumn}} > TIMESTAMP '{{cache.previousSnapshotTimestamp}}'
{{/cache.previousSnapshotTimestamp}}

Step 5: Write the API Template

The serving endpoint reads from the cache table just like any DuckDB table:

-- sqls/customers/customers.sql
SELECT *
FROM cache.analytics.customers_cache
WHERE 1=1
{{#params.segment}}
AND segment = '{{{params.segment}}}'
{{/params.segment}}
ORDER BY registration_date DESC
LIMIT 100;

Schedule Format

cache.schedule accepts a value-unit string parsed by TimeInterval::parseInterval:

ScheduleUse case
30sAggressive freshness
5mNear real-time dashboards
1hStandard BI refresh
6hSlow-changing data
24hDaily reports

Retention and Snapshot Expiry

Two knobs control retention:

cache:
retention:
keep-last-snapshots: 10 # keep N newest snapshots
max-snapshot-age: 30d # also delete anything older

When either is set, flAPI calls ducklake_expire_snapshots against the catalog after a successful refresh. rollback-window reserves snapshots for time-travel queries during the window.

Audit Trail

CacheManager::initializeAuditTables creates <catalog>.audit.sync_events at startup. Every refresh records:

  • endpoint_path, cache_table, cache_schema
  • sync_type (full, append, merge, garbage_collection)
  • status (success, error, warning) and message
  • snapshot_id, rows_affected, sync_started_at, sync_completed_at, duration_ms

Query the audit table directly with DuckDB to debug refresh issues.

Real-World Example

Customer endpoint with DuckLake merge caching (from examples/sqls/customers/customers-rest.yaml):

url-path: /customers/

cache:
enabled: true
table: customers_rest_cache
schema: analytics
schedule: 5m
primary-key: [id]
cursor:
column: registration_date
type: date
retention:
keep-last-snapshots: 3
max-snapshot-age: 7d

with-pagination: true

Best Practices

  • Start with a full refresh, switch to append or merge once you have a stable cursor column.
  • Always set retention — without it, DuckLake snapshot data grows unbounded.
  • Use delete-handling: soft if downstream consumers need a history of deletions.
  • Keep scheduler.scan-interval small enough to honour the tightest cache.schedule value across endpoints.
  • Pre-aggregate in the cache template; don't mirror raw source tables.

Troubleshooting

Cache table doesn't exist

  • Confirm ducklake.enabled: true and a valid metadata-path / data-path.
  • Check the startup log for Initialized DuckLake audit tables — if absent, DuckLake never came up.
  • Inspect <catalog>.audit.sync_events for status='error' rows.

Refresh never fires

  • Confirm ducklake.scheduler.enabled: true.
  • scheduler.scan-interval must be less than the smallest cache.schedule you want to honour.

Snapshots growing without bound

  • Add retention.keep-last-snapshots and/or retention.max-snapshot-age.
  • Make sure compaction.enabled: true and a compaction.schedule is set.

Next Steps

🍪 Cookie Settings