Bidirectional Data Sync Patterns
Traditional data serving layers are read-only—they pull data from slow backends and serve it from a fast cache, but changes flow only one direction. flAPI's write support enables bidirectional sync patterns where data flows both ways: reads from the cache, writes that update both the cache and the backend.
This guide covers patterns for keeping your flAPI cache and backend data warehouse in sync.
The Problem with One-Way Sync
Traditional data architectures face challenges:
Warehouse → Cache → Consumers (one-way flow)
↓
stale data
lost updates
dual-source-of-truth
When consumers can only read from the cache:
- Updates to the backend don't immediately reflect in the cache
- Updates to the cache have no way back to the warehouse
- You need separate systems to sync data back
- Risk of conflicting data between cache and warehouse
The Solution: Bidirectional Sync
With flAPI's write support, data can flow both ways:
Warehouse ↔ Cache ↔ Consumers
(read & write in both directions)
Benefits:
- Single source of truth - All reads and writes go through flAPI
- Immediate consistency - Updates reflected instantly
- Simplified architecture - No separate sync pipelines
- Transaction safety - ACID guarantees on all operations
- Audit trail - All changes logged in warehouse
Pattern 1: Write-Through
When: Need immediate consistency between cache and warehouse
How: Every write goes directly to the warehouse, then updates the cache
Consumer request to write
↓
Validate in flAPI
↓
Write to Warehouse
↓
Update Cache (if separate)
↓
Return result to consumer
Implementation
Endpoint configuration:
url-path: /api/customers/:customer_id
method: PUT
operation:
type: Write
validate-before-write: true
returns-data: true
transaction: true
request:
- field-name: customer_id
field-in: path
required: true
validators:
- type: int
min: 1
- field-name: email
field-in: body
required: false
validators:
- type: string
regex: '^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$'
template-source: customer-update.sql
connection: [snowflake] # Write directly to warehouse
SQL:
-- Direct write to Snowflake
UPDATE customers
SET
email = COALESCE(:email, email),
updated_at = CURRENT_TIMESTAMP()
WHERE customer_id = :customer_id
RETURNING *
Advantages:
- ✅ Immediate consistency
- ✅ No sync delay
- ✅ Simple to understand
Disadvantages:
- ❌ Slower than writing to local cache (warehouse latency)
- ❌ Network failures block the write
- ❌ Can't write if warehouse is unavailable
Pattern 2: Write-Back (Eventual Consistency)
When: Need fast writes with eventual consistency
How: Write to local cache immediately, sync to warehouse asynchronously
Consumer writes to flAPI
↓
Write to DuckDB Cache (fast)
↓
Return to consumer immediately
↓
[Async] Sync to Warehouse
↓
[Async] Handle conflicts
Implementation
Endpoint writes to DuckDB:
url-path: /api/orders/
method: POST
operation:
type: Write
validate-before-write: true
returns-data: true
transaction: true
request:
- field-name: customer_id
required: true
validators:
- type: int
min: 1
- field-name: total_amount
required: true
validators:
- type: string
regex: '^\d+(\.\d{2})?$'
template-source: order-create.sql
connection: [local-duckdb] # Write to fast local cache
SQL:
INSERT INTO orders (
customer_id,
total_amount,
status,
created_at,
synced_to_warehouse
)
VALUES (
:customer_id,
:total_amount,
'pending',
CURRENT_TIMESTAMP(),
false
)
RETURNING *
Separate sync process:
Create a scheduled job to sync pending writes to the warehouse:
-- Sync pending orders to Snowflake
INSERT INTO snowflake.orders
SELECT * FROM orders WHERE synced_to_warehouse = false
-- Mark as synced
UPDATE orders SET synced_to_warehouse = true
WHERE synced_to_warehouse = false
-- Handle conflicts with MAX(updated_at)
Advantages:
- ✅ Ultra-fast writes (local cache only)
- ✅ Resilient to warehouse outages
- ✅ Great user experience
Disadvantages:
- ❌ Eventual consistency (temporary divergence)
- ❌ Requires handling sync failures
- ❌ Complex conflict resolution
Pattern 3: Read-Through Write-Back Hybrid
When: Need balance between speed and consistency
How: Reads from cache, writes batch to warehouse on schedule
Consumer reads
↓
Read from Cache (fast)
↓
Consumers write
↓
Write to Cache (fast)
↓
[Scheduled] Batch sync to Warehouse
↓
Update cache on refresh
Implementation
Read endpoints (from cache):
url-path: /api/products/
method: GET
request: []
template-source: products-read.sql
connection: [local-duckdb] # Read from fast cache
Write endpoints (to cache with sync flag):
url-path: /api/products/:product_id
method: PUT
operation:
type: Write
validate-before-write: true
returns-data: true
transaction: true
request:
- field-name: product_id
field-in: path
required: true
- field-name: unit_price
field-in: body
required: false
template-source: product-update.sql
connection: [local-duckdb]
SQL with sync tracking:
UPDATE products
SET
unit_price = COALESCE(:unit_price, unit_price),
needs_sync = true, -- Flag for batch sync
updated_at = NOW()
WHERE product_id = :product_id
RETURNING *
Scheduled batch sync job (every 5 minutes):
BEGIN TRANSACTION;
-- Copy changes to warehouse
INSERT INTO warehouse.products
SELECT * FROM products WHERE needs_sync = true
ON CONFLICT (product_id) DO UPDATE SET
unit_price = EXCLUDED.unit_price,
updated_at = EXCLUDED.updated_at;
-- Mark as synced
UPDATE products SET needs_sync = false
WHERE needs_sync = true;
COMMIT;
Advantages:
- ✅ Fast reads from cache
- ✅ Fast writes to cache
- ✅ Batch efficiency
- ✅ Reasonable consistency window
Disadvantages:
- ❌ Sync delay (usually 5-15 minutes)
- ❌ Requires batch reconciliation logic
- ❌ Storage overhead for sync tracking
Pattern 4: Smart Cache Invalidation
When: Need to keep cache fresh after writes
How: Invalidate relevant cache entries, lazy-reload on next read
Write to Cache
↓
Invalidate Cache Entry
↓
Next Read
↓
Cache Miss
↓
Reload from Warehouse
↓
Serve to Consumer
Implementation
Write endpoint that invalidates and optionally refreshes the cache:
url-path: /api/customers/:customer_id
method: PUT
operation:
type: Write
validate-before-write: true
returns-data: true
transaction: true
# After successful write, invalidate / refresh the endpoint's DuckLake cache
cache:
invalidate-on-write: true # mark cache as stale
refresh-on-write: false # set true to force a refresh now
template-source: customer-update.sql
connection: [snowflake]
Cache-aware read endpoint (DuckLake):
url-path: /api/customers/:customer_id
method: GET
cache:
enabled: true
table: customers_cache
schedule: 1h
primary-key: [customer_id]
cursor:
column: updated_at
type: timestamp
template-source: customer-read.sql
connection: [snowflake]
Advantages:
- ✅ Automatic cache freshness
- ✅ Lazy reloading (only on demand)
- ✅ Reduced memory footprint
Disadvantages:
- ❌ First read after write is slow
- ❌ Requires cache layer (Redis, Memcached)
- ❌ More complex setup
Conflict Resolution Strategies
When syncing writes from cache back to warehouse, conflicts can occur.
Strategy 1: Last-Write-Wins
Simplest approach: later timestamp wins.
INSERT INTO warehouse.data
SELECT * FROM cache.data WHERE needs_sync = true
ON CONFLICT (id) DO UPDATE SET
-- Take value from cache if warehouse update is older
value = CASE
WHEN warehouse.updated_at < EXCLUDED.updated_at
THEN EXCLUDED.value
ELSE warehouse.value
END,
updated_at = GREATEST(
warehouse.updated_at,
EXCLUDED.updated_at
);
Strategy 2: Custom Merge Logic
Different rules for different columns.
-- For price: take max (don't lower prices accidentally)
-- For availability: take min (conservative estimate)
-- For notes: concatenate with timestamp
INSERT INTO warehouse.products
SELECT * FROM cache.products WHERE needs_sync = true
ON CONFLICT (product_id) DO UPDATE SET
unit_price = GREATEST(warehouse.unit_price, EXCLUDED.unit_price),
units_in_stock = LEAST(warehouse.units_in_stock, EXCLUDED.units_in_stock),
notes = warehouse.notes || ' | ' || EXCLUDED.notes,
updated_at = CURRENT_TIMESTAMP();
Strategy 3: Manual Conflict Resolution
Flag conflicts for human review.
-- Detect conflicts
WITH conflicting_updates AS (
SELECT
cache.id,
cache.value as cache_value,
warehouse.value as warehouse_value
FROM cache.data cache
INNER JOIN warehouse.data warehouse
ON cache.id = warehouse.id
AND cache.value != warehouse.value
AND cache.updated_at > warehouse.updated_at
)
INSERT INTO conflict_log (record_id, cache_value, warehouse_value, detected_at)
SELECT id, cache_value, warehouse_value, NOW()
FROM conflicting_updates;
Best Practices
1. Choose Your Consistency Model
Decide: Immediate (write-through) vs Eventual (write-back)?
- High-stakes data (payments, inventory) → write-through
- User-generated content (profiles, preferences) → write-back
- Reporting data → write-back with batch sync
2. Always Validate Before Writing
Prevent invalid data entering either cache or warehouse:
operation:
validate-before-write: true
3. Use Transactions
Ensure partial operations don't corrupt data:
operation:
transaction: true
4. Track Sync Status
Make it easy to identify what's synced:
CREATE TABLE products (
id INTEGER,
name STRING,
updated_at TIMESTAMP,
synced_at TIMESTAMP, -- NULL until synced
needs_sync BOOLEAN
);
5. Monitor Divergence
Alert when cache and warehouse drift too far:
SELECT COUNT(*) as diverged_records
FROM cache.products c
FULL OUTER JOIN warehouse.products w
ON c.id = w.id
WHERE c.updated_at != w.updated_at
OR c.needs_sync = true;
6. Set Reasonable Refresh Schedules and Retention
DuckLake doesn't use a TTL — it uses snapshot-based refresh plus retention:
cache:
enabled: true
schedule: 1h # Refresh hourly
retention:
keep-last-snapshots: 24
max-snapshot-age: 24h
Examples
Example 1: User Profile Sync
Fast writes to cache, sync to warehouse every 10 minutes:
# Endpoint: PUT /api/users/:user_id
operation:
type: Write
returns-data: true
transaction: true
# Writes to local DuckDB for speed
connection: [local-duckdb]
Sync to Snowflake is performed by an external scheduled job (cron, Airflow, etc.) that reads the local table and merges rows where needs_sync = true into the warehouse — flAPI itself does not provide a built-in cross-connection sync scheduler.
Example 2: Order Processing
Immediate consistency: write directly to warehouse:
# Endpoint: POST /api/orders/
operation:
type: Write
returns-data: true
transaction: true
# Write directly to Snowflake
connection: [snowflake]
# No sync delay needed
Example 3: Analytics Data
Batch updates with smart cache invalidation:
# Endpoint: PUT /api/metrics/:metric_id
operation:
type: Write
returns-data: true
transaction: true
# Write to cache
connection: [local-duckdb]
# Invalidate this endpoint's DuckLake cache after the write
cache:
invalidate-on-write: true
refresh-on-write: false
A separate scheduled job batches the warehouse sync; flAPI exposes the audit table <catalog>.audit.sync_events to monitor refreshes.
When to Use Each Pattern
| Pattern | Speed | Consistency | Complexity | Use Case |
|---|---|---|---|---|
| Write-Through | Slow | Immediate | Low | Critical data (payments) |
| Write-Back | Very Fast | Eventual | High | User data, preferences |
| Hybrid | Fast | Delayed | Medium | Analytics, reporting |
| Cache Invalidation | Fast | Hours | Medium | Public data, products |