Skip to main content

Bidirectional Data Sync Patterns

Traditional data serving layers are read-only—they pull data from slow backends and serve it from a fast cache, but changes flow only one direction. flAPI's write support enables bidirectional sync patterns where data flows both ways: reads from the cache, writes that update both the cache and the backend.

This guide covers patterns for keeping your flAPI cache and backend data warehouse in sync.

The Problem with One-Way Sync

Traditional data architectures face challenges:

Warehouse → Cache → Consumers (one-way flow)

stale data
lost updates
dual-source-of-truth

When consumers can only read from the cache:

  • Updates to the backend don't immediately reflect in the cache
  • Updates to the cache have no way back to the warehouse
  • You need separate systems to sync data back
  • Risk of conflicting data between cache and warehouse

The Solution: Bidirectional Sync

With flAPI's write support, data can flow both ways:

Warehouse ↔ Cache ↔ Consumers
(read & write in both directions)

Benefits:

  • Single source of truth - All reads and writes go through flAPI
  • Immediate consistency - Updates reflected instantly
  • Simplified architecture - No separate sync pipelines
  • Transaction safety - ACID guarantees on all operations
  • Audit trail - All changes logged in warehouse

Pattern 1: Write-Through

When: Need immediate consistency between cache and warehouse

How: Every write goes directly to the warehouse, then updates the cache

Consumer request to write

Validate in flAPI

Write to Warehouse

Update Cache (if separate)

Return result to consumer

Implementation

Endpoint configuration:

url-path: /api/customers/:customer_id
method: PUT

operation:
type: Write
validate-before-write: true
returns-data: true
transaction: true

request:
- field-name: customer_id
field-in: path
required: true
validators:
- type: int
min: 1

- field-name: email
field-in: body
required: false
validators:
- type: string
regex: '^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$'

template-source: customer-update.sql
connection: [snowflake] # Write directly to warehouse

SQL:

-- Direct write to Snowflake
UPDATE customers
SET
email = COALESCE(:email, email),
updated_at = CURRENT_TIMESTAMP()
WHERE customer_id = :customer_id
RETURNING *

Advantages:

  • ✅ Immediate consistency
  • ✅ No sync delay
  • ✅ Simple to understand

Disadvantages:

  • ❌ Slower than writing to local cache (warehouse latency)
  • ❌ Network failures block the write
  • ❌ Can't write if warehouse is unavailable

Pattern 2: Write-Back (Eventual Consistency)

When: Need fast writes with eventual consistency

How: Write to local cache immediately, sync to warehouse asynchronously

Consumer writes to flAPI

Write to DuckDB Cache (fast)

Return to consumer immediately

[Async] Sync to Warehouse

[Async] Handle conflicts

Implementation

Endpoint writes to DuckDB:

url-path: /api/orders/
method: POST

operation:
type: Write
validate-before-write: true
returns-data: true
transaction: true

request:
- field-name: customer_id
required: true
validators:
- type: int
min: 1

- field-name: total_amount
required: true
validators:
- type: string
regex: '^\d+(\.\d{2})?$'

template-source: order-create.sql
connection: [local-duckdb] # Write to fast local cache

SQL:

INSERT INTO orders (
customer_id,
total_amount,
status,
created_at,
synced_to_warehouse
)
VALUES (
:customer_id,
:total_amount,
'pending',
CURRENT_TIMESTAMP(),
false
)
RETURNING *

Separate sync process:

Create a scheduled job to sync pending writes to the warehouse:

-- Sync pending orders to Snowflake
INSERT INTO snowflake.orders
SELECT * FROM orders WHERE synced_to_warehouse = false

-- Mark as synced
UPDATE orders SET synced_to_warehouse = true
WHERE synced_to_warehouse = false

-- Handle conflicts with MAX(updated_at)

Advantages:

  • ✅ Ultra-fast writes (local cache only)
  • ✅ Resilient to warehouse outages
  • ✅ Great user experience

Disadvantages:

  • ❌ Eventual consistency (temporary divergence)
  • ❌ Requires handling sync failures
  • ❌ Complex conflict resolution

Pattern 3: Read-Through Write-Back Hybrid

When: Need balance between speed and consistency

How: Reads from cache, writes batch to warehouse on schedule

Consumer reads

Read from Cache (fast)

Consumers write

Write to Cache (fast)

[Scheduled] Batch sync to Warehouse

Update cache on refresh

Implementation

Read endpoints (from cache):

url-path: /api/products/
method: GET

request: []
template-source: products-read.sql
connection: [local-duckdb] # Read from fast cache

Write endpoints (to cache with sync flag):

url-path: /api/products/:product_id
method: PUT

operation:
type: Write
validate-before-write: true
returns-data: true
transaction: true

request:
- field-name: product_id
field-in: path
required: true

- field-name: unit_price
field-in: body
required: false

template-source: product-update.sql
connection: [local-duckdb]

SQL with sync tracking:

UPDATE products
SET
unit_price = COALESCE(:unit_price, unit_price),
needs_sync = true, -- Flag for batch sync
updated_at = NOW()
WHERE product_id = :product_id
RETURNING *

Scheduled batch sync job (every 5 minutes):

BEGIN TRANSACTION;

-- Copy changes to warehouse
INSERT INTO warehouse.products
SELECT * FROM products WHERE needs_sync = true
ON CONFLICT (product_id) DO UPDATE SET
unit_price = EXCLUDED.unit_price,
updated_at = EXCLUDED.updated_at;

-- Mark as synced
UPDATE products SET needs_sync = false
WHERE needs_sync = true;

COMMIT;

Advantages:

  • ✅ Fast reads from cache
  • ✅ Fast writes to cache
  • ✅ Batch efficiency
  • ✅ Reasonable consistency window

Disadvantages:

  • ❌ Sync delay (usually 5-15 minutes)
  • ❌ Requires batch reconciliation logic
  • ❌ Storage overhead for sync tracking

Pattern 4: Smart Cache Invalidation

When: Need to keep cache fresh after writes

How: Invalidate relevant cache entries, lazy-reload on next read

Write to Cache

Invalidate Cache Entry

Next Read

Cache Miss

Reload from Warehouse

Serve to Consumer

Implementation

Write endpoint that invalidates and optionally refreshes the cache:

url-path: /api/customers/:customer_id
method: PUT

operation:
type: Write
validate-before-write: true
returns-data: true
transaction: true

# After successful write, invalidate / refresh the endpoint's DuckLake cache
cache:
invalidate-on-write: true # mark cache as stale
refresh-on-write: false # set true to force a refresh now

template-source: customer-update.sql
connection: [snowflake]

Cache-aware read endpoint (DuckLake):

url-path: /api/customers/:customer_id
method: GET

cache:
enabled: true
table: customers_cache
schedule: 1h
primary-key: [customer_id]
cursor:
column: updated_at
type: timestamp

template-source: customer-read.sql
connection: [snowflake]

Advantages:

  • ✅ Automatic cache freshness
  • ✅ Lazy reloading (only on demand)
  • ✅ Reduced memory footprint

Disadvantages:

  • ❌ First read after write is slow
  • ❌ Requires cache layer (Redis, Memcached)
  • ❌ More complex setup

Conflict Resolution Strategies

When syncing writes from cache back to warehouse, conflicts can occur.

Strategy 1: Last-Write-Wins

Simplest approach: later timestamp wins.

INSERT INTO warehouse.data
SELECT * FROM cache.data WHERE needs_sync = true
ON CONFLICT (id) DO UPDATE SET
-- Take value from cache if warehouse update is older
value = CASE
WHEN warehouse.updated_at < EXCLUDED.updated_at
THEN EXCLUDED.value
ELSE warehouse.value
END,
updated_at = GREATEST(
warehouse.updated_at,
EXCLUDED.updated_at
);

Strategy 2: Custom Merge Logic

Different rules for different columns.

-- For price: take max (don't lower prices accidentally)
-- For availability: take min (conservative estimate)
-- For notes: concatenate with timestamp

INSERT INTO warehouse.products
SELECT * FROM cache.products WHERE needs_sync = true
ON CONFLICT (product_id) DO UPDATE SET
unit_price = GREATEST(warehouse.unit_price, EXCLUDED.unit_price),
units_in_stock = LEAST(warehouse.units_in_stock, EXCLUDED.units_in_stock),
notes = warehouse.notes || ' | ' || EXCLUDED.notes,
updated_at = CURRENT_TIMESTAMP();

Strategy 3: Manual Conflict Resolution

Flag conflicts for human review.

-- Detect conflicts
WITH conflicting_updates AS (
SELECT
cache.id,
cache.value as cache_value,
warehouse.value as warehouse_value
FROM cache.data cache
INNER JOIN warehouse.data warehouse
ON cache.id = warehouse.id
AND cache.value != warehouse.value
AND cache.updated_at > warehouse.updated_at
)
INSERT INTO conflict_log (record_id, cache_value, warehouse_value, detected_at)
SELECT id, cache_value, warehouse_value, NOW()
FROM conflicting_updates;

Best Practices

1. Choose Your Consistency Model

Decide: Immediate (write-through) vs Eventual (write-back)?

  • High-stakes data (payments, inventory) → write-through
  • User-generated content (profiles, preferences) → write-back
  • Reporting data → write-back with batch sync

2. Always Validate Before Writing

Prevent invalid data entering either cache or warehouse:

operation:
validate-before-write: true

3. Use Transactions

Ensure partial operations don't corrupt data:

operation:
transaction: true

4. Track Sync Status

Make it easy to identify what's synced:

CREATE TABLE products (
id INTEGER,
name STRING,
updated_at TIMESTAMP,
synced_at TIMESTAMP, -- NULL until synced
needs_sync BOOLEAN
);

5. Monitor Divergence

Alert when cache and warehouse drift too far:

SELECT COUNT(*) as diverged_records
FROM cache.products c
FULL OUTER JOIN warehouse.products w
ON c.id = w.id
WHERE c.updated_at != w.updated_at
OR c.needs_sync = true;

6. Set Reasonable Refresh Schedules and Retention

DuckLake doesn't use a TTL — it uses snapshot-based refresh plus retention:

cache:
enabled: true
schedule: 1h # Refresh hourly
retention:
keep-last-snapshots: 24
max-snapshot-age: 24h

Examples

Example 1: User Profile Sync

Fast writes to cache, sync to warehouse every 10 minutes:

# Endpoint: PUT /api/users/:user_id
operation:
type: Write
returns-data: true
transaction: true

# Writes to local DuckDB for speed
connection: [local-duckdb]

Sync to Snowflake is performed by an external scheduled job (cron, Airflow, etc.) that reads the local table and merges rows where needs_sync = true into the warehouse — flAPI itself does not provide a built-in cross-connection sync scheduler.

Example 2: Order Processing

Immediate consistency: write directly to warehouse:

# Endpoint: POST /api/orders/
operation:
type: Write
returns-data: true
transaction: true

# Write directly to Snowflake
connection: [snowflake]

# No sync delay needed

Example 3: Analytics Data

Batch updates with smart cache invalidation:

# Endpoint: PUT /api/metrics/:metric_id
operation:
type: Write
returns-data: true
transaction: true

# Write to cache
connection: [local-duckdb]

# Invalidate this endpoint's DuckLake cache after the write
cache:
invalidate-on-write: true
refresh-on-write: false

A separate scheduled job batches the warehouse sync; flAPI exposes the audit table <catalog>.audit.sync_events to monitor refreshes.

When to Use Each Pattern

PatternSpeedConsistencyComplexityUse Case
Write-ThroughSlowImmediateLowCritical data (payments)
Write-BackVery FastEventualHighUser data, preferences
HybridFastDelayedMediumAnalytics, reporting
Cache InvalidationFastHoursMediumPublic data, products
🍪 Cookie Settings