Skip to main content

Snowflake Integration

Extension Credit

This guide uses the snowflake extension by the DuckDB community. Thanks to the contributors who made Snowflake integration possible!

Turn your Snowflake data warehouse into a high-performance REST API with 99%+ cost savings through intelligent caching. Perfect for analytics APIs, dashboards, and customer-facing data products.

Why Use Snowflake with flAPI?

"The Snowflake caching layer that pays for itself in week 1."

Snowflake is powerful but expensive for high-frequency access:

  • $2/hour minimum for smallest warehouse
  • Cold start delays (5-15 seconds)
  • Compute costs scale with query complexity
  • Per-query billing adds up fast

flAPI solves this:

  • 99%+ cost reduction through caching
  • 1-10ms response time (vs 500ms-5s)
  • No warehouse always-on costs
  • Scheduled refreshes during off-peak hours
  • DuckDB-powered queries (faster for small-medium datasets)

Real-World Use Cases

Customer Analytics Dashboard

Problem: Dashboard queries hit Snowflake 10,000x/day = $1,500/month Solution: Cache refreshes 4x/day = $15/month (100x savings)

Internal Reporting API

Problem: Sales team's BI tool hammers Snowflake constantly Solution: flAPI cache serves instant results, refresh hourly

Mobile App Analytics

Problem: Every app open = expensive Snowflake query Solution: Pre-computed aggregates cached, sub-10ms to users

AI Agent Data Access

Problem: LLMs need fast access to company data Solution: flAPI + MCP provides instant Snowflake access for AI

Architecture

Installation

Prerequisites

  1. Snowflake Account with credentials
  2. DuckDB Snowflake Extension

Extension Setup

The DuckDB snowflake extension authenticates via a DuckDB SNOWFLAKE secret created during init:. flAPI connection properties are exposed as {{ conn.* }} so they can be referenced in templates and the init block.

flapi.yaml
connections:
snowflake-prod:
init: |
INSTALL snowflake FROM community;
LOAD snowflake;

CREATE OR REPLACE PERSISTENT SECRET snowflake_prod (
TYPE SNOWFLAKE,
ACCOUNT '${SNOWFLAKE_ACCOUNT}',
USER '${SNOWFLAKE_USER}',
PASSWORD '${SNOWFLAKE_PASSWORD}',
WAREHOUSE '${SNOWFLAKE_WAREHOUSE}',
DATABASE '${SNOWFLAKE_DATABASE}',
SCHEMA '${SNOWFLAKE_SCHEMA}'
);
properties:
database: 'ANALYTICS'
schema: 'PUBLIC'

template:
environment-whitelist:
- '^SNOWFLAKE_.*'

Environment Variables

.env
SNOWFLAKE_ACCOUNT=xy12345.us-east-1
SNOWFLAKE_USER=flapi_reader
SNOWFLAKE_PASSWORD=your-password-here
SNOWFLAKE_WAREHOUSE=COMPUTE_WH
SNOWFLAKE_DATABASE=ANALYTICS
SNOWFLAKE_SCHEMA=PUBLIC

Basic Configuration

Simple Query

sqls/daily_sales.sql
SELECT
sale_date,
product_category,
SUM(revenue) AS total_revenue,
COUNT(DISTINCT customer_id) AS unique_customers
FROM snowflake_scan('{{ conn.database }}.{{ conn.schema }}.SALES')
WHERE sale_date >= CURRENT_DATE - INTERVAL 30 DAY
{{#params.category}}
AND product_category = '{{{ params.category }}}'
{{/params.category}}
GROUP BY sale_date, product_category
ORDER BY sale_date DESC
sqls/daily_sales.yaml
url-path: /sales/daily/
method: GET
template-source: daily_sales.sql
connection:
- snowflake-prod

cache:
enabled: true
table: daily_sales_cache
schedule: 6h
template-file: daily_sales_cache.sql

request:
- field-name: category
field-in: query
description: Filter by product category
required: false
validators:
- type: string
regex: '^[A-Za-z _-]{1,50}$'

Cache Template

sqls/daily_sales_cache.sql
-- Full refresh: pull data from Snowflake into the DuckLake cache
INSERT INTO {{cache.catalog}}.{{cache.schema}}.{{cache.table}}
SELECT
sale_date,
product_category,
revenue,
customer_id
FROM snowflake_scan('{{ conn.database }}.{{ conn.schema }}.SALES')
WHERE sale_date >= CURRENT_DATE - INTERVAL 30 DAY

Cost Optimization Strategies

Strategy 1: Scheduled Refreshes (best for most use cases)

sqls/analytics.yaml
url-path: /analytics/revenue/
method: GET
template-source: revenue.sql
connection:
- snowflake-prod

cache:
enabled: true
table: revenue_cache
# Refresh on a fixed cadence
schedule: 6h
template-file: revenue_cache.sql

Cost comparison (10K API calls/day):

  • Direct Snowflake: ~$1,500/month
  • Cached (4x refresh): ~$15/month
  • Savings: 99%

Strategy 2: Incremental Refresh

sqls/orders_cache.sql
-- Append only new/updated records
INSERT INTO {{cache.catalog}}.{{cache.schema}}.{{cache.table}}
SELECT *
FROM snowflake_scan('{{ conn.database }}.{{ conn.schema }}.ORDERS')
{{#cache.previousSnapshotTimestamp}}
WHERE updated_at > TIMESTAMP '{{ cache.previousSnapshotTimestamp }}'
{{/cache.previousSnapshotTimestamp}}
cache:
enabled: true
table: orders_cache
schedule: 15m
primary-key: [id]
cursor:
column: updated_at
type: timestamp

Strategy 3: Tiered Caching

# Hot data: refresh frequently
# sqls/sales_today.yaml
cache:
enabled: true
table: sales_today
schedule: 5m

# Warm data: less frequent
# sqls/sales_monthly.yaml
cache:
enabled: true
table: sales_monthly
schedule: 6h

# Cold data: rarely updated
# sqls/sales_yearly.yaml
cache:
enabled: true
table: sales_yearly
schedule: 7d

Example: Real-Time Sales Dashboard

Aggregated Metrics

sqls/sales_metrics.sql
WITH daily_stats AS (
SELECT
sale_date,
SUM(revenue) AS daily_revenue,
COUNT(DISTINCT order_id) AS order_count,
COUNT(DISTINCT customer_id) AS customer_count,
AVG(revenue) AS avg_order_value
FROM snowflake_scan('{{ conn.database }}.{{ conn.schema }}.SALES')
WHERE sale_date >= CURRENT_DATE - INTERVAL 90 DAY
GROUP BY sale_date
)
SELECT
sale_date,
daily_revenue,
order_count,
customer_count,
ROUND(avg_order_value, 2) AS avg_order_value,
ROUND(AVG(daily_revenue) OVER (
ORDER BY sale_date
ROWS BETWEEN 6 PRECEDING AND CURRENT ROW
), 2) AS revenue_7day_avg
FROM daily_stats
ORDER BY sale_date DESC
LIMIT {{#params.limit}}{{ params.limit }}{{/params.limit}}{{^params.limit}}90{{/params.limit}}
sqls/sales_metrics.yaml
url-path: /sales/metrics/
method: GET
template-source: sales_metrics.sql
connection:
- snowflake-prod

cache:
enabled: true
table: sales_metrics_cache
schedule: 10m
template-file: sales_metrics_cache.sql

request:
- field-name: limit
field-in: query
description: Number of days to return
required: false
validators:
- type: int
min: 1
max: 365

Customer Segmentation

sqls/customer_segments.sql
WITH customer_lifetime AS (
SELECT
customer_id,
COUNT(DISTINCT order_id) AS total_orders,
SUM(revenue) AS lifetime_value,
MIN(sale_date) AS first_purchase,
MAX(sale_date) AS last_purchase,
date_diff('day', MAX(sale_date), CURRENT_DATE) AS days_since_last_purchase
FROM snowflake_scan('{{ conn.database }}.{{ conn.schema }}.SALES')
GROUP BY customer_id
)
SELECT
customer_id,
total_orders,
ROUND(lifetime_value, 2) AS lifetime_value,
first_purchase,
last_purchase,
days_since_last_purchase,
CASE
WHEN days_since_last_purchase <= 30 AND total_orders >= 5 THEN 'VIP Active'
WHEN days_since_last_purchase <= 30 THEN 'Active'
WHEN days_since_last_purchase <= 90 THEN 'At Risk'
WHEN days_since_last_purchase <= 180 THEN 'Dormant'
ELSE 'Lost'
END AS segment
FROM customer_lifetime
{{#params.segment}}
WHERE segment = '{{{ params.segment }}}'
{{/params.segment}}
ORDER BY lifetime_value DESC

Performance Comparison

Direct Snowflake Query

Query time: 500ms - 5 seconds
Cost per query: ~$0.01
10K queries/day: $100/day = $3,000/month
Cold start penalty: 5-15 seconds

flAPI Cached Query

Query time: 1-10ms (50-500x faster)
Cost per query: $0 (cached)
10K queries/day: $0
Cache refresh (4x/day): $0.40/day = $12/month
Savings: 99.6%

Best Practices

1. Use Smallest Warehouse for Refreshes

export SNOWFLAKE_WAREHOUSE='XSMALL_WH'  # Not 'LARGE_WH'

Cost difference:

  • X-Small: $2/hour
  • Large: $32/hour
  • Savings: 16x

2. Schedule Refreshes During Off-Peak

cache:
# Off-peak refresh once every 6 hours
schedule: 6h

3. Materialize Complex Joins in Snowflake

-- Create materialized view in Snowflake
CREATE MATERIALIZED VIEW ANALYTICS.PUBLIC.SALES_ENRICHED AS
SELECT
s.*,
c.customer_name,
c.customer_segment,
p.product_name,
p.product_category
FROM SALES s
JOIN CUSTOMERS c ON s.customer_id = c.id
JOIN PRODUCTS p ON s.product_id = p.id;

-- Then query the view (faster, cheaper)
SELECT * FROM snowflake_scan('ANALYTICS.PUBLIC.SALES_ENRICHED')

4. Partition by Time

WHERE sale_date >= CURRENT_DATE - INTERVAL 90 DAY

5. Use Column Pruning

-- Bad: pulls all columns
SELECT * FROM snowflake_scan('ANALYTICS.PUBLIC.SALES')

-- Good: only needed columns (faster, cheaper)
SELECT
sale_date,
customer_id,
revenue
FROM snowflake_scan('ANALYTICS.PUBLIC.SALES')

Security

Read-Only User

Create a dedicated read-only user in Snowflake:

-- In Snowflake
CREATE ROLE FLAPI_READER;
GRANT USAGE ON WAREHOUSE COMPUTE_WH TO ROLE FLAPI_READER;
GRANT USAGE ON DATABASE ANALYTICS TO ROLE FLAPI_READER;
GRANT USAGE ON SCHEMA ANALYTICS.PUBLIC TO ROLE FLAPI_READER;
GRANT SELECT ON ALL TABLES IN SCHEMA ANALYTICS.PUBLIC TO ROLE FLAPI_READER;
GRANT SELECT ON FUTURE TABLES IN SCHEMA ANALYTICS.PUBLIC TO ROLE FLAPI_READER;

CREATE USER flapi_reader
PASSWORD='...'
DEFAULT_ROLE=FLAPI_READER;

GRANT ROLE FLAPI_READER TO USER flapi_reader;

Credential Management

connections:
snowflake-prod:
init: |
INSTALL snowflake FROM community;
LOAD snowflake;

CREATE OR REPLACE PERSISTENT SECRET snowflake_prod (
TYPE SNOWFLAKE,
ACCOUNT '${SNOWFLAKE_ACCOUNT}',
USER '${SNOWFLAKE_USER}',
PASSWORD '${SNOWFLAKE_PASSWORD}',
ROLE 'FLAPI_READER'
);

Network Security

Use Snowflake network policies:

-- In Snowflake: restrict IP access
CREATE NETWORK POLICY flapi_access
ALLOWED_IP_LIST = ('203.0.113.10/32', '198.51.100.20/32');

ALTER USER flapi_reader SET NETWORK_POLICY = flapi_access;

Troubleshooting

Issue: "Authentication failed"

# Test connection with the DuckDB CLI
export SNOWFLAKE_ACCOUNT="xy12345.us-east-1"
export SNOWFLAKE_USER="flapi_reader"
export SNOWFLAKE_PASSWORD="your-password"

duckdb -c "
INSTALL snowflake FROM community;
LOAD snowflake;
CREATE OR REPLACE PERSISTENT SECRET sf (
TYPE SNOWFLAKE,
ACCOUNT '${SNOWFLAKE_ACCOUNT}',
USER '${SNOWFLAKE_USER}',
PASSWORD '${SNOWFLAKE_PASSWORD}'
);
SELECT * FROM snowflake_scan('ANALYTICS.PUBLIC.SALES') LIMIT 10;
"

Issue: Slow cache refreshes

Problem: Cache refresh takes 10+ minutes

Solutions:

  1. Use materialized views in Snowflake
  2. Partition by date (only recent data)
  3. Use a larger warehouse for the refresh (balance cost vs speed)
  4. Consider an incremental cursor-based refresh

Issue: High Snowflake costs despite caching

Tune your refresh cadence:

# Too frequent (expensive)
cache:
schedule: 1m # every minute = $$$

# Optimal (cheap)
cache:
schedule: 6h

Cost Calculator

Your scenario:

  • API calls: 10,000/day
  • Cache refreshes: 4/day
  • Refresh duration: 5 minutes each
  • Warehouse: X-Small ($2/hour)

Direct Snowflake:

10,000 calls × $0.01 per query = $100/day
$100 × 30 days = $3,000/month

flAPI Cached:

4 refreshes/day × 5 min × $2/hour = $0.67/day
$0.67 × 30 days = $20/month
Savings: $2,980/month (99.3%)

Migration from Direct Access

Phase 1: Add flAPI Layer

connections:
snowflake-prod:
init: |
INSTALL snowflake FROM community;
LOAD snowflake;
CREATE OR REPLACE PERSISTENT SECRET snowflake_prod (
TYPE SNOWFLAKE,
ACCOUNT '${SNOWFLAKE_ACCOUNT}',
USER '${SNOWFLAKE_USER}',
PASSWORD '${SNOWFLAKE_PASSWORD}'
);

Phase 2: Introduce Caching

cache:
enabled: true
table: hot_data_cache
schedule: 10m
template-file: hot_data_cache.sql

Phase 3: Monitor Costs

-- In Snowflake: check query costs
SELECT
query_text,
COUNT(*) AS execution_count,
AVG(total_elapsed_time) / 1000 AS avg_seconds,
AVG(credits_used_cloud_services) AS avg_credits
FROM SNOWFLAKE.ACCOUNT_USAGE.QUERY_HISTORY
WHERE query_text LIKE '%flapi%'
AND start_time >= DATEADD(day, -7, CURRENT_TIMESTAMP)
GROUP BY query_text
ORDER BY avg_credits DESC;

Next Steps

Additional Resources


Cost Savings

One flAPI user saved $47,000/year by caching their customer dashboard queries. The cache refreshes 6x/day, users get instant responses, and Snowflake costs dropped by 98%. ROI payback: 3 days.

🍪 Cookie Settings