Snowflake Integration
This guide uses the snowflake extension by the DuckDB community. Thanks to the contributors who made Snowflake integration possible!
Turn your Snowflake data warehouse into a high-performance REST API with 99%+ cost savings through intelligent caching. Perfect for analytics APIs, dashboards, and customer-facing data products.
Why Use Snowflake with flAPI?
"The Snowflake caching layer that pays for itself in week 1."
Snowflake is powerful but expensive for high-frequency access:
- $2/hour minimum for smallest warehouse
- Cold start delays (5-15 seconds)
- Compute costs scale with query complexity
- Per-query billing adds up fast
flAPI solves this:
- 99%+ cost reduction through caching
- 1-10ms response time (vs 500ms-5s)
- No warehouse always-on costs
- Scheduled refreshes during off-peak hours
- DuckDB-powered queries (faster for small-medium datasets)
Real-World Use Cases
Customer Analytics Dashboard
Problem: Dashboard queries hit Snowflake 10,000x/day = $1,500/month Solution: Cache refreshes 4x/day = $15/month (100x savings)
Internal Reporting API
Problem: Sales team's BI tool hammers Snowflake constantly Solution: flAPI cache serves instant results, refresh hourly
Mobile App Analytics
Problem: Every app open = expensive Snowflake query Solution: Pre-computed aggregates cached, sub-10ms to users
AI Agent Data Access
Problem: LLMs need fast access to company data Solution: flAPI + MCP provides instant Snowflake access for AI
Architecture
Installation
Prerequisites
- Snowflake Account with credentials
- DuckDB Snowflake Extension
Extension Setup
The DuckDB snowflake extension authenticates via a DuckDB SNOWFLAKE secret created during init:. flAPI connection properties are exposed as {{ conn.* }} so they can be referenced in templates and the init block.
connections:
snowflake-prod:
init: |
INSTALL snowflake FROM community;
LOAD snowflake;
CREATE OR REPLACE PERSISTENT SECRET snowflake_prod (
TYPE SNOWFLAKE,
ACCOUNT '${SNOWFLAKE_ACCOUNT}',
USER '${SNOWFLAKE_USER}',
PASSWORD '${SNOWFLAKE_PASSWORD}',
WAREHOUSE '${SNOWFLAKE_WAREHOUSE}',
DATABASE '${SNOWFLAKE_DATABASE}',
SCHEMA '${SNOWFLAKE_SCHEMA}'
);
properties:
database: 'ANALYTICS'
schema: 'PUBLIC'
template:
environment-whitelist:
- '^SNOWFLAKE_.*'
Environment Variables
SNOWFLAKE_ACCOUNT=xy12345.us-east-1
SNOWFLAKE_USER=flapi_reader
SNOWFLAKE_PASSWORD=your-password-here
SNOWFLAKE_WAREHOUSE=COMPUTE_WH
SNOWFLAKE_DATABASE=ANALYTICS
SNOWFLAKE_SCHEMA=PUBLIC
Basic Configuration
Simple Query
SELECT
sale_date,
product_category,
SUM(revenue) AS total_revenue,
COUNT(DISTINCT customer_id) AS unique_customers
FROM snowflake_scan('{{ conn.database }}.{{ conn.schema }}.SALES')
WHERE sale_date >= CURRENT_DATE - INTERVAL 30 DAY
{{#params.category}}
AND product_category = '{{{ params.category }}}'
{{/params.category}}
GROUP BY sale_date, product_category
ORDER BY sale_date DESC
url-path: /sales/daily/
method: GET
template-source: daily_sales.sql
connection:
- snowflake-prod
cache:
enabled: true
table: daily_sales_cache
schedule: 6h
template-file: daily_sales_cache.sql
request:
- field-name: category
field-in: query
description: Filter by product category
required: false
validators:
- type: string
regex: '^[A-Za-z _-]{1,50}$'
Cache Template
-- Full refresh: pull data from Snowflake into the DuckLake cache
INSERT INTO {{cache.catalog}}.{{cache.schema}}.{{cache.table}}
SELECT
sale_date,
product_category,
revenue,
customer_id
FROM snowflake_scan('{{ conn.database }}.{{ conn.schema }}.SALES')
WHERE sale_date >= CURRENT_DATE - INTERVAL 30 DAY
Cost Optimization Strategies
Strategy 1: Scheduled Refreshes (best for most use cases)
url-path: /analytics/revenue/
method: GET
template-source: revenue.sql
connection:
- snowflake-prod
cache:
enabled: true
table: revenue_cache
# Refresh on a fixed cadence
schedule: 6h
template-file: revenue_cache.sql
Cost comparison (10K API calls/day):
- Direct Snowflake: ~$1,500/month
- Cached (4x refresh): ~$15/month
- Savings: 99%
Strategy 2: Incremental Refresh
-- Append only new/updated records
INSERT INTO {{cache.catalog}}.{{cache.schema}}.{{cache.table}}
SELECT *
FROM snowflake_scan('{{ conn.database }}.{{ conn.schema }}.ORDERS')
{{#cache.previousSnapshotTimestamp}}
WHERE updated_at > TIMESTAMP '{{ cache.previousSnapshotTimestamp }}'
{{/cache.previousSnapshotTimestamp}}
cache:
enabled: true
table: orders_cache
schedule: 15m
primary-key: [id]
cursor:
column: updated_at
type: timestamp
Strategy 3: Tiered Caching
# Hot data: refresh frequently
# sqls/sales_today.yaml
cache:
enabled: true
table: sales_today
schedule: 5m
# Warm data: less frequent
# sqls/sales_monthly.yaml
cache:
enabled: true
table: sales_monthly
schedule: 6h
# Cold data: rarely updated
# sqls/sales_yearly.yaml
cache:
enabled: true
table: sales_yearly
schedule: 7d
Example: Real-Time Sales Dashboard
Aggregated Metrics
WITH daily_stats AS (
SELECT
sale_date,
SUM(revenue) AS daily_revenue,
COUNT(DISTINCT order_id) AS order_count,
COUNT(DISTINCT customer_id) AS customer_count,
AVG(revenue) AS avg_order_value
FROM snowflake_scan('{{ conn.database }}.{{ conn.schema }}.SALES')
WHERE sale_date >= CURRENT_DATE - INTERVAL 90 DAY
GROUP BY sale_date
)
SELECT
sale_date,
daily_revenue,
order_count,
customer_count,
ROUND(avg_order_value, 2) AS avg_order_value,
ROUND(AVG(daily_revenue) OVER (
ORDER BY sale_date
ROWS BETWEEN 6 PRECEDING AND CURRENT ROW
), 2) AS revenue_7day_avg
FROM daily_stats
ORDER BY sale_date DESC
LIMIT {{#params.limit}}{{ params.limit }}{{/params.limit}}{{^params.limit}}90{{/params.limit}}
url-path: /sales/metrics/
method: GET
template-source: sales_metrics.sql
connection:
- snowflake-prod
cache:
enabled: true
table: sales_metrics_cache
schedule: 10m
template-file: sales_metrics_cache.sql
request:
- field-name: limit
field-in: query
description: Number of days to return
required: false
validators:
- type: int
min: 1
max: 365
Customer Segmentation
WITH customer_lifetime AS (
SELECT
customer_id,
COUNT(DISTINCT order_id) AS total_orders,
SUM(revenue) AS lifetime_value,
MIN(sale_date) AS first_purchase,
MAX(sale_date) AS last_purchase,
date_diff('day', MAX(sale_date), CURRENT_DATE) AS days_since_last_purchase
FROM snowflake_scan('{{ conn.database }}.{{ conn.schema }}.SALES')
GROUP BY customer_id
)
SELECT
customer_id,
total_orders,
ROUND(lifetime_value, 2) AS lifetime_value,
first_purchase,
last_purchase,
days_since_last_purchase,
CASE
WHEN days_since_last_purchase <= 30 AND total_orders >= 5 THEN 'VIP Active'
WHEN days_since_last_purchase <= 30 THEN 'Active'
WHEN days_since_last_purchase <= 90 THEN 'At Risk'
WHEN days_since_last_purchase <= 180 THEN 'Dormant'
ELSE 'Lost'
END AS segment
FROM customer_lifetime
{{#params.segment}}
WHERE segment = '{{{ params.segment }}}'
{{/params.segment}}
ORDER BY lifetime_value DESC
Performance Comparison
Direct Snowflake Query
Query time: 500ms - 5 seconds
Cost per query: ~$0.01
10K queries/day: $100/day = $3,000/month
Cold start penalty: 5-15 seconds
flAPI Cached Query
Query time: 1-10ms (50-500x faster)
Cost per query: $0 (cached)
10K queries/day: $0
Cache refresh (4x/day): $0.40/day = $12/month
Savings: 99.6%
Best Practices
1. Use Smallest Warehouse for Refreshes
export SNOWFLAKE_WAREHOUSE='XSMALL_WH' # Not 'LARGE_WH'
Cost difference:
- X-Small: $2/hour
- Large: $32/hour
- Savings: 16x
2. Schedule Refreshes During Off-Peak
cache:
# Off-peak refresh once every 6 hours
schedule: 6h
3. Materialize Complex Joins in Snowflake
-- Create materialized view in Snowflake
CREATE MATERIALIZED VIEW ANALYTICS.PUBLIC.SALES_ENRICHED AS
SELECT
s.*,
c.customer_name,
c.customer_segment,
p.product_name,
p.product_category
FROM SALES s
JOIN CUSTOMERS c ON s.customer_id = c.id
JOIN PRODUCTS p ON s.product_id = p.id;
-- Then query the view (faster, cheaper)
SELECT * FROM snowflake_scan('ANALYTICS.PUBLIC.SALES_ENRICHED')
4. Partition by Time
WHERE sale_date >= CURRENT_DATE - INTERVAL 90 DAY
5. Use Column Pruning
-- Bad: pulls all columns
SELECT * FROM snowflake_scan('ANALYTICS.PUBLIC.SALES')
-- Good: only needed columns (faster, cheaper)
SELECT
sale_date,
customer_id,
revenue
FROM snowflake_scan('ANALYTICS.PUBLIC.SALES')
Security
Read-Only User
Create a dedicated read-only user in Snowflake:
-- In Snowflake
CREATE ROLE FLAPI_READER;
GRANT USAGE ON WAREHOUSE COMPUTE_WH TO ROLE FLAPI_READER;
GRANT USAGE ON DATABASE ANALYTICS TO ROLE FLAPI_READER;
GRANT USAGE ON SCHEMA ANALYTICS.PUBLIC TO ROLE FLAPI_READER;
GRANT SELECT ON ALL TABLES IN SCHEMA ANALYTICS.PUBLIC TO ROLE FLAPI_READER;
GRANT SELECT ON FUTURE TABLES IN SCHEMA ANALYTICS.PUBLIC TO ROLE FLAPI_READER;
CREATE USER flapi_reader
PASSWORD='...'
DEFAULT_ROLE=FLAPI_READER;
GRANT ROLE FLAPI_READER TO USER flapi_reader;
Credential Management
connections:
snowflake-prod:
init: |
INSTALL snowflake FROM community;
LOAD snowflake;
CREATE OR REPLACE PERSISTENT SECRET snowflake_prod (
TYPE SNOWFLAKE,
ACCOUNT '${SNOWFLAKE_ACCOUNT}',
USER '${SNOWFLAKE_USER}',
PASSWORD '${SNOWFLAKE_PASSWORD}',
ROLE 'FLAPI_READER'
);
Network Security
Use Snowflake network policies:
-- In Snowflake: restrict IP access
CREATE NETWORK POLICY flapi_access
ALLOWED_IP_LIST = ('203.0.113.10/32', '198.51.100.20/32');
ALTER USER flapi_reader SET NETWORK_POLICY = flapi_access;
Troubleshooting
Issue: "Authentication failed"
# Test connection with the DuckDB CLI
export SNOWFLAKE_ACCOUNT="xy12345.us-east-1"
export SNOWFLAKE_USER="flapi_reader"
export SNOWFLAKE_PASSWORD="your-password"
duckdb -c "
INSTALL snowflake FROM community;
LOAD snowflake;
CREATE OR REPLACE PERSISTENT SECRET sf (
TYPE SNOWFLAKE,
ACCOUNT '${SNOWFLAKE_ACCOUNT}',
USER '${SNOWFLAKE_USER}',
PASSWORD '${SNOWFLAKE_PASSWORD}'
);
SELECT * FROM snowflake_scan('ANALYTICS.PUBLIC.SALES') LIMIT 10;
"
Issue: Slow cache refreshes
Problem: Cache refresh takes 10+ minutes
Solutions:
- Use materialized views in Snowflake
- Partition by date (only recent data)
- Use a larger warehouse for the refresh (balance cost vs speed)
- Consider an incremental cursor-based refresh
Issue: High Snowflake costs despite caching
Tune your refresh cadence:
# Too frequent (expensive)
cache:
schedule: 1m # every minute = $$$
# Optimal (cheap)
cache:
schedule: 6h
Cost Calculator
Your scenario:
- API calls: 10,000/day
- Cache refreshes: 4/day
- Refresh duration: 5 minutes each
- Warehouse: X-Small ($2/hour)
Direct Snowflake:
10,000 calls × $0.01 per query = $100/day
$100 × 30 days = $3,000/month
flAPI Cached:
4 refreshes/day × 5 min × $2/hour = $0.67/day
$0.67 × 30 days = $20/month
Savings: $2,980/month (99.3%)
Migration from Direct Access
Phase 1: Add flAPI Layer
connections:
snowflake-prod:
init: |
INSTALL snowflake FROM community;
LOAD snowflake;
CREATE OR REPLACE PERSISTENT SECRET snowflake_prod (
TYPE SNOWFLAKE,
ACCOUNT '${SNOWFLAKE_ACCOUNT}',
USER '${SNOWFLAKE_USER}',
PASSWORD '${SNOWFLAKE_PASSWORD}'
);
Phase 2: Introduce Caching
cache:
enabled: true
table: hot_data_cache
schedule: 10m
template-file: hot_data_cache.sql
Phase 3: Monitor Costs
-- In Snowflake: check query costs
SELECT
query_text,
COUNT(*) AS execution_count,
AVG(total_elapsed_time) / 1000 AS avg_seconds,
AVG(credits_used_cloud_services) AS avg_credits
FROM SNOWFLAKE.ACCOUNT_USAGE.QUERY_HISTORY
WHERE query_text LIKE '%flapi%'
AND start_time >= DATEADD(day, -7, CURRENT_TIMESTAMP)
GROUP BY query_text
ORDER BY avg_credits DESC;
Next Steps
- Caching Strategy: Understand optimization techniques
- BigQuery: Alternative cloud warehouse
- SQL Templating: Master dynamic queries
- Examples: See caching in action
- Deployment: Deploy to production
Additional Resources
- DuckDB Snowflake Extension: Official extension docs
- Snowflake Documentation: Complete Snowflake reference
- Snowflake Cost Optimization: Official cost guide
One flAPI user saved $47,000/year by caching their customer dashboard queries. The cache refreshes 6x/day, users get instant responses, and Snowflake costs dropped by 98%. ROI payback: 3 days.