Snowflake Integration
This guide uses the snowflake extension by the DuckDB community. Thanks to the contributors who made Snowflake integration possible!
Turn your Snowflake data warehouse into a high-performance REST API with 99%+ cost savings through intelligent caching. Perfect for analytics APIs, dashboards, and customer-facing data products.
Why Use Snowflake with flAPI?
"The Snowflake caching layer that pays for itself in week 1."
Snowflake is powerful but expensive for high-frequency access:
- 💰 $2/hour minimum for smallest warehouse
- 🐌 Cold start delays (5-15 seconds)
- 💸 Compute costs scale with query complexity
- 📊 Per-query billing adds up fast
flAPI solves this:
- ✅ 99%+ cost reduction through caching
- ✅ 1-10ms response time (vs 500ms-5s)
- ✅ No warehouse always-on costs
- ✅ Scheduled refreshes during off-peak hours
- ✅ DuckDB-powered queries (faster for small-medium datasets)
Real-World Use Cases
📊 Customer Analytics Dashboard
Problem: Dashboard queries hit Snowflake 10,000x/day = $1,500/month
Solution: Cache refreshes 4x/day = $15/month (100x savings)
🏢 Internal Reporting API
Problem: Sales team's BI tool hammers Snowflake constantly
Solution: flAPI cache serves instant results, refresh hourly
📱 Mobile App Analytics
Problem: Every app open = expensive Snowflake query
Solution: Pre-computed aggregates cached, sub-10ms to users
🤖 AI Agent Data Access
Problem: LLMs need fast access to company data
Solution: flAPI + MCP provides instant Snowflake access for AI
Architecture
Installation
Prerequisites
- Snowflake Account with credentials
- DuckDB Snowflake Extension
Extension Setup
connections:
snowflake-prod:
init: |
-- Install Snowflake extension
INSTALL snowflake FROM community;
LOAD snowflake;
-- Authenticate with Snowflake
CREATE SECRET (
TYPE SNOWFLAKE,
ACCOUNT '{{{conn.account}}}',
USER '{{{conn.user}}}',
PASSWORD '${SNOWFLAKE_PASSWORD}',
WAREHOUSE '{{{conn.warehouse}}}',
DATABASE '{{{conn.database}}}',
SCHEMA '{{{conn.schema}}}'
);
properties:
account: 'xy12345.us-east-1'
user: 'flapi_reader'
warehouse: 'COMPUTE_WH'
database: 'ANALYTICS'
schema: 'PUBLIC'
template:
environment-whitelist:
- '^SNOWFLAKE_.*'
Environment Variables
SNOWFLAKE_PASSWORD=your-password-here
Basic Configuration
Simple Query
SELECT
sale_date,
product_category,
SUM(revenue) as total_revenue,
COUNT(DISTINCT customer_id) as unique_customers
FROM snowflake_scan('ANALYTICS.PUBLIC.SALES')
WHERE sale_date >= CURRENT_DATE - INTERVAL 30 DAY
{{#params.category}}
AND product_category = '{{{params.category}}}'
{{/params.category}}
GROUP BY sale_date, product_category
ORDER BY sale_date DESC
url-path: /sales/daily/
template-source: daily_sales.sql
connection:
- snowflake-prod
# CRITICAL: Cache to avoid constant Snowflake queries
cache-template: daily_sales_cache.sql
cache-schedule: "0 */6 * * *" # Refresh every 6 hours
parameters:
category:
type: string
description: Filter by product category
Cache Template
-- Full refresh: Pull data from Snowflake into flAPI cache
SELECT
sale_date,
product_category,
revenue,
customer_id
FROM snowflake_scan('ANALYTICS.PUBLIC.SALES')
WHERE sale_date >= CURRENT_DATE - INTERVAL 30 DAY
Cost Optimization Strategies
Strategy 1: Scheduled Refreshes (Best for most use cases)
url-path: /analytics/revenue/
cache-template: revenue_cache.sql
# Refresh during off-peak hours
cache-schedule: "0 1,7,13,19 * * *" # 1am, 7am, 1pm, 7pm
Cost comparison (10K API calls/day):
- ❌ Direct Snowflake: ~$1,500/month
- ✅ Cached (4x refresh): ~$15/month
- Savings: 99%
Strategy 2: Incremental Refresh
-- Only fetch new/updated records
SELECT *
FROM snowflake_scan('ANALYTICS.PUBLIC.ORDERS')
WHERE updated_at > (
SELECT COALESCE(MAX(updated_at), '1970-01-01')
FROM cached_orders
)
cache-strategy: incremental-append
cache-schedule: "*/15 * * * *" # Every 15 minutes
Strategy 3: Tiered Caching
# Hot data: Refresh frequently
url-path: /sales/today/
cache-schedule: "*/5 * * * *" # Every 5 minutes
# Warm data: Less frequent
url-path: /sales/monthly/
cache-schedule: "0 */6 * * *" # Every 6 hours
# Cold data: Rarely updated
url-path: /sales/yearly/
cache-schedule: "0 2 * * 0" # Weekly, Sundays at 2am
Example: Real-Time Sales Dashboard
Aggregated Metrics
WITH daily_stats AS (
SELECT
sale_date,
SUM(revenue) as daily_revenue,
COUNT(DISTINCT order_id) as order_count,
COUNT(DISTINCT customer_id) as customer_count,
AVG(revenue) as avg_order_value
FROM snowflake_scan('ANALYTICS.PUBLIC.SALES')
WHERE sale_date >= CURRENT_DATE - INTERVAL 90 DAY
GROUP BY sale_date
)
SELECT
sale_date,
daily_revenue,
order_count,
customer_count,
ROUND(avg_order_value, 2) as avg_order_value,
-- Rolling 7-day average
ROUND(AVG(daily_revenue) OVER (
ORDER BY sale_date
ROWS BETWEEN 6 PRECEDING AND CURRENT ROW
), 2) as revenue_7day_avg
FROM daily_stats
ORDER BY sale_date DESC
{{#params.limit}}
LIMIT {{{params.limit}}}
{{/params.limit}}
{{^params.limit}}
LIMIT 90
{{/params.limit}}
url-path: /sales/metrics/
template-source: sales_metrics.sql
connection:
- snowflake-prod
cache-template: sales_metrics_cache.sql
cache-schedule: "*/10 * * * *" # Every 10 minutes
parameters:
limit:
type: integer
description: Number of days to return
minimum: 1
maximum: 365
Customer Segmentation
WITH customer_lifetime AS (
SELECT
customer_id,
COUNT(DISTINCT order_id) as total_orders,
SUM(revenue) as lifetime_value,
MIN(sale_date) as first_purchase,
MAX(sale_date) as last_purchase,
DATEDIFF('day', MAX(sale_date), CURRENT_DATE) as days_since_last_purchase
FROM snowflake_scan('ANALYTICS.PUBLIC.SALES')
GROUP BY customer_id
)
SELECT
customer_id,
total_orders,
ROUND(lifetime_value, 2) as lifetime_value,
first_purchase,
last_purchase,
days_since_last_purchase,
CASE
WHEN days_since_last_purchase <= 30 AND total_orders >= 5 THEN 'VIP Active'
WHEN days_since_last_purchase <= 30 THEN 'Active'
WHEN days_since_last_purchase <= 90 THEN 'At Risk'
WHEN days_since_last_purchase <= 180 THEN 'Dormant'
ELSE 'Lost'
END as segment
FROM customer_lifetime
{{#params.segment}}
WHERE segment = '{{{params.segment}}}'
{{/params.segment}}
ORDER BY lifetime_value DESC
Performance Comparison
Direct Snowflake Query
Query time: 500ms - 5 seconds
Cost per query: ~$0.01
10K queries/day: $100/day = $3,000/month
Cold start penalty: 5-15 seconds
flAPI Cached Query
Query time: 1-10ms (50-500x faster)
Cost per query: $0 (cached)
10K queries/day: $0
Cache refresh (4x/day): $0.40/day = $12/month
Savings: 99.6%
Best Practices
1. Use Smallest Warehouse for Refreshes
properties:
warehouse: 'XSMALL_WH' # Not 'LARGE_WH'
Cost difference:
- X-Small: $2/hour
- Large: $32/hour
- Savings: 16x
2. Schedule Refreshes During Off-Peak
# BAD: Peak hours (expensive)
cache-schedule: "0 9-17 * * 1-5" # Business hours
# GOOD: Off-peak (discounted)
cache-schedule: "0 1,7,13,19 * * *" # Off-peak hours
3. Materialize Complex Joins in Snowflake
-- Create materialized view in Snowflake
CREATE MATERIALIZED VIEW ANALYTICS.PUBLIC.SALES_ENRICHED AS
SELECT
s.*,
c.customer_name,
c.customer_segment,
p.product_name,
p.product_category
FROM SALES s
JOIN CUSTOMERS c ON s.customer_id = c.id
JOIN PRODUCTS p ON s.product_id = p.id;
-- Then query the view (faster, cheaper)
SELECT * FROM snowflake_scan('ANALYTICS.PUBLIC.SALES_ENRICHED')
4. Partition by Time
-- Only fetch recent data
WHERE sale_date >= CURRENT_DATE - INTERVAL 90 DAY
5. Use Column Pruning
-- BAD: Pulls all columns
SELECT * FROM snowflake_scan('ANALYTICS.PUBLIC.SALES')
-- GOOD: Only needed columns (faster, cheaper)
SELECT
sale_date,
customer_id,
revenue
FROM snowflake_scan('ANALYTICS.PUBLIC.SALES')
Security
Read-Only User
Create a dedicated read-only user in Snowflake:
-- In Snowflake
CREATE ROLE FLAPI_READER;
GRANT USAGE ON WAREHOUSE COMPUTE_WH TO ROLE FLAPI_READER;
GRANT USAGE ON DATABASE ANALYTICS TO ROLE FLAPI_READER;
GRANT USAGE ON SCHEMA ANALYTICS.PUBLIC TO ROLE FLAPI_READER;
GRANT SELECT ON ALL TABLES IN SCHEMA ANALYTICS.PUBLIC TO ROLE FLAPI_READER;
GRANT SELECT ON FUTURE TABLES IN SCHEMA ANALYTICS.PUBLIC TO ROLE FLAPI_READER;
CREATE USER flapi_reader
PASSWORD='...'
DEFAULT_ROLE=FLAPI_READER;
GRANT ROLE FLAPI_READER TO USER flapi_reader;
Credential Management
connections:
snowflake-prod:
init: |
CREATE SECRET (
TYPE SNOWFLAKE,
ACCOUNT '{{{conn.account}}}',
USER '{{{conn.user}}}',
PASSWORD '${SNOWFLAKE_PASSWORD}', # From environment
ROLE 'FLAPI_READER' # Restricted role
);
Network Security
Use Snowflake network policies:
-- In Snowflake: Restrict IP access
CREATE NETWORK POLICY flapi_access
ALLOWED_IP_LIST = ('203.0.113.10/32', '198.51.100.20/32');
ALTER USER flapi_reader SET NETWORK_POLICY = flapi_access;
Troubleshooting
Issue: "Authentication failed"
# Test connection
export SNOWFLAKE_PASSWORD="your-password"
duckdb -c "
INSTALL snowflake FROM community;
LOAD snowflake;
SELECT * FROM snowflake_scan('ANALYTICS.PUBLIC.SALES') LIMIT 10;
"
Issue: "Warehouse timeout"
Increase warehouse timeout or use auto-resume:
properties:
warehouse: 'COMPUTE_WH'
warehouse_timeout: 300 # 5 minutes
Issue: Slow cache refreshes
Problem: Cache refresh takes 10+ minutes
Solutions:
- Use materialized views in Snowflake
- Partition by date (only recent data)
- Use larger warehouse for refresh (balance cost vs speed)
- Consider incremental refresh strategy
Issue: High Snowflake costs despite caching
Check your cache schedule:
# Too frequent (expensive)
cache-schedule: "* * * * *" # Every minute = $$$
# Optimal (cheap)
cache-schedule: "0 */6 * * *" # Every 6 hours
Cost Calculator
Your scenario:
- API calls: 10,000/day
- Cache refreshes: 4/day
- Refresh duration: 5 minutes each
- Warehouse: X-Small ($2/hour)
Direct Snowflake:
10,000 calls × $0.01 per query = $100/day
$100 × 30 days = $3,000/month
flAPI Cached:
4 refreshes/day × 5 min × $2/hour = $0.67/day
$0.67 × 30 days = $20/month
Savings: $2,980/month (99.3%)
Migration from Direct Access
Phase 1: Add flAPI Layer
# Keep existing Snowflake access, add flAPI
connections:
snowflake-prod:
init: |
INSTALL snowflake FROM community;
LOAD snowflake;
Phase 2: Introduce Caching
# Add cache for high-frequency queries
cache-template: hot_data_cache.sql
cache-schedule: "*/10 * * * *"
Phase 3: Monitor Costs
-- In Snowflake: Check query costs
SELECT
query_text,
COUNT(*) as execution_count,
AVG(total_elapsed_time)/1000 as avg_seconds,
AVG(credits_used_cloud_services) as avg_credits
FROM SNOWFLAKE.ACCOUNT_USAGE.QUERY_HISTORY
WHERE query_text LIKE '%flapi%'
AND start_time >= DATEADD(day, -7, CURRENT_TIMESTAMP)
GROUP BY query_text
ORDER BY avg_credits DESC;
Next Steps
- Caching Strategy: Understand optimization techniques
- BigQuery: Alternative cloud warehouse
- SQL Templating: Master dynamic queries
- Examples: See caching in action
- Deployment: Deploy to production
Additional Resources
- DuckDB Snowflake Extension: Official extension docs
- Snowflake Documentation: Complete Snowflake reference
- Snowflake Cost Optimization: Official cost guide
One flAPI user saved $47,000/year by caching their customer dashboard queries. The cache refreshes 6x/day, users get instant responses, and Snowflake costs dropped by 98%. ROI payback: 3 days.