Skip to main content

Snowflake Integration

Extension Credit

This guide uses the snowflake extension by the DuckDB community. Thanks to the contributors who made Snowflake integration possible!

Turn your Snowflake data warehouse into a high-performance REST API with 99%+ cost savings through intelligent caching. Perfect for analytics APIs, dashboards, and customer-facing data products.

Why Use Snowflake with flAPI?

"The Snowflake caching layer that pays for itself in week 1."

Snowflake is powerful but expensive for high-frequency access:

  • 💰 $2/hour minimum for smallest warehouse
  • 🐌 Cold start delays (5-15 seconds)
  • 💸 Compute costs scale with query complexity
  • 📊 Per-query billing adds up fast

flAPI solves this:

  • 99%+ cost reduction through caching
  • 1-10ms response time (vs 500ms-5s)
  • No warehouse always-on costs
  • Scheduled refreshes during off-peak hours
  • DuckDB-powered queries (faster for small-medium datasets)

Real-World Use Cases

📊 Customer Analytics Dashboard

Problem: Dashboard queries hit Snowflake 10,000x/day = $1,500/month
Solution: Cache refreshes 4x/day = $15/month (100x savings)

🏢 Internal Reporting API

Problem: Sales team's BI tool hammers Snowflake constantly
Solution: flAPI cache serves instant results, refresh hourly

📱 Mobile App Analytics

Problem: Every app open = expensive Snowflake query
Solution: Pre-computed aggregates cached, sub-10ms to users

🤖 AI Agent Data Access

Problem: LLMs need fast access to company data
Solution: flAPI + MCP provides instant Snowflake access for AI

Architecture

Installation

Prerequisites

  1. Snowflake Account with credentials
  2. DuckDB Snowflake Extension

Extension Setup

flapi.yaml
connections:
snowflake-prod:
init: |
-- Install Snowflake extension
INSTALL snowflake FROM community;
LOAD snowflake;

-- Authenticate with Snowflake
CREATE SECRET (
TYPE SNOWFLAKE,
ACCOUNT '{{{conn.account}}}',
USER '{{{conn.user}}}',
PASSWORD '${SNOWFLAKE_PASSWORD}',
WAREHOUSE '{{{conn.warehouse}}}',
DATABASE '{{{conn.database}}}',
SCHEMA '{{{conn.schema}}}'
);
properties:
account: 'xy12345.us-east-1'
user: 'flapi_reader'
warehouse: 'COMPUTE_WH'
database: 'ANALYTICS'
schema: 'PUBLIC'

template:
environment-whitelist:
- '^SNOWFLAKE_.*'

Environment Variables

.env
SNOWFLAKE_PASSWORD=your-password-here

Basic Configuration

Simple Query

sqls/daily_sales.sql
SELECT
sale_date,
product_category,
SUM(revenue) as total_revenue,
COUNT(DISTINCT customer_id) as unique_customers
FROM snowflake_scan('ANALYTICS.PUBLIC.SALES')
WHERE sale_date >= CURRENT_DATE - INTERVAL 30 DAY
{{#params.category}}
AND product_category = '{{{params.category}}}'
{{/params.category}}
GROUP BY sale_date, product_category
ORDER BY sale_date DESC
sqls/daily_sales.yaml
url-path: /sales/daily/
template-source: daily_sales.sql
connection:
- snowflake-prod
# CRITICAL: Cache to avoid constant Snowflake queries
cache-template: daily_sales_cache.sql
cache-schedule: "0 */6 * * *" # Refresh every 6 hours

parameters:
category:
type: string
description: Filter by product category

Cache Template

sqls/daily_sales_cache.sql
-- Full refresh: Pull data from Snowflake into flAPI cache
SELECT
sale_date,
product_category,
revenue,
customer_id
FROM snowflake_scan('ANALYTICS.PUBLIC.SALES')
WHERE sale_date >= CURRENT_DATE - INTERVAL 30 DAY

Cost Optimization Strategies

Strategy 1: Scheduled Refreshes (Best for most use cases)

sqls/analytics.yaml
url-path: /analytics/revenue/
cache-template: revenue_cache.sql
# Refresh during off-peak hours
cache-schedule: "0 1,7,13,19 * * *" # 1am, 7am, 1pm, 7pm

Cost comparison (10K API calls/day):

  • ❌ Direct Snowflake: ~$1,500/month
  • ✅ Cached (4x refresh): ~$15/month
  • Savings: 99%

Strategy 2: Incremental Refresh

sqls/orders_cache.sql
-- Only fetch new/updated records
SELECT *
FROM snowflake_scan('ANALYTICS.PUBLIC.ORDERS')
WHERE updated_at > (
SELECT COALESCE(MAX(updated_at), '1970-01-01')
FROM cached_orders
)
cache-strategy: incremental-append
cache-schedule: "*/15 * * * *" # Every 15 minutes

Strategy 3: Tiered Caching

# Hot data: Refresh frequently
url-path: /sales/today/
cache-schedule: "*/5 * * * *" # Every 5 minutes

# Warm data: Less frequent
url-path: /sales/monthly/
cache-schedule: "0 */6 * * *" # Every 6 hours

# Cold data: Rarely updated
url-path: /sales/yearly/
cache-schedule: "0 2 * * 0" # Weekly, Sundays at 2am

Example: Real-Time Sales Dashboard

Aggregated Metrics

sqls/sales_metrics.sql
WITH daily_stats AS (
SELECT
sale_date,
SUM(revenue) as daily_revenue,
COUNT(DISTINCT order_id) as order_count,
COUNT(DISTINCT customer_id) as customer_count,
AVG(revenue) as avg_order_value
FROM snowflake_scan('ANALYTICS.PUBLIC.SALES')
WHERE sale_date >= CURRENT_DATE - INTERVAL 90 DAY
GROUP BY sale_date
)
SELECT
sale_date,
daily_revenue,
order_count,
customer_count,
ROUND(avg_order_value, 2) as avg_order_value,
-- Rolling 7-day average
ROUND(AVG(daily_revenue) OVER (
ORDER BY sale_date
ROWS BETWEEN 6 PRECEDING AND CURRENT ROW
), 2) as revenue_7day_avg
FROM daily_stats
ORDER BY sale_date DESC
{{#params.limit}}
LIMIT {{{params.limit}}}
{{/params.limit}}
{{^params.limit}}
LIMIT 90
{{/params.limit}}
sqls/sales_metrics.yaml
url-path: /sales/metrics/
template-source: sales_metrics.sql
connection:
- snowflake-prod
cache-template: sales_metrics_cache.sql
cache-schedule: "*/10 * * * *" # Every 10 minutes

parameters:
limit:
type: integer
description: Number of days to return
minimum: 1
maximum: 365

Customer Segmentation

sqls/customer_segments.sql
WITH customer_lifetime AS (
SELECT
customer_id,
COUNT(DISTINCT order_id) as total_orders,
SUM(revenue) as lifetime_value,
MIN(sale_date) as first_purchase,
MAX(sale_date) as last_purchase,
DATEDIFF('day', MAX(sale_date), CURRENT_DATE) as days_since_last_purchase
FROM snowflake_scan('ANALYTICS.PUBLIC.SALES')
GROUP BY customer_id
)
SELECT
customer_id,
total_orders,
ROUND(lifetime_value, 2) as lifetime_value,
first_purchase,
last_purchase,
days_since_last_purchase,
CASE
WHEN days_since_last_purchase <= 30 AND total_orders >= 5 THEN 'VIP Active'
WHEN days_since_last_purchase <= 30 THEN 'Active'
WHEN days_since_last_purchase <= 90 THEN 'At Risk'
WHEN days_since_last_purchase <= 180 THEN 'Dormant'
ELSE 'Lost'
END as segment
FROM customer_lifetime
{{#params.segment}}
WHERE segment = '{{{params.segment}}}'
{{/params.segment}}
ORDER BY lifetime_value DESC

Performance Comparison

Direct Snowflake Query

Query time: 500ms - 5 seconds
Cost per query: ~$0.01
10K queries/day: $100/day = $3,000/month
Cold start penalty: 5-15 seconds

flAPI Cached Query

Query time: 1-10ms (50-500x faster)
Cost per query: $0 (cached)
10K queries/day: $0
Cache refresh (4x/day): $0.40/day = $12/month
Savings: 99.6%

Best Practices

1. Use Smallest Warehouse for Refreshes

properties:
warehouse: 'XSMALL_WH' # Not 'LARGE_WH'

Cost difference:

  • X-Small: $2/hour
  • Large: $32/hour
  • Savings: 16x

2. Schedule Refreshes During Off-Peak

# BAD: Peak hours (expensive)
cache-schedule: "0 9-17 * * 1-5" # Business hours

# GOOD: Off-peak (discounted)
cache-schedule: "0 1,7,13,19 * * *" # Off-peak hours

3. Materialize Complex Joins in Snowflake

-- Create materialized view in Snowflake
CREATE MATERIALIZED VIEW ANALYTICS.PUBLIC.SALES_ENRICHED AS
SELECT
s.*,
c.customer_name,
c.customer_segment,
p.product_name,
p.product_category
FROM SALES s
JOIN CUSTOMERS c ON s.customer_id = c.id
JOIN PRODUCTS p ON s.product_id = p.id;

-- Then query the view (faster, cheaper)
SELECT * FROM snowflake_scan('ANALYTICS.PUBLIC.SALES_ENRICHED')

4. Partition by Time

-- Only fetch recent data
WHERE sale_date >= CURRENT_DATE - INTERVAL 90 DAY

5. Use Column Pruning

-- BAD: Pulls all columns
SELECT * FROM snowflake_scan('ANALYTICS.PUBLIC.SALES')

-- GOOD: Only needed columns (faster, cheaper)
SELECT
sale_date,
customer_id,
revenue
FROM snowflake_scan('ANALYTICS.PUBLIC.SALES')

Security

Read-Only User

Create a dedicated read-only user in Snowflake:

-- In Snowflake
CREATE ROLE FLAPI_READER;
GRANT USAGE ON WAREHOUSE COMPUTE_WH TO ROLE FLAPI_READER;
GRANT USAGE ON DATABASE ANALYTICS TO ROLE FLAPI_READER;
GRANT USAGE ON SCHEMA ANALYTICS.PUBLIC TO ROLE FLAPI_READER;
GRANT SELECT ON ALL TABLES IN SCHEMA ANALYTICS.PUBLIC TO ROLE FLAPI_READER;
GRANT SELECT ON FUTURE TABLES IN SCHEMA ANALYTICS.PUBLIC TO ROLE FLAPI_READER;

CREATE USER flapi_reader
PASSWORD='...'
DEFAULT_ROLE=FLAPI_READER;

GRANT ROLE FLAPI_READER TO USER flapi_reader;

Credential Management

connections:
snowflake-prod:
init: |
CREATE SECRET (
TYPE SNOWFLAKE,
ACCOUNT '{{{conn.account}}}',
USER '{{{conn.user}}}',
PASSWORD '${SNOWFLAKE_PASSWORD}', # From environment
ROLE 'FLAPI_READER' # Restricted role
);

Network Security

Use Snowflake network policies:

-- In Snowflake: Restrict IP access
CREATE NETWORK POLICY flapi_access
ALLOWED_IP_LIST = ('203.0.113.10/32', '198.51.100.20/32');

ALTER USER flapi_reader SET NETWORK_POLICY = flapi_access;

Troubleshooting

Issue: "Authentication failed"

# Test connection
export SNOWFLAKE_PASSWORD="your-password"
duckdb -c "
INSTALL snowflake FROM community;
LOAD snowflake;
SELECT * FROM snowflake_scan('ANALYTICS.PUBLIC.SALES') LIMIT 10;
"

Issue: "Warehouse timeout"

Increase warehouse timeout or use auto-resume:

properties:
warehouse: 'COMPUTE_WH'
warehouse_timeout: 300 # 5 minutes

Issue: Slow cache refreshes

Problem: Cache refresh takes 10+ minutes

Solutions:

  1. Use materialized views in Snowflake
  2. Partition by date (only recent data)
  3. Use larger warehouse for refresh (balance cost vs speed)
  4. Consider incremental refresh strategy

Issue: High Snowflake costs despite caching

Check your cache schedule:

# Too frequent (expensive)
cache-schedule: "* * * * *" # Every minute = $$$

# Optimal (cheap)
cache-schedule: "0 */6 * * *" # Every 6 hours

Cost Calculator

Your scenario:

  • API calls: 10,000/day
  • Cache refreshes: 4/day
  • Refresh duration: 5 minutes each
  • Warehouse: X-Small ($2/hour)

Direct Snowflake:

10,000 calls × $0.01 per query = $100/day
$100 × 30 days = $3,000/month

flAPI Cached:

4 refreshes/day × 5 min × $2/hour = $0.67/day
$0.67 × 30 days = $20/month
Savings: $2,980/month (99.3%)

Migration from Direct Access

Phase 1: Add flAPI Layer

# Keep existing Snowflake access, add flAPI
connections:
snowflake-prod:
init: |
INSTALL snowflake FROM community;
LOAD snowflake;

Phase 2: Introduce Caching

# Add cache for high-frequency queries
cache-template: hot_data_cache.sql
cache-schedule: "*/10 * * * *"

Phase 3: Monitor Costs

-- In Snowflake: Check query costs
SELECT
query_text,
COUNT(*) as execution_count,
AVG(total_elapsed_time)/1000 as avg_seconds,
AVG(credits_used_cloud_services) as avg_credits
FROM SNOWFLAKE.ACCOUNT_USAGE.QUERY_HISTORY
WHERE query_text LIKE '%flapi%'
AND start_time >= DATEADD(day, -7, CURRENT_TIMESTAMP)
GROUP BY query_text
ORDER BY avg_credits DESC;

Next Steps

Additional Resources


Cost Savings

One flAPI user saved $47,000/year by caching their customer dashboard queries. The cache refreshes 6x/day, users get instant responses, and Snowflake costs dropped by 98%. ROI payback: 3 days.

🍪 Cookie Settings