Connecting to BigQuery

Extension Credit

This guide uses the bigquery extension by the DuckDB community. Thanks to the contributors who made BigQuery integration seamless!

Google BigQuery is one of the most popular cloud data warehouses. flAPI makes it easy to expose BigQuery data as fast, cost-effective APIs through its caching layer.

Why Use flAPI with BigQuery?

Direct BigQuery access for APIs is problematic:

Slow: Queries take 2-10 seconds
Expensive: $0.05+ per query (based on data scanned)
Not designed for high-frequency access: Poor for serving APIs

With flAPI caching:

⚡ Millisecond responses (1-10ms vs 2-10s)
💰 90%+ cost reduction through caching
🎯 Purpose-built for API serving
📊 Query massive datasets efficiently

DuckDB BigQuery Extension

flAPI uses the DuckDB BigQuery extension to connect to Google BigQuery. This community extension provides:

Direct BigQuery access via bigquery_scan() function
Google Cloud authentication (service accounts, ADC)
Push-down optimization - Filters sent to BigQuery
Schema inference - Automatic type mapping
Partition pruning - Efficient data access

Key Features:

Read tables and query results from BigQuery
Support for complex types (ARRAY, STRUCT)
Automatic credential discovery
Compatible with all BigQuery regions

Setup

1. Install BigQuery Extension

# flapi.yaml
connections:
  bigquery-warehouse:
    init: |
      INSTALL 'bigquery';
      LOAD 'bigquery';
    properties:
      project_id: 'my-project-id'
      dataset: 'my_dataset'  # Optional default dataset

2. Authentication

BigQuery authentication uses Google Cloud credentials.

Option A: Service Account (Recommended for Production)

connections:
  bigquery-warehouse:
    init: |
      INSTALL 'bigquery';
      LOAD 'bigquery';
    properties:
      project_id: 'my-project-id'
      credentials_path: '${GOOGLE_APPLICATION_CREDENTIALS}'

# Set environment variable
export GOOGLE_APPLICATION_CREDENTIALS=/path/to/service-account.json

Option B: Default Credentials (Development)

connections:
  bigquery-warehouse:
    init: |
      INSTALL 'bigquery';
      LOAD 'bigquery';
    properties:
      project_id: 'my-project-id'

# Authenticate with gcloud
gcloud auth application-default login

Querying BigQuery

Query Strategies Comparison

Direct Query (Without Cache)

-- Endpoint template
SELECT
  campaign_id,
  campaign_name,
  SUM(clicks) as total_clicks,
  SUM(revenue) as total_revenue
FROM bigquery_scan('my-project.analytics.campaigns')
WHERE date >= CURRENT_DATE - INTERVAL '7 days'
{{#params.country}}
  AND country = '{{{params.country}}}'
{{/params.country}}
GROUP BY 1, 2
ORDER BY total_revenue DESC

Cost: Every API call = 1 BigQuery query ($0.05+)

With Caching (Recommended)

Cache Template (sqls/campaigns_cache.sql):

-- Runs on schedule (e.g., hourly)
CREATE OR REPLACE TABLE {{cache.catalog}}.{{cache.schema}}.{{cache.table}} AS
SELECT
  campaign_id,
  campaign_name,
  country,
  campaign_type,
  SUM(clicks) as total_clicks,
  SUM(revenue) as total_revenue,
  MAX(last_updated) as last_updated
FROM bigquery_scan('my-project.analytics.campaigns')
WHERE date >= CURRENT_DATE - INTERVAL '30 days'
GROUP BY 1, 2, 3, 4

API Template (sqls/campaigns.sql):

-- Serves API requests from cache
SELECT *
FROM {{cache.catalog}}.{{cache.schema}}.{{cache.table}}
WHERE 1=1
{{#params.country}}
  AND country = '{{{params.country}}}'
{{/params.country}}
{{#params.campaign_type}}
  AND campaign_type = '{{{params.campaign_type}}}'
{{/params.campaign_type}}
ORDER BY total_revenue DESC
LIMIT 100

Endpoint Configuration (sqls/campaigns.yaml):

url-path: /campaigns/

cache:
  enabled: true
  table: campaigns_cache
  schema: analytics
  schedule: 60m  # Refresh hourly
  template_file: campaigns_cache.sql

request:
  - field-name: country
    field-in: query
    required: false
  - field-name: campaign_type
    field-in: query
    required: false

template-source: campaigns.sql
connection:
  - bigquery-warehouse

Cost Comparison

Scenario: Marketing Dashboard API

Requirements:

10,000 API calls per day
BigQuery table: 100GB, costs $0.05 per full scan

Without flAPI (Direct BigQuery):

10,000 queries × $0.05 = $500/day = $15,000/month
Response time: 2-5 seconds per query

With flAPI (Hourly Cache):

24 cache refreshes × $0.05 = $1.20/day = $36/month
API serving: Free (from cache)
Response time: 1-50ms

Savings: $14,964/month (99.76%)
Speed: 1000-10,000x faster

Advanced Patterns

Incremental Loading

Only load new data to reduce BigQuery costs:

-- Cache refresh template
INSERT INTO {{cache.catalog}}.{{cache.schema}}.{{cache.table}}
SELECT *
FROM bigquery_scan('my-project.dataset.table')
WHERE event_timestamp > (
  SELECT MAX(event_timestamp)
  FROM {{cache.catalog}}.{{cache.schema}}.{{cache.table}}
)

Partitioned Tables

Query specific partitions to reduce costs:

SELECT *
FROM bigquery_scan('my-project.dataset.events')
WHERE _PARTITIONDATE = CURRENT_DATE

Multiple BigQuery Projects

Connect to multiple projects:

connections:
  bigquery-prod:
    init: |
      INSTALL 'bigquery';
      LOAD 'bigquery';
    properties:
      project_id: 'prod-project'
  
  bigquery-analytics:
    init: |
      INSTALL 'bigquery';
      LOAD 'bigquery';
    properties:
      project_id: 'analytics-project'

Best Practices

1. Always Use Caching for High-Frequency Access

Direct BigQuery queries are expensive. Use caching unless you need real-time data.

2. Pre-Aggregate in Cache

-- ✅ Good: Pre-aggregated
CREATE OR REPLACE TABLE {{cache.table}} AS
SELECT
  date,
  country,
  SUM(revenue) as revenue
FROM source
GROUP BY 1, 2

-- ❌ Bad: Raw data (expensive to cache)
CREATE OR REPLACE TABLE {{cache.table}} AS
SELECT * FROM source

3. Optimize Cache Refresh Schedule

Balance freshness vs cost:

Real-time dashboards: Every 15 minutes
Business reporting: Every hour
Analytics: Daily

4. Use Incremental Updates

For large tables, only refresh changed data:

-- Only last 24 hours
WHERE updated_at >= CURRENT_TIMESTAMP - INTERVAL '24 hours'

5. Monitor Cache Hit Rates

$ flapii cache status

campaign_cache: 10,234 API calls / 24 refreshes = 426 calls per refresh
✅ Excellent ROI

Troubleshooting

Authentication Errors

Error: Failed to authenticate with BigQuery

Solution:

Check credentials file exists
Verify service account has BigQuery permissions
Ensure GOOGLE_APPLICATION_CREDENTIALS is set

# Test authentication
gcloud auth application-default login

Query Timeout

Error: BigQuery query timeout

Solution:

Optimize your SQL query
Add filters to reduce data scanned
Use partitioned tables
Increase timeout in configuration

Cost Alerts

Set up BigQuery cost monitoring in Google Cloud Console:

Budget Alerts: Get notified when spending exceeds threshold
Query Quotas: Limit per-user query costs
flAPI Monitoring: Track cache refresh costs vs API usage

Example: Complete Setup

Here's a complete working example:

flapi.yaml:

project_name: marketing-api
connections:
  bigquery-marketing:
    init: |
      INSTALL 'bigquery';
      LOAD 'bigquery';
    properties:
      project_id: 'my-project-id'
      credentials_path: '${GOOGLE_APPLICATION_CREDENTIALS}'
template:
  path: './sqls'

sqls/campaigns.yaml:

url-path: /campaigns/
cache:
  enabled: true
  table: campaigns_cache
  schema: analytics
  schedule: 60m
  template_file: campaigns_cache.sql
request:
  - field-name: country
    field-in: query
    required: false
    validators:
      - type: string
        pattern: '^[A-Z]{2}$'
template-source: campaigns.sql
connection:
  - bigquery-marketing

sqls/campaigns_cache.sql:

CREATE OR REPLACE TABLE analytics.campaigns_cache AS
SELECT
  campaign_id,
  campaign_type,
  country,
  SUM(clicks) as clicks,
  SUM(revenue) as revenue
FROM bigquery_scan('my-project.marketing.campaigns')
WHERE date >= CURRENT_DATE - INTERVAL '30 days'
GROUP BY 1, 2, 3

sqls/campaigns.sql:

SELECT *
FROM analytics.campaigns_cache
WHERE 1=1
{{#params.country}}
  AND country = '{{{params.country}}}'
{{/params.country}}
ORDER BY revenue DESC

Test it:

# Start flAPI
$ ./flapi -c flapi.yaml

# Call API
$ curl http://localhost:8080/campaigns?country=US

Next Steps

Caching Strategy: Learn about cost optimization
Examples: See complete BigQuery + caching example
PostgreSQL: Connect to other databases

Why Use flAPI with BigQuery?​

DuckDB BigQuery Extension​

Setup​

1. Install BigQuery Extension​

2. Authentication​

Querying BigQuery​

Query Strategies Comparison​

Direct Query (Without Cache)​

With Caching (Recommended)​

Cost Comparison​

Scenario: Marketing Dashboard API​

Advanced Patterns​

Incremental Loading​

Partitioned Tables​

Multiple BigQuery Projects​

Best Practices​

1. Always Use Caching for High-Frequency Access​

2. Pre-Aggregate in Cache​

3. Optimize Cache Refresh Schedule​

4. Use Incremental Updates​

5. Monitor Cache Hit Rates​

Troubleshooting​

Authentication Errors​

Query Timeout​

Cost Alerts​

Example: Complete Setup​

Next Steps​