Connecting to BigQuery
This guide uses the bigquery extension by the DuckDB community. Thanks to the contributors who made BigQuery integration seamless!
Google BigQuery is one of the most popular cloud data warehouses. flAPI makes it easy to expose BigQuery data as fast, cost-effective APIs through its caching layer.
Why Use flAPI with BigQuery?
Direct BigQuery access for APIs is problematic:
- Slow: Queries take 2-10 seconds
- Expensive: $0.05+ per query (based on data scanned)
- Not designed for high-frequency access: Poor for serving APIs
With flAPI caching:
- Millisecond responses (1-10ms vs 2-10s)
- 90%+ cost reduction through caching
- Purpose-built for API serving
- Query massive datasets efficiently
DuckDB BigQuery Extension
flAPI uses the DuckDB BigQuery extension to connect to Google BigQuery. This community extension provides:
- Direct BigQuery access via
bigquery_scan()function - Google Cloud authentication (service accounts, ADC)
- Push-down optimization — filters sent to BigQuery
- Schema inference — automatic type mapping
- Partition pruning — efficient data access
Setup
1. Install BigQuery Extension
# flapi.yaml
connections:
bigquery-warehouse:
init: |
INSTALL 'bigquery' FROM community;
LOAD 'bigquery';
properties:
project_id: 'my-project-id'
The connection property project_id is accessible inside SQL templates as {{ conn.project_id }}.
2. Authentication
The DuckDB BigQuery extension uses Google Application Default Credentials (ADC). Set the standard environment variable so the extension picks them up:
# Service account (recommended for production)
export GOOGLE_APPLICATION_CREDENTIALS=/path/to/service-account.json
# Or interactive (development)
gcloud auth application-default login
flAPI does not need to know about the credentials — the extension reads them directly. No credentials_path: property is required (or supported) by flAPI's connection block.
Querying BigQuery
Query Strategies Comparison
Direct Query (Without Cache)
-- Endpoint template
SELECT
campaign_id,
campaign_name,
SUM(clicks) AS total_clicks,
SUM(revenue) AS total_revenue
FROM bigquery_scan('{{ conn.project_id }}.analytics.campaigns')
WHERE date >= CURRENT_DATE - INTERVAL 7 DAY
{{#params.country}}
AND country = '{{{ params.country }}}'
{{/params.country}}
GROUP BY 1, 2
ORDER BY total_revenue DESC
Cost: Every API call = 1 BigQuery query ($0.05+)
With Caching (Recommended)
Cache Template (sqls/campaigns_cache.sql):
-- Runs on schedule (e.g., hourly)
CREATE OR REPLACE TABLE {{cache.catalog}}.{{cache.schema}}.{{cache.table}} AS
SELECT
campaign_id,
campaign_name,
country,
campaign_type,
SUM(clicks) AS total_clicks,
SUM(revenue) AS total_revenue,
MAX(last_updated) AS last_updated
FROM bigquery_scan('{{ conn.project_id }}.analytics.campaigns')
WHERE date >= CURRENT_DATE - INTERVAL 30 DAY
GROUP BY 1, 2, 3, 4
API Template (sqls/campaigns.sql):
SELECT *
FROM {{cache.catalog}}.{{cache.schema}}.{{cache.table}}
WHERE 1=1
{{#params.country}}
AND country = '{{{ params.country }}}'
{{/params.country}}
{{#params.campaign_type}}
AND campaign_type = '{{{ params.campaign_type }}}'
{{/params.campaign_type}}
ORDER BY total_revenue DESC
LIMIT 100
Endpoint Configuration (sqls/campaigns.yaml):
url-path: /campaigns/
method: GET
cache:
enabled: true
table: campaigns_cache
schema: analytics
schedule: 60m
template-file: campaigns_cache.sql
request:
- field-name: country
field-in: query
required: false
validators:
- type: string
regex: '^[A-Z]{2}$'
- field-name: campaign_type
field-in: query
required: false
template-source: campaigns.sql
connection:
- bigquery-warehouse
Cost Comparison
Scenario: Marketing Dashboard API
Requirements:
- 10,000 API calls per day
- BigQuery table: 100GB, costs $0.05 per full scan
Without flAPI (Direct BigQuery):
10,000 queries × $0.05 = $500/day = $15,000/month
Response time: 2-5 seconds per query
With flAPI (Hourly Cache):
24 cache refreshes × $0.05 = $1.20/day = $36/month
API serving: Free (from cache)
Response time: 1-50ms
Savings: $14,964/month (99.76%)
Speed: 1000-10,000x faster
Advanced Patterns
Incremental Loading
Only load new data to reduce BigQuery costs:
-- Cache refresh template (incremental)
INSERT INTO {{cache.catalog}}.{{cache.schema}}.{{cache.table}}
SELECT *
FROM bigquery_scan('{{ conn.project_id }}.dataset.table')
WHERE event_timestamp > (
SELECT MAX(event_timestamp)
FROM {{cache.catalog}}.{{cache.schema}}.{{cache.table}}
)
Partitioned Tables
Query specific partitions to reduce costs:
SELECT *
FROM bigquery_scan('{{ conn.project_id }}.dataset.events')
WHERE _PARTITIONDATE = CURRENT_DATE
Multiple BigQuery Projects
connections:
bigquery-prod:
init: |
INSTALL 'bigquery' FROM community;
LOAD 'bigquery';
properties:
project_id: 'prod-project'
bigquery-analytics:
init: |
INSTALL 'bigquery' FROM community;
LOAD 'bigquery';
properties:
project_id: 'analytics-project'
Best Practices
1. Always Use Caching for High-Frequency Access
Direct BigQuery queries are expensive. Use caching unless you need real-time data.
2. Pre-Aggregate in Cache
-- Good: Pre-aggregated
CREATE OR REPLACE TABLE {{cache.table}} AS
SELECT
date,
country,
SUM(revenue) AS revenue
FROM source
GROUP BY 1, 2
-- Bad: Raw data (expensive to cache)
CREATE OR REPLACE TABLE {{cache.table}} AS
SELECT * FROM source
3. Optimize Cache Refresh Schedule
Balance freshness vs cost:
- Real-time dashboards: every 15 minutes
- Business reporting: every hour
- Analytics: daily
4. Use Incremental Updates
For large tables, only refresh changed data:
WHERE updated_at >= CURRENT_TIMESTAMP - INTERVAL 24 HOUR
Troubleshooting
Authentication Errors
Error: Failed to authenticate with BigQuery
Solution:
- Check credentials file exists
- Verify service account has BigQuery permissions
- Ensure
GOOGLE_APPLICATION_CREDENTIALSis set
gcloud auth application-default login
Query Timeout
Error: BigQuery query timeout
Solution:
- Optimize your SQL query
- Add filters to reduce data scanned
- Use partitioned tables
Cost Alerts
Set up BigQuery cost monitoring in Google Cloud Console:
- Budget Alerts: get notified when spending exceeds threshold
- Query Quotas: limit per-user query costs
- flAPI Monitoring: track cache refresh costs vs API usage
Example: Complete Setup
flapi.yaml:
project-name: marketing-api
template:
path: './sqls'
connections:
bigquery-marketing:
init: |
INSTALL 'bigquery' FROM community;
LOAD 'bigquery';
properties:
project_id: 'my-project-id'
sqls/campaigns.yaml:
url-path: /campaigns/
method: GET
cache:
enabled: true
table: campaigns_cache
schema: analytics
schedule: 60m
template-file: campaigns_cache.sql
request:
- field-name: country
field-in: query
required: false
validators:
- type: string
regex: '^[A-Z]{2}$'
template-source: campaigns.sql
connection:
- bigquery-marketing
sqls/campaigns_cache.sql:
CREATE OR REPLACE TABLE {{cache.catalog}}.{{cache.schema}}.{{cache.table}} AS
SELECT
campaign_id,
campaign_type,
country,
SUM(clicks) AS clicks,
SUM(revenue) AS revenue
FROM bigquery_scan('{{ conn.project_id }}.marketing.campaigns')
WHERE date >= CURRENT_DATE - INTERVAL 30 DAY
GROUP BY 1, 2, 3
sqls/campaigns.sql:
SELECT *
FROM {{cache.catalog}}.{{cache.schema}}.{{cache.table}}
WHERE 1=1
{{#params.country}}
AND country = '{{{ params.country }}}'
{{/params.country}}
ORDER BY revenue DESC
Test it:
# Start flAPI
$ ./flapi -c flapi.yaml
# Call API
$ curl 'http://localhost:8080/campaigns?country=US'
Next Steps
- Caching Strategy: Learn about cost optimization
- Examples: See complete BigQuery + caching example
- PostgreSQL: Connect to other databases