Connecting Hub to Airbyte
Use Airbyte to sync Formbricks Hub's feedback data into Snowflake, Databricks, BigQuery, Redshift, or any other supported warehouse - one Postgres source, many destinations.
Hub stores all data in a single PostgreSQL database. Airbyte is an open-source, self-hostable data integration platform with a ready-made PostgreSQL source connector and dozens of warehouse destinations. Point Airbyte at Hub’s feedback_records table and it will replicate the data on a schedule into Snowflake, Databricks, BigQuery, Redshift, or any other supported destination.
Summary
Section titled “Summary”| Step | Action |
|---|---|
| 1 | Run Hub, worker, and Postgres (Quick Start) |
| 2 | Install Airbyte OSS with abctl and open the UI on http://localhost:8000 |
| 3 | Add a Postgres source pointing at Hub’s database; replicate public.feedback_records with Standard incremental sync using updated_at as cursor |
| 4 | Add a destination (Snowflake, Databricks, BigQuery, Redshift, etc.) with its account/warehouse/credentials |
| 5 | Create a connection from source -> destination, pick Incremental | Append + Deduped, choose a schedule (cron or hourly+), and trigger the first sync |
Why Hub + Airbyte?
Section titled “Why Hub + Airbyte?”- One source, many destinations - Hub’s data is a standard PostgreSQL table. Airbyte’s Postgres source connector reads it, and any of Airbyte’s warehouse destinations (Snowflake, Databricks, BigQuery, Redshift, ClickHouse, …) can receive it.
- Open source and self-hostable - Airbyte OSS runs on your own infrastructure; no third-party SaaS is involved.
- Incremental sync out of the box - Hub’s
updated_atcolumn is a perfect cursor for incremental replication; you only move new and changed rows after the initial backfill. - Schema-aware - Airbyte maps Hub’s columns to native warehouse types (including
metadataJSONB to SnowflakeVARIANT, BigQueryJSON, RedshiftSUPER, etc.).
Why not webhooks?
Section titled “Why not webhooks?”Hub’s Webhooks are designed for event-driven workflows - alerting, routing a low NPS to a Slack channel, triggering automations. For analytics warehouse use cases, a scheduled Postgres pull via Airbyte is simpler and more reliable:
- Historical backfill - Airbyte replays the full table on first run; webhooks only fire for new events.
- At-least-once with deduplication - Incremental + Deduped lands one row per
idregardless of retries. - Schema evolution - Airbyte adds new columns to the destination automatically.
- One pipeline, many warehouses - Switch or add destinations without changing Hub.
Step 1: Get Hub running
Section titled “Step 1: Get Hub running”If Formbricks Hub is not running yet, follow the Quick Start Guide to start Postgres, apply migrations automatically, and start the Hub API and worker.
Note the database connection details you use for Hub. You will need them when configuring Airbyte’s Postgres source.
| Setting | Example value |
|---|---|
| Host | host.docker.internal (Airbyte runs in containers and reaches Hub’s Postgres on the host), or Hub’s Postgres host on your network |
| Port | 5432 |
| Database name | hub |
| Username | formbricks |
| Password | Your Postgres password (in the Quick Start compose setup this is POSTGRES_PASSWORD in .env; default is formbricks_dev) |
Step 2: Install Airbyte
Section titled “Step 2: Install Airbyte”Airbyte OSS ships with abctl, a small CLI that provisions a local Kubernetes-in-Docker cluster and installs Airbyte via Helm. The legacy docker-compose.yml install method is deprecated.
Prerequisites
Section titled “Prerequisites”- Docker running locally
- At least 4 CPUs and 8 GB RAM available (add
--low-resource-modefor smaller machines)
Install
Section titled “Install”# Install abctlcurl -LsfS https://get.airbyte.com | bash -
# Provision the cluster and install Airbyte (can take up to 30 minutes the first time)abctl local installOpen the UI
Section titled “Open the UI”# Print the generated admin credentialsabctl local credentialsBrowse to http://localhost:8000 and log in. To set a custom password:
abctl local credentials --password <your-password>Step 3: Configure Hub’s Postgres as a source
Section titled “Step 3: Configure Hub’s Postgres as a source”- In the Airbyte UI, go to Sources -> + New source and select Postgres.
- Fill in the connection fields from the Step 1 table:
- Host -
host.docker.internal(or Hub’s Postgres host) - Port -
5432 - Database Name -
hub - Schemas -
public - Username -
formbricks - Password - Your Postgres password
- SSL Mode -
disablefor local,requirefor production
- Host -
- Under Update Method, choose Standard (Airbyte’s Postgres source offers Standard, Xmin, and CDC; Standard means cursor-based incremental with a user-defined cursor).
- Click Set up source. Airbyte will run a connection check and discover the schema. You will pick
updated_atas the cursor field per stream in Step 5.
Optional: a read-only user for Airbyte
Section titled “Optional: a read-only user for Airbyte”For production, create a dedicated read-only user instead of reusing the formbricks superuser.
-- All-tenant replication (default)CREATE USER airbyte WITH PASSWORD '<strong-password>';GRANT CONNECT ON DATABASE hub TO airbyte;GRANT USAGE ON SCHEMA public TO airbyte;GRANT SELECT ON public.feedback_records TO airbyte;Tenant-scoped replication
Section titled “Tenant-scoped replication”To restrict Airbyte to a single tenant, create a view that filters by tenant_id and grant on the view only:
CREATE USER airbyte_acme WITH PASSWORD '<strong-password>';GRANT CONNECT ON DATABASE hub TO airbyte_acme;GRANT USAGE ON SCHEMA public TO airbyte_acme;
CREATE VIEW public.feedback_records_acme AS SELECT * FROM public.feedback_records WHERE tenant_id = 'acme';
GRANT SELECT ON public.feedback_records_acme TO airbyte_acme;Then in Step 3, set Schemas to public, and in Step 5 enable only the feedback_records_acme stream. Use Standard cursor-based replication exactly as before - Postgres views work transparently with Airbyte’s Postgres source.
Step 4: Configure a destination
Section titled “Step 4: Configure a destination”Go to Destinations -> + New destination and pick the warehouse you want to load Hub data into. The fields below are the minimum you need; consult the linked Airbyte docs for the full reference.
Snowflake
Section titled “Snowflake”- Account name - For example,
xy12345.us-east-1 - Warehouse, Database, Default Schema, Role
- Auth - Username + password or key-pair authentication
Airbyte ships a SQL script in the destination docs that creates the warehouse, database, role, and user with the right grants.
Databricks
Section titled “Databricks”- Server Hostname and HTTP Path for a Databricks SQL warehouse
- Personal Access Token
- Catalog and Schema (Unity Catalog)
- A staging location (Unity Catalog volume or cloud bucket)
BigQuery
Section titled “BigQuery”- Project ID, Dataset Location, Default Dataset ID
- Service Account Key (JSON)
- Loading Method - GCS Staging recommended for larger loads
Redshift
Section titled “Redshift”- Host, Port, Database, Username, Password, Schema
- S3 staging bucket and IAM credentials
Other supported destinations include Postgres, ClickHouse, S3 (raw files), DuckDB, and many more - see the full list.
Step 5: Create the connection
Section titled “Step 5: Create the connection”- Go to Connections -> + New connection.
- Select the Hub Postgres source and your destination.
- In the Streams list, enable only
public.feedback_records(leave any internal tables disabled). For that stream:- Sync mode - Incremental | Append + Deduped
- Cursor field -
updated_at - Primary key -
id
- Configure the destination namespace and stream name to taste (for example, schema
hub_raw, streamfeedback_records). - Pick a schedule:
- Manual - Trigger from the UI or an orchestrator (Airflow, Dagster, Kestra).
- Scheduled - Every 1, 2, 3, 6, 8, 12, or 24 hours (hourly is the maximum frequency).
- Cron - Quartz cron expression with timezone, for fine-grained scheduling.
- Click Set up connection. Airbyte will run an initial full snapshot, then only pick up rows where
updated_athas advanced.
Query in the destination
Section titled “Query in the destination”After the first sync completes, you can query feedback_records in the destination warehouse using the same dimensions and metrics as everywhere else in Hub. The metadata JSONB column is preserved as a native semi-structured type in the destination:
- Snowflake -
VARIANT(usemetadata:campaign_id::string) - BigQuery -
JSON(useJSON_VALUE(metadata, '$.campaign_id')) - Redshift -
SUPER(usemetadata.campaign_id) - Databricks -
STRING(parse withfrom_json(metadata, 'map<string,string>')as needed)
NPS trend over time (any warehouse, SQL is portable):
SELECT date_trunc('week', collected_at) AS week, AVG(value_number) AS avg_nps, COUNT(*) AS responsesFROM hub_raw.feedback_recordsWHERE field_type = 'nps' AND value_number IS NOT NULLGROUP BY 1ORDER BY 1;Feedback count by source:
SELECT source_type, COUNT(*) AS countFROM hub_raw.feedback_recordsGROUP BY source_typeORDER BY count DESC;Quick reference: dimensions and metrics
Section titled “Quick reference: dimensions and metrics”For the complete list of columns and field types, see the Data Model.
| Use case | Dimension / filter | Metric |
|---|---|---|
| NPS/CSAT over time | collected_at; filter field_type = nps | AVG(value_number) or custom NPS % |
| Count by source | source_type | COUNT(*) |
| Score by question | field_id or field_label | AVG(value_number) |
| Categorical breakdown | value_text (for example, device, segment) | COUNT(*) |
| Multi-tenant | Filter tenant_id | any |
Once Airbyte has a working source + destination + connection, Hub feedback data flows into your warehouse on the schedule you choose - no custom ETL to maintain.
References
Section titled “References”Hub
- Quick Start - Run Hub, worker, and Postgres
- Data Model - Table schema, columns, and field types for
feedback_records - Webhooks - Event-driven push (use this for alerts and automation; use Airbyte for analytics replication)
- Connecting Hub to Databricks - Direct JDBC or Lakehouse Federation when you do not need ETL
- Connecting Hub to Superset - Direct dashboards on Hub’s Postgres
- Connecting Hub to Power BI - Direct reports on Hub’s Postgres
Airbyte
- OSS Quickstart - Install with
abctl - Postgres source connector - All connection options, CDC vs cursor-based
- Destinations catalog - Snowflake, Databricks, BigQuery, Redshift, and more
- Sync schedules - Manual, scheduled, and cron
- Supported data types - How JSONB and other types map to destinations