Skip to content
Support

Connecting Hub to Airbyte

Use Airbyte to sync Formbricks Hub's feedback data into Snowflake, Databricks, BigQuery, Redshift, or any other supported warehouse - one Postgres source, many destinations.

Hub stores all data in a single PostgreSQL database. Airbyte is an open-source, self-hostable data integration platform with a ready-made PostgreSQL source connector and dozens of warehouse destinations. Point Airbyte at Hub’s feedback_records table and it will replicate the data on a schedule into Snowflake, Databricks, BigQuery, Redshift, or any other supported destination.

StepAction
1Run Hub, worker, and Postgres (Quick Start)
2Install Airbyte OSS with abctl and open the UI on http://localhost:8000
3Add a Postgres source pointing at Hub’s database; replicate public.feedback_records with Standard incremental sync using updated_at as cursor
4Add a destination (Snowflake, Databricks, BigQuery, Redshift, etc.) with its account/warehouse/credentials
5Create a connection from source -> destination, pick Incremental | Append + Deduped, choose a schedule (cron or hourly+), and trigger the first sync
  • One source, many destinations - Hub’s data is a standard PostgreSQL table. Airbyte’s Postgres source connector reads it, and any of Airbyte’s warehouse destinations (Snowflake, Databricks, BigQuery, Redshift, ClickHouse, …) can receive it.
  • Open source and self-hostable - Airbyte OSS runs on your own infrastructure; no third-party SaaS is involved.
  • Incremental sync out of the box - Hub’s updated_at column is a perfect cursor for incremental replication; you only move new and changed rows after the initial backfill.
  • Schema-aware - Airbyte maps Hub’s columns to native warehouse types (including metadata JSONB to Snowflake VARIANT, BigQuery JSON, Redshift SUPER, etc.).

Hub’s Webhooks are designed for event-driven workflows - alerting, routing a low NPS to a Slack channel, triggering automations. For analytics warehouse use cases, a scheduled Postgres pull via Airbyte is simpler and more reliable:

  • Historical backfill - Airbyte replays the full table on first run; webhooks only fire for new events.
  • At-least-once with deduplication - Incremental + Deduped lands one row per id regardless of retries.
  • Schema evolution - Airbyte adds new columns to the destination automatically.
  • One pipeline, many warehouses - Switch or add destinations without changing Hub.

If Formbricks Hub is not running yet, follow the Quick Start Guide to start Postgres, apply migrations automatically, and start the Hub API and worker.

Note the database connection details you use for Hub. You will need them when configuring Airbyte’s Postgres source.

SettingExample value
Hosthost.docker.internal (Airbyte runs in containers and reaches Hub’s Postgres on the host), or Hub’s Postgres host on your network
Port5432
Database namehub
Usernameformbricks
PasswordYour Postgres password (in the Quick Start compose setup this is POSTGRES_PASSWORD in .env; default is formbricks_dev)

Airbyte OSS ships with abctl, a small CLI that provisions a local Kubernetes-in-Docker cluster and installs Airbyte via Helm. The legacy docker-compose.yml install method is deprecated.

  • Docker running locally
  • At least 4 CPUs and 8 GB RAM available (add --low-resource-mode for smaller machines)
Terminal window
# Install abctl
curl -LsfS https://get.airbyte.com | bash -
# Provision the cluster and install Airbyte (can take up to 30 minutes the first time)
abctl local install
Terminal window
# Print the generated admin credentials
abctl local credentials

Browse to http://localhost:8000 and log in. To set a custom password:

Terminal window
abctl local credentials --password <your-password>

Step 3: Configure Hub’s Postgres as a source

Section titled “Step 3: Configure Hub’s Postgres as a source”
  1. In the Airbyte UI, go to Sources -> + New source and select Postgres.
  2. Fill in the connection fields from the Step 1 table:
    • Host - host.docker.internal (or Hub’s Postgres host)
    • Port - 5432
    • Database Name - hub
    • Schemas - public
    • Username - formbricks
    • Password - Your Postgres password
    • SSL Mode - disable for local, require for production
  3. Under Update Method, choose Standard (Airbyte’s Postgres source offers Standard, Xmin, and CDC; Standard means cursor-based incremental with a user-defined cursor).
  4. Click Set up source. Airbyte will run a connection check and discover the schema. You will pick updated_at as the cursor field per stream in Step 5.

For production, create a dedicated read-only user instead of reusing the formbricks superuser.

-- All-tenant replication (default)
CREATE USER airbyte WITH PASSWORD '<strong-password>';
GRANT CONNECT ON DATABASE hub TO airbyte;
GRANT USAGE ON SCHEMA public TO airbyte;
GRANT SELECT ON public.feedback_records TO airbyte;

To restrict Airbyte to a single tenant, create a view that filters by tenant_id and grant on the view only:

CREATE USER airbyte_acme WITH PASSWORD '<strong-password>';
GRANT CONNECT ON DATABASE hub TO airbyte_acme;
GRANT USAGE ON SCHEMA public TO airbyte_acme;
CREATE VIEW public.feedback_records_acme AS
SELECT * FROM public.feedback_records WHERE tenant_id = 'acme';
GRANT SELECT ON public.feedback_records_acme TO airbyte_acme;

Then in Step 3, set Schemas to public, and in Step 5 enable only the feedback_records_acme stream. Use Standard cursor-based replication exactly as before - Postgres views work transparently with Airbyte’s Postgres source.

Go to Destinations -> + New destination and pick the warehouse you want to load Hub data into. The fields below are the minimum you need; consult the linked Airbyte docs for the full reference.

Snowflake destination docs

  • Account name - For example, xy12345.us-east-1
  • Warehouse, Database, Default Schema, Role
  • Auth - Username + password or key-pair authentication

Airbyte ships a SQL script in the destination docs that creates the warehouse, database, role, and user with the right grants.

Databricks destination docs

  • Server Hostname and HTTP Path for a Databricks SQL warehouse
  • Personal Access Token
  • Catalog and Schema (Unity Catalog)
  • A staging location (Unity Catalog volume or cloud bucket)

BigQuery destination docs

  • Project ID, Dataset Location, Default Dataset ID
  • Service Account Key (JSON)
  • Loading Method - GCS Staging recommended for larger loads

Redshift destination docs

  • Host, Port, Database, Username, Password, Schema
  • S3 staging bucket and IAM credentials

Other supported destinations include Postgres, ClickHouse, S3 (raw files), DuckDB, and many more - see the full list.

  1. Go to Connections -> + New connection.
  2. Select the Hub Postgres source and your destination.
  3. In the Streams list, enable only public.feedback_records (leave any internal tables disabled). For that stream:
    • Sync mode - Incremental | Append + Deduped
    • Cursor field - updated_at
    • Primary key - id
  4. Configure the destination namespace and stream name to taste (for example, schema hub_raw, stream feedback_records).
  5. Pick a schedule:
    • Manual - Trigger from the UI or an orchestrator (Airflow, Dagster, Kestra).
    • Scheduled - Every 1, 2, 3, 6, 8, 12, or 24 hours (hourly is the maximum frequency).
    • Cron - Quartz cron expression with timezone, for fine-grained scheduling.
  6. Click Set up connection. Airbyte will run an initial full snapshot, then only pick up rows where updated_at has advanced.

After the first sync completes, you can query feedback_records in the destination warehouse using the same dimensions and metrics as everywhere else in Hub. The metadata JSONB column is preserved as a native semi-structured type in the destination:

  • Snowflake - VARIANT (use metadata:campaign_id::string)
  • BigQuery - JSON (use JSON_VALUE(metadata, '$.campaign_id'))
  • Redshift - SUPER (use metadata.campaign_id)
  • Databricks - STRING (parse with from_json(metadata, 'map<string,string>') as needed)

NPS trend over time (any warehouse, SQL is portable):

SELECT
date_trunc('week', collected_at) AS week,
AVG(value_number) AS avg_nps,
COUNT(*) AS responses
FROM hub_raw.feedback_records
WHERE field_type = 'nps' AND value_number IS NOT NULL
GROUP BY 1
ORDER BY 1;

Feedback count by source:

SELECT source_type, COUNT(*) AS count
FROM hub_raw.feedback_records
GROUP BY source_type
ORDER BY count DESC;

For the complete list of columns and field types, see the Data Model.

Use caseDimension / filterMetric
NPS/CSAT over timecollected_at; filter field_type = npsAVG(value_number) or custom NPS %
Count by sourcesource_typeCOUNT(*)
Score by questionfield_id or field_labelAVG(value_number)
Categorical breakdownvalue_text (for example, device, segment)COUNT(*)
Multi-tenantFilter tenant_idany

Once Airbyte has a working source + destination + connection, Hub feedback data flows into your warehouse on the schedule you choose - no custom ETL to maintain.

Hub

Airbyte