Connecting Hub to Databricks
Connect Formbricks Hub to Databricks via JDBC or Lakehouse Federation and analyze feedback data alongside the rest of your lakehouse - no ETL layer required.
Hub stores all data in a single PostgreSQL database. You can analyze it in Databricks by connecting Databricks directly to Hub’s Postgres - either through Lakehouse Federation (zero-ETL, queried live) or with a JDBC read from a notebook (load once into a Delta table).
Summary
Section titled “Summary”| Step | Action |
|---|---|
| 1 | Run Hub, worker, and Postgres (Quick Start); make sure Databricks can reach the Postgres host (Private Link / VPC peering / public endpoint with sslmode=require) |
| 2 | Choose a path: Lakehouse Federation (UI, zero-ETL, queries Hub live through Unity Catalog) or JDBC + Delta (notebook code, materialized copy) |
| 3 | Federation: create a CONNECTION and foreign CATALOG. JDBC: read with spark.read.format("jdbc") and df.write.format("delta").saveAsTable(...) |
| 4 | Run analytics queries using the same dimensions and metrics as in the Quick reference below |
Why Hub + Databricks?
Section titled “Why Hub + Databricks?”- Direct access - Databricks connects to Hub’s Postgres database; no separate ETL pipeline required.
- Two integration paths - Use Lakehouse Federation for live, governed queries through Unity Catalog, or use JDBC + Delta for a materialized lakehouse copy.
- Same data model - Hub’s Data Model uses one row per response in a single table; ready for Spark SQL aggregations.
Step 1: Get Hub running
Section titled “Step 1: Get Hub running”If Formbricks Hub is not running yet, follow the Quick Start Guide to start Postgres, apply migrations automatically, and start the Hub API and worker.
Note the database connection details you use for Hub (host, port, database name, user, password). You will need them for Databricks. If you use the compose file from the docs, that is typically:
| Setting | Example value |
|---|---|
| Host | A hostname or IP that Databricks can reach (for example, your Postgres public endpoint or a Private Link target) |
| Port | 5432 |
| Database name | hub |
| Username | formbricks |
| Password | Your Postgres password (in the Quick Start compose setup this is POSTGRES_PASSWORD in .env; default is formbricks_dev) |
Step 2: Choose a connection path
Section titled “Step 2: Choose a connection path”Databricks ships the PostgreSQL JDBC driver (org.postgresql.Driver) on every current Databricks Runtime, so no extra library install is needed. You have two practical ways to query feedback_records:
- Lakehouse Federation (recommended when you have Unity Catalog) - Create a
CONNECTIONand a foreignCATALOGonce, then query Hub live from any notebook, SQL editor, or dashboard. Filters, projections, limits, and aggregates are pushed down to Postgres on DBR 13.3 LTS+ (join pushdown is a separate Public Preview feature that requires DBR 17.2+ or SQL warehouses 2025.30+). - JDBC + Delta - From a notebook, read
feedback_recordsvia JDBC into a Spark DataFrame and optionally write it to a Delta table for repeat queries.
Pick Federation for governed, low-latency live access; pick JDBC + Delta when you want a local materialized copy or are on a workspace without Unity Catalog. Step 3 covers both paths - follow only the one you chose.
Step 3a: Lakehouse Federation (Unity Catalog)
Section titled “Step 3a: Lakehouse Federation (Unity Catalog)”Requires Databricks Runtime 13.3 LTS or later in Standard/Dedicated mode, or a Pro/Serverless SQL warehouse, plus CREATE CONNECTION and CREATE CATALOG privileges on the metastore.
Create the connection
Section titled “Create the connection”- In your Databricks workspace, open Catalog Explorer, then click + Add -> Add a connection.
- Choose PostgreSQL as the connection type.
- Enter the values from the Step 1 table:
- Host - Hub’s Postgres host
- Port -
5432 - User -
formbricks - Password - Hub’s Postgres password (Databricks will offer to store it as a Secret)
- Click Test connection, then Create.
Or use SQL:
CREATE CONNECTION hub_pg TYPE postgresql OPTIONS ( host '<hub-postgres-host>', port '5432', user '<user>', password secret('<scope>', '<key>') );Create the foreign catalog
Section titled “Create the foreign catalog”- Still in Catalog Explorer, click + Add -> Add a catalog.
- Choose Foreign catalog, name it (for example,
hub), select the connectionhub_pg, and set the Database tohub. - Click Create.
Or use SQL:
CREATE FOREIGN CATALOG hub USING CONNECTION hub_pg OPTIONS (database 'hub');You can now query feedback_records live from anywhere in Databricks:
SELECT * FROM hub.public.feedback_records LIMIT 100;For the full table schema - column names, types, and field semantics (for example, field_type, value_number, value_text) - see the Data Model doc.
Step 3b: JDBC + Delta (notebook)
Section titled “Step 3b: JDBC + Delta (notebook)”If you do not have Unity Catalog or you want a materialized copy of feedback_records in Delta, read it from a notebook.
Store Postgres credentials as a secret
Section titled “Store Postgres credentials as a secret”databricks secrets create-scope hubdatabricks secrets put-secret hub pg_user --string-value formbricksdatabricks secrets put-secret hub pg_password --string-value <your-password>Read feedback_records via JDBC
Section titled “Read feedback_records via JDBC”jdbc_url = "jdbc:postgresql://<hub-postgres-host>:5432/hub?sslmode=require"
reader = ( spark.read.format("jdbc") .option("url", jdbc_url) .option("driver", "org.postgresql.Driver") .option("user", dbutils.secrets.get("hub", "pg_user")) .option("password", dbutils.secrets.get("hub", "pg_password")))
# Whole tabledf = reader.option("dbtable", "public.feedback_records").load()
# Or push a query down (faster for incremental loads)df_recent = ( reader .option( "query", """ SELECT id, tenant_id, submission_id, source_type, source_id, field_id, field_type, value_text, value_number, value_boolean, value_date, metadata, collected_at, created_at, updated_at FROM public.feedback_records WHERE updated_at > now() - interval '1 day' """, ) .load())Write to a Delta table
Section titled “Write to a Delta table”( df.write .mode("overwrite") # or .mode("append") for incremental loads .format("delta") .saveAsTable("main.hub.feedback_records"))You can now query the Delta table from Databricks SQL or any notebook:
SELECT source_type, count(*) FROM main.hub.feedback_records GROUP BY source_type;Step 4: Example queries
Section titled “Step 4: Example queries”Whether you query the foreign catalog (hub.public.feedback_records) or a Delta table (main.hub.feedback_records), the analytics are identical.
NPS trend over time
Section titled “NPS trend over time”Average NPS per week, filtered to NPS questions:
SELECT date_trunc('week', collected_at) AS week, ROUND(AVG(value_number), 2) AS avg_nps, COUNT(*) AS responsesFROM hub.public.feedback_recordsWHERE field_type = 'nps' AND value_number IS NOT NULLGROUP BY 1ORDER BY 1;Feedback count by source
Section titled “Feedback count by source”SELECT source_type, COUNT(*) AS countFROM hub.public.feedback_recordsGROUP BY source_typeORDER BY count DESC;NPS promoters / passives / detractors
Section titled “NPS promoters / passives / detractors”SELECT CASE WHEN value_number <= 6 THEN 'Detractor (0-6)' WHEN value_number <= 8 THEN 'Passive (7-8)' ELSE 'Promoter (9-10)' END AS nps_bucket, COUNT(*) AS countFROM hub.public.feedback_recordsWHERE field_type = 'nps' AND value_number IS NOT NULLGROUP BY 1ORDER BY MIN(value_number);Quick reference: dimensions and metrics
Section titled “Quick reference: dimensions and metrics”For the complete list of columns and field types, see the Data Model.
| Use case | Dimension / filter | Metric |
|---|---|---|
| NPS/CSAT over time | collected_at; filter field_type = nps | AVG(value_number) or custom NPS % |
| Count by source | source_type | COUNT(*) |
| Score by question | field_id or field_label | AVG(value_number) |
| Categorical breakdown | value_text (for example, device, segment) | COUNT(*) |
| Multi-tenant | Filter tenant_id | any |
Once Databricks can reach Hub’s Postgres, your feedback data is ready for SQL, notebooks, and dashboards alongside everything else in your lakehouse.
References
Section titled “References”Hub
- Quick Start - Run Hub, worker, and Postgres
- Data Model - Table schema, columns, and field types for
feedback_records - Connecting Hub to Power BI - Same data, with Microsoft Power BI
- Connecting Hub to Superset - Same data, with Apache Superset
- Connecting Hub to Airbyte - Schedule periodic syncs into Snowflake, Databricks, BigQuery, or Redshift
Databricks
- Query PostgreSQL with Databricks - JDBC reads and writes
- Run federated queries on PostgreSQL - Lakehouse Federation setup
- What is Lakehouse Federation? - Overview
- Secrets management - Store database credentials safely