14 min read

Apache Superset vs Metabase

an engineering-first comparison of the two open-source BI platforms engineers actually ship

Stack and runtime model, SQL-first vs visual-first, where the governance paywall lands, and what embedded licensing really costs — a side-by-side for engineers picking an open-source BI stack, not a marketing funnel.

By drafted.work· Operational data team

Most "Superset vs Metabase" comparisons collapse into one of two framings: "they're both open source, pick whichever looks nicer" or "Metabase is for small teams, Superset is for serious ones." Both framings are wrong in interesting ways. This piece is for the engineer who has to pick an open-source BI stack and live with it — knowing the licence bill, the on-call pager, the query patterns of the warehouse, and how much SQL the audience is actually willing to write.

Superset and Metabase come from opposite philosophies. Superset is an Apache project governed by a foundation, distributed under Apache 2.0, and architected as an unbundled platform — many components, many knobs, a strong SQL surface. Metabase is a corporate-led product under AGPL v3 with a commercial "pay to govern" tier; the open-source edition runs as a single JVM process and leans hard on a visual query builder. Everything that follows — pricing, governance, failure modes, hiring profile — flows from that gap.

1. Product snapshot

Apache Superset is a Top-Level Project at the Apache Software Foundation, distributed under Apache License 2.0. Code at github.com/apache/superset — a high-traffic repo (70k+ stars, broad contributor base with strong representation from Preset, Airbnb, Lyft, Dropbox). The 4.x line is the stable baseline most production teams run. The 4.x cycle is a "stabilisation era" — the legacy filter box and other deprecated components were finally cut, and the ECharts-based visualisation engine is the default. Releases ship several times a year with an UPDATING.md for every breaking change.

Metabase is a corporate-backed BI product from Metabase, Inc. Code at github.com/metabase/metabase. The project ships under two licences:

  • Metabase Community Edition under AGPL v3 — the "open source" version, which is fully featured for single-tenant analytics but excludes most governance features (SSO, row-level security, column-level security, audit logs, white-label embedding).
  • A commercial licence covers the Starter / Pro / Enterprise tiers that unlock those features.

Metabase has been aggressive in shipping AI features — an assistant called Metabot with MCP-style integration for external agents, and a semantic-layer-oriented product surface ("Data Studio") that lets analysts define metrics and segments centrally and reuse them across the UI. The release cadence is roughly monthly minor versions.

AttributeApache Superset 4.xMetabase (current)
GovernanceApache Software Foundation (community-led, TLP)Metabase, Inc. (corporate)
LicenceApache 2.0 (permissive)AGPL v3 (copyleft, for OSS) / commercial for paid tiers
StackPython / Flask, React, Celery, Redis, PostgresClojure on the JVM, React, single-process
Query paradigmSQL-first (SQL Lab + explorer)Visual-first (query builder), SQL native-query editor
Pricing model (paid)Managed vendor (Preset) — per-user tiersVendor-run tiers — base fee + per-user, gating governance

2. Pricing — the honest numbers

Both tools are "free" in the licence-fee sense, but they converge at non-trivial real costs once you take governance and operations into account. Numbers below come straight from metabase.com/pricing and preset.io/pricing at retrieval time (USD).

2.1 Apache Superset self-hosted

  • Licence: $0 under Apache 2.0.
  • Real cost: infrastructure (Kubernetes or equivalent, Postgres, Redis, optional headless Chrome for alerts), plus the engineering hours to operate them. Mid-sized production deployments typically need a platform engineer touching the stack ~15–20 hours / month — that's the number to budget, not "zero".
  • Managed alternative: Preset (the vendor behind most Superset upstream development) sells a managed tier:
    • Starter — free, up to 5 users, 1 workspace.
    • Professional$20/user/month annual (or $25 monthly); RBAC, scheduled reports / alerts, multi-region.
    • Enterprise — custom; SSO, SCIM, audit logs, managed private cloud, dbt integration, dedicated Slack support, MCP, chatbot.
    • Embedded dashboards — add-on starting at $500/month per 50 viewer licences on Professional.

2.2 Metabase

  • Open Source (AGPL v3): $0. Self-hosted, unlimited queries / dashboards / charts. No SSO (SAML/OIDC), no row- or column-level security, no audit logs, no white-label embedding.
  • Starter (Cloud only): $100/month base + $6/user/month (first 5 users included). Annual: $1,080/year + $65/user/year. Effectively managed OSS with fully-managed hosting, automatic upgrades and patches. Still no RLS, still no SSO.
  • Pro (Cloud or self-hosted): $575/month base + $12/user/month (first 10 users included). Annual: $6,210/year + $130/user/year. This is the minimum tier for SAML/SCIM SSO, row- and column-level security, advanced caching, usage analytics, white-label, multi-tenant embedded analytics.
  • Enterprise: Custom, starts at $20k/year; air-gapping options, 1-day SLA, dedicated success engineer.

2.3 Where the lines actually cross

For a team of 25 users, the napkin math at list prices looks like:

  • Metabase Pro: ≈ $6,210 + 15 × $130 = $8,160/year, hosted for you, security gates included.
  • Superset self-hosted: $0 licence, plus real infrastructure (~$200–500/month for Kubernetes + managed Postgres + Redis), plus a platform-engineering cost on the order of 150–250 hours/year. Third- party TCO analyses routinely put a fully-loaded self-hosted Superset around the $30–40k/year range for a 25-user org, dominated by labour rather than hosting.

That conclusion is uncomfortable but honest: for teams under roughly a hundred users who need RLS / SSO, Metabase Pro is frequently the cheaper choice, because the Superset "operational tax" dominates the TCO. The calculus inverts sharply once you either (a) already run a Kubernetes platform and the marginal cost of Superset is low, or (b) scale past a few hundred users, where per-user Metabase fees or the Enterprise floor start dominating.

3. Architecture where it actually matters

Skipping "both have dashboards." Here's the stuff that changes engineering decisions.

3.1 Stack and runtime model

  • Superset is an unbundled Python / Flask web app with a React frontend, a Celery worker pool, Celery Beat for scheduling, optional headless Chrome (via Playwright) for alert rendering, and stateful deps on Redis + Postgres/MySQL. Every component is independently scaleable — and independently breakable. The architecture is explicitly cloud-native: the official Helm chart is the reference production deployment.
  • Metabase is a single Clojure application running on the JVM. You can start it with docker run -d -p 3000:3000 metabase/metabase and have a working BI tool in minutes. State lives in an application database (H2 by default, Postgres / MySQL for production); there is no external queue, no worker pool, no extra services. Upgrades are mostly "swap the JAR, restart."

Consequence: Metabase gets you to "working dashboard" in an afternoon. Superset gets you to a production-grade platform in a week of work by someone who knows Kubernetes. That delta, more than any feature-by-feature comparison, is why the two tools end up in different slots inside the same organisation.

3.2 Query paradigm and modelling

  • Superset is SQL-first. SQL Lab is a proper workbench — multi-tab editor, query history, metadata browser, CTAS, direct path from query result to saved dataset to chart. Datasets are either physical tables or virtual SQL objects; metrics and calculated columns live on the dataset; Jinja templating lets you inject {{ current_user_id() }}, URL params, or role filters into the SQL at runtime. Non-SQL analysts can use the chart explorer, but the platform's centre of gravity is still SQL.
  • Metabase is visual-first. The Query Builder covers joins, filters, aggregations, and custom columns without writing SQL — genuinely useful for non-technical users. A native-query editor exists for SQL, but its ergonomics (no multi-tab workspace, weaker history, no CTAS surface) are noticeably behind SQL Lab. A semantic-layer-style surface ("Data Studio") lets admins define metrics and segments centrally and reuse them across the UI.

Rough heuristic: if "our analysts write SQL every day" is true, Superset pays off. If "most of our users will open a dashboard and build one ad-hoc chart" is true, Metabase's visual-first flow is a meaningful productivity win.

3.3 Visualizations

  • Superset renders via Apache ECharts plus a handful of legacy D3-based charts, and exposes a plugin system for custom React charts. The catalogue is broad — bars, lines, time series with forecasting, pivot tables, heatmaps, geo, Sankey, chord diagrams. Pixel-perfect formatting is not the strong suit, but the ceiling for custom visuals is very high.
  • Metabase ships a cleaner but narrower catalogue — roughly the chart types a product-analytics team actually uses daily, polished. User reports routinely describe it as "broad enough, shallow on exotic charts." There is no first-class custom-visual plugin system comparable to Superset's.

3.4 Alerts and scheduled reports

  • Superset: Celery Beat schedules jobs; headless Chrome (Playwright) renders charts/dashboards; delivery via SMTP or Slack. Setting this up cleanly in production takes real work — the headless-browser orchestration is a known rough patch, and the built-in alerting is honestly "basic": threshold monitoring is minimal, and teams end up wiring custom Slack flows against the Superset API for anything sophisticated.
  • Metabase has first-class scheduled subscriptions and threshold alerts to email / Slack, configured from the UI, no worker pool to tune. This is one of the places the polish gap shows most clearly.

3.5 Embedded analytics

  • Superset: @superset-ui/embedded-sdk with guest tokens (JWT), iframe + postMessage, CSS-level theming, full control over host-app ↔ viz interactions. Apache 2.0 means you can embed in a commercial product without a per-end-user licence bill — but only if you self-host. Preset's managed embedded offering starts at $500/month for 50 viewer licences, which changes the economics.
  • Metabase has Static embedding in OSS (signed URLs, no per-user interactivity) and Interactive embedding behind the Pro tier, plus white-labelling. Metabase's "data sandboxing" (their term for multi-tenant row-level security) is designed specifically for B2B SaaS scenarios where each customer sees their own slice of the same dashboard. This is one of Metabase's strongest fits.

For embedding at scale, licence model and RLS shape decide it. If you can self-host Superset and you have tenant-scoped data contracts in your warehouse, Superset is the cheaper path per viewer. If you want a turnkey multi-tenant SaaS embed without operating the thing, Metabase Pro is the more honest answer.

3.6 Data connectivity

  • Superset connects via SQLAlchemy / DB-API dialects: BigQuery, Snowflake, ClickHouse, Redshift, Databricks, Postgres, MySQL, Trino, Athena, and dozens of others — driven by PyPI packages.
  • Metabase ships a curated set of "official" drivers (Snowflake, BigQuery, Redshift, Postgres, MySQL, MongoDB, SQL Server, Databricks, etc.) and a community driver ecosystem for everything else (DuckDB, ClickHouse, CSV, and more). Community drivers are install-at-your-own-risk: Metabase doesn't vet them for security or performance, and breakage across upgrades is common.

If your warehouse is on a mainstream cloud dialect, either tool works. If you're running something unusual (ClickHouse, Trino, Presto, Druid, DuckDB, SingleStore), Superset's SQLAlchemy model is usually the less fragile path.

4. Governance, security, auth

This is where the licence model shows through most clearly. Superset treats governance as core platform; Metabase treats it as an upsell.

4.1 Authentication

  • Superset delegates auth to Flask-AppBuilder (FAB). Out of the box: DB-backed users, OAuth2 / OIDC (with PKCE), LDAP, SAML (via add-ons), REMOTE_USER for header-based integrations. SSO is free.
  • Metabase OSS supports only Google auth and LDAP. SAML, OIDC, and SCIM require the Pro tier ($575/month base). For mid-sized orgs using Okta, Azure AD/Entra, or any enterprise IdP, this effectively means "Metabase starts at $575/month."

4.2 Authorization, row-level security, governance

  • Superset has role-based access control through FAB (roles granted permissions on views / menus / datasets) and Row-Level Security expressed as SQL filter clauses tied to roles or user attributes. There's also a Dashboard RBAC feature flag that lets roles be attached directly to dashboards, overriding dataset-level permissions — useful for executive dashboards sitting on sensitive underlying tables. All of this is in the OSS edition.
  • Metabase OSS provides basic group-based view / create-query permissions. Row- and column-level security ("data sandboxing") is Pro-only. So is audit logging. For any org with a compliance requirement — SOC 2, HIPAA, anything that asks "who accessed what" — Metabase OSS is effectively not an option.

Rough heuristic: if governance is a hard requirement and the budget is $0, Superset is the only real answer. If the budget exists and governance simplicity matters, Metabase Pro's group-based RLS is ergonomically much easier to set up than Superset's SQL-filter RLS — at the cost of less expressive power.

4.3 Audit & lineage

  • Superset logs actions and queries into its metadata DB; export to ELK / Loki / Splunk as needed. Lineage and catalog are external concerns (DataHub, OpenMetadata, dbt docs).
  • Metabase Pro ships usage analytics and audit logs as a first-class feature. Lineage across external assets is not a native concept.

5. Deployment and day-2 operations

Superset (self-hosted)

  • Reference deployment: the official Helm chart on Kubernetes, or Docker Compose for smaller setups.
  • Runtime components: web, worker (Celery), beat (Celery Beat), optional headless Chrome container, Redis, metadata Postgres.
  • Hard-coding the SECRET_KEY in superset_config.py is an explicit production footgun — generate with openssl rand -base64 42, store in a secret manager, rotate on compromise.
  • Upgrade posture: every minor release has an UPDATING.md documenting breaking changes, config renames, and manual migrations. Reading it is non-optional.
  • Honest truth: self-hosting Superset is a real infra project, not a weekend docker-compose up.

Metabase (self-hosted)

  • Reference deployment: a single container, docker run -d -p 3000:3000 metabase/metabase. Back the application DB with Postgres/MySQL rather than the default H2 for production.
  • Real operational concerns:
    • JVM heap tuning. At scale, large result sets and complex queries can push memory hard; JAVA_OPTS (Xms / Xmx) and MB_JETTY_MAX_THREADS are the primary knobs. Misconfigured heap is the single most common source of slowness / crashes.
    • Upgrades are usually "swap JAR / new image, restart" — the application DB migrates itself.
  • Serialisation (export/import of dashboards + settings via YAML for Git-based environment promotion) is a Pro-only feature, which matters if you want an SDLC-style staging→prod workflow for analytics.
DimensionApache Superset 4.xMetabase
Time to first dashboardHours (production: days)Minutes (production: hours)
Runtime component count5–6 (web, worker, beat, Redis, DB, optional browser)1–2 (app, DB)
Upgrade profileManual, migration-aware (UPDATING.md)JAR swap / Cloud auto
Primary failure modeCelery / headless Chrome; upgrade driftJVM heap pressure at scale
Git-based promotionNatively via metadata + YAML export toolingOnly in Pro tier (Serialisation)

6. Real-world scale — traceable signals

A sanity check on where each tool actually lives today, from engineering blogs rather than vendor decks.

Apache Superset — the Airbnb case study

Airbnb is the flagship public reference and has documented concrete numbers for a Superset deployment at real scale:

  • Thousands of weekly users; tens of thousands of SQL Lab queries per week; over a hundred thousand chart views per week.
  • A cache-warmup job driven by Apache Airflow that programmatically loads popular dashboards during off-peak hours, reaching an 86% cache hit rate for Presto-backed charts.
  • Domain sharding to bypass browser-level per-origin connection limits, routing chart requests across four subdomains to allow more concurrent queries per dashboard.

Dropbox, Lyft, and others publicly document similar patterns: Dropbox moved more than ten legacy visualisation tools onto Superset; Lyft uses it against Presto / Hive with scheduled node cycling for stable query performance; the ASF lists American Express, Nielsen, and X/Twitter among prominent users.

Metabase — the embedded / SMB case

Metabase dominates two segments in practice:

  • SMBs and product teams adopting BI for the first time — the "fastest time from docker run to shared dashboard" story is genuine and explains a lot of the adoption curve.
  • B2B SaaS embedded analytics. Multi-tenant data-sandboxing, the Pro embedding flow, and white-labelling combine into a well-trodden "give each customer their own dashboards" path. Published customer references skew toward this use case — product analytics inside SaaS products rather than centralised internal BI for thousands of employees.

Neither tool is "too small to be serious" anymore. The scale reference points are just different shapes: Superset scale is proven by engineering-led internal deployments of thousands of users; Metabase scale is proven by thousands of SaaS products embedding it for their own customers.

7. Honest weaknesses

Apache Superset

  • Operational overhead is real. Teams regularly describe running Superset as "a part-time job" — Celery tuning, headless Chrome quirks, metadata migrations on minor upgrades, secret rotation.
  • SQL barrier for non-technical users. The explorer helps, but the tool's centre of gravity is SQL; non-SQL analysts struggle more than they do in Metabase.
  • Built-in alerting is basic. Threshold monitoring and notification flexibility lag behind both Metabase's built-in alerts and dedicated observability tools. Teams often end up writing custom Slack flows on top of the API.
  • UI polish gaps. Filter UX, dashboard interactivity, and exec- ready styling are visibly behind commercial BI — and behind Metabase for a non-technical audience.
  • Bugs that linger. Community threads consistently note that long-standing bugs take time to land fixes, especially in less-used chart types or niche integrations.

Metabase

  • The governance paywall. SSO (SAML/OIDC), SCIM, row- and column-level security, audit logs, white-label embedding — all of these live behind Pro ($575/month base). Any team that "just needs SSO" ends up paying for everything else.
  • JVM memory management at scale. The number-one scaling complaint: large queries or big result sets blow up the heap, and diagnosing it means learning JVM tooling (GC logs, heap dumps, MB_JETTY_MAX_THREADS).
  • Visualisation depth. Broad-but-shallow catalogue; complex multi-join reports and unusual chart types fall off faster than they do in Superset.
  • Community drivers are "install at your own risk." If your warehouse isn't in the official driver list, you inherit the upgrade risk.
  • AGPL v3. Fine for self-hosted internal use; potentially problematic if you want to embed Metabase into a product you distribute — the commercial licence exists precisely to handle that case, and the answer is effectively "go to Pro or negotiate a commercial agreement."

8. When to pick which

Pick Apache Superset when…Pick Metabase when…
Your audience is SQL-fluent — data engineers, analytics engineers, product teams.Your audience is non-technical business users who need to build their own charts.
You want thousands of viewers without per-seat licences and you can self-host.You want a BI tool running in an afternoon, not a Kubernetes project next week.
Governance is a hard requirement and the budget is $0 — you need RLS, SSO, audit.You need multi-tenant embedded analytics for a B2B SaaS product.
You want deep custom visualisations (plugin system, ECharts, custom React).You want polished default UX with minimal configuration friction.
Your data platform is already SQL/warehouse-first and you need first-class SQL Lab.You have budget for Pro and want turnkey SSO, RLS, audit, white-label.
You can absorb upgrade / ops work in a platform team.You want a managed Cloud tier with automatic upgrades and backups.
You're embedding into a product you sell and want Apache 2.0 licensing.You accept AGPL / buy a commercial licence and want Metabase's RLS model.

9. TL;DR for the impatient

Both tools are legitimately good. They optimise for different things.

  • Superset is the better answer for SQL-first, engineering-led organisations that need real governance on a zero-licence budget, run deep custom dashboards at scale, or embed OSS analytics into products they sell. The price is operational complexity and a less polished UX for non-technical users.
  • Metabase is the better answer for teams that want a BI tool working in an afternoon, serve a non-technical audience, or build multi-tenant embedded analytics into a SaaS product. The price is the governance paywall — practically everything an enterprise needs (SSO, RLS, audit) sits behind the $575/month Pro tier — and a shallower ceiling on visualisation depth.

The honest deciding questions are: who writes most of the analysis — engineers or business users?, is your budget for licences zero or not?, and how much operations capacity does the team have? The answers point to one tool or the other with very little genuine overlap in the middle.

References

Topics

  • Apache Superset
  • Metabase
  • Open-source BI
  • BI comparison
  • Embedded analytics
  • Data governance