Self-service data pipeline platform: 2 500 jobs without data engineers

How a 4-engineer team served data to 3 000+ users — from analysts to business stakeholders — by automating Spark job creation with template generation.

StackDatabricksApache SparkAzure Event HubsDelta LakeApache SupersetGitHub ActionsJinja2

Context

A large foodtech company with 1 700+ locations across 30 countries had accumulated a huge volume of operational data. Analytics was embedded in the business from day one: every order and every click was captured and was expected to turn into management insight.

But demand for new jobs and dashboards kept growing, and the data engineering team could not scale at the same pace.

Problem

Every new data source required a hand-written Spark job: schema, configs, deployment to Databricks. Analysts depended on data engineers even for trivial tasks, and time-to-data stretched into weeks.

6-month backlog in the data engineering team
Analysts blocked on engineers even for repetitive pipelines
Data engineers turned into a bottleneck between the business and its own data
Every new source meant hand-rolled schemas, configs and deployments

Solution

We designed and built a code-generation platform on top of the existing stack — tooling that turns simple analyst configs into deployed, production pipelines.

The user fills in a handful of parameters and triggers a GitHub Action. The platform generates Spark code, deploys the job to Databricks and returns the result. The data team stopped writing jobs by hand and started owning templates and quality — no longer the bottleneck.

How the generator works

Analyst config
YAML/JSON
GitHub Action
pipeline trigger
Jinja template
canonical patterns
~600 lines of PySpark
auto-generated
Deploy to Databricks
production-ready

Analyst config
YAML/JSON
GitHub Action
pipeline trigger
Jinja template
canonical patterns
~600 lines of PySpark
auto-generated
Deploy to Databricks
production-ready

«Imagine how long it would take to hand-write hundreds and hundreds of jobs. Now the user creates their own job — without involving the data engineering team.»

Results

2 500+

jobs shipped via the template generator

76 of 85

contributors are not data engineers

engineers covering 3 000+ data users

Faster time-to-data

Analysts get a working pipeline in minutes instead of waiting days in the engineering backlog.

Data team back to leverage work

Data engineers moved to architecture and modelling instead of hand-writing yet another job.

Scales with the business

New data domains are onboarded self-serve — without tickets and without the bottleneck.

Operable by design

All generated code lives in one repo — infra optimisations and compute migrations take hours, not weeks.

Want the same leverage on your stack?

Let's map where code generation can take load off your data engineers.

On the call we review your sources, recurring job patterns and current backlog. You leave with a concrete scope: which patterns to templatise, which pipelines to push to self-service, and how to wire it into your existing infrastructure.

Review of data sources and typical pipelines
Map of patterns that can be templatised
Self-service model: who runs what
Integration with your CI/CD and data platform
Delivery timeline and engagement model

Book a call