Lead Data Engineer/Associate Architect

Description

About Gapstars

About Gapstars

At Gapstars, we partner with some of Europe’s most ambitious tech companies, from disruptive startups to fast-scaling scaleups, helping them build high-performing remote engineering teams. Headquartered in the Netherlands, with talent hubs in Sri Lanka and Portugal, we are home to 275+ engineers who thrive on solving real-world challenges with modern technologies. Our teams work across domains, from networking and marketplaces to SaaS and AI, delivering scalable solutions that drive meaningful outcomes. If you’re looking for a company that combines technical excellence, a strong culture, and room to grow, welcome to Gapstars.

The Role

As the Data and Analytics complete the transition away from on-premises to solely on cloud (Azure Databricks), We are looking for a Lead Data Engineer/Architect to spearhead how Brompton approaches data engineering and architecture, bringing stakeholders across the business along the journey. This individual will also work very closely with the Head of Data and Analytics, looking strategically across their team of master data specialists, analytics/BI engineers, reporting developers, and data scientists, to ensure business requirements are efficiently turned into end-to-end solutions, and determine where we would add the most value in a highly engaged organisation.

You will be working directly with diverse data sets from multiple systems, orchestrating their seamless integration and optimisation to enable our business to derive valuable insights. This process will encompass everything from the raw development of data pipelines to the management and optimisation of these pipelines using all tools available in the Azure Cloud.

Your work will directly impact the creation and delivery of data-driven strategies that will yield pivotal insights, bolstering decision making and strategic planning. You will have the chance to contribute directly to our mission of revolutionising urban living by ensuring that our data management and analysis processes are as efficient, reliable, and insightful as possible.

The platform you'll inherit:

  • Azure Databricks on Private PaaS - VNet injection, no public IPs, 57 private endpoints, NAT gateways with stable egress

  • 5-subscription CAF Landing Zone architecture with full network isolation

  • Metadata-driven ingestion via ADF orchestrating Databricks notebooks, with Autoloader and DLT pipelines downstream

  • 40+ data sources including Infor (global ERP), Xero (7 regions), Cin7 (4 regions), NetSuite, Salesforce, BigCommerce, Lightspeed and others

  • Unity Catalog, Key Vault integration, CI/CD from Azure DevOps

  • Annual Databricks commitment on a 3-year contract

The reality you'll walk into:

  • Bronze and silver layers exist for most sources, but the gold layer is underdeveloped.

  • Cross-source business models that should serve BI directly are incomplete, leaving Power BI to compensate with complex transformations it shouldn't own.

  • An Azure SQL Server still runs (SSIS packages), including complex financial and operational logic that the business depends on daily.

  • 41% of production compute runs on all-purpose clusters instead of job clusters.

  • Legacy architecture is ADF-heavy with minimal DLT adoption

  • The senior consultant who built the ingestion layer is third-party. Their knowledge needs to be internalised.

Responsibilities:

  • Develop, construct, test, and maintain data architectures within large-scale data processing systems.

  • Develop and manage data pipelines using Azure Data Factory, Delta Lake, and Spark, ensuring all data sets are secure, reliable, and accessible.

  • Implement proper Delta Lake housekeeping - OPTIMIZE, VACUUM, liquid clustering as a design choice versus legacy partitioning.

  • Operate and extend the existing metadata-driven ingestion framework - CSV configs define source handling, which parameterise ADF and Databricks notebooks.

  • Utilise Azure Cloud architecture knowledge to design and implement scalable data solutions.

  • Utilise Spark, SQL, Python, R, and other data frameworks to manipulate data and gain a thorough understanding of the dataset's characteristics. This role requires the ability to comprehend the business logic behind the data's creation, with the aim of enhancing the data modelling process.

  • Interact with API systems to query and retrieve data for analysis, onboarding remaining data sources at pace

  • Work closely with Business Analysts, IT Ops, and other stakeholders to understand data needs and deliver on those needs.

  • Ensure understanding and compliance with data governance and data quality principles.

  • Implement and manage Unity Catalog for centralized data governance and unified access controls across Databricks

  • Maintain technical documentation for the entirety of the code base.

  • End-to-end ownership of the Data Engineering Lifecycle with accountability for Databricks architecture, security, and governance standards, and authority to define, enforce, and evolve those standards across the business

  • Design data platforms for production reliability, embedding testing, monitoring, and data observability into all pipelines and models to proactively detect issues, minimise business disruption, and ensure data consistency.

  • Own cost management and optimisation across Databricks and Azure, balancing performance, reliability, and spend, and ensuring the platform delivers clear and measurable business value.

The Role

Lead Data Engineer/Associate Architect

Requirements

  • Bachelor's degree in computer science, engineering, or equivalent experience.

  • Multiple years as a senior data engineer across more than one organisation and tech stack

  • Experience operating Databricks in a network-isolated environment (Private PaaS, VNet injection, private endpoints)

  • Deep hands-on Databricks - Delta Lake, DLT, Autoloader, Unity Catalog, Spark UI, job clusters vs all-purpose, cluster sizing - you should be able to explain the difference between OPTIMIZE and VACUUM

  • Experience with complex ERP and financial data - Infor, SAP, Oracle Financials, NetSuite, or similar. The data from these systems is unforgiving, and we need someone who's worked with it before

  • Experience migrating on-premises SQL Server / SSIS workloads to the cloud - not theoretically, but done it

  • Proficiency with Spark, SQL, Python, R, and other data engineering development tools.

  • Experience with metadata-driven pipelines and SQL serverless data warehouses.

  • Extensive knowledge of querying API systems.

  • Excellent problem-solving skills and attention to detail.

  • Extensive experience building and optimising ETL pipelines using Databricks.

  • Azure fundamentals - storage accounts, managed identities, RBAC, Key Vault, diagnostic settings.

  • The instinct to troubleshoot before you're told. Token expiry, key rotation, composite key collisions, silent schema drift - you've seen these before, and you recognise the patterns.

  • Understanding of data governance and data quality principles.

  • Recent experience demonstrating how AI has reduced time to value in key projects


*You may unsubscribe from these communications at any time. For our full Privacy Policy, Click here.

*You may unsubscribe from these communications at any time. For our full Privacy Policy, Click here.

Here to help

Reach out to us, and let’s explore how we can build your dreams with the right people, expertise, and solutions.