About Gapstars
At Gapstars, we partner with some of Europe’s most ambitious tech companies, from disruptive startups to fast-scaling scaleups, helping them build high-performing remote engineering teams. Headquartered in the Netherlands, with talent hubs in Sri Lanka and Portugal, we are home to 275+ engineers who thrive on solving real-world challenges with modern technologies. Our teams work across domains, from networking and marketplaces to SaaS and AI, delivering scalable solutions that drive meaningful outcomes. If you’re looking for a company that combines technical excellence, a strong culture, and room to grow, welcome to Gapstars.
The Role
As a Data Engineer, you will help design, build, and maintain scalable and reliable data solutions on Google Cloud Platform (GCP). You’ll develop robust pipelines into BigQuery, implement transformations (ideally using dbt), and ensure data quality and operational stability. You’ll also support monitoring and observability, including working with BI tooling (e.g., Power BI) mainly for monitoring and validation, not as a dedicated BI developer.
Key responsibilities
1) Data Engineering & Architecture (GCP-first)
Design and evolve data solutions on GCP, choosing services and patterns with scalability, reliability, performance, and cost in mind.
Build and maintain end-to-end data pipelines (primarily batch, with streaming/event-driven where needed).
Develop and maintain the BigQuery warehouse structure: data modeling conventions, partitioning/clustering strategies, performance optimization, and cost control.
Define and implement integration patterns to ingest data from multiple sources (e.g., enterprise systems, SharePoint, external APIs, cloud storage).
Ensure datasets are reusable, well-modeled, and analytics-ready to support reporting, monitoring, and future modeling needs.
2) Transformations (dbt is a priority)
Implement transformation logic using SQL and Python, preferably using dbt for standardized modeling and testing.
Help set standards for modular, maintainable transformations and documentation.
Contribute to best practices around versioning, code review, and deployment for transformation workflows.
3) Data quality, reliability & governance
Build and maintain data quality controls (validation, reconciliation with source systems, consistency checks).
Monitor pipeline health, data freshness, and reliability; proactively troubleshoot incidents and prevent recurrence.
Support governance practices such as naming conventions, access principles, and lifecycle management.
4) Platform ownership, security & operations (collaborative ownership)
Contribute to stable GCP project setup and environment separation (e.g., dev/staging/prod where applicable).
Support IAM and dataset access controls, aligned with least privilege principles.
Contribute to infrastructure-as-code and standardized deployment approaches (where used by the team).
Document architecture, data flows, and operational runbooks so the team can move fast and safely.
5) Collaboration & communication (core expectation)
Work closely with analysts, engineers, and stakeholders to translate business needs into scalable technical solutions.
Communicate clearly about designs, trade-offs, risks, and progress to both technical and non-technical audiences.
Bring a proactive, positive attitude—raise issues early, propose solutions, and take ownership through delivery.
Strong SQL and Python skills for building pipelines and transformations.
Hands-on experience building and operating data pipelines and a data warehouse (or lakehouse) environment.
Experience with cloud data platforms:
GCP preferred (especially BigQuery and serverless components like Cloud Functions)
AWS/Azure is also welcome if you’re motivated to learn GCP quickly
Strong engineering habits: readable code, testing mindset, version control, and operational awareness.
A proactive, ownership-driven mindset aligned with Open, Active, Positive values.
Nice-to-have (strong plus)
dbt experience (big plus/priority) including models, tests, documentation, and deployment workflows.
GCP services experience, such as:
BigQuery, Cloud Functions, Cloud Run, Cloud Storage
Workflows / Composer (Airflow) / Scheduled Queries
Logging & Monitoring, Secret Manager, IAM
Experience with Dataflow (or equivalent), event-driven patterns, or streaming pipelines.
Terraform and/or Docker exposure.
Familiarity with using Power BI (or similar) for monitoring/validation and operational visibility.
Experience integrating with systems like SharePoint, external APIs, SAP, Databricks, or cross-cloud environments.
Tools & technologies (current landscape)
Primary platform:
Google Cloud Platform (GCP): BigQuery, Cloud Functions (and potentially Cloud Run, Workflows, Composer/Airflow, Cloud Storage, Logging/Monitoring, IAM, Secret Manager)
Data modeling & transformation:
dbt (highly valued)
Programming & querying:
Python, SQL
DevOps / engineering tooling:
Terraform, Docker, Bitbucket (as applicable within the team)
Monitoring/reporting:
Power BI (monitoring/visibility; not a dedicated Power BI developer role)
Data sources/integrations (examples):
SharePoint, External APIs, Databricks, SAP CAR, AWS