A next-generation distributed SQL service is being developed on Azure Kubernetes Service (AKS), designed to provide engineering teams with secure and scalable access to data lake tables through a high-performance SQL endpoint. The platform emphasizes high availability, modern authentication, multi-tenancy, and strict workload isolation. The cross-functional engineering team focuses on automation, observability, and ensuring production reliability.
The initial project duration is six months, with the possibility of extension.
Key Responsibilities
- Design, deploy, and maintain Trino clusters on AKS with high availability and strong workload isolation.
- Implement multi-tenancy, authentication, and authorization for cross-team usage.
- Integrate with Azure Data Lake while meeting internal security standards.
- Build and optimize observability metrics, logging, alerting, distributed tracing.
- Partner with SRE teams to ensure SLA compliance and troubleshoot production issues.
- Automate infrastructure and deployments using Terraform and Azure DevOps pipelines.
- Support governance processes including security, networking, and cost reviews.
Skills & Experience
- 8+ years in Cloud Infrastructure / DevOps Engineering.
- 5+ years hands-on with Kubernetes (AKS preferred) scaling, RBAC, HA workloads, and security.
- Solid expertise with Azure services AKS, AD, Networking (VNet, NSG, Subnets), Storage, Key Vault, Monitoring, Cost Management.
- Strong background in Infrastructure as Code (Terraform, Azure DevOps).
- Proven experience troubleshooting distributed production systems (networking, auth, workloads).
- Exposure to SQL engines (Trino, Presto, or equivalent).
- Observability tooling (Prometheus, Grafana, Azure Monitor, or similar).
- Kubernetes (AKS) HA deployment & operations
- Deep Azure Cloud knowledge (networking, auth, storage, monitoring)
- IaC with Terraform and Azure DevOps
- Troubleshooting at scale (distributed systems, networking, authentication)
Nice-to-Haves
- Trino / Presto or distributed SQL engine experience
- Multi-tenant service design
- Telemetry & observability expertise
- Familiarity with data engineering / lakehouse patterns
What We Value
- Strong problem-solving mindset and ownership of complex issues
- Clear communication and documentation skills across teams
- Ability to work both independently and in cross-functional groups
- Proactive drive for automation, cost efficiency, and security best practices
- Adaptability in a dynamic environment with evolving requirements