Capitole
Cloud Platform Engineer – AWS / Terraform / Linux
About the role
We are looking for a Cloud Platform Engineer to join a technology team within a large international environment, working on the platform foundation that supports modern AI, data, and cloud-based systems.
This role is highly focused on cloud infrastructure, platform engineering, AWS, Terraform, Linux troubleshooting, networking, automation, and production operations .
The position is not primarily focused on building AI models or developing GenAI applications. Instead, you will work on the infrastructure and platform layer that enables technical teams to deploy, operate, monitor, and scale reliable systems in production.
You will be expected to bring a strong Platform / DevOps / Cloud Engineering mindset , with the ability to analyse real production issues, debug infrastructure problems, design AWS-based solutions, and make sound technical decisions in complex environments.
Experience with AI/ML platforms, MLOps, SageMaker, MLflow, or LLM tooling is valuable, but the core of the role is AWS Cloud Platform Engineering .
If you enjoy solving infrastructure challenges, designing scalable AWS solutions, automating environments with Terraform, and troubleshooting real production issues — this could be a great fit.
What you’ll do
Design, build, and maintain cloud infrastructure solutions on AWS
Work with Terraform / Infrastructure as Code to provision, manage, and standardise infrastructure
Analyse and troubleshoot production issues across cloud infrastructure, Linux systems, networking, storage, permissions, deployments, and platform services
Investigate infrastructure drift, unexpected production changes, Terraform state inconsistencies, and configuration mismatches
Design AWS-based solutions, selecting the right services and explaining technical trade-offs around scalability, reliability, security, cost, and maintainability
Support and improve CI/CD pipelines for infrastructure, platform services, and cloud workloads
Work with Linux environments, including debugging issues related to disk usage, permissions, processes, logs, networking, and system performance
Contribute to monitoring, observability, alerting, logging, and operational readiness of production systems
Collaborate with engineering, data, AI, and platform teams to ensure systems are reliable, automated, secure, and scalable
Apply DevOps, SRE, and platform engineering best practices to improve reliability, automation, and operational excellence
Support cloud environments that may include AI/ML workloads, MLOps tooling, training/inference environments, or AI platform components
Must Have
Solid experience in Platform Engineering, Cloud Engineering, DevOps, Infrastructure Engineering, or SRE
Strong hands-on experience with AWS in production environments
Strong experience with Terraform and Infrastructure as Code
Good understanding of cloud infrastructure design , including networking, compute, storage, IAM/security, monitoring, and scalability
Strong troubleshooting skills in Linux environments
Ability to debug real infrastructure issues using command-line tools, logs, metrics, system resources, and cloud-native services
Experience with CI/CD pipelines and automation
Understanding of networking fundamentals , including VPCs, subnets, routing, DNS, load balancers, security groups, firewalls, and connectivity troubleshooting
Experience with production operations, incident analysis, root cause investigation, and reliability improvement
Ability to design technical solutions in AWS and explain the reasoning behind the selected services and architecture
Strong ownership mindset and ability to work independently in complex technical environments
Good communication skills and ability to explain technical decisions clearly
Fluent English
✨ Nice to Have
Experience with MLOps / AI Platform environments
Experience with SageMaker, MLflow, feature stores , model deployment, model serving, or training/inference platforms
Experience with Docker and Kubernetes
Familiarity with LLM tooling such as LangChain, Langfuse, LangSmith, or similar
Experience with observability tools, monitoring platforms, logging, tracing, and alerting systems
Experience with cost optimisation in AWS environments
Experience with data pipelines or workflow orchestration tools such as Airflow or Prefect
Knowledge of security, governance, compliance, and best practices for cloud platforms
Experience working in Agile / Scrum environments
Fluent Spanish
Hybrid model: 2 days onsite per week
Why join this project?
Por favor, para solicitar este trabajo visita es.whatjobs.com.

