Cloud Platform Engineer – AWS / Terraform / Linux

  • Tiempo completo
  • catalonia

Capitole

Cloud Platform Engineer – AWS / Terraform / Linux

About the role

We are looking for a Cloud Platform Engineer to join a technology team within a large international environment, working on the platform foundation that supports modern AI, data, and cloud-based systems.

This role is highly focused on cloud infrastructure, platform engineering, AWS, Terraform, Linux troubleshooting, networking, automation, and production operations .

The position is not primarily focused on building AI models or developing GenAI applications. Instead, you will work on the infrastructure and platform layer that enables technical teams to deploy, operate, monitor, and scale reliable systems in production.

You will be expected to bring a strong Platform / DevOps / Cloud Engineering mindset , with the ability to analyse real production issues, debug infrastructure problems, design AWS-based solutions, and make sound technical decisions in complex environments.

Experience with AI/ML platforms, MLOps, SageMaker, MLflow, or LLM tooling is valuable, but the core of the role is AWS Cloud Platform Engineering .

If you enjoy solving infrastructure challenges, designing scalable AWS solutions, automating environments with Terraform, and troubleshooting real production issues — this could be a great fit.

What you’ll do

Design, build, and maintain cloud infrastructure solutions on AWS

Work with Terraform / Infrastructure as Code to provision, manage, and standardise infrastructure

Analyse and troubleshoot production issues across cloud infrastructure, Linux systems, networking, storage, permissions, deployments, and platform services

Investigate infrastructure drift, unexpected production changes, Terraform state inconsistencies, and configuration mismatches

Design AWS-based solutions, selecting the right services and explaining technical trade-offs around scalability, reliability, security, cost, and maintainability

Support and improve CI/CD pipelines for infrastructure, platform services, and cloud workloads

Work with Linux environments, including debugging issues related to disk usage, permissions, processes, logs, networking, and system performance

Contribute to monitoring, observability, alerting, logging, and operational readiness of production systems

Collaborate with engineering, data, AI, and platform teams to ensure systems are reliable, automated, secure, and scalable

Apply DevOps, SRE, and platform engineering best practices to improve reliability, automation, and operational excellence

Support cloud environments that may include AI/ML workloads, MLOps tooling, training/inference environments, or AI platform components

Must Have

Solid experience in Platform Engineering, Cloud Engineering, DevOps, Infrastructure Engineering, or SRE

Strong hands-on experience with AWS in production environments

Strong experience with Terraform and Infrastructure as Code

Good understanding of cloud infrastructure design , including networking, compute, storage, IAM/security, monitoring, and scalability

Strong troubleshooting skills in Linux environments

Ability to debug real infrastructure issues using command-line tools, logs, metrics, system resources, and cloud-native services

Experience with CI/CD pipelines and automation

Understanding of networking fundamentals , including VPCs, subnets, routing, DNS, load balancers, security groups, firewalls, and connectivity troubleshooting

Experience with production operations, incident analysis, root cause investigation, and reliability improvement

Ability to design technical solutions in AWS and explain the reasoning behind the selected services and architecture

Strong ownership mindset and ability to work independently in complex technical environments

Good communication skills and ability to explain technical decisions clearly

Fluent English

✨ Nice to Have

Experience with MLOps / AI Platform environments

Experience with SageMaker, MLflow, feature stores , model deployment, model serving, or training/inference platforms

Experience with Docker and Kubernetes

Familiarity with LLM tooling such as LangChain, Langfuse, LangSmith, or similar

Experience with observability tools, monitoring platforms, logging, tracing, and alerting systems

Experience with cost optimisation in AWS environments

Experience with data pipelines or workflow orchestration tools such as Airflow or Prefect

Knowledge of security, governance, compliance, and best practices for cloud platforms

Experience working in Agile / Scrum environments

Fluent Spanish

Hybrid model: 2 days onsite per week

Why join this project?

Por favor, para solicitar este trabajo visita es.whatjobs.com.