Site Reliability Engineer

Valid

If you’re passionate about technology, innovative projects, and making a real impact, your place is here.

We are a global technology provider with 65+ years of experience, delivering a comprehensive portfolio of solutions across ID & Digital Government, Banking & Payments, and Trusted Connectivity. With more than 4,000 employees in 16 countries, we are committed to building a more secure and trustworthy world.

Within our Trusted Connectivity business unit, we develop cutting-edge solutions for the telecommunications industry—ranging from SIM cards and eSIMs to Subscription Management and secure connectivity services —connecting people, businesses, and devices worldwide.

We are looking for a highly analytical and business-oriented Site Reliability Engineer to design and enhance the reliability engineering architecture of our platforms, ensuring high availability, scalability, reliability, and observability through close collaboration with R&D, DevOps, and Operations teams.

As a Site Reliability Engineer, you will be responsible for working mainly with Site Reliability Architect (SRA) and R&D team to design resilient systems and operational processes that ensure the high availability, scalability, reliability and observability of our platforms.

What will you do?

Site Reliability Engineering

  • Work together with R&D to develop and maintain reliable, scalable, and efficient systems.
  • Work closely with R&D when new features are being developed and ensure that the new feature is ready to be released.
  • Ensure new features have been validate in terms of performance, reliability and saclability
  • Prepare and conduct knowledge transfer, documentation and information sharing to the other team members.

Cross-functional Collaboration

  • Work together with DevOps team to improve existing and implement new, effective CI/CD processes.
  • Work together with Enablement engineer to produce automation tools needed for performance and reliability monitoring
  • Work together with Operations team to support the platforms in terms of operational aspects.

System Architecture & Design

  • Continuously evaluate and optimize system performance and capacity in order to maintain stable production platforms.
  • Identify, assess, and implement measures to eliminate potential risks that could impact the performance of systems and services.
  • Research, evaluate, test and advise at selecting appropriate new technologies or tools for improving site reliability

Observability & Monitoring

  • Monitor system performance, identifying bottlenecks, and execute pipeline optimization
  • Implement comprehensive service metrics to track and report on system reliability, performance, and efficiency.

Disaster Recovery & Backups

  • Implement disaster recovery plans and ensuring robust backup systems are in place.

Capacity Planning & Performance Engineering

  • Support in forecasting, scaling, and performance tuning.
  • Create KPI to monitor growth and optimize resource utilization.

Migrations

  • Analyze and plan for complex migrations.

What are we looking for?

  • Bachelor’s degree in Computer Engineering, Electronics Engineering, Telecommunications Engineering, or a related field.
  • 3+ years of experience in Site Reliability Engineering, Infrastructure Operations, DevOps, or a similar role.
  • 3+ years of experience within the telecommunications industry or related technology sectors.
  • Strong Linux administration skills (Red Hat, Ubuntu, or similar distributions).
  • Hands-on experience with cloud platforms, preferably AWS.
  • Experience designing and maintaining CI/CD pipelines (Jenkins, GitLab CI, GitHub Actions, or similar).
  • Experience with monitoring and observability tools (Prometheus, Grafana, ELK Stack, Datadog, or equivalent).
  • Proficiency in scripting and automation using Python, Bash, or similar languages.
  • Experience with Infrastructure as Code (Terraform, Ansible, or equivalent).
  • Strong knowledge of containerization and orchestration technologies (Docker, Kubernetes).
  • Experience in performance monitoring, troubleshooting, and system optimization.
  • Knowledge of disaster recovery, backup strategies, and business continuity practices.
  • Experience working with SQL databases.
  • Advanced English communication skills (B2+/C1).
  • Candidates must be based in Spain or nearby European countries and be available to travel when required.

If you want this position to be yours, we would like you to have the following:

  • AWS, Linux, or Kubernetes certifications.
  • Experience in highly available and mission-critical environments.
  • Knowledge of capacity planning and performance engineering.
  • Experience in telecom platforms, mobile services, or cloud-native architectures.

What we offer

  • Join Valid and work on innovative, global technology projects within multicultural and multidisciplinary teams.
  • Flexibility: flexible working hours and remote work options to support work-life balance.
  • Well-being first: private medical insurance and life insurance.
  • Be part of a company that values continuous learning, collaboration, and growth.
  • Meal allowance

Our Culture

At Valid, we foster an inclusive, diverse, and innovative environment where everyone thrives. We are committed to equal opportunities, free from discrimination concerning sex, age, race, sexual orientation, religion, education, social status, culture, or special needs such as illness or disability. We value people as the heart of our culture. Trust, transparency, and teamwork are the foundations of our success, driving growth and empowering talent.

Join this great team and be part of our story!

Por favor, para solicitar este trabajo visita es.whatjobs.com.