Site Reliability Engineer (Linux Platform Operations) (m/f/d)

As a Site Reliability Engineer (m/f/d), you operate and evolve Linux-based production platforms that power critical business services at scale. You focus on automation, reliability, and reducing operational overhead while enabling teams to work more independently.

Working at the intersection of infrastructure, automation, and reliability, you contribute to building resilient systems and support the evolution toward a “you build it, you run it” culture.

What to expect

Ensure reliable, secure, and high-performing Linux-based production systems with full ownership
Automate operational tasks (e.g. patching, provisioning, deployments) to eliminate manual effort and improve efficiency
Standardize and optimize deployment and configuration processes for scalability and consistency
Lead incident response and drive root cause analysis and long-term fixes
Manage and automate access and identity processes with a strong focus on security and auditability
Maintain and improve core Linux infrastructure services essential for platform operations
Collaborate with engineering teams to enhance observability and shared operational practices
Analyze complex systems end-to-end and simplify them to improve reliability and performance
Drive the modernization of operations towards automation, scalability, and self-service models
Adapt quickly to changing environments and deliver pragmatic, effective solutions

What you bring

5+ years of experience in Linux-based production environments
Strong expertise in Linux systems engineering, performance tuning, and lifecycle management
Strong understanding of reliability concepts (SLOs, SLAs, performance, capacity)
Solid scripting and automation skills (e.g., Bash, Python) with a continuous improvement mindset
Hands-on experience with configuration management (e.g., Salt, Ansible) and Infrastructure as Code (e.g., Terraform)
Experience with CI/CD tools (e.g., GitLab, Jenkins) and automated deployments
Good knowledge of monitoring and observability tools (e.g., Zabbix, Grafana, ELK)
Proven experience in incident management, root cause analysis, and postmortems
Experience with security practices, including patching and access control
Knowledge of core traffic services (DNS, load balancing, CDN)
Basic experience with container and cloud technologies (Docker, Kubernetes, AWS)

We value diversity and treat all applications equally – regardless of gender, background, age, religion, disability, or sexual orientation. Different perspectives enrich our team and make EVENTIM stronger.

Apply now

Benefits

Sofakonzerte & Mitarbeitenden-Events

Rabatt beim Ticketkauf & Clearing-Einsätze

25 Tage Workation aus dem EU-Ausland

30 Tage Urlaub & Möglichkeit auf 15 Tage unbezahlten Urlaub

Mental Health Programm & betriebliche Altersvorsorge

Corporate Benefits & Vergünstigungen bei Kess

Flexible Arbeitszeiten

Zentrale Lage & Bezuschussung ÖPNV

Bikeleasing

Sprachlernplattform sowie Lunch & Learn