Key Responsibilities:
- Manage containerized web services on-premises, along with various bridge services, integrations, and batch processes that connect our productivity ecosystem.
- Proactively reduce operational workload through engineering solutions rather than reactive firefighting.
- Automate and troubleshoot a wide range of technical infrastructure, both on-premises and in the cloud.
- Develop and implement monitoring solutions to maintain high system uptime and reliability.
- Promote transparency and high development velocity within the organization while upholding stringent security standards. Strive to minimize user friction and ensure team members have timely access to necessary tools and data.
- Simplify complex issues, iterate on solutions, and communicate progress to a diverse group of leads and stakeholders.
Qualifications:
- Over 5 years of experience in site reliability engineering or related fields.
- Proficiency in Python.
- Experience in managing and monitoring containerized infrastructure.
- Familiarity with CI/CD tools such as Jenkins, GitHub Actions, or ArgoCD.
- Expertise in Infrastructure as Code (IaC) and configuration management tools like Terraform, SaltStack, Chef, Puppet, or Ansible.
Job ID: 473334704
Originally Posted on: 4/14/2025
Want to find more Engineering opportunities?
Check out the 122,864 verified Engineering jobs on iHireEngineering
Similar Jobs