SSUSA Job #999: SENIOR SITE RELIABILITY ENGINEER (SRE)
SENIOR SITE RELIABILITY ENGINEER (SRE)
Our client is seeking an experienced Senior SRE who is passionate about the security, performance, and reliability of applications hosted in our global multi-cloud and on-prem data centers. You will be responsible for all aspects of our builds and PROD deployment environments including scaling, provisioning, monitoring, and automation. You should have significant CI/CD experience and strong AWS CLOUD system administration skills. Java and Spring Boot development experience will also be a plus. The successful candidate will be part of a collaborative “DevOps&SRE” business unit that provides continual proactive technical infrastructure support & improvement on all environments of our services/products and platforms, ensuring their optimal system’s availability and reliability. You must contribute to the improvement of DevOps implementation while applying IT security’s best practices.
· Participate in all stages of infrastructure provisioning, primarily providing the staging and production support.
· Assist in the implementation of security best practices and initiatives at all levels of the systems infrastructure.
· Adhere with SRE (Site Reliability Engineering) principles/pillars on incident management and service level objectives.
· Work closely with DevOps engineers to apply/improve the automation scripts and system designs shared by DevOps to improve systems efficiency in a production environment.
· Ensure maximum uptime and stability of cloud and on-premises environments, especially in staging and production environments.
· Apply the latest OS and security patches ensuring the compatibility of the underlying running application.
· Lead on conducting in the disaster recovery/business continuity (DRBC) routine exercises.
· Handle help desk & JIRA tickets and mitigate any production issues.
· Ensure accurate knowledge base documentation in a timely manner.
· Strong knowledge of secure web app deployments in AWS (4+ years).
· Advanced experience as a Linux or Windows server administrator.
· The ability to work with little supervision; must be self-driven and motivated.
· Experience with continuous integration/continuous delivery (CI/CD) — Jenkins and Git.
· Experience with containerized microservices delivered with Docker, Kubernetes (Kops, AWS EKS), or OpenShift 4.x.
· Manage & optimize unified logging system and APM (Application Performance Management) monitoring tools, constantly reducing the MTTR (Mean Time to Recovery).
· Strong experience with hybrid infrastructure systems monitoring and proactive incident management.
· Strong scripting skills using Shell and Python or Go (a plus).
· Ability to proactively triage on troubleshooting urgent production issues under high time pressure with precision.
· Experience in working collaboratively with various applications development teams throughout the organization to resolve mission-critical problems.
· Excellent written and oral communication skills are necessary to produce and process technical documents.
· Excellent problem-solving and analytical skills and the ability to translate business requirements into information systems solutions.
· Experience with IT security.
· Someone who is a team player.
· Familiarity/experience with the DevOps process.
· Professional IT certifications, such as Red Hat Certified Engineer/Windows Server, and AWS certifications (a huge plus).
· Relevant work experience (8+ years), either in software development or IT infrastructure.
· Master’s degree in technology-related, engineering, or computer science (a plus).
· Participate in a weekly on-call rotation (~every 3-4 weeks) as needed.
· Provide mission-critical production support in case of an outage during off business hours if necessary.
THIS IS A REMOTE POSITION AND YOU MUST BE BASED IN THE UNITED STATES
SEND YOUR RESUME TO JOBS@SSUSA.COM
MENTION JOB 999 IN THE SUBJECT BOX
Remote from Home