SSUSA Job #984: DIRECTOR OF Dev/Ops & Site Reliability Engineering

Job Description

                          Director of DevOps & Site Reliability Engineering


One of our clients in Social Media with offices in NYC is seeking a Director of DevOps & Site Reliability Engineering across North America and Europe. 

The Role...

Our client is looking for a seasoned and impactful Director of DevOps to lead a team of DevOps Engineers across North America and Europe. As the Director of DevOps & Site Reliability Engineering, you will lead the DevOps, Production Engineering, and Service Operation team that design, implement, and operate the hybrid-cloud infrastructure. You will lead the team of experienced DevOps engineers to modernize, implement, optimize, and support the infrastructure and tools that provide building blocks for the developers to improve productivity.

The ideal candidate will have hands-on experience developing operational-based tools, managing hybrid/multi-cloud environments in a highly available, large scale, and global organization. We are looking for someone who is results-driven, customer-centric, and takes ownership and accountability with a mindset that anything is possible.

Key Points........

       Lead all aspects of technology infrastructure operations and management of on-prem and cloud-based production systems to ensure availability, performance, and scalability

       Define strategy, processes, and procedures for 24x7 site reliability, runbooks, escalation workflows, production incident resolution, and disaster recovery plans

       Manage, innovate, and create processes, automation, and tooling that continuously improve the availability, scalability, latency, and efficiency of services

       Recruit, hire, and retain a collaborative and high-performing team.

       Partner with finance teams to ensure infrastructure and software licensing budgets are on target and accurate


       10+ years of proven development and DevOps experience deploying and maintaining multi-tiered infrastructure and cloud applications

       BA/BS or equivalent combination of education and experience

       Experience in developing and executing DevOps principles and best practices

       Experience in infrastructure and application security requirements, practices, and detections

       Strong verbal and written communication skills

       A Subject Matter Expert (SME) in operating both On-prem and Cloud environments with successful initiatives in migrating from On-prem to Cloud

       Knowledgeable in data security frameworks, regulatory compliance, and privacy standards such as GDPR/CCPA, SOC 2

       Superb interpersonal skills, capable of working with multi-functional technical and varying levels of management

       Proven project management skills, including excellent presentation skills and agile methodologies

       Experience running and supporting infrastructure and services in public cloud environments (AWS, Azure, GCP, etc.)

       Experience building and supporting containerized application technologies including IaC (Terraform), Docker, Kubernetes/EKS, serverless, AWS/GCP, CICD, and OpenStack

       DevOps/SRE tools such as Configuration, Alerting, Observability

       Experience in both RDBMS and NoSQL databases

       Experience in managing Offshore / Geo-distributed teams

       Deep understanding and demonstrable experience with AWS products, Kubernetes, Docker, MongoDB, PostgresDB, OpenStack, CI/CD pipeline both on Cloud platforms (AWS, GCP, Azure) and On-prem Private-Cloud (OpenStack)

       Proven experience delivering and maintaining high-availability and disaster-tolerance global application and infrastructure

       Collaborate effectively with peer Engineering Directors and external partners to solve customer-facing production incidents and problems. The candidate will work closely with product engineering teams to collect and present appropriate metrics (i.e., SLA, SLI, and SLO) to implement the observability platform

       Analyze infrastructure Service Desk tickets and seek continuous improvement opportunities. Take point on urgent or complex issues, ensuring appropriate actions are taken and the determined root cause

       Create the DevOps roadmap and drive its implementation with the team and the dependencies across the company

       Developer Success is a must - Work with the Engineering teams to ensure customer SLAs are being met or exceeded


Job Location
Remote from Home

Position Type

Salary Range