Job Description
About the Organization
We are a global technology enterprise delivering the backbone of modern digital ecosystems through AI infrastructure, hyperscale cloud platforms, and mission-critical IT operations. Our environment supports high-availability systems responsible for powering real-time data processing, large-scale AI workloads, and enterprise-grade applications across multiple regions.
Operating at a global scale, we manage a complex IT landscape that includes distributed systems, multi-cloud environments, high-performance compute clusters, and advanced cybersecurity frameworks. Our clients rely on us to deliver 99.999% uptime, resilient infrastructure, and secure digital operations, making operational excellence a strategic priority.
As our infrastructure footprint continues to expand, the need for a highly capable leader to oversee global IT operations, service delivery, and operational resilience is critical. The Director of IT Operations will lead the design, execution, and continuous optimization of enterprise IT operations across a dynamic, high-growth environment.
This role requires a leader who can operate at scale, balancing operational rigor, innovation, automation, and performance management while ensuring seamless delivery of services across geographically distributed systems.
Essential Duties and Responsibilities
- Lead global IT operations, ensuring high availability, reliability, and performance across all systems and platforms.
- Oversee infrastructure operations, including cloud environments, data centers, networks, and enterprise systems.
- Establish and manage IT service management (ITSM) frameworks, including incident, problem, and change management processes.
- Drive operational excellence through automation, monitoring, and continuous improvement initiatives.
- Develop and enforce service level agreements (SLAs) and key performance indicators (KPIs) to ensure consistent service delivery.
- Lead incident response and crisis management, ensuring rapid resolution and minimal business disruption.
- Collaborate with cybersecurity teams to ensure a robust security posture and compliance with enterprise standards.
- Manage vendor relationships, including cloud providers, infrastructure partners, and managed service providers.
- Implement advanced monitoring and observability tools to provide real-time visibility into system performance.
- Partner with engineering and product teams to support infrastructure scalability and platform reliability.
- Drive cost optimization initiatives, balancing performance, scalability, and financial efficiency.
- Build and lead a high-performing global IT operations team, fostering a culture of accountability and continuous improvement.
Job Qualifications and Requirements
- Bachelor’s degree in Information Technology, Computer Science, or related field; advanced degree preferred.
- 12+ years of experience in IT operations, infrastructure management, or related roles within enterprise-scale environments.
- Proven experience managing global IT operations in cloud, data center, or high-availability environments.
- Strong expertise in multi-cloud platforms (AWS, Azure, GCP), networking, and distributed systems.
- Deep knowledge of ITSM frameworks (ITIL or equivalent) and operational best practices.
- Experience with monitoring, observability, and automation tools (e.g., ServiceNow, Splunk, Datadog, etc.).
- Strong understanding of cybersecurity principles and compliance frameworks.
- Experience managing vendor relationships and large-scale service contracts.
- Demonstrated ability to lead teams in complex, matrixed, global organizations.
- Exceptional problem-solving and decision-making capabilities.
Personal Capabilities and Qualifications
- Operationally focused leader with a commitment to reliability, performance, and continuous improvement.
- Strong analytical mindset with the ability to leverage data for operational optimization.
- Executive presence with the ability to communicate effectively with senior leadership and stakeholders.
- Calm and decisive under pressure, particularly in high-severity incident scenarios.
- Collaborative and cross-functional, with the ability to align IT operations with broader business objectives.
- Adaptable and resilient in a fast-paced, high-growth environment.
- Strong leadership skills with a focus on team development and performance excellence.
Strategic Support
- Serve as a key partner to the CIO and executive leadership team on IT strategy, operational scalability, and infrastructure investments.
- Provide insights into emerging technologies, operational trends, and infrastructure innovation.
- Support enterprise initiatives, including digital transformation, cloud migration, and platform modernization.
- Align IT operations with business growth objectives, ensuring scalable and resilient infrastructure.
- Contribute to long-term planning for global expansion and capacity management.
Working Conditions
- Fully remote within the United States, with occasional travel (10–20%) for leadership meetings and operational reviews.
- Operates within a 24/7 global operations environment, requiring availability for critical incidents when necessary.
- High-performance, fast-paced environment with global collaboration across multiple time zones.
- High-visibility role with direct impact on system reliability and business continuity.
Job Function
- IT Operations Leadership
- Infrastructure & Cloud Operations
- IT Service Management (ITSM)
- Site Reliability & Performance Engineering
- Incident & Crisis Management
- Vendor & Service Delivery Management
Compensation & Benefits
- Compensation Package: $350,000 – $457,000 (base salary + performance bonus + equity potential)
- Comprehensive health, dental, and vision coverage
- Equity participation and long-term incentive programs
- Retirement plans with a company match
- Flexible PTO and wellness programs
- Leadership development and technical certification support
- Access to global technology conferences and innovation forums
Why Join Us
- Lead IT operations at the forefront of AI infrastructure and global digital transformation.
- Operate in a high-impact leadership role with direct influence on enterprise reliability and scalability.
- Work alongside top-tier engineering and technology leaders in a globally distributed organization.
- Contribute to building and operating systems that power next-generation enterprise platforms.
- Be part of a high-growth organization shaping the future of cloud and AI-driven operations.