
Engineer, NOC
Dublin
Description
Job Overview The NOC Lead, reporting to the NOC Manager, will provide strategic and technical leadership to a team of Principal Engineers and Engineers. This role is accountable for ensuring the stability and functionality of applications, batch processes, network, and infrastructure components. The Lead will drive operational excellence by maintaining maximum availability (99.9%–99.99%), overseeing incident management, and ensuring timely resolution of escalations to meet or exceed established SLAs. Additionally, this position will guide the team in implementing best practices, fostering collaboration, and delivering continuous improvements across the NOC environment. The Network Operations Center (NOC), a key part of iCIMS Technical Operations, is dedicated to monitoring applications and infrastructure to deliver an exceptional customer experience. The team ensures optimal performance by validating availability, coordinating cross-functional event responses, and communicating any customer-impacting incidents. Additionally, the NOC analyzes key performance indicators (KPIs) to forecast future trends and provide initial recommendations to the engineering team. The Lead, NOC reports to the NOC Manager, will be responsible for maintaining the functionality of applications, batch processes, network, and infrastructure components. This role ensures maximum availability (99.9%–99.99%) and drives timely resolution of incidents or technical escalations to meet established SLAs. Responsibilities Success Metrics Ensure Production Stability: Monitor availability and performance across the entire production environment to maintain optimal operations. Off hours support as needed Leverage Monitoring Tools: Track cloud resource utilization and performance metrics to identify trends and potential issues proactively. Data-Driven Insights: Generate regular performance reports and recommend enhancements based on detailed analysis. Incident Management Excellence: Lead the restoration of normal service operations swiftly, including assessment, research, escalation, communication, and resolution management. Execute Production Changes: Implement necessary changes to support both internal and external customer needs. Operational Support: Provide effective triage and resolution for operational support requests. Documentation & Standards: Review and refine SOPs, policies, procedures, and system requirements to ensure accuracy and relevance. Automation Development: Create and maintain automation scripts using Python and Java to streamline processes and reduce manual effort. Infrastructure as Code (IaC): Apply IaC practices to improve deployment efficiency, consistency, and scalability. Comprehensive Documentation: Prepare detailed electronic documentation, including SLAs, performance metrics, installation guides, and implementation guides. Reduce Manual Work: Identify repetitive tasks and implement automation solutions to eliminate inefficiencies. Performance Reviews: Participate in monthly metric reviews to support uptime goals of 99.9%–99.99%. Drive Innovation: Demonstrate passion, initiative, and urgency in seeking innovative solutions and resolving issues effectively. Qualifications Technical Expertise 8+ years in administration and production support experience with on-call responsibilities 10+ years of strong Cloud provider experience and demonstrated knowledge 6 Years Leadership Experience 1 Certification in any Technical Area Observability tooling experience Preferred Preferred Qualifications Experience with AWS / AWS Certifications Exposure to other cloud technologies like Azure and GCP
About iCIMS