Trident Consulting
Operations Command Center Engineer (OCCE2)
Trident Consulting, San Ramon, California, United States, 94583
Operations Command Center Engineer (OCCE2)
Sacramento, California
Information Technology - IT Operations /
Exempt /
Hybrid
Berkshire Hathaway Homestate Companies, Workers Compensation Division, has an immediate opening for an Operations Command Center Engineer 2 (OCCE2). The OCCE2 will be responsible for handling escalated incidents as referred by OCCE1, performing deeper troubleshooting, incident management, and root cause analysis. This individual will provide technical expertise to ensure uptime and efficiency in the operations of IT systems, applications, and infrastructure, and will be involved in maintaining and updating monitoring tools, processes, and cloud-based solutions to enhance operational efficiency. Contributes to key areas such as network management, system administration, and automation.
KEY RESPONSIBILITIES
Manages escalated tickets from OCCE1 for advanced troubleshooting and problem resolution across network, system, and cloud platforms. Proactively monitors system health, performance, and uptime, ensuring continuous service availability using advanced monitoring and observability tools. Identifies recurring incidents and initiates root cause analysis for long-term resolution. Collaborates with cross-functional teams, including Applications, Infrastructure, Security, and Cloud teams, to resolve incidents. Configures, troubleshoots, and maintains network devices (e.g., routers, switches, firewalls) and ensures secure remote access (VPN, remote desktop solutions). Manages and maintains cloud infrastructure (AWS, Azure, GCP), including virtualization (VMware, Hyper-V) and automation (Terraform, Ansible). Develops and refines operational runbooks, playbooks, and response procedures, focusing on improving cloud governance and security. Participates in on-call rotations to support incident handling outside of normal business hours. Contributes to the continuous improvement of monitoring tools, cloud services, and incident management processes. Prepares and delivers post-incident reports, root cause analysis, and lessons learned to Senior Management. Ensures that SLAs related to response times, escalation, and ticket handling are met consistently. Coordinates shift handovers with detailed incident reporting and supporting documentation. Leads efforts on system administration (Windows, Linux, Mac OS), backup and disaster recovery procedures, and server management. Participates in project management efforts, capacity planning and risk management for ongoing operations. EDUCATION/EXPERIENCE
Education : Minimum of High School diploma or equivalent required. Bachelor's degree in Computer Science, Software Engineering, or related discipline preferred. Experience:
Minimum of 3 years of experience in IT operations, HelpDesk, or similar roles, with a minimum of 1 year of experience with VPN, remote access technologies, and network monitoring required. Experience with Windows, Linux, and/or Mac OS administration. PREFERRED CERTIFICATIONS
Comp TIA Network+ Cisco Certified Network Associate (CCNA) Microsoft Certified: Azure Administrator Associate AWS Certified Solutions Architect - Associate Google Professional Cloud Architect Red Hat Certified System Administrator (RHCSA) Certified Ethical Hacker (CEH) CompTIA CySA+ (Cybersecurity Analyst) Certified Information Systems Auditor (CISA) - ISACA GIAC Security Essentials (GSEC) PRINCE2 Practitioner Agile Certified Practitioner (PMI-ACP) - PMI Certified ScrumMaster (CSM) ITIL Practice Manager (PM) SKILLS NEEDED: Network & Infrastructure Management
Network configuration and troubleshooting VPN and remote access technologies Cloud networking (AWS, Azure, Google Cloud) Virtualization technologies (VMware, Hyper-V, KVM) Storage solutions (SAN, NAS, DAS) Server management and configuration (Windows, Linux) DNS, DHCP, TCP/IP protocols SKILLS NEEDED: System Administration
Windows, Linux, and Mac OS administration User and group management (AD, LDAP) Patch management and system updates Backup and disaster recovery procedures Server health monitoring and performance tuning SKILLS NEEDED: Cloud & Virtualization
Cloud platform management (AWS, Azure, GCP) Cloud services (IaaS, PaaS, SaaS) Cloud security and governance Cloud automation tools (Terraform, CloudFormation) SKILLS NEEDED: DevOps & CI/CD
Monitoring and observability (Prometheus, Grafana) Incident and change management Scripting and automation (Python, Bash, PowerShell) SKILLS NEEDED: Security & Compliance
Firewalls, IDS/IPS, and VPNs Endpoint security and antivirus solutions SIEM platforms (Chronicle, Splunk, QRadar) Vulnerability management (Nessus, Qualys) Identity and Access Management (IAM) Compliance standards (ISO, NIST, GDPR, HIPAA) Data loss prevention (DLP) Network security protocols (SSL/TLS, IPsec) SKILLS NEEDED: Database Management
Database administration (SQL Server, MySQL, Oracle) Data backup and recovery NoSQL databases (MongoDB, Cassandra, Redis) SKILLS NEEDED: Monitoring & Observability
System and application monitoring (Nagios, Zabbix, SolarWinds) Log management and analysis (ELK Stack, Splunk) Cloud monitoring (CloudWatch, Azure Monitor) Event correlation and root cause analysis SKILLS NEEDED: End-User Support & Troubleshooting
Desktop support (Windows, Mac OS) Ticketing and help desk systems (JIRA, ServiceNow) Hardware and software troubleshooting SLA management and tracking SKILLS NEEDED: Automation & Scripting
Scripting languages (Bash, PowerShell, Python) Automation tools (Ansible, Puppet, Chef) Workflow automation SKILLS NEEDED: Project Management & Documentation
Project management frameworks (Agile, Scrum, ITIL) Change management processes Documentation standards (SOPs, runbooks) Risk management and mitigation SKILLS NEEDED: Data Analytics & Reporting
Data visualization tools (Power BI, Tableau) Basic data analytics and query skills KPI monitoring and reporting
Manages escalated tickets from OCCE1 for advanced troubleshooting and problem resolution across network, system, and cloud platforms. Proactively monitors system health, performance, and uptime, ensuring continuous service availability using advanced monitoring and observability tools. Identifies recurring incidents and initiates root cause analysis for long-term resolution. Collaborates with cross-functional teams, including Applications, Infrastructure, Security, and Cloud teams, to resolve incidents. Configures, troubleshoots, and maintains network devices (e.g., routers, switches, firewalls) and ensures secure remote access (VPN, remote desktop solutions). Manages and maintains cloud infrastructure (AWS, Azure, GCP), including virtualization (VMware, Hyper-V) and automation (Terraform, Ansible). Develops and refines operational runbooks, playbooks, and response procedures, focusing on improving cloud governance and security. Participates in on-call rotations to support incident handling outside of normal business hours. Contributes to the continuous improvement of monitoring tools, cloud services, and incident management processes. Prepares and delivers post-incident reports, root cause analysis, and lessons learned to Senior Management. Ensures that SLAs related to response times, escalation, and ticket handling are met consistently. Coordinates shift handovers with detailed incident reporting and supporting documentation. Leads efforts on system administration (Windows, Linux, Mac OS), backup and disaster recovery procedures, and server management. Participates in project management efforts, capacity planning and risk management for ongoing operations. EDUCATION/EXPERIENCE
Education : Minimum of High School diploma or equivalent required. Bachelor's degree in Computer Science, Software Engineering, or related discipline preferred. Experience:
Minimum of 3 years of experience in IT operations, HelpDesk, or similar roles, with a minimum of 1 year of experience with VPN, remote access technologies, and network monitoring required. Experience with Windows, Linux, and/or Mac OS administration. PREFERRED CERTIFICATIONS
Comp TIA Network+ Cisco Certified Network Associate (CCNA) Microsoft Certified: Azure Administrator Associate AWS Certified Solutions Architect - Associate Google Professional Cloud Architect Red Hat Certified System Administrator (RHCSA) Certified Ethical Hacker (CEH) CompTIA CySA+ (Cybersecurity Analyst) Certified Information Systems Auditor (CISA) - ISACA GIAC Security Essentials (GSEC) PRINCE2 Practitioner Agile Certified Practitioner (PMI-ACP) - PMI Certified ScrumMaster (CSM) ITIL Practice Manager (PM) SKILLS NEEDED: Network & Infrastructure Management
Network configuration and troubleshooting VPN and remote access technologies Cloud networking (AWS, Azure, Google Cloud) Virtualization technologies (VMware, Hyper-V, KVM) Storage solutions (SAN, NAS, DAS) Server management and configuration (Windows, Linux) DNS, DHCP, TCP/IP protocols SKILLS NEEDED: System Administration
Windows, Linux, and Mac OS administration User and group management (AD, LDAP) Patch management and system updates Backup and disaster recovery procedures Server health monitoring and performance tuning SKILLS NEEDED: Cloud & Virtualization
Cloud platform management (AWS, Azure, GCP) Cloud services (IaaS, PaaS, SaaS) Cloud security and governance Cloud automation tools (Terraform, CloudFormation) SKILLS NEEDED: DevOps & CI/CD
Monitoring and observability (Prometheus, Grafana) Incident and change management Scripting and automation (Python, Bash, PowerShell) SKILLS NEEDED: Security & Compliance
Firewalls, IDS/IPS, and VPNs Endpoint security and antivirus solutions SIEM platforms (Chronicle, Splunk, QRadar) Vulnerability management (Nessus, Qualys) Identity and Access Management (IAM) Compliance standards (ISO, NIST, GDPR, HIPAA) Data loss prevention (DLP) Network security protocols (SSL/TLS, IPsec) SKILLS NEEDED: Database Management
Database administration (SQL Server, MySQL, Oracle) Data backup and recovery NoSQL databases (MongoDB, Cassandra, Redis) SKILLS NEEDED: Monitoring & Observability
System and application monitoring (Nagios, Zabbix, SolarWinds) Log management and analysis (ELK Stack, Splunk) Cloud monitoring (CloudWatch, Azure Monitor) Event correlation and root cause analysis SKILLS NEEDED: End-User Support & Troubleshooting
Desktop support (Windows, Mac OS) Ticketing and help desk systems (JIRA, ServiceNow) Hardware and software troubleshooting SLA management and tracking SKILLS NEEDED: Automation & Scripting
Scripting languages (Bash, PowerShell, Python) Automation tools (Ansible, Puppet, Chef) Workflow automation SKILLS NEEDED: Project Management & Documentation
Project management frameworks (Agile, Scrum, ITIL) Change management processes Documentation standards (SOPs, runbooks) Risk management and mitigation SKILLS NEEDED: Data Analytics & Reporting
Data visualization tools (Power BI, Tableau) Basic data analytics and query skills KPI monitoring and reporting