Procyon TS
Senior Network Reliability Engineer
Procyon TS, Plano, Texas, us, 75086
In this role, you will:
Drive solid system architecture and guide and mentor well-disciplined code development practices (i.e. Repository procedures for proper code check-out/in);Manage Safe feature branching strategies and versioning control;Develop proper work-flow for team code review and deliver well vetted and tested products.Will oversee/author application testing procedures; SW deployment packaging and release coordination with customers; Monitoring of infrastructure, in/outbound processes, web services, application health;Implement feature tracking, bug fixes.Define standards that produce enterprise quality software that is robust, scalable, and maintainable for the entire lifecycle of the project and business.Develop and maintain a catalog of reliability scripts, tools, and libraries that can be leveraged for common instrumentation, automation, and operational needsMonitoring and analyze network performance, providing automation and orchestration insight for identifying or mitigating network and service-related eventsAnalyze data to diagnose and identify root causes for network-specific events within our Domain-of-ResponsibilityAct as a Tier 3 escalation for issues from Tier 1 or Tier 2 related to our observability platformCollaborate with Vendors and internal technical teams to understand and incorporate technical solutions.Define and implement strategies for network automation to improve operational efficienciesManage a (CI/CD) pipeline for network development and testingParticipate in the documentation of application/network flows for various support needsProvide technical guidance, training and mentorship to members of the NOC & engineering teams we support with our platformsDevelop and improve instrumentation for monitoring and logging the health and availability of servicesParticipate in Major Incident bridges that involve multiple teams/participants and the resulting formal RCA reports.The Skills and Experience You Bring to Dish Wireless:
A DISH Wireless Sr Site Reliability Engineer leads the solution to any problem or issue with an
automation-first
mindset, utilizing a crawl/walk/run approach towards implementation.Requirements for the position (Must Haves)
Bachelor's Degree in Computer Science, IT-related field, or equivalent experienceRequire at least 3+ years of scripting experience in Python, Javascript,3+ years of event-driven engineering with a strong preference for candidates with experience in AIOps using AI/ML platforms/tools3+ years Experience utilizing Source Code Management, CI/CD tools, and Automation tools such as Git/Gitlab, Terraform, Ansible, Chef, Puppet, Jenkins3+ years Experience building CI/CD pipelines, version control, and system testing with Gitlab and Jenkins.3+ Years Experience OS level containerization virtualization/techniques using Docker, WindRiver, VMware, Kubernetes and Rancher for microservices deployment.3+ Years Experience Familiar with cloud platforms such as AWS, Azure, and GCP5+ years of technical, hands-on experience in one or more of the following areas: AWS Cloud Engineering, 5G ORAN, 5G Core, and/or Data and Transport EngineeringA passion for taking ownership of your work and delivering resultsHabitual code branching, versioning, feature lifecycle management, testing, packaging and deploymentsVoracious need to document code and catalog data transformationsWillingness to learn and teach complex technologiesExcellent communication skills, and a team playerPreferred complementary skills for the Job
5+ years of experience using one or more platforms, such as DataDog, Grafana, ServiceNow, Solarwinds, Cisco Vitria/Matrix, Innoeye, Atlassian Stack: (Crucible, Bitbucket, JIRA, Confluence)Experience gaining insight from log files with LOKI, ElasticSearch, Prometheus, and Grafana.Experience implementing systems tracing with services such as Tempo, Jaeger, Opentracing etc.Intermediate understanding of utilizing RestAPIs, Apache Spark, Kafka
Drive solid system architecture and guide and mentor well-disciplined code development practices (i.e. Repository procedures for proper code check-out/in);Manage Safe feature branching strategies and versioning control;Develop proper work-flow for team code review and deliver well vetted and tested products.Will oversee/author application testing procedures; SW deployment packaging and release coordination with customers; Monitoring of infrastructure, in/outbound processes, web services, application health;Implement feature tracking, bug fixes.Define standards that produce enterprise quality software that is robust, scalable, and maintainable for the entire lifecycle of the project and business.Develop and maintain a catalog of reliability scripts, tools, and libraries that can be leveraged for common instrumentation, automation, and operational needsMonitoring and analyze network performance, providing automation and orchestration insight for identifying or mitigating network and service-related eventsAnalyze data to diagnose and identify root causes for network-specific events within our Domain-of-ResponsibilityAct as a Tier 3 escalation for issues from Tier 1 or Tier 2 related to our observability platformCollaborate with Vendors and internal technical teams to understand and incorporate technical solutions.Define and implement strategies for network automation to improve operational efficienciesManage a (CI/CD) pipeline for network development and testingParticipate in the documentation of application/network flows for various support needsProvide technical guidance, training and mentorship to members of the NOC & engineering teams we support with our platformsDevelop and improve instrumentation for monitoring and logging the health and availability of servicesParticipate in Major Incident bridges that involve multiple teams/participants and the resulting formal RCA reports.The Skills and Experience You Bring to Dish Wireless:
A DISH Wireless Sr Site Reliability Engineer leads the solution to any problem or issue with an
automation-first
mindset, utilizing a crawl/walk/run approach towards implementation.Requirements for the position (Must Haves)
Bachelor's Degree in Computer Science, IT-related field, or equivalent experienceRequire at least 3+ years of scripting experience in Python, Javascript,3+ years of event-driven engineering with a strong preference for candidates with experience in AIOps using AI/ML platforms/tools3+ years Experience utilizing Source Code Management, CI/CD tools, and Automation tools such as Git/Gitlab, Terraform, Ansible, Chef, Puppet, Jenkins3+ years Experience building CI/CD pipelines, version control, and system testing with Gitlab and Jenkins.3+ Years Experience OS level containerization virtualization/techniques using Docker, WindRiver, VMware, Kubernetes and Rancher for microservices deployment.3+ Years Experience Familiar with cloud platforms such as AWS, Azure, and GCP5+ years of technical, hands-on experience in one or more of the following areas: AWS Cloud Engineering, 5G ORAN, 5G Core, and/or Data and Transport EngineeringA passion for taking ownership of your work and delivering resultsHabitual code branching, versioning, feature lifecycle management, testing, packaging and deploymentsVoracious need to document code and catalog data transformationsWillingness to learn and teach complex technologiesExcellent communication skills, and a team playerPreferred complementary skills for the Job
5+ years of experience using one or more platforms, such as DataDog, Grafana, ServiceNow, Solarwinds, Cisco Vitria/Matrix, Innoeye, Atlassian Stack: (Crucible, Bitbucket, JIRA, Confluence)Experience gaining insight from log files with LOKI, ElasticSearch, Prometheus, and Grafana.Experience implementing systems tracing with services such as Tempo, Jaeger, Opentracing etc.Intermediate understanding of utilizing RestAPIs, Apache Spark, Kafka