Sibitalent Corp
System Engineer
6 months C2H
10001 Richmond Ave Houston TX 77042 – 5 days onsite
Specific Work Requirements:
- A minimum of 5 years’ experience working in a large HPC enterprise environment comprising thousands of servers, large storage solutions, tape and tape automation.
- Proficient in the installation, configuration and management of Linux based operating systems, preferably using RHEL, CentOS, Rocky Linux.
- Experience with IBM’s xCAT distributed computing management software.
- Experience with installation and maintenance of computer hardware including servers, tape drives, robotic tape libraries, GPGPU, SSD, disk arrays.
- Experience with containerization.
- Knowledge of networking and datacenter technologies, switching, routing, high-availability, LAN / WAN / WLAN topologies and system configuration for Ethernet, InfiniBand, and Fiber Channel SAN.
- Experience with HPC Storage Solutions, for example configuration and operation of HPE ClusterStor systems, NetApp, Dell Isilon, and Pure Storage.
- Ability to write and troubleshoot Bourne, Bash and C Shell, Perl, Python, Ruby and MRTG scripts.
- Experience with PostgreSQL and database installation and support.
- Experience with Google Cloud Platform and Azure public clouds. Able to provision and manage instances, build images, write installation scripts.
- Experience with configuration tools like Ansible and Terraform.
- Experience with backup and recovery tools, IBM Spectrum, Dell Networker.
- Good knowledge of Linux security, including configuration of endpoint security tools.
- Ability to evaluate HPC system environments and make recommendations for improvement in performance and manageability.
- Ability to investigate, debug and diagnose system level issues.
General Work Requirements:
- Conform to local change management philosophies, including full testing on non-production systems, prior to production deployment.
- Effectively communicate all change activities to all affected parties including a clear description of the change, related service outages and possible effects on the different environments we support.
- Ensure SLB IT deployment standards are maintained, with verification through reporting systems.
- Meet KPO requirements for InTouch support processing, including full documentation of problem resolution, creation of knowledge content and best practice items.
- Show a good understanding of computer equipment, and its care and maintenance.
- Work with other internal support groups, systems, networking, programming, desktop support, computer operations, and facilities as required to complete administration functions.
- Work with a variety of vendors in technical environments and in the reporting and investigation of system problems.
- Provide a written weekly status report to the team manager and be prepared to present and discuss this with the team at a weekly status meeting.
- Prepared to work outside of normal hours as system maintenance often must be performed outside of prime time; provide 24/7 support to computer operations; work with other remote support locations, for example Kuala Lumpur, backing follow the sun support.
- Participate in support on-call schedule and in weekend power outages, normally two per year and in emergency data center activities.
- Peer-review all major projects, as part of the normal deployment philosophy.
Ensure compliance with all quality assurance, best practice procedures and QHSE requirements, as defined by job position