Logo
Samsung SDS America

Senior Storage Engineer, HPC - GPU

Samsung SDS America, Ridgefield Park, New Jersey, us, 07660

Save Job

We are seeking a highly skilled and experienced Data Center Storage Engineer with exposure to High Performance Computing (HPC) and GPU Infrastructure. The ideal candidate will design, implement, and manage cutting-edge storage and backup solutions, HPC infrastructure, GPU clusters etc. This role requires expertise in the various storage technologies and data protection strategies to ensure high availability, scalability, and security of our data infrastructure. The ideal candidate will possess a strong understanding of storage technologies, HPC & GPU infrastructure, and excellent communication and leadership skills. Samsung SDS is the digital arm of the Samsung group and a global provider of cloud and digital transformation innovations. Samsung SDS delivers enterprise-grade solutions and services in cloud, secure mobility, analytics / AI, digital marketing and digital workspace.

We enable our customers in government, financial services, healthcare, and other industries to drive business in a hyper-connected economy helping them to increase productivity, safeguard assets, and make smarter decisions. Responsibilities: Design scalable storage solutions optimized for HPC and GPU-intensive workloads. Evaluate and implement high-performance storage technologies, including NVMe, SSD, parallel file systems (e.g., Lustre, GPFS) Develop high-availability storage architectures, including replication, clustering, and disaster recovery solutions tailored to HPC environments. Collaborate with HPC engineers, application developers, and systems teams to align storage solutions with computational and data-intensive requirements. Design and maintain GPU clusters for AI/ML, deep learning, and scientific computing applications. Manage HPC environments, including job scheduling systems (e.g., SLURM, PBS) and workload orchestration. Optimize storage and network architectures for low latency and high throughput required in HPC and GPU computing. Install, configure, and maintain enterprise storage infrastructure, including SAN, NAS, and distributed file systems. Monitor storage performance, capacity, and availability, ensuring optimal utilization for HPC and GPU workloads. Manage data provisioning, storage pools, LUNs, and volume configurations. Automate storage administration tasks using scripting languages (e.g., PowerShell, Python, Bash). Design and implement robust backup and disaster recovery strategies ensuring data integrity and compliance with RPO/RTO objectives. Administer enterprise backup solutions such as Veeam, Commvault, Veritas NetBackup, or similar tools. Perform regular DR testing and audits to validate backup integrity and failover procedures. Provide Tier 3 support for storage, HPC, and GPU infrastructure-related incidents and performance issues. Perform root cause analysis for critical incidents and implement corrective actions. Requirements Bachelor's degree in Computer Science, Technology, Computer Information Systems, Engineering, or a related field. 8+ years in IT consulting with 4+ years of hands-on experience in the following: Leading architectural design, development, and deployment of service delivery systems and tooling solutions. Experience as a Storage Administrator role in an enterprise environment. Solid knowledge of storage technologies such as SAN, NAS, and DAS. Experience with storage management software (e.g., Netapp, EMC/Dell). Knowledge of storage networking protocols like iSCSI, Fibre Channel, NFS, and SMB. Strong understanding of security practices in storage systems (encryption, access control). Experience with performance monitoring and troubleshooting tools. Administer and maintain SAN switch infrastructure (e.g., Brocade, or other SAN switches). Configure, monitor, and troubleshoot SAN switch zoning. Deep understanding of storage management protocols (block storage, capacity management, DR replication, etc.). Proficient in documentation using MS Office products. Strong experience in validating and evaluating system architecture. Ability to gather business requirements effectively. Willing and able to commute to the office 3 days per week. Must be authorized to work for any employer in the U.S. Preferred: Experience with HPC (High-Performance Computing) infrastructure. Familiarity with GPU infrastructure. Experience in implementing and managing information security practices and procedures. Familiarity with ITIL, DevOps, and Agile frameworks. Experience with NetBackup and Data Domain appliances. Backup software experience (e.g., additional backup solutions beyond NetBackup and DataDomain). Willingness and ability to travel domestically (approx. 5%). Benefits Samsung SDSA offers a comprehensive suite of programs to support our employees: Top-notch medical, dental, vision and prescription coverage Wellness program Parental leave 401K match and savings plan Flexible spending accounts Life insurance Paid Holidays Paid Time off Additional benefits Samsung SDS America, Inc. is an equal opportunity employer. All qualified applicants will receive consideration for employment without regard to age, race, color, religion, sex, sexual orientation, gender identity or expression, national origin, disability, status as a protected veteran, marital status, genetic information, medical condition, or any other characteristic protected by law. We are committed to providing reasonable accommodations to participate in the job application or interview process for candidates with disabilities. Please let your recruiter know if you need an accommodation at any point during the interview process.