NVIDIA
Senior Software Development Engineer in Test
NVIDIA, Santa Clara, California, us, 95053
We are seeking a highly skilled and hard-working Senior Test Developer / test engineer to join our multifaceted Enterprise Software QA team. This role offers an outstanding opportunity to leave your mark on the design, construction, optimization and testing of large-scale infrastructure for various foundational NVIDIA unified cloud services and data center offerings. If you are a dedicated engineer with a deep understanding of cloud infrastructure and distributed systems, and you thrive in an exciting, innovative environment, this could be the flawless role for you.
What you'll be doing:
Work with development teams on test plans for all layers of SW stack for cloud infrastructure, execution, reviews, failure analysis and assessing overall quality and risk. Work with customer PMs on software issues including technical feedback from OEMs and CSPs. Develop key KPIs to track execution and deploy process improvements to improve efficiency
Lead NVIDIA Cloud and Data Center bring up activities which will involve validation, reporting, working with engineering to debug issues, providing design input at times, adding coverage in different areas.
Design, develop and maintain CI/CD pipelines for continuous testing in cloud environments when needed.
Perform performance, scalability, and reliability testing of cloud services.
Implement and maintain test environments in cloud platforms such as AWS, Azure, or Google Cloud.
Supervise the infrastructure to alert on significant events, ensuring the highest level of system performance and reliability.
Work with various different partner teams to ensure availability of clusters to test on and take the lead in resolve all issues.
Working with teams to ensure quality of the cloud products getting delivered focusing on critical areas like security, storage, workloads, performance on latest SW and FW components.
What we need to see:
A Master's or Ph.D. in Computer Science or a related field, or equivalent experience.
5+ years of hands-on experience in cluster management and related tools, including Docker Containers, Slurm, Kubernetes, and Ansible.
2+ years strong experience with cloud infrastructure platforms like AWS, Azure, Google, OCI Cloud.
Hands-on experience with network, storage, security, cluster configuration and debugging, cloud infrastructure management tools like terraform, ansible.
Expertise in administering, operating, and configuring Kubernetes.
Experience in CI/CD tools such as Gitlab and Jenkins and the GitOps model.
Proficiency in various monitoring tools :Prometheus, Grafana, Cloudwatch, and Thanos.
Proficiency in debugging issues involving networks, DHCP, DNS, HTTP, Linux, and containers.
Ways to Stand Out from the Crowd:
Familiarity with "Bright Cluster manager" or runAI for managing and monitoring high performance computing.
Experience in writing automation for web application using tools like selenium, playwright.
By joining our team, you will be part of a forward-thinking company that values innovation and creativity. We offer a competitive salary and benefits package, a flexible work environment, and the opportunity to work with some of the industry leading experts. If you're ready to take your career to the next level, we'd love to hear from you.
NVIDIA is a well-known and esteemed company in the technology industry. We are recognized for our innovative solutions and are home to individuals who are both forward-thinking and dedicated. We strive to create a work environment that encourages collaboration and inclusivity, where all ideas are valued and respected. If you are a self-motivated and imaginative individual seeking to make a significant difference, we look forward to hearing from you.
The base salary range is 132,000 USD - 258,750 USD. Your base salary will be determined based on your location, experience, and the pay of employees in similar positions.
You will also be eligible for equity and benefits (https://www.nvidia.com/en-us/benefits/) . NVIDIA accepts applications on an ongoing basis.
NVIDIA is committed to fostering a diverse work environment and proud to be an equal opportunity employer. As we highly value diversity in our current and future employees, we do not discriminate (including in our hiring and promotion practices) on the basis of race, religion, color, national origin, gender, gender expression, sexual orientation, age, marital status, veteran status, disability status or any other characteristic protected by law.
What you'll be doing:
Work with development teams on test plans for all layers of SW stack for cloud infrastructure, execution, reviews, failure analysis and assessing overall quality and risk. Work with customer PMs on software issues including technical feedback from OEMs and CSPs. Develop key KPIs to track execution and deploy process improvements to improve efficiency
Lead NVIDIA Cloud and Data Center bring up activities which will involve validation, reporting, working with engineering to debug issues, providing design input at times, adding coverage in different areas.
Design, develop and maintain CI/CD pipelines for continuous testing in cloud environments when needed.
Perform performance, scalability, and reliability testing of cloud services.
Implement and maintain test environments in cloud platforms such as AWS, Azure, or Google Cloud.
Supervise the infrastructure to alert on significant events, ensuring the highest level of system performance and reliability.
Work with various different partner teams to ensure availability of clusters to test on and take the lead in resolve all issues.
Working with teams to ensure quality of the cloud products getting delivered focusing on critical areas like security, storage, workloads, performance on latest SW and FW components.
What we need to see:
A Master's or Ph.D. in Computer Science or a related field, or equivalent experience.
5+ years of hands-on experience in cluster management and related tools, including Docker Containers, Slurm, Kubernetes, and Ansible.
2+ years strong experience with cloud infrastructure platforms like AWS, Azure, Google, OCI Cloud.
Hands-on experience with network, storage, security, cluster configuration and debugging, cloud infrastructure management tools like terraform, ansible.
Expertise in administering, operating, and configuring Kubernetes.
Experience in CI/CD tools such as Gitlab and Jenkins and the GitOps model.
Proficiency in various monitoring tools :Prometheus, Grafana, Cloudwatch, and Thanos.
Proficiency in debugging issues involving networks, DHCP, DNS, HTTP, Linux, and containers.
Ways to Stand Out from the Crowd:
Familiarity with "Bright Cluster manager" or runAI for managing and monitoring high performance computing.
Experience in writing automation for web application using tools like selenium, playwright.
By joining our team, you will be part of a forward-thinking company that values innovation and creativity. We offer a competitive salary and benefits package, a flexible work environment, and the opportunity to work with some of the industry leading experts. If you're ready to take your career to the next level, we'd love to hear from you.
NVIDIA is a well-known and esteemed company in the technology industry. We are recognized for our innovative solutions and are home to individuals who are both forward-thinking and dedicated. We strive to create a work environment that encourages collaboration and inclusivity, where all ideas are valued and respected. If you are a self-motivated and imaginative individual seeking to make a significant difference, we look forward to hearing from you.
The base salary range is 132,000 USD - 258,750 USD. Your base salary will be determined based on your location, experience, and the pay of employees in similar positions.
You will also be eligible for equity and benefits (https://www.nvidia.com/en-us/benefits/) . NVIDIA accepts applications on an ongoing basis.
NVIDIA is committed to fostering a diverse work environment and proud to be an equal opportunity employer. As we highly value diversity in our current and future employees, we do not discriminate (including in our hiring and promotion practices) on the basis of race, religion, color, national origin, gender, gender expression, sexual orientation, age, marital status, veteran status, disability status or any other characteristic protected by law.