Palantir Technologies

Site Reliability Engineer - TITAN Program

Palantir Technologies, Washington, District of Columbia, us, 20022

A World-Changing Company

Palantir builds the world’s leading software for data-driven decisions and operations. By bringing the right data to the people who need it, our platforms empower our partners to develop lifesaving drugs, forecast supply chain disruptions, locate missing children, and more.

The Role

Palantir has been selected as the prime contractor for the development and delivery of the Tactical Intelligence Targeting Access Node (TITAN) ground station system, the Army’s next-generation deep-sensing capability enabled by artificial intelligence and machine learning (AI/ML). The TITAN team will be focused on the development of 10 TITAN prototypes, including five Advanced and five Basic variants, as well as the integration of new critical technologies to modernize the sensor to shooter workflow in support of Army long-range precision fires.

TITAN is a ground station that has access to Space, High Altitude, Aerial, and Terrestrial sensors to provide actionable targeting information for enhanced mission command and long-range precision fires. Palantir’s TITAN solution is designed to maximize usability for Soldiers, incorporating tangible feedback and insights from Soldier touch points at every step of the development and configuration process. Building off Palantir’s prior work delivering AI capabilities for the warfighter, Palantir is deploying the Army’s first AI-defined vehicle and is looking for teammates to help deliver next-generation capability to meet the needs of future conflict.

Our U.S. Government team is looking for a skilled Site Reliability Engineer with an unbending commitment to security to join us. The ideal candidate will combine their engineering experience and drive to improve existing systems and processes, with the creativity to develop novel solutions to evolving challenges.

Core ResponsibilitiesCollaborate with cross-functional teams to ensure the reliability, scalability, and performance of our systems and applications.Design, implement, and maintain containerized environments.Contribute to automation infrastructure and tools to streamline operations, deployment, and monitoring processes.Monitor and troubleshoot system and application performance, identifying and resolving issues to minimize downtime.Implement and maintain robust security measures to protect systems and data, while prioritizing security.Collaborate with development teams to optimize application performance and ensure seamless integration of new features.Participate in on-call rotations and respond to incidents to ensure timely resolution.Stay up-to-date with the latest industry trends and technologies, proactively finding opportunities for improvement.What We ValueStrong proficiency in Linux or Windows system administration and troubleshooting.In-depth knowledge and practical experience with containerization solutions such as OpenShift, Kubernetes, Rancher, or MicroK8s.Proficiency with programming and scripting languages such as Go, Bash, Java, and Python.Familiarity with automation and configuration management tools (e.g., Ansible, Chef, Puppet).Understanding of systems and network security principles and best practices.Excellent problem-solving skills and the ability to analyze complex issues.Strong communication and collaboration skills to work effectively in cross-functional teams.Ability to operate autonomously and without day-to-day direction.What We RequireWillingness to travel up to 25% of the time.Experience with cloud platforms (e.g., AWS, Azure, Google Cloud Platform) or on-premise hardware.Knowledge of DevOps and DevSecOps principles and practices.Experience with CI/CD pipelines and related tools (e.g., Jenkins, GitLab CI).Familiarity with monitoring and logging tools (e.g., Prometheus, Grafana, ELK stack).Understanding of database systems and SQL.An active U.S. security clearance.

#J-18808-Ljbffr