Logo
Normal Computing

Staff AI Infrastructure Engineer

Normal Computing, New York, New York, us, 10261


Your Role in Our Mission

As a Staff AI Infrastructure Engineer, you will play a pivotal role in leading and managing infrastructure projects from training to production, mentoring team members, and influencing the strategic direction of our AI and hardware engineering initiatives.

Depending on interest and skills, responsibilities could include:Collaborating closely with software and research engineers to optimize and productionize training, experimentation, and enterprise product deployment infrastructureDesigning, implementing, and maintaining individual microservices that collectively form complex, scalable backend systemsImplementing tools, libraries and frameworks to speed up and enable new research and productizationBe a part of planning and performing rapid prototyping of machine learning techniques applied to real-world scientific and enterprise semiconductor design and engineering problemsMake improvements to model architectures, training, simulation, and compilation proceduresPractice sustainable incident response and blameless postmortemsStaying up-to-date with the latest advancements in AI, ML, and infrastructure technologiesMentoring and guiding junior colleagues, nurturing a collaborative, growth-oriented environment that promotes knowledge sharing and professional developmentQualifications:

Bachelor's degree or higher in Computer Science, Engineering, or a related field6+ years of experience in infrastructure engineering, with a focus on machine learning, distributed systems, and cloud computingExperience writing highly performant services and strong knowledge of Golang and/or Python.Specific expertise with managing and evolving Kubernetes in production and any cloud platform like GCP, AWS, AzureExpertise in monitoring & alerting, scalable testing, automation, CI/CD frameworks and best practices, infrastructure-as-code (Terraform, CloudFormation), and configuration management tools (Ansible, Puppet, Chef)Leadership and collaboration qualities, enthusiasm for real-world, responsible impactExcellent problem-solving and "getting things done" skills, and a proven ability to troubleshoot and optimize complex systemsStrong written and verbal communication skills, with the ability to explain complex concepts to both technical and non-technical stakeholders across research and productIn addition, the following would be a significant advantage:

Extensive experience with various database technologies, both relational (like PostgreSQL) and NoSQL (such as MongoDB, Cassandra, or DynamoDB). Proficiency in database design, sharding, replication, and tuning for high-robustness environments is highly desirableApplied experience with machine learning, preferably modern deep learning architectures (e.g. Transformers, CNNs, vision-language models, deep reinforcement learning)Experience using TensorFlow, PyTorch, Jax, NumPy, Pandas or similar ML/scientific librariesComfort with probabilistic programming languages (e.g. Tensorflow Probability)

Equal Employment Opportunity Statement

Normal Computing is an Equal Opportunity Employer. We celebrate diversity and are committed to creating an inclusive environment for all employees. All qualified applicants will receive consideration for employment without regard to race, color, religion, sex, sexual orientation, gender identity, national origin, disability, veteran status, or any other legally protected status.

Accessibility Accommodations

Normal Computing is committed to providing reasonable accommodations to individuals with disabilities. If you need assistance or an accommodation due to a disability, please let us know at accomodations@normalcomputing.ai.

Privacy Notice

By submitting your application, you agree that Normal Computing may collect, use, and store your personal information for employment-related purposes in accordance with our Privacy Policy.