Logo
GenBio AI

Data Engineer

GenBio AI, Palo Alto, California, United States, 94306


Headquartered in Silicon Valley, we are a newly established start-up, where a collective of visionary scientists, engineers, and entrepreneurs are dedicated to transforming the landscape of biology and medicine through the power of Generative AI. Our team comprises leading minds and innovators in AI and Biological Science, pushing the boundaries of what is possible. We are dreamers who reimagine a new paradigm for biology and medicine.

We are committed to decoding biology holistically and enabling the next generation of life-transforming solutions. As the first mover in pan-modal Large Biological Models (LBM), we are pioneering a new era of biomedicine, with our LBM training leading to ground-breaking advancements and a transformative approach to healthcare. Our exceptionally strong R&D team and leadership in LLM and generative AI position us at the forefront of this revolutionary field. With headquarters in Silicon Valley, California, and a branch office in Paris, we are poised to make a global impact. Join us as we embark on this journey to redefine the future of biology and medicine through the transformative power of Generative AI.

Key Responsibilities:

Design, develop, optimize, and maintain software systems for the entire foundation model development and deployment lifecycle (i.e., data pipeline, pre-training, fine-tuning, serving). Build and maintain scalable, efficient, and reusable codebases for large-scale foundation model training, adaptation, evaluation, and inference. Collaborate closely with data engineers and research scientists to integrate models into production environments. Implement and ensure best practices in software engineering, including code quality, testing, and documentation. Build and optimize robust back-end systems, APIs, and databases to support complex workflows. Ensure code quality, scalability, and performance through rigorous testing and code reviews. Qualifications:

Bachelor's, Master's degree in Computer Science, Engineering, or related field. Experience in life sciences or healthcare is a plus. Strong programming skills in JavaScript, Python, and modern web development frameworks, and familiarity with GPU-accelerated tools (e.g., CUDA, cuDNN, Triton). Proficiency with major deep learning frameworks such as PyTorch, HuggingFace Transformers & Accelerate, or Megatron-LM/DeepSpeed. Familiarity with resource management and scheduling systems (e.g., SLURM, Kubernetes). Proficiency in back-end frameworks like Django, Flask, or Node.js, and database technologies (e.g., PostgreSQL, MongoDB). Expertise in distributed systems, cloud computing (AWS, GCP), and containerization tools (Docker, Kubernetes). Preferred Qualifications:

Ph.D. degree in Computer Science, Engineering, or related field. Experience in life sciences or healthcare is a plus. Prior experience pre-training or serving large language models or large-scale foundation models. Experience with deep learning workflows. Knowledge of biological data types and challenges and experience with bioinformatics tools Familiarity with version control systems like Git and CI/CD pipelines. Strong understanding of RESTful APIs, authentication, and deployment pipelines Familiarity with machine learning workflows and biological datasets.

Join us as we embark on this journey to redefine the future of biology and medicine.

We are an equal opportunity employer. We celebrate diversity and are committed to creating an inclusive environment for all employees.