TensorLake Inc.
Founding AI Frameworks Engineer
TensorLake Inc., San Francisco, California, United States, 94199
Tensorlake is building a distributed data processing platform for developers building Generative AI applications. Our product, Indexify( https://getindexify.ai ), enables building continuously evolving knowledge bases and indexes for Large Language Model applications by allowing structured data or embedding extraction algorithms on any unstructured data.We are building a server-less product on top of Indexify that allows users to build real-time extraction pipelines for unstructured data. The extracted data and indexes would be directly consumed by AI Applications and LLMs to power business and consumer applications.As an
AI Frameworks Engineer , you will be responsible for optimizing our AI infrastructure, developing high-performance inference engines, and maximizing GPU utilization. You’ll work on the critical backend architecture that powers our platform’s scalability and performance, collaborating with both researchers and product engineers to ensure Tensorlake’s models run efficiently on a variety of hardware configurations.Responsibilities
As an AI Frameworks Engineer, your focus will be on optimizing and building high-performance AI systems. You will:Design and build custom inference engines
optimized for high throughput and low latency.Optimize GPU usage across our platform, ensuring that deep learning models run efficiently at scale.Write and optimize
custom CUDA kernels
and other low-level operations to accelerate deep learning workloads.Develop and implement techniques for
model compression , including
quantization
and
pruning , to make models more efficient for real-world deployment.Collaborate with research scientists and engineers to integrate new models into Tensorlake’s platform while ensuring peak performance.Utilize
cuDNN ,
cuBLAS , and other GPU-accelerated libraries to optimize computational workloads.Troubleshoot and debug performance bottlenecks using tools like
nvprof
and
Nsight , and implement fixes to improve throughput and memory usage.Work on scaling AI models to multiple GPUs and nodes using
NCCL
and other parallel computing techniques.Basic Qualifications
5+ years of experience in building and optimizing AI models for performance at scale.Strong knowledge of deep learning frameworks
such as
TensorFlow ,
PyTorch , or
JAX , with experience optimizing them for hardware.Proficiency in
GPU programming
with
CUDA ,
OpenCL , or similar parallel computing frameworks.Expertise in writing
custom CUDA kernels
to optimize deep learning operations.Experience with inference engines
such as
TensorRT , and understanding of model deployment optimization.Software engineering proficiency
in
C/C++ ,
Python , and low-level system components like memory management and concurrency.Experience in using profiling tools like
nvprof ,
Nsight , and other debugging tools for performance tuning.Benefits
- Ability to save in 401(k) plans- Comprehensive Healthcare and Dental Benefits
#J-18808-Ljbffr
AI Frameworks Engineer , you will be responsible for optimizing our AI infrastructure, developing high-performance inference engines, and maximizing GPU utilization. You’ll work on the critical backend architecture that powers our platform’s scalability and performance, collaborating with both researchers and product engineers to ensure Tensorlake’s models run efficiently on a variety of hardware configurations.Responsibilities
As an AI Frameworks Engineer, your focus will be on optimizing and building high-performance AI systems. You will:Design and build custom inference engines
optimized for high throughput and low latency.Optimize GPU usage across our platform, ensuring that deep learning models run efficiently at scale.Write and optimize
custom CUDA kernels
and other low-level operations to accelerate deep learning workloads.Develop and implement techniques for
model compression , including
quantization
and
pruning , to make models more efficient for real-world deployment.Collaborate with research scientists and engineers to integrate new models into Tensorlake’s platform while ensuring peak performance.Utilize
cuDNN ,
cuBLAS , and other GPU-accelerated libraries to optimize computational workloads.Troubleshoot and debug performance bottlenecks using tools like
nvprof
and
Nsight , and implement fixes to improve throughput and memory usage.Work on scaling AI models to multiple GPUs and nodes using
NCCL
and other parallel computing techniques.Basic Qualifications
5+ years of experience in building and optimizing AI models for performance at scale.Strong knowledge of deep learning frameworks
such as
TensorFlow ,
PyTorch , or
JAX , with experience optimizing them for hardware.Proficiency in
GPU programming
with
CUDA ,
OpenCL , or similar parallel computing frameworks.Expertise in writing
custom CUDA kernels
to optimize deep learning operations.Experience with inference engines
such as
TensorRT , and understanding of model deployment optimization.Software engineering proficiency
in
C/C++ ,
Python , and low-level system components like memory management and concurrency.Experience in using profiling tools like
nvprof ,
Nsight , and other debugging tools for performance tuning.Benefits
- Ability to save in 401(k) plans- Comprehensive Healthcare and Dental Benefits
#J-18808-Ljbffr