EVONA
Site Reliability Engineer
EVONA, Fremont, California, United States
Site Reliability Engineer (SRE) Location : San Francisco Bay Area Role Overview : We are seeking a highly skilled Site Reliability Engineer (SRE) to join a dynamic team at a rapidly growing technology company. As an SRE, you will be responsible for ensuring the reliability, scalability, and performance of mission-critical systems, while implementing automation and optimizing cloud infrastructure. This role offers the opportunity to work with cutting-edge AI/ML technologies , leveraging them to solve complex challenges in cloud infrastructure management and performance optimization. Key Responsibilities : System Reliability & Performance : Design, implement, and maintain scalable systems, ensuring high availability, performance, and disaster recovery across production environments. Automation & Tool Development : Develop automation tools to streamline operations, improve system reliability, and reduce manual interventions. Cloud Infrastructure Management : Create and manage cloud instances (e.g., dev, staging, production) using AWS, GCP, or Azure, optimizing infrastructure performance and cost. Integration of AI/ML Models : Collaborate with engineering teams to integrate machine learning models into production environments, ensuring that these models scale efficiently and perform optimally. Incident Management : Respond to and resolve incidents, minimizing downtime and ensuring quick recovery. Lead post-incident reviews and implement preventive measures. Continuous Improvement : Identify areas of improvement and drive initiatives to enhance system reliability, performance, and security. Security & Compliance : Ensure that infrastructure and applications adhere to security best practices and compliance standards. Qualifications : Educational Background : Bachelor’s degree in Computer Science, Engineering, or a related field (or equivalent experience). Experience : Proven experience as a Site Reliability Engineer or in a similar role within a SaaS environment , managing and optimizing cloud infrastructure (preferably AWS, GCP, or Azure), and familiarity with integrating AI and machine learning technologies. Technical Skills : Proficiency in programming and scripting languages such as Python, Go, or Bash. Experience with containerization and orchestration tools like Docker and Kubernetes. Solid understanding of networking, security , and performance optimization practices. Knowledge of CI/CD pipelines and DevOps practices to ensure smooth development and deployment cycles. Problem-Solving : Strong analytical and problem-solving skills with attention to detail. Collaboration & Communication : Excellent interpersonal skills, with the ability to work collaboratively in cross-functional teams and communicate technical concepts clearly. Benefits : Competitive Salary : Attractive compensation package, including equity options. Health & Wellness : Comprehensive health, dental, and vision insurance, along with other benefits. Work Environment : A collaborative and innovative work environment within a growing company. Growth Opportunities : Opportunities for career growth, professional development, and a chance to shape the future of the company’s technology and infrastructure.