Scaleway

Site Reliability Engineer - SRE

Scaleway, Newport Beach, California, us, 92659

About the jobScaleway is looking for a Site Reliability Engineer to join our teams. Reporting to a Lead SRE, you will be responsible to ensure we can reliably serve our products for users around the world. We expect you to have a strong background in development and system administration. Our systems evolve constantly and the tools needed to observe and act to ensure their resilience need to evolve accordingly.Minimum qualifications

Previous experience as a developer in Go, Python or RustExperience in system programming with usual scripting languages (bash, Python)Demonstrated ability to troubleshoot production systems failuresA great attitude and desire to work with a teamPassion for incremental improvements on tooling, love all things of automationExperience with Linux systems (Ubuntu/Debian)Experience with cloud environments architecture (baremetal, virtual machines, containers, orchestrators)Good understanding of computer networks: TCP/IP, DNS, load-balancing, IPv6, BGP and network virtualisationUnderstanding of written and spoken English, capable of writing technical documentation in English, ability to speak English if neededPreferred qualifications

Experience with infrastructure as code and continuous deploymentExperience dealing with physical hardware automationExperience with monitoring & logging systemsExperience administering relational databasesKnowledge of one cloud platform and related use-casesTake initiatives to propose new solutions and defend themTeam player, willing to share knowledge, opinions, and participate in regular team ritualsGood communication skills and coaching skillsResponsibilities

Create or optimize existing tools & documentation that will help identify, diagnose and remediate production incidents, automating as much as possibleTroubleshoot high-impact issues working with multiple engineering teamsTake on-call responsibilities, mitigate issues encountered in production and secure the best real-time answer to our customersEnsure a high quality of service for our customers by leveraging observability and monitoring technologiesManage lifecycle of products in productionHelp implementing best practices in stability, resiliency, scalability, security and performance across our systemsTechnical Stack

Python, Go, RustRabbitMQPostgreSQLHA Proxy, Nginx, REST APIs / FlaskS3 APISentry, Prometheus, Grafana, ElasticSearch, Fluentd, KibanaAnsible, AWX, Foreman, SaltGitLab, NexusUbuntu, Debian, CentOSJira, Confluence, Slack, GSuiteLocation

This position is based in our offices in Paris or Lille (France).

#J-18808-Ljbffr