Logo
A Place for Mom

Staff DevOps Engineer

A Place for Mom, New York, New York, us, 10261


Job Description

We are seeking a highly skilled and experienced Staff DevOps Engineer to join our team. This role will focus on Site Reliability Engineering (SRE), enhancing our developer platform, and ensuring robust security practices. The ideal candidate will have a strong background in SRE principles, platform engineering, and security, with a proven ability to drive improvements in system reliability, performance, and security.

Key Responsibilities:

Site Reliability Engineering (SRE):

Implement and manage SRE practices to ensure high availability, reliability, and performance of our systems and services.

Develop and maintain Service Level Objectives (SLOs), Service Level Indicators (SLIs), and Service Level Agreements (SLAs).

Monitor and analyze system performance, identifying and addressing reliability issues proactively.

Automate operational tasks to reduce manual intervention and improve system efficiency.

Developer Platform:

Enhance and maintain the developer platform to support efficient and scalable software development.

Collaborate with engineering teams to improve CI/CD pipelines, streamline development workflows, and optimize deployment processes.

Ensure that development tools and environments are up-to-date, reliable, and scalable.

Security:

Implement and manage security practices to protect our systems and data from threats.

Conduct regular security assessments and vulnerability scans to identify and mitigate risks.

Collaborate with security teams to enforce security policies and ensure compliance by default.

Collaboration and Leadership:

Lead with a product mindset, building tools that developers find intuitive and easy to use.

Work closely with cross-functional teams to support and improve system reliability, performance, and security.

Mentor and provide technical guidance to junior team members.

Stay updated with industry trends and best practices, applying them to improve our systems and processes.

Qualifications:Qualifications

Technical Skills:

Proven experience with SRE principles and practices.

Strong proficiency in DevOps tools and technologies, including CI/CD pipelines (GitHub Actions), containerization (Docker, Kubernetes), and infrastructure as code (Terraform).

Experience with monitoring and logging tools across application (DataDog, New Relic), infrastructure (Grafana, Prometheus), management (Splunk, ELK), and cloud (AWS CloudWatch)

Strong understanding of security practices and tools, including vulnerability scanning and threat detection.

Experience:

5+ years of experience in DevOps, SRE, or a related field.

Demonstrated experience in managing and optimizing large-scale systems and platforms.

Experience with cloud platforms (e.g., AWS, Azure, GCP) and their security features.

Soft Skills:

Excellent problem-solving and analytical skills.

Strong communication and collaboration abilities.

Ability to work independently and as part of a team.

Education:

Bachelor’s degree in Computer Science, Engineering, or a related field, or equivalent experience.

Additional Information

Compensation

Base Salary: $140,000 to $150,000 + 20% Bonus

Benefits:

401(k) plus match

Dental insurance

Health insurance

Vision Insurance

Paid Time Off

All your information will be kept confidential according to EEO guidelines.

#LI-KT1

#LI-REMOTE