Logo
TikTok

Site Reliability Engineer, Recommendation Infrastructure - USDS

TikTok, New York, NY


Responsibilities

About TikTok U.S.Data Security
TikTok is the leading destination for short-form mobile video. Our mission is to inspire creativity and bring joy. U.S. Data Security ("USDS") is a subsidiary of TikTok in the U.S. This new, security-first division was created to bring heightened focus and governance to our data protection policies and content assurance protocols to keep U.S. users safe. Our focus is on providing oversight and protection of the TikTok platform and U.S. user data, so millions of Americans can continue turning to TikTok to learn something new, earn a living, express themselves creatively, or be entertained. The teams within USDS that deliver on this commitment daily span across Trust & Safety, Security & Privacy, Engineering, User & Product Ops, Corporate Functions and more.

Why Join Us
Creation is the core of TikTok's purpose. Our platform is built to help imaginations thrive. This is doubly true of the teams that make TikTok possible.
Together, we inspire creativity and bring joy - a mission we all believe in and aim towards achieving every day.
To us, every challenge, no matter how difficult, is an opportunity; to learn, to innovate, and to grow as one team. Status quo? Never. Courage? Always.
At TikTok, we create together and grow together. That's how we drive impact - for ourselves, our company, and the communities we serve.
Join us.

In order to enhance collaboration and cross-functional partnerships, among other things, at this time, our organization follows a hybrid work schedule that requires employees to work in the office 3 days a week, or as directed by their manager/department. We regularly review our hybrid work model, and the specific requirements may change at any time.

Responsibilities:
• Engage in and improve the whole lifecycle of Recommendation systems - from system design consulting through to launch reviews, deployment, operation and refinement
• Deliver tools/software to improve the reliability and scalability of services, automate operations and improve R&D efficiency
• Build availability of large-scale services deployed across global data centers
• Plan, manage and optimize cloud resources utilization, ensuring SLA of large-scale clusters
• Measure and monitor availability, latency and overall service health
• Practice sustainable incident response and postmortems.

Qualifications
• Bachelor's degree or above majoring in Computer Science or related fields, with at least 5 years of related work experience
• Experience in SRE of large-scale systems deployment with high reliability and scalability
• Familiar with system operation skills in Linux and network
• Experience programming in at least one of the following languages: Python, Perl, Go, or C/C++
• Experience in designing, analyzing and troubleshooting large-scale distributed systems
• Familiar with popular CI/CD procedures and environments
• Effective communication skills and a sense of ownership and drive

TikTok is committed to creating an inclusive space where employees are valued for their skills, experiences, and unique perspectives. Our platform connects people from across the globe and so does our workplace. At TikTok, our mission is to inspire creativity and bring joy. To achieve that goal, we are committed to celebrating our diverse voices and to creating an environment that reflects the many communities we reach. We are passionate about this and hope you are too.

TikTok is committed to providing reasonable accommodations in our recruitment processes for candidates with disabilities, pregnancy, sincerely held religious beliefs or other reasons protected by applicable laws. If you need assistance or a reasonable accommodation, please reach out to us at https://shorturl.at/ktJP6

This role requires the ability to work with and support systems designed to protect sensitive data and information. As such, this role will be subject to strict national security-related screening.