Logo
Microsoft

Principal Data Engineer

Microsoft, Houston, Texas, United States,


Security represents the most critical priorities for our customers in a world awash in digital threats, regulatory scrutiny, and estate complexity. Microsoft Security aspires to make the world a safer place for all. We want to reshape security and empower every user, customer, and developer with a security cloud that protects them with end to end, simplified solutions. The Microsoft Security organization accelerates Microsoft’s mission and bold ambitions to ensure that our company and industry is securing digital technology platforms, devices, and clouds in our customers’ heterogeneous environments, as well as ensuring the security of our own internal estate.

The Microsoft Security AI (Artificial Intelligence) Research team is responsible for defending Microsoft and our customers through applied AI innovation. Our culture is centered on embracing a growth mindset, a theme of inspiring excellence, and encouraging teams and leaders to bring their best each day. In doing so, we create life-changing innovations that impact billions of lives around the world. Defending Microsoft’s complex environment provides a unique opportunity to build and evaluate autonomous defense through emerging generative AI capabilities. Microsoft understands and learns from its own defensive expertise, including via teams like Microsoft Threat Intelligence Center (MSTIC), and has the opportunity to build a unique knowledge graph describing the relationship between risk, investigation, and response. This data, built over Microsoft’s complex digital estate, along with Microsoft AI forms the foundation for innovative solutions to defend Microsoft.

We are looking for a Principal Data Engineer to join a newly created team to reshape how we defend Microsoft. You will establish and build processes and structures to empower research scenarios: from data engineering, construction of training data sets, creation of new telemetry sources, building of knowledge graphs, and creation of agentic AI systems or new security-focused language models. You would be responsible for working together with applied scientists to develop research prototypes to solve real-world scenarios and challenges. For those interested in making a meaningful impact in the security industry, don’t miss this opportunity to join the team working on the latest AI technology and applicability.

ResponsibilitiesActs as enablers to research and applied scientists by creating data pipelines, data quality/validation checks leveraging large, complex data across the Microsoft estate.Collaborate with researchers on the design, development, execution, and implementation of technology research projects that serve as a catalyst for technology transfer into the hands of Microsoft Defenders.Facilitate the quality of prototypes through data visualization, monitoring, and alerting pipelines.Research and develop an understanding of tools, technologies, and methods being used in the community that can be utilized to improve product quality, performance, or efficiency.Incorporate state-of-the-art research or previously tested solutions occurring at Microsoft and academia and tune it to solve complex security challenges.

QualificationsRequired QualificationsBachelor's Degree in Computer Science, Math, Software Engineering, Computer Engineering, or related field AND 6+ years experience in business analytics, data science, software development, data modeling or data engineering workOR Master's Degree in Computer Science, Math, Software Engineering, Computer Engineering, or related field AND 4+ year(s) experience in business analytics, data science, software development, or data engineering workOR equivalent experience.4+ years of experience building, scaling and operating cloud services in Azure (including Azure Functions, Azure Containers, Azure DevOps pipelines, Github actions, Github Codespaces, and Jupyter Notebooks) AWS, or GCP.4+ years of relevant industry experience driving cutting-edge research into real world impact.4+ years of experience working with large-scale data processing tools.

Other RequirementsAbility to meet Microsoft, customer and/or government security screening requirements are required for this role. These requirements include, but are not limited to the following specialized security screenings:Microsoft Cloud Background Check: This position will be required to pass the Microsoft background and Microsoft Cloud background check upon hire/transfer and every two years thereafter.

Preferred QualificationsTrack record of experience building customer solutions or open source projects using generative AI and/or multi-agent frameworks such as AutoGen, LangGraph, crewAI, or equivalent contributions to products and/or open-source projects.Experience formulating research problems, evaluating related work, innovating differentiators, defining success metrics, developing plans and schedules for achieving metrics, and learning and iterating on further improvements.Demonstrated technical competencies with large-scale data processing and distributed compute tools such as Cosmos, Azure Data Explorer, Kusto, Azure Data Factory, Azure AML pipelines, Spark, Synapse, or Dask.Experience in cyber security and safety domains, such as malware detection, fraud prevention, cyber-physical systems, adversary tradecraft, emerging threats or SOC operations.Written and verbal communication skills, ability to simplify and explain complex ideas.

Data Engineering IC5 - The typical base pay range for this role across the U.S. is USD $137,600 - $267,000 per year. There is a different range applicable to specific work locations, within the San Francisco Bay area and New York City metropolitan area, and the base pay range for this role in those locations is USD $180,400 - $294,000 per year.

Microsoft is an equal opportunity employer. Consistent with applicable law, all qualified applicants will receive consideration for employment without regard to age, ancestry, citizenship, color, family or medical care leave, gender identity or expression, genetic information, immigration status, marital status, medical condition, national origin, physical or mental disability, political affiliation, protected veteran or military status, race, ethnicity, religion, sex (including pregnancy), sexual orientation, or any other characteristic protected by applicable local laws, regulations and ordinances.

#J-18808-Ljbffr