CrowdStrike
Software Engineer III, Data Platform - Query Platforms (Remote)
CrowdStrike, Austin, Texas, us, 78716
About The Role
Our Data Platform group at Crowdstrike is unique among its kind for being uncommonly customer-focused. In line with classic Data Platform groups, we build and operate systems to centralize all of the data from Falcon Sensors and 3rd Party sources derived from trillions of events/day. We also drive industry-leading innovation on a hyper scale security data lake that helps find bad actors and stop breaches.
As an E6 Engineer on the team - what would commonly be called "Senior Engineer" at most companies, SDE3 at Crowdstrike - you will contribute to the full spectrum of our systems, from foundational processing and data storage to scalable pipelines, frameworks, tools and applications that make that data available to other teams and systems.
One of your core responsibilities will be building and maintaining Self-service Data Platforms based upon Spark as the runtime, making it easy for customers to transform and access our data for analytics, machine learning, and threat hunting. Ultimately, you're empowering them to predictively build (among other things) automations that defend against such threats 'before' they appear in their own environments.
Another key value add for you will be contributions to a new graph database, which you will have a significant hand in building out.
Your primary toolset in your work will be Java/Spark, Kubernetes and AWS native tooling.
What You'll DoWrite highly fault-tolerant Java code within Apache Spark to produce platform products used by our customers to query our data for insight into active threat trends and related analyticsDesign, develop, and maintain ultra-high-scale data platforms that process petabytes of dataParticipate in technical reviews of our products and help us develop new features and enhance stabilityContinually help us improve the efficiency and reduce latency of our high-performance services to delight our customersResearch and implement new ways for both internal stakeholders as well as customers to query their data efficiently and extract results in the format they desireWhat You'll Need6+ years' experience combined between backend and data platform engineering roles4+ years of experience building data platform product(s) or features with (one of) Apache Spark, Flink or Iceberg, or with comparable tools in GCP3+ years of experience programming with Java, Scala or KotlinProven experience owning robust feature/product design end to end, yourself, especially with vaguely defined problem statements or only 'loose' specs leading the wayProven expertise with algorithms, distributed systems design and the software development lifecycleExperience building large scale data/event pipelinesExpertise designing solutions with relational SQL and NoSQL databases, including Postgres/MySQL, Cassandra, DynamoDBGood test driven development disciplineReasonable proficiency with Linux administration toolsProven ability to work effectively with remote teamsExperience With The Following Is DesirableGoPinot or other time-series/OLAP-style databaseIcebergKubernetesJenkinsParquetProtocol Buffers/GRPC
#J-18808-Ljbffr
Our Data Platform group at Crowdstrike is unique among its kind for being uncommonly customer-focused. In line with classic Data Platform groups, we build and operate systems to centralize all of the data from Falcon Sensors and 3rd Party sources derived from trillions of events/day. We also drive industry-leading innovation on a hyper scale security data lake that helps find bad actors and stop breaches.
As an E6 Engineer on the team - what would commonly be called "Senior Engineer" at most companies, SDE3 at Crowdstrike - you will contribute to the full spectrum of our systems, from foundational processing and data storage to scalable pipelines, frameworks, tools and applications that make that data available to other teams and systems.
One of your core responsibilities will be building and maintaining Self-service Data Platforms based upon Spark as the runtime, making it easy for customers to transform and access our data for analytics, machine learning, and threat hunting. Ultimately, you're empowering them to predictively build (among other things) automations that defend against such threats 'before' they appear in their own environments.
Another key value add for you will be contributions to a new graph database, which you will have a significant hand in building out.
Your primary toolset in your work will be Java/Spark, Kubernetes and AWS native tooling.
What You'll DoWrite highly fault-tolerant Java code within Apache Spark to produce platform products used by our customers to query our data for insight into active threat trends and related analyticsDesign, develop, and maintain ultra-high-scale data platforms that process petabytes of dataParticipate in technical reviews of our products and help us develop new features and enhance stabilityContinually help us improve the efficiency and reduce latency of our high-performance services to delight our customersResearch and implement new ways for both internal stakeholders as well as customers to query their data efficiently and extract results in the format they desireWhat You'll Need6+ years' experience combined between backend and data platform engineering roles4+ years of experience building data platform product(s) or features with (one of) Apache Spark, Flink or Iceberg, or with comparable tools in GCP3+ years of experience programming with Java, Scala or KotlinProven experience owning robust feature/product design end to end, yourself, especially with vaguely defined problem statements or only 'loose' specs leading the wayProven expertise with algorithms, distributed systems design and the software development lifecycleExperience building large scale data/event pipelinesExpertise designing solutions with relational SQL and NoSQL databases, including Postgres/MySQL, Cassandra, DynamoDBGood test driven development disciplineReasonable proficiency with Linux administration toolsProven ability to work effectively with remote teamsExperience With The Following Is DesirableGoPinot or other time-series/OLAP-style databaseIcebergKubernetesJenkinsParquetProtocol Buffers/GRPC
#J-18808-Ljbffr