Smart IT Frame
Sr. Data Engineer
Smart IT Frame, San Francisco, California, United States, 94199
Job Title: Sr. Data Engineer Location: San Francisco, CA (Hybrid) Work Type: Full Time Job Description :- Create and maintain optimal data pipeline architecture Build data pipelines that transform raw, unstructured data into formats that data analyst can use to for analysis Assemble large, complex data sets that meet functional / non-functional business requirements Identify, design, and implement internal process improvements: automating manual processes, optimizing data delivery, re-designing infrastructure for greater scalability, etc. Build the infrastructure required for optimal extraction, transformation, and delivery of data from a wide variety of data sources using SQL and AWS Big Data technologies Work with stakeholders including the Executive, Product, Engineering, and program teams to assist with data-related technical issues and support their data infrastructure needs. Develop and maintain scalable data pipelines and builds out new integrations and processes required for optimal extraction, transformation, and loading of data from a wide variety of data sources using scalable distributed Data technologies Implements processes and systems to validate data, monitor data quality, ensuring production data is always accurate and available for key stakeholders and business processes that depend on it Write unit/integration tests, adopt Test-driven development, contribute to engineering wiki, and document work Performs root cause analysis on internal and external data and processes to answer specific business questions and identify opportunities for improvement About you: 6 yrs experience and bachelors degree in computer science, Informatics, Information Systems or a related field; or equivalent work experience In-depth working experience of distributed systems Hadoop/MapReduce, Spark, Hive, Kafka and Oozie/Airflow At least 5 years of solid production quality coding experience in data pipeline implementation in Java, Scala and Python Experience with AWS cloud services: EC2, EMR, RDS Experience in GIT, JIRA, Jenkins, Shell scripting Familiar with Agile methodology, test-driven development, source control management and test automation Experience supporting and working with cross-functional teams in a dynamic environment You're passionate about data and building efficient data pipelines You have excellent listening skills and are empathetic to others You believe in simple and elegant solutions and give paramount importance to quality You have a track record of building fast, reliable, and high-quality data pipelines Nice to have skills: Experience building Marketing Data pipelines including Direct Mail will be a big plus Experience with Snowflake and Salesforce Marketing Cloud Working knowledge of open-source ML frameworks and end-to-end model development life cycle Previous working experience with running containers (Docker/LXC) in a production environment using one of the container orchestration services.