BayOne Solutions
Hi,
Job Title: Senior Data Engineer
Location: Sunnyvale, CA (Hybrid role)
Duration: Contract to Hire Role
Interview Process: Onsite
Payrate: 75 to 80/hr. on C2C
Responsibilities:
Key Responsibilities:
Design, develop, and maintain data pipelines using Spark, PySpark, and other big data technologies to process large datasets.
Build and optimize complex SQL queries for data extraction, transformation, and loading (ETL) in both batch and real-time workflows.
Work with cloud platforms such as AWS, Azure, or GCP to deploy and manage data infrastructure.
Collaborate with cross-functional teams to gather data requirements and deliver data solutions that meet business needs.
Ensure data quality, integrity, and consistency across different stages of data processing.
Optimize and troubleshoot performance of Spark and SQL-based applications.
Develop and implement data models and data architectures to support analytics and business intelligence initiatives.
Create and maintain automated data workflows and ensure they are scalable and maintainable.
Document data processes, architectures, and pipelines to ensure knowledge sharing across teams.
Required Skills & Qualifications:
Proven experience as a Data Engineer with expertise in Spark and PySpark for distributed data processing.
Strong proficiency in SQL, including experience with writing complex queries, performance tuning, and database management.
Hands-on experience with cloud platforms (AWS, Azure, or GCP) and their data processing services such as AWS EMR, Azure Databricks, Google BigQuery, etc.
In-depth understanding of ETL processes, data modelling, and data warehousing concepts.
Experience working with large datasets and optimizing data pipelines for efficiency and scalability.
Familiarity with data storage technologies like Hadoop, HDFS, and cloud-based data lakes.
Knowledge of version control systems like Git, and experience working in Agile environments.
Excellent problem-solving skills and the ability to work in a fast-paced, collaborative team environment.
Strong communication skills and the ability to explain technical concepts to non-technical stakeholders.
Preferred Qualifications:
Experience with containerization and orchestration tools like Docker, Kubernetes, or Apache Airflow.
Familiarity with machine learning concepts and integration of data pipelines with ML workflows.
Experience with real-time data streaming technologies (Kafka, Apache Flink, etc.).
Bachelors degree in Computer Science, Engineering, Mathematics, or a related field.