BayOne Solutions
Hi,
Job Title: Senior Data Engineer
Location: Sunnyvale, CA (Hybrid role)
Duration: Contract to Hire Role
Interview Process: Onsite
Payrate: 75 to 80/hr. on C2C
Responsibilities:
Key Responsibilities:
- Design, develop, and maintain data pipelines using Spark, PySpark, and other big data technologies to process large datasets.
- Build and optimize complex SQL queries for data extraction, transformation, and loading (ETL) in both batch and real-time workflows.
- Work with cloud platforms such as AWS, Azure, or GCP to deploy and manage data infrastructure.
- Collaborate with cross-functional teams to gather data requirements and deliver data solutions that meet business needs.
- Ensure data quality, integrity, and consistency across different stages of data processing.
- Optimize and troubleshoot performance of Spark and SQL-based applications.
- Develop and implement data models and data architectures to support analytics and business intelligence initiatives.
- Create and maintain automated data workflows and ensure they are scalable and maintainable.
- Document data processes, architectures, and pipelines to ensure knowledge sharing across teams.
Required Skills & Qualifications:
- Proven experience as a Data Engineer with expertise in Spark and PySpark for distributed data processing.
- Strong proficiency in SQL, including experience with writing complex queries, performance tuning, and database management.
- Hands-on experience with cloud platforms (AWS, Azure, or GCP) and their data processing services such as AWS EMR, Azure Databricks, Google BigQuery, etc.
- In-depth understanding of ETL processes, data modelling, and data warehousing concepts.
- Experience working with large datasets and optimizing data pipelines for efficiency and scalability.
- Familiarity with data storage technologies like Hadoop, HDFS, and cloud-based data lakes.
- Knowledge of version control systems like Git, and experience working in Agile environments.
- Excellent problem-solving skills and the ability to work in a fast-paced, collaborative team environment.
- Strong communication skills and the ability to explain technical concepts to non-technical stakeholders.
Preferred Qualifications:
- Experience with containerization and orchestration tools like Docker, Kubernetes, or Apache Airflow.
- Familiarity with machine learning concepts and integration of data pipelines with ML workflows.
- Experience with real-time data streaming technologies (Kafka, Apache Flink, etc.).
- Bachelor’s degree in Computer Science, Engineering, Mathematics, or a related field.