Big Data Engineer (The Data Pipeline Innovator)
Unreal Gigs, San Francisco, CA, United States
Are you passionate about handling massive datasets and building the infrastructure that enables complex data analysis and machine learning at scale? Do you excel in creating robust, scalable data pipelines that fuel data-driven decision-making? If you’re ready to tackle the challenges of big data, our client has the perfect role for you. We’re seeking a Big Data Engineer (aka The Data Pipeline Innovator) to architect and maintain high-performance data systems that empower analytics and support advanced data processing needs.
As a Big Data Engineer at our client, you’ll collaborate with data scientists, analysts, and software engineers to design, implement, and optimize big data platforms. Your expertise in data engineering, distributed systems, and cloud infrastructure will be critical to ensuring that our data ecosystem is efficient, reliable, and scalable.
Key Responsibilities:
- Design and Build Scalable Data Pipelines:
- Architect and implement data pipelines for ETL processes using tools like Apache Spark, Kafka, and Hadoop. You’ll create data workflows that handle high-volume, high-velocity data and ensure seamless integration across systems.
- Develop and manage data storage solutions (e.g., HDFS, S3, Cassandra) that are optimized for performance and cost-efficiency. You’ll configure distributed processing systems to support efficient data retrieval and transformation.
- Work closely with data scientists, analysts, and other engineers to align big data architecture with analytics goals. You’ll ensure data availability and integrity across systems to support business objectives.
- Develop processes and tools to monitor data quality and enforce data governance policies. You’ll ensure data is accurate, reliable, and secure through regular checks and validation processes.
- Use tools like Apache Airflow or AWS Glue to automate data workflows and reduce manual processing. You’ll implement scripts and automation that streamline data handling and improve efficiency.
- Use monitoring tools to track system performance and address issues proactively. You’ll troubleshoot and resolve any bottlenecks or failures to maintain optimal data processing capabilities.
- Keep up with advancements in big data technologies and tools. You’ll integrate new techniques and platforms that align with business needs and promote innovation.
Required Skills:
- Big Data Platform Proficiency: Extensive experience with big data technologies such as Apache Spark, Hadoop, Kafka, and Hive. You’re skilled at handling high-volume data and distributed processing.
- Data Pipeline and ETL Knowledge: Proven ability to design, build, and maintain ETL processes for massive datasets. You can handle both real-time and batch data processing requirements.
- Programming and Scripting: Proficiency in programming languages like Python, Java, or Scala for data processing and automation. Experience with SQL for data querying and manipulation is essential.
- Cloud Data Services Expertise: Familiarity with cloud platforms such as AWS, GCP, or Azure, including their big data and storage services (e.g., S3, BigQuery, Azure Data Lake).
- Data Quality and Governance: Strong understanding of data quality standards and governance practices, with experience in implementing data validation and monitoring frameworks.
Educational Requirements:
- Bachelor’s or Master’s degree in Computer Science, Data Engineering, Information Technology, or a related field. Equivalent experience in data engineering or big data management may be considered.
- Certifications in big data or cloud technologies (e.g., Cloudera Certified Data Engineer, AWS Certified Big Data – Specialty, Google Professional Data Engineer) are a plus.
Experience Requirements:
- 5+ years of experience in data engineering, with at least 3+ years focusing on big data technologies and high-scale data environments.
- Experience in distributed systems and large-scale data storage management.
- Familiarity with containerization (Docker, Kubernetes) for deploying data processing environments is advantageous.
- Health and Wellness: Comprehensive medical, dental, and vision insurance plans with low co-pays and premiums.
- Paid Time Off: Competitive vacation, sick leave, and 20 paid holidays per year.
- Work-Life Balance: Flexible work schedules and telecommuting options.
- Professional Development: Opportunities for training, certification reimbursement, and career advancement programs.
- Wellness Programs: Access to wellness programs, including gym memberships, health screenings, and mental health resources.
- Life and Disability Insurance: Life insurance and short-term/long-term disability coverage.
- Employee Assistance Program (EAP): Confidential counseling and support services for personal and professional challenges.
- Tuition Reimbursement: Financial assistance for continuing education and professional development.
- Community Engagement: Opportunities to participate in community service and volunteer activities.
- Recognition Programs: Employee recognition programs to celebrate achievements and milestones.