Tech One IT
Big Data Developer
Tech One IT, Phoenix, Arizona, United States
Job Summary: We are looking for an experienced Senior Big Data Engineer with at least 6 years of hands-on experience in Big Data technologies, specifically Apache Spark , CouchDB , and Google Cloud Platform (GCP) . This role is ideal for a highly skilled individual who thrives in a dynamic environment and is passionate about developing scalable, high-performance data solutions. The candidate should have extensive expertise in designing, implementing, and managing Big Data systems on the cloud, especially in Google Cloud Platform . Key Responsibilities: Data Pipeline Development: Design and implement robust and scalable data pipelines using Apache Spark to process large volumes of structured and unstructured data efficiently in both batch and real-time processing modes. Spark Optimization: Optimize Spark jobs for performance, including tuning configurations, partitioning strategies, and memory management to handle complex computations over large datasets. CouchDB Integration: Architect, manage, and optimize CouchDB instances for efficient storage and retrieval of NoSQL data, ensuring high availability, fault tolerance, and scalability in a distributed environment. Cloud Infrastructure: Leverage Google Cloud Platform (GCP) services such as BigQuery , Cloud Dataproc , Cloud Functions , DataFlow , and Google Cloud Storage to build cloud-native solutions for data storage, processing, and analysis. Data Modeling and Management: Design data models that efficiently integrate with CouchDB and other databases. Ensure optimal schema design, indexing, and query performance to support high-volume, low-latency operations. Real-Time Data Processing: Utilize Apache Kafka , Spark Streaming , and GCP Pub/Sub to build systems that support real-time data ingestion and processing, enabling near-instantaneous insights and analytics. Cloud-Based Big Data Solutions: Implement and maintain fully automated, scalable data pipelines in the cloud using GCP Kubernetes Engine (GKE) , Docker , and Terraform . Collaboration with Teams: Work closely with cross-functional teams, including data scientists, analysts, and business stakeholders, to understand data requirements and deliver solutions that meet the business needs. Monitoring and Maintenance: Continuously monitor the performance and health of data pipelines and cloud services using tools like Google Stackdriver to ensure uptime, reliability, and prompt resolution of issues. Documentation & Best Practices: Maintain detailed documentation of data pipelines, infrastructure, and processes. Promote best practices for Big Data engineering, data security, and cloud cost optimizatio