Python AWS Data engineer
InfoVision Inc., Dallas, TX, United States
Senior AWS Data Engineer
Dallas, TX (Onsite)
Longterm
Job Description
Looking for a highly technical, hands-on Data Engineer for our Data Lake Team that can independently lead data engineering projects and strive to proactively improve process efficiency, making recommendations for process and system improvements where applicable. The Data Engineer role will be responsible for not only understanding data pipelines but, event streaming applications, and how to build systems that handle massive amounts of data while making it consumable by other application teams, users and data scientists.
You will also be assisting in the design and architecture of highly scalable, fault tolerant infrastructure capable of processing millions of operations per minute coming from millions of TVs, efficiently store petabytes of data and provide fast insights from the data. You will also be working with teams across the client enterprise to bring their data into our Big Data ecosystem, monitor data quality for cleanliness and fix discrepancies. Ensure data accuracy through validation tasks, perform root cause analysis and implement solutions for data prep and cleanliness, review data at all granular/aggregate levels, and versioning.
About You
- You have a BS or MS in Computer Science or similar relevant field
- You work well in a collaborative, team-based environment
- You are an experienced engineering with 5+ years of experience
- You have a passion for big data structures
- You possess strong organizational and analytical skills related to working with structured and unstructured data operations
- You have experience implementing and maintaining high performance / high availability data structures
- You are most comfortable operating within cloud based eco systems
- You enjoy leading projects and mentoring other team members
Specific Skills
- Experience or knowledge of relational SQL and NoSQL databases, including Postgres and Cassandra.
- Strong understanding of in-memory processing and data formats (Avro, Parquet, Json etc.)
- Experience or knowledge of AWS cloud services: EC2, MSK, S3, RDS, SNS, SQS
- Experience or knowledge of stream-processing systems: i.e., Storm, Spark-Structured-Streaming, Kafka consumers.
- Experience or knowledge of object-oriented/object function scripting languages: i.e., Python, Java, Scala, R, SQL.
- Experience or knowledge of data pipeline and workflow management tools: i.e., AWS Data Pipeline, Apache Airflow, Argo.
- Experience or knowledge of big data tools: i.e., Hadoop, Spark, Kafka.
- Experience or knowledge of software engineering tools/practices: i.e., Github, VSCode, CI/CD
- Hands-on experience in designing and maintaining data schema life-cycles.