InfoVision Inc.

Python AWS Data engineer

InfoVision Inc., Dallas, TX, United States

Senior AWS Data Engineer

Dallas, TX (Onsite)

Longterm

Job Description

Looking for a highly technical, hands-on Data Engineer for our Data Lake Team that can independently lead data engineering projects and strive to proactively improve process efficiency, making recommendations for process and system improvements where applicable. The Data Engineer role will be responsible for not only understanding data pipelines but, event streaming applications, and how to build systems that handle massive amounts of data while making it consumable by other application teams, users and data scientists.

You will also be assisting in the design and architecture of highly scalable, fault tolerant infrastructure capable of processing millions of operations per minute coming from millions of TVs, efficiently store petabytes of data and provide fast insights from the data. You will also be working with teams across the client enterprise to bring their data into our Big Data ecosystem, monitor data quality for cleanliness and fix discrepancies. Ensure data accuracy through validation tasks, perform root cause analysis and implement solutions for data prep and cleanliness, review data at all granular/aggregate levels, and versioning.

About You

You have a BS or MS in Computer Science or similar relevant field
You work well in a collaborative, team-based environment
You are an experienced engineering with 5+ years of experience
You have a passion for big data structures
You possess strong organizational and analytical skills related to working with structured and unstructured data operations
You have experience implementing and maintaining high performance / high availability data structures
You are most comfortable operating within cloud based eco systems
You enjoy leading projects and mentoring other team members

Specific Skills

Experience or knowledge of relational SQL and NoSQL databases, including Postgres and Cassandra.
Strong understanding of in-memory processing and data formats (Avro, Parquet, Json etc.)
Experience or knowledge of AWS cloud services: EC2, MSK, S3, RDS, SNS, SQS
Experience or knowledge of stream-processing systems: i.e., Storm, Spark-Structured-Streaming, Kafka consumers.
Experience or knowledge of object-oriented/object function scripting languages: i.e., Python, Java, Scala, R, SQL.
Experience or knowledge of data pipeline and workflow management tools: i.e., AWS Data Pipeline, Apache Airflow, Argo.
Experience or knowledge of big data tools: i.e., Hadoop, Spark, Kafka.
Experience or knowledge of software engineering tools/practices: i.e., Github, VSCode, CI/CD
Hands-on experience in designing and maintaining data schema life-cycles.