Indotronix International Corporation

AWS Data Architect

Indotronix International Corporation, San Diego, CA, United States

Job Title: AWS Data Engineer

Location: Remote (San Diego, CA 92130)

Duration: 6 months (possibility of extension or conversion)

Job Description:

As the Senior Software Engineer, you will lead a team of data engineers in designing, building, and maintaining high-performance software system to manage analytical data pipelines that fuel the organization’s data strategy using software engineering best practices. Beyond technical expertise, you will also serve as a change leader, guiding teams through adopting new tools, technologies, and workflows to improve data management and processing.
This position requires extensive hands-on data system design and coding experience, as well as the development of modern data pipelines (AWS Step functions, Prefect, Airflow, Luigi, Python, Spark, SQL) and associated code in AWS.
You will work closely with stakeholders across the business to understand their data needs, ensure scalability, and foster a culture of innovation and learning within the data engineering team and beyond.

Key Responsibilities:

Be responsible for the overall architecture of a specific module within a product (e.g., Data-ingestion, near-real-time-data-processor, etc.), perform design and assist implementation considering system characteristics to produce optimal performance, reliability and maintainability.
Provide technical guidance to team members, ensuring they are working towards the product's architectural goals.
Create and manage RFCs (Request for Comments) and ADRs (Architecture Decision Records), Design notes and technical documentation for your module, following the architecture governance processes.
Lead a team of data engineers, providing mentorship, setting priorities, and ensuring alignment with business goals.
Architect, design, and build scalable data pipelines for processing large volumes of structured and unstructured data from various sources.
Collaborate with software engineers, architects, and product teams to design and implement systems that enable real-time and batch data processing at scale.
Be the go-to person for PySpark-based solutions, ensuring optimal performance and reliability for distributed data processing.
Ensure that data engineering systems adhere to the best data security, privacy, and governance practices in line with industry standards.
Perform code reviews for the product, ensuring adherence to company coding standards and best practices.
Develop and implement monitoring and alerting systems to ensure timely detection and resolution of data pipeline failures and performance bottlenecks.
Act as a champion for new technologies, helping ease transitions and addressing concerns or resistance from team members.

Ideal Candidate:

Experience leading a data engineering team with a strong focus on software engineering principles such as KISS, DRY, YAGNI etc.
Must have experience in owning large, complex system architecture and hands-on experience designing and implementing data pipelines across large-scale systems.
Experience implementing and optimizing data pipelines with AWS is a must.
Production delivery experience in Cloud-based PaaS Big Data related technologies (EMR, Snowflake, Data bricks etc.)
Experienced in multiple Cloud PaaS persistence technologies, and in-depth knowledge of cloud- based ETL offerings and orchestration technologies (AWS Step Function, Airflow etc.)
Experienced in stream-based and batch processing, applying modern technologies
Working experience with distributed file systems (S3, HDFC, ADLS), table formats (HUDI, Iceberg), and various open file formats (JSON, Parquet, Csv, etc.)
Strong programming experience in PySpark, SQL, Python, etc.
Database design skills including normalization/de-normalization and data warehouse design
Knowledge and understanding of relevant legal and regulatory requirements, such as SOX, PCI, HIPAA, Data Protection
Experience in the healthcare industry, a plus
A collaborative and informative mentality is a must.

Toolset:

AWS, preferably AWS certified Data Engineer and AWS certified Solutions Architect.
Proficiency in at least one programming language C#, GoLang, JavaScript or ReactJs.
Spark / Python / SQL.
Snowflake/ Databricks / Synapse / MS SQL Server.
ETL / Orchestration Tools (Step Function, DBT etc.).
ML / Notebooks.

Education and experience required:

Bachelors or master’s in computer science, Information Systems, or an engineering field or relevant experience.
10+ years of related experience in developing data solutions and data movement.