Quick Med Claims

Data Engineer Business Intelligence and Data Warehousing

Quick Med Claims, Pittsburgh, Pennsylvania, us, 15289

Quick Med Claims, LLC (QMC) is a leading Revenue Cycle Management (RCM) organization specializing in Emergency Medical Services (EMS). We leverage data to drive business intelligence, optimize decision-making, and enhance performance offering innovative ways for healthcare organizations to optimize their financial and operational processes. Our growing team is seeking an experienced and driven Data Engineer. This lead role is critical in ensuring that our data infrastructure and architecture are robust, scalable, and optimized for near real-time analytics, reporting, and machine learning.

This Position is 100% Remote

Job Purpose/Summary

The Data Engineer will be responsible for architecting and optimizing data systems, ensuring seamless integration between various data sources, and enabling near real-time or batch processing capabilities. You will guide the design and implementation of robust data pipelines that support advanced analytics, reporting, and machine learning initiatives. This position will shape the future of the QMC data infrastructure. The Data Engineer will work closely with cross-functional teams and lead the design and development of data pipelines in a fast-paced, collaborative environment. This role will be pivotal in supporting and optimizing our AWS Redshift data warehouse while contributing to the migration to a more modern Databricks Lakehouse architecture. In addition, the role will involve applying machine learning in Databricks to deliver actionable insights for specific business problems. You will also work closely with our Sisense BI platform and Jaspersoft Reporting platform to support business intelligence and reporting needs.

Essential Duties & Responsibilities

AWS Redshift & AWS Ecosystem Support: Support and optimize the existing AWS Redshift data warehouse and Hanger ETL pipeline. Leverage the AWS ecosystem including S3, Spectrum, Redshift, Glue, SQL, and Hanger ETL to integrate, transform, and load data. Performance Tuning: Implement data optimizations to improve the performance of large datasets, including data partitioning, indexing, and query performance tuning Databricks Lakehouse Architecture: Lead the migration from AWS Redshift to Databricks Lakehouse, implementing Delta Lake for data storage and processing. Optimize large-scale data processing pipelines and workflows. Data Pipeline Development: Design, develop, and maintain scalable ETL pipelines using Python, Spark, SQL, and Databricks, ensuring data quality, consistency, and timeliness. Data Integration: Integrate structured, semi-structured, and unstructured data from various internal and external sources from both on premise and cloud platforms such as AWS. ETL Frameworks & Automation: Utilize ETL frameworks and scheduling tools (e.g., Airflow, Databricks Jobs) for automated monitoring, testing, and validation for data quality and pipeline health Data Analysis & Mapping: Perform data analysis and data mapping from SQL Server-based RCM transactional systems and other source systems to transform data into business intelligence and reporting formats residing in data warehouse Data Modeling: Apply dimensional modeling techniques (e.g., star schemas) to ensure effective data organization and modeling for BI, reporting, and machine learning. Slowly Changing Dimensions (SCD): Implement SCD techniques (Types 1, 2, and 3) to ensure accurate tracking and storage of historical data changes, particularly in operational and transactional data. Business Intelligence & Reporting: Work with Sisense to develop interactive dashboards and with Jaspersoft Reporting to develop and enhance reports that support operational and strategic decision-making. Machine Learning in Databricks: Implement and integrate large language models (LLMs) to solve specific business problems in Databricks, such as improving billing processes, predicting trends, and enhancing operational efficiency. DevOps & Infrastructure: Work with DevOps tools such as Kubernetes, Jenkins, Github, Slack, and Terraform to automate deployments and infrastructure management. Support cloud infrastructure monitoring tools like CloudWatch and Databricks Monitoring for performance tracking. Data Governance & Security: Ensure data security and compliance with industry regulations, including HIPAA, by adhering to best practices in data governance and privacy standards as well as managing access control and encryption for sensitive data. Documentation & Best Practices: Maintain documentation for data models, data workflows, ETL pipelines, machine learning models, system architectures, and design and coding standards. Promote best practices in data engineering, DevOps, and cloud infrastructure management. Problem-Solving & Communication: Collaborate with data engineers, data analysts, business analysts, and other stakeholders to ensure data availability for reporting, modeling, and decision-making. Effectively communicate complex technical concepts to non-technical stakeholders. Possess strong problem-solving and analytical skills. Leadership: Lead projects to successful completion. Lead, mentor, and provide guidance to junior team members, promoting best practices and code quality. Continuous Improvement: Stay current with emerging technologies, methodologies, and industry trends. Implement new tools and technologies as necessary to improve the data engineering workflow. Other Responsibilities

Adhere to all QMC HIPAA privacy policies and procedures. This includes always maintaining the confidentiality and security of sensitive patient information. Ensures consistent adherence to company attendance policies. Requirements

Education

Education: Bachelor's degree in Computer Science, Engineering, Mathematics, or a related field. Experience, Skills and Abilities

5+ years of experience in data engineering, with a focus on ETL pipeline design and development, data warehouse design and management, structured and unstructured database management systems, and cloud technologies. Experience with AWS Redshift, including integration with S3, Spectrum, Redshift, Lambda, and Glue for data processing and transformation. 1+ years of hands-on experience with Databricks Lakehouse, Delta Lake, and Unity Catalog including data lake management, and optimization of storage and processing. Solid proficiency in Python and SQL for developing ETL pipelines, querying relational databases, and transforming data. Experience with ETL tools, ETL frameworks, and scheduling tools like Apache Airflow, Databricks Jobs, AWS Glue, Talend, and Informatica. Strong background in data modeling, including dimensional modeling (star and snowflake schemas) to support business intelligence and reporting tools. Experience implementing Slowly Changing Dimensions (SCD) techniques to manage and track historical data changes. Expertise in machine learning integration within Databricks to solve business problems and optimize business processes. Familiarity with DevOps practices and tools such Jenkins, Github, Slack, and Terraform. Experience with containerization tools like Docker and Kubernetes for packaging and deploying applications. Basic understanding of cloud infrastructure management and monitoring using tools like CloudWatch and Databricks Monitoring. Experience working in an Agile development environment, using Jira and Confluence to manage tasks and collaboration according to the Software Development Life Cycle (SDLC). Preferred Qualifications:

Experience with Delta Lake in Databricks and data lake best practices for large-scale data storage and management. Familiarity with data privacy regulations, especially in healthcare (HIPAA). Experience with containerization tools like Docker and Kubernetes for packaging and deploying applications. Experience with administration and management of Sisense BI platform. Experience with JavaScript and CSS. Experience with leading teams and projects. Experience in Healthcare or RCM.