Chemical Abstracts Service
BN30P4 Data Engineer
Chemical Abstracts Service, Columbus, Ohio, United States, 43224
Description
CAS uses intuitive technology, unparalleled scientific content and unmatched human expertise to help companies create groundbreaking innovations that benefit the world. As the scientific information solutions division of the American Chemical Society, CAS manages the largest curated reservoir of scientific knowledge, and for 115 years, has helped innovators mine, assess and apply that information to keep businesses thriving. The CAS team is global, diverse, endlessly curious and strives to make scientific insights accessible to innovators worldwide. CAS is currently seeking a Data Engineer. This position will be located in our headquarters in Columbus, Ohio. Job Accountabilities: Designs and develops complex and large-scale data structures and pipelines for ingestion and transformation of data for analytical or operational uses across a wide variety of business needs and enterprise data domains. Ensures production stability for data processing workflows used by analytics groups and data scientists who are interrogating information for predictive analytics, machine learning and data mining purposes. Writes complex ETL (Extract / Transform / Load) processes, designs database systems and develops tools for real-time and offline analytic processing. Develops frameworks, standards and reference material for architecture and associated products. Collaborates with data science team to transform data and integrate algorithms and models into production systems. Uses in-depth knowledge of Hadoop architecture, HDFS commands and experience designing and optimizing queries to build scalable, modular and efficient data pipelines. Uses advanced programming skills in Python, Java or any of the major languages to build robust data pipelines and dynamic systems. Integrates data from a variety of sources, assuring that they adhere to data quality and accessibility standards. Experiments with available tools and advises on new tools in order to determine optimal solutions given the requirements dictated by the model/use case Behaves as mentor to less senior team members to provide technical advice. Applies knowledge of company systems and products to consult and advise on additional efforts across multiple domains spanning broader enterprise
Qualifications: Master's degree (preferred) or Bachelor's degree in Computer Science or similar discipline with 8+ years of software engineering experience 5+ years' experience in data integration, ETL, data warehouses, data profiling, data governance understanding 3+ years' hands on experience in big data environments such as Cloudera or Hortonworks Experience with DevOps, Continuous Integration and Continuous Delivery (Maven, Jenkins, Stash, Ansible, Docker) 3+ years' experience with programming in Scala, Spark, Python, JavaScript and Java, as well as strong Unix shell skills Minimum 2 years' experience in Cloud providers like AWS (preferable) Experience building Data Ingestion on the cloud (using tools like Glue) Understanding of principles, best practices and trade-offs of schema design for both Relational and NoSQL database systems Solid understanding of Big Data NoSQL databases/technologies (MarkLogic, Hbase, Hive, Spark, MongoDB) Strong written and verbal communication skills
Desired, but not required: Knowledge and experience in chemistry, drug discovery/development, or medical-related industry
CAS offers a competitive salary and comprehensive benefits package, including a generous vacation plan, medical, dental, vision insurance plans, and employee savings and retirement plans. Candidates for this position must be authorized to work in the United States and not require work authorization sponsorship by our company for this position now or in the future. EEO/Minority/Female/Disabled/Veteran Qualifications Education
Bachelor of Science of Computer Science (required)
Equal Opportunity Employer/Protected Veterans/Individuals with Disabilities
The contractor will not discharge or in any other manner discriminate against employees or applicants because they have inquired about, discussed, or disclosed their own pay or the pay of another employee or applicant. However, employees who have access to the compensation information of other employees or applicants as a part of their essential job functions cannot disclose the pay of other employees or applicants to individuals who do not otherwise have access to compensation information, unless the disclosure is (a) in response to a formal complaint or charge, (b) in furtherance of an investigation, proceeding, hearing, or action, including an investigation conducted by the employer, or (c) consistent with the contractor's legal duty to furnish information. 41 CFR 60-1.35(c)
CAS uses intuitive technology, unparalleled scientific content and unmatched human expertise to help companies create groundbreaking innovations that benefit the world. As the scientific information solutions division of the American Chemical Society, CAS manages the largest curated reservoir of scientific knowledge, and for 115 years, has helped innovators mine, assess and apply that information to keep businesses thriving. The CAS team is global, diverse, endlessly curious and strives to make scientific insights accessible to innovators worldwide. CAS is currently seeking a Data Engineer. This position will be located in our headquarters in Columbus, Ohio. Job Accountabilities: Designs and develops complex and large-scale data structures and pipelines for ingestion and transformation of data for analytical or operational uses across a wide variety of business needs and enterprise data domains. Ensures production stability for data processing workflows used by analytics groups and data scientists who are interrogating information for predictive analytics, machine learning and data mining purposes. Writes complex ETL (Extract / Transform / Load) processes, designs database systems and develops tools for real-time and offline analytic processing. Develops frameworks, standards and reference material for architecture and associated products. Collaborates with data science team to transform data and integrate algorithms and models into production systems. Uses in-depth knowledge of Hadoop architecture, HDFS commands and experience designing and optimizing queries to build scalable, modular and efficient data pipelines. Uses advanced programming skills in Python, Java or any of the major languages to build robust data pipelines and dynamic systems. Integrates data from a variety of sources, assuring that they adhere to data quality and accessibility standards. Experiments with available tools and advises on new tools in order to determine optimal solutions given the requirements dictated by the model/use case Behaves as mentor to less senior team members to provide technical advice. Applies knowledge of company systems and products to consult and advise on additional efforts across multiple domains spanning broader enterprise
Qualifications: Master's degree (preferred) or Bachelor's degree in Computer Science or similar discipline with 8+ years of software engineering experience 5+ years' experience in data integration, ETL, data warehouses, data profiling, data governance understanding 3+ years' hands on experience in big data environments such as Cloudera or Hortonworks Experience with DevOps, Continuous Integration and Continuous Delivery (Maven, Jenkins, Stash, Ansible, Docker) 3+ years' experience with programming in Scala, Spark, Python, JavaScript and Java, as well as strong Unix shell skills Minimum 2 years' experience in Cloud providers like AWS (preferable) Experience building Data Ingestion on the cloud (using tools like Glue) Understanding of principles, best practices and trade-offs of schema design for both Relational and NoSQL database systems Solid understanding of Big Data NoSQL databases/technologies (MarkLogic, Hbase, Hive, Spark, MongoDB) Strong written and verbal communication skills
Desired, but not required: Knowledge and experience in chemistry, drug discovery/development, or medical-related industry
CAS offers a competitive salary and comprehensive benefits package, including a generous vacation plan, medical, dental, vision insurance plans, and employee savings and retirement plans. Candidates for this position must be authorized to work in the United States and not require work authorization sponsorship by our company for this position now or in the future. EEO/Minority/Female/Disabled/Veteran Qualifications Education
Bachelor of Science of Computer Science (required)
Equal Opportunity Employer/Protected Veterans/Individuals with Disabilities
The contractor will not discharge or in any other manner discriminate against employees or applicants because they have inquired about, discussed, or disclosed their own pay or the pay of another employee or applicant. However, employees who have access to the compensation information of other employees or applicants as a part of their essential job functions cannot disclose the pay of other employees or applicants to individuals who do not otherwise have access to compensation information, unless the disclosure is (a) in response to a formal complaint or charge, (b) in furtherance of an investigation, proceeding, hearing, or action, including an investigation conducted by the employer, or (c) consistent with the contractor's legal duty to furnish information. 41 CFR 60-1.35(c)