Sanmina-SCI Systems de México

Data Scientist – Remote

Sanmina-SCI Systems de México, San Jose, California, United States, 95199

Viking Enterprise Solutions is a supplier of storage array systems, providing solutions to customers who are seeking a flexible, scalable and resilient platform on which to build a storage, public cloud or private cloud service. We are looking for an experienced software developer to work on robust and scalable storage and AI/ML management solutions.

Job Purpose:

This individual contributor is primarily responsible for designing and developing data pipelines and automation for data acquisition and ingestion of raw data from multiple data sources and data formats by transforming, cleansing, and storing data for consumption. This role is also responsible for developing detailed problem statements outlining hypotheses and their effect on target clients/customers, analyzing and investigating complex data sets and summarizing key characteristics, selecting, manipulating and transforming data into features used in machine learning algorithms, training statistical models, deploying and maintaining reliable and efficient models through production, verifying model performance, and collaborating with internal and external stakeholders across domains to develop and deliver statistical driven outcomes.

The successful candidate will work in a team responsible for architecting, building and maintaining management applications for our storage and AI/ML systems utilizing open source and third-party software.

Nature of Duties/Responsibilities:

Familiarity with data platforms and applications

Solid foundation in machine learning, applied stats, and/or experimentation

Experience with statistical and data programming languages, including Pyspark / Spark / Python

Experience with medium-to-large data sets (>1 M rows)

ML Experience

Applied experience of frameworks like PyTorch, Keras, Tensorflow, etc

Comfortable operating in an SDLC environment and deploying production engineering models and code

Education and Experience:

Skills and knowledge: Essential

Completes work assignments autonomously and supports business-specific projects by applying expertise in subject area and business knowledge to generate creative solutions; encourages team members to adapt to and follow all procedures and policies. Collaborates cross-functionally and/or externally to achieve effective business decisions; provides recommendations and solves complex problems; escalates high-priority issues or risks, as appropriate; monitors progress and results.

Designs and develops data pipelines and automation for data acquisition and ingestion of raw data from multiple data sources and data formats by transforming, cleansing, and storing data for consumption by downstream processes; writing and optimizing diverse Python and SQL queries; and demonstrating knowledge of database fundamentals.

Analyzes and investigates complex data sets and summarizes key characteristics by employing data visualization methods; and determining how best to manipulate data sources to discover patterns, spot anomalies, test hypotheses, and/or check assumptions.

Selects, manipulates, and transforms data into features used in machine learning algorithms by leveraging techniques to conduct dimensionality reduction, feature importance, and feature selection.

Trains statistical models by using algorithms and data mining techniques; testing models with various algorithms to assess the input dataset and related features; and applying techniques to prevent overfitting such as cross-validation.

Deploys and maintains reliable and efficient models through production.

Verifies model performance by demonstrating expertise in the practice of a variety of model validation techniques to assess and discriminate the goodness of model fit; and leveraging feedback and output to manage and strengthen model performance.

Collaborates with internal and external stakeholders across domains to develop and deliver statistical driven outcomes by delivering insights and values from heterogeneous data to investigate complex problems for multiple use cases; driving informed decision-making; and presenting findings to both technical and non-technical audiences.

Additional Requirements:

Experience working with data visualization methods.

Machine learning and/or algorithmic experience.

Statistical analysis and modeling experience.

Minimum One (1) year Python coding experience.

Bachelors degree in Mathematics, Statistics, Computer Science, Engineering, Economics, Public Health, or related field.

3-5 years of experience in data science or a directly related field.

Additional equivalent work experience in a directly related field may be substituted for the degree requirement. Advanced degrees may be substituted for the work experience requirements.

Sanmina is an Equal Opportunity Employer – M/F/Veteran/Disability/Sexual Orientation/Gender Identity

Salary range (annual): $100,000 – $150,000 per year

In addition, Sanmina provides a variety of benefits including health insurance coverage, life and disability insurance, savings plan, Company paid holidays and paid time off (PTO) for vacation and/or personal business.

#J-18808-Ljbffr