ZipRecruiter
Data Scientist (PhD)
ZipRecruiter, Washington, District of Columbia, us, 20022
Job Description
Location: 100% Remote within the United States
Job Overview : As a Data Scientist, you will be responsible for managing the complete Model Development Life Cycle (MDLC), from problem definition to model deployment and monitoring. You will work closely with cross-functional teams to deliver machine learning models that support business objectives and drive innovation. The ideal candidate should have a strong background in data analysis, feature engineering, and model selection, along with a deep understanding of model deployment and ongoing model maintenance. Key Responsibilities : Problem Definition : Collaborate with business stakeholders to define and structure data-driven problems. Translate business objectives into machine learning tasks (e.g., classification, regression, clustering). Data Collection & Preprocessing : Gather, clean, and preprocess data from multiple sources (e.g., databases, APIs, publicly available datasets). Handle missing data, outliers, and apply normalization techniques. Exploratory Data Analysis (EDA) : Use statistical analysis and data visualization techniques to identify key patterns, trends, and correlations in the data. Feature Engineering : Create, extract, and transform features to improve model performance. Apply techniques such as feature extraction, selection, and transformation. Model Selection & Training : Select the appropriate machine learning models based on the problem at hand (e.g., supervised learning, unsupervised learning, deep learning). Train models using tools like
Scikit-learn ,
TensorFlow , or
PyTorch . Evaluate model performance using relevant metrics (e.g., RMSE, accuracy, F1-score, ROC-AUC) and optimize hyperparameters to ensure robustness. Deploy models in a production environment using tools like
Flask ,
FastAPI ,
Docker , and
Kubernetes . Ensure scalability and integration with existing systems. Model Monitoring & Maintenance : Monitor model performance post-deployment, address model drift, and retrain models as needed. Ensure continuous accuracy and relevance of models in real-world scenarios. Model Interpretation & Communication : Provide clear and actionable insights through model interpretation techniques such as feature importance and SHAP values. Present results to both technical and non-technical stakeholders. Qualifications : PhD degree
in Computer Science, Data Science, Statistics, Engineering, or a related field. 3+ years
of experience in machine learning, statistical modeling, and data science. Proficiency in Python, SQL, and experience with libraries such as
Pandas ,
NumPy ,
Scikit-learn ,
TensorFlow , and
Keras . Hands-on experience with model deployment tools such as
Flask ,
Docker ,
Kubernetes , and cloud platforms like
AWS ,
Azure , or
Google Cloud . Strong knowledge of data preprocessing techniques, feature engineering, and exploratory data analysis. Experience with hyperparameter tuning techniques (e.g., Grid Search, Bayesian Optimization). Familiarity with model monitoring tools such as
MLflow ,
Prometheus , or
Grafana . Excellent communication skills, with the ability to translate technical results into actionable insights for stakeholders. Strong problem-solving skills and the ability to work on complex, data-driven projects. Preferred Qualifications : Experience with
deep learning
models (e.g., CNNs, RNNs, LSTMs). Familiarity with NLP and time-series analysis. Knowledge of big data tools like
Spark
or
Hadoop . Experience in sectors such as healthcare, finance, or e-commerce.
#J-18808-Ljbffr
Job Overview : As a Data Scientist, you will be responsible for managing the complete Model Development Life Cycle (MDLC), from problem definition to model deployment and monitoring. You will work closely with cross-functional teams to deliver machine learning models that support business objectives and drive innovation. The ideal candidate should have a strong background in data analysis, feature engineering, and model selection, along with a deep understanding of model deployment and ongoing model maintenance. Key Responsibilities : Problem Definition : Collaborate with business stakeholders to define and structure data-driven problems. Translate business objectives into machine learning tasks (e.g., classification, regression, clustering). Data Collection & Preprocessing : Gather, clean, and preprocess data from multiple sources (e.g., databases, APIs, publicly available datasets). Handle missing data, outliers, and apply normalization techniques. Exploratory Data Analysis (EDA) : Use statistical analysis and data visualization techniques to identify key patterns, trends, and correlations in the data. Feature Engineering : Create, extract, and transform features to improve model performance. Apply techniques such as feature extraction, selection, and transformation. Model Selection & Training : Select the appropriate machine learning models based on the problem at hand (e.g., supervised learning, unsupervised learning, deep learning). Train models using tools like
Scikit-learn ,
TensorFlow , or
PyTorch . Evaluate model performance using relevant metrics (e.g., RMSE, accuracy, F1-score, ROC-AUC) and optimize hyperparameters to ensure robustness. Deploy models in a production environment using tools like
Flask ,
FastAPI ,
Docker , and
Kubernetes . Ensure scalability and integration with existing systems. Model Monitoring & Maintenance : Monitor model performance post-deployment, address model drift, and retrain models as needed. Ensure continuous accuracy and relevance of models in real-world scenarios. Model Interpretation & Communication : Provide clear and actionable insights through model interpretation techniques such as feature importance and SHAP values. Present results to both technical and non-technical stakeholders. Qualifications : PhD degree
in Computer Science, Data Science, Statistics, Engineering, or a related field. 3+ years
of experience in machine learning, statistical modeling, and data science. Proficiency in Python, SQL, and experience with libraries such as
Pandas ,
NumPy ,
Scikit-learn ,
TensorFlow , and
Keras . Hands-on experience with model deployment tools such as
Flask ,
Docker ,
Kubernetes , and cloud platforms like
AWS ,
Azure , or
Google Cloud . Strong knowledge of data preprocessing techniques, feature engineering, and exploratory data analysis. Experience with hyperparameter tuning techniques (e.g., Grid Search, Bayesian Optimization). Familiarity with model monitoring tools such as
MLflow ,
Prometheus , or
Grafana . Excellent communication skills, with the ability to translate technical results into actionable insights for stakeholders. Strong problem-solving skills and the ability to work on complex, data-driven projects. Preferred Qualifications : Experience with
deep learning
models (e.g., CNNs, RNNs, LSTMs). Familiarity with NLP and time-series analysis. Knowledge of big data tools like
Spark
or
Hadoop . Experience in sectors such as healthcare, finance, or e-commerce.
#J-18808-Ljbffr