CloudBC Labs

Data Scientist

CloudBC Labs, Reston, Virginia, United States, 22090

Position : Data Scientist

Location : Reston, VA (Fully Onsite)

Duration : Long term

Term : W2 & Full Time only

Job Description :

Client is expanding its data science talent to further push the frontiers of modeling and advanced analytics. Are you passionate about advanced analytics algorithms and creating new data science tools and technologies? Do you have creative and innovative approaches to developing new analytics techniques? We're seeking data scientists who have domain knowledge or an interest in Big Data, machine learning, natural language processing, image processing and an interest to apply it to economic and financial applications.

Are you looking to innovate the next generation of data analytics solutions with diverse data sets and leading-edge analytics use-cases? If you are ready for an exciting opportunity working hands on with the world's most advanced data science technologies and thrive in a super dynamic environment where you are being counted on to develop advanced analytics products, this role is for you.

Minimum Qualifications: Work or educational background in one or more of the following areas: operations research, computer science, Mathematics, data science, business analytics, or knowledge management. Demonstrated experience programming with R/Python, Linux, and Spark in AWS cloud environment, or knowledge and algorithmic design experience in Python/C#/C++ (3+ years) Proficient with Amazon AWS Sagemaker, Jupyter Notebook and Python Scikit, Deep Learning, Machine Learning tools such as TensorFlow experience building Vector DB, NLP, LLM and GenAI tools. Experience with LoRA, LangChain, RAG, LLM Fine Tuning and PEFT are preferred. Demonstrated experience with SQL and relational database technologies, such as Oracle, PostgreSQL, MySQL, RDS, Redshift, Hadoop EMR, Hive, etc. Demonstrated experience processing structured and unstructured data sources, data cleansing, data normalization and prep for analysis Demonstrated experience with machine learning techniques including natural language processing, BERT, RoBERT, GPT and Large language Models. Demonstrated experience with code repositories and build/deployment pipelines, specifically Jenkins and/or Git. Demonstrated experience using Apache Hadoop and/or Apache Spark stack for big data processing, or comparable distributed computing platforms. Demonstrated experience using data streaming technologies such as Kafka, Rabbit MQ, NiFi, Kinesis or comparable tools Demonstrated experience using Tableau, Kibana, Quicksights or other similar data visualizations tools. Ability to handle terabytes of time-series and cross-sectional data and extract well defined alpha from the underlying relationships Very comfortable working with ambiguity (e.g. imperfect data, loosely defined concepts, ideas, or goals) Qualifications & Requirements

Education: MS in Computer Science, Statistics, Math, Engineering, or related field, PhD preferred 3+ years of relevant experience in building large scale machine learning or deep learning models and/or systems 1+ year of experience specifically with deep learning (e.g., CNN, RNN, LSTM) 1+ year of experience building NLP, LLM and GenAI tools. Experience with LoRA, LangChain, RAG, LLM Fine Tuning and PEFT are preferred. Demonstrated skills with Jupyter Notebook, AWS Sagemaker, or Domino Datalab or comparable environments Passion for solving complex data problems and generating cross-functional solutions in a fast-paced environment Knowledge in Python or C++ / C#, and SQL, object oriented programming, service oriented architectures Strong scripting skills with Shell script and SQL Strong coding skills and experience with Python (including SciPy, NumPy, and/or PySpark) and/or Scala. Knowledge and implementation experience with statistical and machine learning models (regression, classification, clustering, graph models, etc.) Preferred Qualifications

Hands on experience building models with deep learning frameworks like MXNet, Tensorflow, Keras, Caffe, PyTorch, Theano, or similar Experience search architecture (ex - Solr, ElasticSearch) Experience with building querying ontologies such as Zeno, OWL, RDF, SparQL or comparable Knowledge and implementation experience with NLP techniques (LDA, TF/IDF, Sentiment analysis) and NLP technologies such as Python NLTK, or Spacy or comparable technologies Knowledge & experience with microservices, service mesh, API development and test automation. Demonstrated experience using Docker, Kubernetes, and/or other similar container frameworks.