Disys - Oak Brook
Bigdata Engineer
Disys - Oak Brook, Tampa, Florida, us, 33646
· Design interfaces to the data warehouses/data storages and machine learning/Big Data applications using open source tools such as Scala, Java, Python, Perl, and shell scripting.· Design and create data pipelines to maintain stable dataflow to the machine learning models – both in batch mode and near real-time mode.· Interface with Engineering/Operations/System Admin/Data Scientist teams to ensure data pipelines and processes fit within the production framework.· Ensure that tools and environments adhere to strict security protocols.· Deploy the machine learning model and serve its outputs as RESTful API calls.· Understand the business needs in close collaborations with subject matter experts (SMEs) and Data Scientists to do efficient feature engineering for machine learning models.· Maintain the code and libraries in code repository.· Work with the system administration team to proactively resolve issues/install tools and libraries on the AWS platform.· Research and come up with architecture and solutions most appropriate for problems at hand.· Maintain and improve tools to assist Analytics in ETL, retrospective testing, efficiency, repeatability, and R&D.· Lead by example regarding software best practices, including code style and architecture, documentation, source control, and testing.· Support the Chief Data Scientist/Data Scientists/Big Data Engineers in creating new and novel approaches to solve challenging problems using Machine Learning, Big Data, and Cloud.· Handle ADHOC requirements to create reports for the end users.Required Skills· Strong skills with Apache Spark (Spark SQL) and SCALA with at least 2+ years of experience.· Understanding of AWS Big Data components and tools.· Strong Java skills with experience in web services and web development is required.· Hands-on experience with model deployment.· Hands-on experience in application deployment on Docker and/or Kubernetes or other similar technology.· Linux scripting is a plus.· Fundamental understanding of AWS cloud components.· 2+ years of experience in data ingesting, cleansing/processing, storing, and querying large datasets.· 2+ years of experience in engineering large-scale data solutions with Java/Tomcat/SQL/Linux.· Experience working in a data-intensive role including the extraction of data (db/web/api/etc.), transformation, and loading (ETL).· Exposure to structured and/or unstructured data contents.· Experience with data cleansing/preparation on Hadoop/Apache Spark Ecosystem – MapReduce/Hive/HBase/Spark SQL.· Experience with distributed streaming tools like Apache KAFKA.· Experience with multiple file formats (Parquet, Avro, OCR).· Knowledge in AGILE development cycle.· Efficient coding skills to enhance the performance/cost savings of the job running on AWS platform.· Experience in building stable, scalable, and high-speed live streams of data and serving web platforms.· Enthusiastic self-starter with the ability to work in a team environment.· Graduate (MS) or Undergraduate degree in Computer Science/Engineering/relevant field.Nice to have:· Strong Software development experience.· Ability to write custom Map/Reduce programs to clean/prepare complex data.· Familiarity with Streaming data processing - Experience with distributed real-time computation system like Apache STORM/Apache Spark Streaming.Additional Information
All your information will be kept confidential according to EEO guidelines.
#J-18808-Ljbffr
All your information will be kept confidential according to EEO guidelines.
#J-18808-Ljbffr