BlueSkyClarity
Industrial Manufacturing SRE/DevOps, Hybrid, must live near Boston, MA.
BlueSkyClarity, Boston, MA
Industrial Manufacturing SRE/DevOps, Hybrid, must live near Boston, MA., 155k to 195k base, bonus, options, Note: I also represent similar roles that are completely remote but must live in continental USCompensation Commensurate with experience, bonus, benefits, EOE, diverse highly educated culture. Candidates must be a U.S. citizen or national, refugee, asylum, or lawful permanent residents.Our client has a team of leading technology and operational experts with decades of experience in advanced manufacturing, materials, automation, and robotics. They are leaders in scalable additive manufacturing (AM) solution and pioneered integrated digital production systems. They continuously seek contributors who demonstrate outstanding integrity, intelligence, accountability, and a passion for learning. They celebrate diversity and are committed to creating an inclusive environment for all employees.Currently our client is seeking a Site Reliability Engineering (SRE)/DevOps resource who combines software and systems engineering to build and run large-scale, massively distributed, fault-tolerant systemsResponsibilitiesSRE/DevOps ensures that our cloud services—both our internally critical and our externally-visible systems— are optimally deployed and have reliability, uptime appropriate to users' needs and a fast rate of improvement.Additionally, SRE/DevOps ’s will keep an ever-watchful eye on our systems capacity and performance. Much of our software development focuses on optimizing existing systems, building infrastructure, and eliminating work through automation inclusive of full CI/CD.On the SRE/DevOps team, you’ll have the opportunity to manage the complex challenges of scale which are unique to our high-volume manufacturing, which generates significant terabytes of sensor/IoT data, while using your expertise in coding, algorithms, complexity analysis and large-scale system design.Lead designs of major software components, systems, and features to improve the availability, scalability, latency, and efficiency of our clients’ services.Serve as a primary point person for the overall health, performance, and capacity of our Kubernetes-based cloud platform, evangelize, design, implement and automate security controls, governance processes, and compliance validation, work closely with our DevOps team and other consumers of the platform to develop SLAs and KPI targetsBuild an understanding of all aspects of the cloud platform including network ingress, monitoring, alerting, RBAC, and multi-region deployment, participate in solution design for new cloud platform features, open-source technologies, and tool evaluation and selection, create and maintain runbooks and operational procedures to ensure that service availability requirements are achievedContinually improve processes, automation, documentation, monitoring and security, keep up to date on Kubernetes updates and tools, and plan and execute cloud platform updatesDirectly impact our web to consumer customer experience by creating clean, maintainable, intuitive build and installation solutions using Kubernetes, Docker, and Terraform.Automate as much work that makes sense so that we can ensure we deliver products and releases on time and within scope.Work directly with the engineering teams to assist with technical support within our environment and to develop, write, and ensure that all aspects of the code are tested in an efficient manner.Lead sustainable incident response, blameless postmortems, and production improvements that result in direct business opportunities for at our client.Provide guidance to other team members on managing end-to-end availability and performance of mission critical services, on building automation to prevent problem recurrence, and on building automated responses for non-exceptional service conditions.Mentor and train other team members on design techniques and coding standards, and to cultivate innovation and collaboration across multiple teams.Manage individual projects priorities, deadlines, and deliverablesFoster SRE's culture of diversity, intellectual curiosity, problem solving, and openness is key to its success. Our organization brings together people with a wide variety of backgrounds, experiences, and perspectives. We encourage them to collaborate, think big and take risks in a blame-free environment.Promote self-direction to work on meaningful projects, while we also strive to create an environment that provides the support and mentorship needed to learn and grow.QualificationsSignificant experience in computing, distributed systems, storage, networking, and cloud (AWS, Azure, GCP)Experience programming in at least one of the following languages: C, C++, Java, Python, Go, Perl, or Ruby, preference for Python & Go togetherExperience with Linux – RedHat, CentOS & UbuntuExperience with hyperconverged infrastructures (Nutanix), Private Cloud, & Public Clouds (AWS & Azure & GCP)Experience with NAS technologies: configuration, permissions, protocols, sync, and backupExperience with configuration and implementation of VoIP technologiesExperience with configuring layer 2/3 network switches, DNS, VPN, and Palo Alto firewallsExperience with Wi-Fi technologies, security, and configurationExperience working with manufacturing, aerospace, defense, and medical devices industriesExperience architecting, developing, and troubleshooting large scale systems.Experience with algorithms and data structures and/or Unix/Linux systems internals (e.g., filesystems, system calls) and administration.Experience designing, analyzing, and troubleshooting large-scale distributed systems.Team player with excellent communication skillsExcited about taking on new challenges in the metal additive manufacturing industry and working in a fast-paced startup environmentSystematic problem-solving approach, coupled with strong communication skills and a sense of ownership and drive.Familiarity with NIST Risk Management Framework, other computer security publications, and implementing system security controls following DFARS 252.204-7012 and or NIST SP 800-171 cyber security compliance standardsKnowledge of client/server environments, Group Policies, Domain Services, Networking, and NISProven track record from the ground-up IT Infrastructure specification, implementation, administration, and maintenanceSelf-motivated, hands-on, practical mindset, and capable of setting and reaching ambitious goalsPrevious focus on managing costs and budget preparation, including software and hardware maintenance, supplies inventory, vendor relationship, and licensingAbility to successfully manage many issues/projects at multiple sites simultaneouslyStrong sense of ownershipBachelor's degree in Computer Science, similar technical field of study, or equivalent practical experience.Diversity and EOE StatementsOur clients are committed to an inclusive workplace where we do not discriminate based on race, sex, gender, national origin, religion, sexual orientation, gender identity, marital or familial status, age, ancestry, disability, genetic information, or any other characteristic protected by applicable laws. They believe in diversity and encourage any qualified individual to apply. They are an equal employment opportunity employer.BlueSkyClarity proudly believe that your gender, race, nationality, religion, sexual orientation, status as a protected veteran, or status as an individual with a disability should have nothing to do with hiring practices. We are an EOE agency that seek to increase our client’s diversity recruitment and hiring.Posted byDom Costagliola, Principal, m 1-617-899-5094, domenic@blueskyclarity.comhttp://www.blueskyclarity.com/http://www.linkedin.com/in/domcosta1Type: direct hire