SRE- remote ok, Tampa, Florida

SRE- remote ok
Full Time
$000.00 - $000001.00 per annum
Job title: Site reliability Engineer
Location: Remote
Duration: Full-time
Responsibilities:
• Partner with product owners and business SMEs to analyse the business needs and improve supportability, scalability, and recovery for the engineered solution. Ensure that the overall technical solution is aligned with the business needs and operational teams methodologies
• Drive the improvement of service availability to reduce the mean time to recovery using automation. Develop methods for autonomous recovery and self-repairing systems. Ensure the solution is consistent with our company's architecture, design, and development standards
• Coordinate and plan system releases and hotfixes. Develop methods that allow simplified triage following a set of checklists, run books and standard operating procedures. Adjust to adopt new methodologies that provide the business with increased flexibility and agility
• Support software development by providing operational improvements to non-functional requirements. Develop enhancements to improve service levels by leveraging key performance indicators consisting of monitoring, non-functional testing, and availability reports. Provide a service-focused approach leveraging continuous process improvement. Participate in chaos testing to improve system resiliency. Mentor other engineers. Provide overall technical leadership to smaller working teams as needed
• Stay current with latest development tools, technology ideas, patterns, and methodologies; share knowledge by clearly articulating results and ideas to key stakeholders
Qualifications:
• Bachelor's Degree in Computer Science or related engineering field; or equivalent experience
• 2+ years of experience in Site Reliability Engineering, DevOps, or SysOps
• 2+ years of software development experience
• 2+ experience with cloud computing platforms (AWS, Google Cloud Platform, and Azure)
• Hands on coding experience with one or more high level languages, such as Python, JavaScript, NodeJS, and Go
• Hands-on experience with infrastructure as code tools and concepts: Terraform, Ansible, etc.
• Understanding of common system architecture like web application, microservices, serverless, and distributed applications etc.
• Experience engaging with engineering teams to improve services from design to deployment to operation by building in Disaster Recovery, Security, Performance, and Reliability
• An ability to understand large complex systems and a passion to constantly improve environments
• Experience running the production environment by monitoring availability and taking a holistic view of system health
• Ability to develop, manage, and follow operational policies and procedures including documentation and training
• Demonstrated ability to deliver results on time with high quality and attention to detail
• Strong interpersonal skills
Digital Intelligence Systems, LLC (DISYS) is an Equal Opportunity Employer that recruits and hires qualified candidates without regard to race, religion, sex, sexual orientation, gender identity, age, national origin, ancestry, citizenship, disability, or veteran status.