Site Reliability Engineer
Role & Responsibilities
· Identify risks, responsive, and works with a sense of urgency plus works within a team or independently
· Maintain services once they are live by measuring and monitoring availability, latency and overall system health with a focus on business activities and continuously evaluate cost and waste
· Engage in and improve the whole lifecycle of services from inception and design, through deployment, operation, capacity planning and launch reviews.
· Trouble-shoot problems that span systems, databases, storage, network (TCP/IP), and code
· Scale systems sustainably through mechanisms like automation and evolve systems by pushing for changes that improve reliability and velocity; includes automation for other various operational needs
· Develop, implement, and monitor security measures for the protection of systems, networks, and information
· Minimize and mitigate risk of reliability related failure outcomes as pertaining to durability, availability, performance, and correctness
· Meet project deadlines.
· Bachelor’s degree in Computer Science or related technical field
· Knowledge or experience with algorithms, data structures, complexity analysis and software design
· Knowledge or experience in operationalizing and scaling the software systems
· Knowledge or experience in Scripting, Chef/Ansible, Docker/Kubernetes, and Cloud Platforms
· Ability to diagnose technical problems, debug, optimize code, and automate routine tasks
· Interest in designing, analyzing and troubleshooting large-scale distributed systems.
· Systematic problem-solving approach, coupled with strong communication skills and a sense of ownership and drive
· Experience with configuring and operating traffic management components such as load balancers, web, DNS and proxy servers in production
This role is based in Fort Collins, CO and will not offer relocation.