Director of Site Reliability Engineers/SRE (REMOTE)
Cyware is a venture-backed organization, headquartered in New York City. The firm was founded by innovative practitioners to solve the massive-scale cybersecurity challenges they saw daily while working for leading global banks and technology organizations.
Cyware is disrupting the cybersecurity operations market with innovation that gives the firm claim to being the far-and-away ONLY company capable of delivering technology to build cyber fusion centers for customers in large enterprises and the mid-market.
Cyware is in hyper-growth mode. Your next opportunity starts here!!
More on Cyware: (www.cyware.com)
Built on innovation designed by SecOps practitioners and cybersecurity leaders, Cyware offers multiple technologies within its next-generation platform, including advanced threat intelligence solutions (TIP) for large and small security teams, vendor-agnostic security automation (SOAR), and security case management. As a result, organizations are able to increase speed and accuracy while reducing costs and analyst burnout. Cyware's Virtual Cyber Fusion solutions make secure collaboration, information sharing, and enhanced threat visibility a reality for enterprises, sharing communities (ISAC/ISAO), MSSPs, and government agencies of all sizes and needs.
- You can lead on strategic and tactical initiatives
- You are hungry, inquisitive, proactive, energetic, and driven
- You have a growth mindset and are committed to delivering results
- You thrive in a fast-paced, collaborative environment
Why We Are Hiring:
The Director of Site Reliability Engineers (SREs) is responsible for managing the entire SRE function at Cyware. The SREs are responsible for keeping all user-facing services and other Cyware production systems running smoothly. The Director, along with the SREs, are a blend of pragmatic operators and software craftspeople who apply sound engineering principles, operational discipline, and mature automation to our production operating environments.
The SRE team as a whole specializes in cloud systems (application runtimes, operating systems, storage subsystems, networking, etc.) while implementing best practices for availability, reliability, and scalability, with interests in eliminating toil via automation.
As a lead of the SRE function at Cyware, your mission is to enable the team to protect, provide for, and progress the software and systems of our Production environments with an ever-watchful eye on their availability, latency, performance and capacity. Continuous improvement, automation, and observability will be the principles you employ to accomplish your mission.
Come join an exciting cybersecurity product startup that just closed Series C funding round!
What You Will Do:
- Manage and mentor a team of site reliability engineers, fostering a collaborative and high-performing environment.
- Lead efforts to automate manual processes, deployments, and infrastructure provisioning, promoting efficiency and reducing human error.
- Set clear goals and performance expectations for the team members, providing regular feedback and conducting performance evaluations.
- Develop SRE team members into senior levels and leaders within the team.
- Lead the Production organization in identifying trends, drawing conclusions from problems we face, and establishing the actions needed to resolve these issues within collaborative forums.
- Support a team on an on-call rotation to respond to incidents that impact availability and drive the efforts to provide service restoration within SLAs.
- Participate in daily technical war room calls and provide expertise to all teams.
- Provide governance over HA and DR capability management and reporting.
- Conduct neutral postmortems of issues and events to identify Root Cause.
- Use your on-call shift schedule to prevent incidents from occurring. Be proactive in reviewing daily health checks, monitoring, reporting and taking timely actions.
- Lead Production data-driven operations to define and measure (i.e. Error Budgets, SLIs, SLOs, and SLAs.
- Publish daily, weekly and monthly incident reports with corrective actions. Ensure there is a path of continuous improvement identified and reported on.
- Run our infrastructure with tools such as Ansible, Terraform, Jenkins CI/CD, Docker, Kubernetes, and Lambdas.
- Build monitoring capabilities that alerts on symptoms and outages, ensuring Cyware’s proactive posture.
- Monthly review of infrastructure costs to implement financial saving measures.
- Document every action so your findings turn into repeatable actions and then into automation.
- Co-Lead the Release Management function and establish strong operational readiness across teams.
- Improve operational processes (such as deployments and upgrades) to make them as seamless as possible.
- Lead the Maintenance Window and CCB activities.
- Design, build, and maintain the core infrastructure that enables Cywares’ scaling to support thousands of customer-hosted environments.
- Debug production issues across services and levels of the stack. Review logs, application and network firewalls, and security domains daily. Identify outliers and take steps to mitigate by following the incident management process.
- Plan the growth of Cyware infrastructure.
- Ability to work across time zones to collaborate and develop solutions is a requirement.
- Maintain the security, integrity, and stability of the Production environment by performing vulnerability management tasks, compliance monitoring, configuration drift management, and conducting quarterly DR exercises.
Who You Are
- US Citizenship is a requirement of this position in accordance with 8 U.S.C 1324b(a)(2)(C)
- Bachelor's degree or higher, in Computer Science, Engineering, IT or related discipline
- 7 to 10 Years of total experience as an SRE
- 4 to 6 Years of experience managing a team of SREs
- Experienced in knowledge sharing and mentoring of Team members
- Self-awareness, handling conflict in the team, and providing and receiving feedback
- Accountability: willing to proactively step in and do the right thing while providing candid and constructive feedback
- Cloud: AWS/Azure/GCP
- Linux: Solid understanding of Linux Systems, sed/awk/grep/egrep, VI/VIM/Emacs, netstat, lsof, strace, ps/top/atop/dstat, grub boot config & systems rescue, fstab/disk labels, ext3/ext4, IPtables, sysstat (sar/vmstat/iostat etc), run-levels & startup scripts, sudo/chroot
- Scripting: Bash/Python
- Development Languages and Frameworks: Python/Django, Vue, React, Go Lang
- Fundamentals: Basic DNS & Networking, TCP/UDP, IP Routing, HA & Load Balancing Concepts.
- Application Protocols: SMTP, HTTP, HTTPS, FTP, IMAP, POP.
- Good to have Applications: Database Systems Fundamentals (MySQL/Postgres), Redis, Nginx/Apache , Supervisorctl
- Tools/Utilities: Nagios, Yum, RPM, GIT, Grafana, Prometheus, New Relic, ELK, Docker, Jenkins
- Certifications: RHCSA/RHCE/AWS (SysOps).
We're a lean team, so your impact will be felt immediately. If this all sounds like a good fit for you, why not join us?
You’ll love working at Cyware because
- We value balance. We are committed to providing an environment in which you can balance great work with a great life. You’ll have a competitive PTO structure and holidays covered.
- We’re not just employees. We’re people. We offer 401(k) match, insurance coverage (health, vision, and dental), and reimbursements for your home office.
- We’ll invest in your career. Our company’s growing quickly, and we’ll give you the opportunity to do the same. You’ll have access to a number of professional development opportunities so that you can keep up with the company’s evolving needs.
- We offer competitive compensation packages. We deeply value the talent our team brings to the table and believe that fair and equitable total compensation packages are part of our commitment to everyone who works here.
- And so much more…
Cyware is dedicated to hiring a diverse workplace that celebrates an inclusive culture and a sense of belonging. As an equal opportunity employer, we do not discriminate based on race, color, religion, sex (including pregnancy, gender identity, gender expression, and sexual orientation), national origin, age, veteran status, genetic information or disability.
How to Apply
Apply right here. You've found the application!