Recently, Boston’s Department of Innovation and Technology (DoIT) hired an incident manager as part of an effort to reevaluate our approach to incident management, research best practices and improve communication and coordination across teams. This past week, I sat down with Carissa Sacchetti, our new Incident Manager, to discuss her role and the department’s vision for incident management.
What do you and the leadership of the department see as the goal of an incident management program?
The goals of DoIT’s Incident Management program are to:
- Deliver high-quality service to both internal and external users by minimizing the downtime of applications
- Provide root cause analysis in the interest of preventing future outages
- Cultivate a culture of accountability
Yours is a new position. Why is this a departmental priority and how has leadership described your mandate as Incident Manager?
On the one hand, I think that there have been a few headlines recently across the country that point to the urgent need for all levels of government to focus on modernizing IT infrastructure and data security. Sophisticated incident management practices can mitigate the risk of anything from data breaches to server overloads. In that sense, hiring a dedicated incident manager is part of a tactically diverse effort to improve the security and robustness of the city’s IT infrastructure. On a more granular level, I think that departmental leadership has recognized that there is room for improvement in regard to coordination across applications and functional areas of the organization. I am interested in improving consistency and standardization of incident reporting across teams; gaining insight into integrations between applications, such as how an outage in one may affect another; and in improving communication across teams in general.
Can you elaborate on what you mean by a culture of accountability? Who is accountable to whom in your vision of effective incident management?
Beyond compliance with security and privacy laws and standards, as is our obligation, my vision of accountability involves creating an environment in which there are relationships built on trust and support in both/all directions along the chain of command. A functional, effective incident management program requires that front-line employees can report honestly in post-crisis debriefs. By working with departmental leadership, I hope to cultivate a proactive and collaborative approach to identifying the core issue(s) behind an outage in an effort to prevent future incidents. Incidents should be viewed as a learning opportunity, and root cause analysis should be employed to foster a positive culture around remediation. Ultimately, by holding ourselves accountable to each other internally, we are augmenting our ability to go above compliance as a bare minimum in service to our end users.
What does incident management look like in practice? Can you describe some of the strategies you have implemented or plan on implementing?
I’m still working on how to apply best practices in a way that will suit this department and context. Right now, I’m focused on gathering data on the nature, severity and frequency of incidents so that I can aggregate this information and use it to inform a strategic approach. To that end, I have created an incident reporting form that I hope will standardize incident reporting, thereby making reports more actionable. I’m also exploring communication channels to serve as an alternative to email in an emergency and I’ve developed a first draft of an incident classification system that I plan to refine with stakeholders in the coming weeks. Incident classification is inherently a reflection of the values and environmental context of an organization, so I’m interested to see how people react and I expect it to be a collaborative exercise. With a hierarchy in place, I will be able to develop response procedures, but I expect implementation to be iterative and require adjustments as we experience what does and doesn’t work.
What is one of the biggest challenges you anticipate in this role and with this program?
Besides building trust among employees in post-crisis debrief scenarios, gaining insight into the points of integration between our applications and services has proven to be a big challenge. It’s incredible how interwoven our tech stacks are. My background is in education and compliance with federal funds. In that space, it can be challenging to make change because people are so passionate in their politics. I thought tech would be so much more straightforward: you diagnose a problem, you develop a solution and then move on to the next incident. But that’s absolutely not the case. There are all these connected applications, different teams and vendors to manage. We purchase a lot of different technology to serve a lot of different purposes. Part of my approach to incident management is to play a role in breaking down silos and form an integrated, holistic approach to the services we provide.
Big thanks to Carissa for sharing her thoughts and experiences with us, and more importantly, for pushing Boston to do more to improve the quality, availability and security of our services at DoIT. What do you think of our approach? What are the challenges your organization faces when it comes to incident management and how do you handle them?
Susanna Ronalds-Hannon is part of the GovLoop Featured Contributor program, where we feature articles by government voices from all across the country (and world!). To see more Featured Contributor posts, click here.
Great interview- thank you so much for sharing! Interesting how common it is for folks in tech like this example who realize the key to success is often the people/teams involved and their trust in the process.