How do you conduct an incident post-mortem?
Best practices for an incident postmortem
- Establish a blameless culture.
- Avoid pointing fingers, keep critiques constructive.
- Review every single postmortem, and ingrain this into your process.
What is a post-mortem process?
A post-mortem is a process that helps improve projects by identifying what did and didn’t work, and changing organizational processes to incorporate lessons learned. Post-mortem meetings typically take place at the end of a project.
What is a postmortem SRE?
Therefore, postmortems are an essential tool for SRE. The postmortem concept is well known in the technology industry [All12]. A postmortem is a written record of an incident, its impact, the actions taken to mitigate or resolve it, the root cause(s), and the follow-up actions to prevent the incident from recurring.
How do you write a post-mortem?
Write the Postmortem Report Using Meeting Notes
- Dates the project was live.
- Why the project was launched.
- What was launched (including screenshots and data on what was changed)
- The results of the project (including more metrics that were tracked)
- Feedback from all team members.
- Why you ended up with the results you did.
What are the four steps of the incident response process?
The NIST incident response lifecycle breaks incident response down into four main phases: Preparation; Detection and Analysis; Containment, Eradication, and Recovery; and Post-Event Activity.
What is the value of doing a postmortem?
The postmortem process drives focus, instills a culture of learning, and identifies opportunities for improvement that otherwise would be lost. Without a postmortem you fail to recognize what you’re doing right, where you could improve, and most importantly, how to avoid making the same mistakes in the future.
What does a blameless postmortem not help with?
Blameless post-mortems don’t cover remediation of mistakes or errors in execution. Rather, they look backward to look forward and try to prevent issues from recurring. Remediation should be handled by other means (e.g., process improvements or personnel improvement plans).
Who writes postmortem?
Who writes a postmortem. Our status updates are published by whoever is leading the incident response or happens to be on call. It’s usually either the ops or the support team. Once the issue is resolved, the same people will be expected to draft a postmortem on Jira for everyone to comment and discuss.
What is included in a post mortem report?
The pathologist who undertook the post mortem will write a post-mortem report. In most cases, a pathology report will start with general information, such as the person’s medical history and the circumstances of their death. This is usually followed by a description of the outside of the body and the internal organs.
What happens during a post mortem of an incident?
During post-mortem, an incident response team determines what happened during an incident, identifies what was done right and what can be corrected, learns from its mistakes and proceeds accordingly. A post-mortem generally involves the following steps:
What should be included in a post mortem meeting?
A post-mortem generally involves the following steps: Set up a meeting to discuss an incident. That way, an incident response team can determine exactly what happens during an incident and brainstorm solutions to prevent recurring problems. Encourage incident response team members to bring their incident notes to a post-mortem meeting, too.
When to publish a critical incident postmortem?
So, when a critical incident occurs, convene within 24-48 hours, and certainly do not delay more than a week. The responsibility to research, write, and publish a postmortem report lies with the project manager or the person most responsible for a particular outage or data loss.
Why is it important to have a postmortem process?
This is where postmortems come in. An incident postmortem is a framework for learning from incidents and turning problems into progress. It also builds trust with customers, colleagues, and end users (basically the folks affected by the incident) and lets them know your team is working to minimize future incidents and impact.