I was recently asked for my advice on how to solve recurring IT issues in an organization. Some organizations seem to experience seemingly “random” outages on a regular basis – of the network infrastructure, key platforms and applications, etc. These outages tend to occur at the worst possible times (e.g. during busy season) and not only negatively impact the business/bottom line but reflects badly on IT.
My answer was straightforward – you have to put systems in place to prevent issues from occurring in the first place. Just to be clear, you cannot prevent every outage. But you can plan ahead to reduce the likelihood of an outage occurring and increase your ability to recover faster.
What are some of these systems?
- Implementing redundancy at critical points of failure
- Establishing monitoring for key hardware and software platforms
- Conducting regular patching and updates of hardware and software
- Performing periodic reviews of hardware and software platforms (including vendors and 3rd parties)
- Conducting a root cause analysis (RCA) following an incident and updating standard operating procedures
In short, having the right systems in place will save you stress time energy and money.