Having an IT outage is inevitable* but what does your IT team do to get you back up and running again?
*not because you don’t know what you’re doing, but because IT systems, particularly the back end, are increasingly complex.
As an SAP Business One Cloud Hosting Partner, our job is to deploy, manage and monitor the software and storage required to run SAP Business One. This software comes from a variety of vendors, including SAP, Microsoft and SUSE.
Like all software, these applications and platforms need patching to address security issues and require updates on a regular basis to add new features and functionality. Unfortunately, from time to time, these applications crash, particularly after completing a patch update. It’s these application crashes that cause outages, and the severity of the outage impact then depends on the severity of the application crash or fault.
From time to time, the hardware also crashes – have you ever gotten into your car, gone to start it and it wouldn’t start, and yet 5 minutes later it starts perfectly? Unfortunately, computer hardware is not immune to these challenges and we still encounter scenarios where all that can be done is to restart the offending equipment to fix the issue.
And of course, there is our friend the internet – no internet connection is immune to dropouts and outages. Our data centres have multiple connections to and from the internet but unless you are lucky, chances are your business or home only has 1 internet connection. In more than 70% of all reported outages, unfortunately, it turns out to be internet connectivity or local area network connectivity problems outside our data centre which is beyond our ability to control (unless you are an SMB Solutions MSP Client, in which case we can manage that aspect for you as well).
What Do We Do To Try To Prevent This?
Prior to rolling out patches and upgrades, we research for known issues that could have an impact on the functionality and operation of the customer cloud workspace. However, this information is often somewhat difficult to find as the software vendors don’t always share this in an easily accessible manner.
At SMB Solutions Cloud Services, we always make sure that we allow at least a 1 week wait period after Microsoft patches are released before we apply them. People might ask why wouldn’t we wait longer? The challenge with waiting is that most patches include important security updates that must be installed as soon as possible to mitigate potential security risks and breaches.
So How Do We Deal With a System Outage?
Depending on the severity and the number of users impacted, we will take a number of different steps.
- The first step of course is diagnosis to determine what the root cause of the issue is. Cloud hosting is an intricate web of systems and software working together simultaneously, so finding the root cause of the issue can often take some time and assistance from customers to understand exactly what is happening.
- The second step is rectification where we determine what needs to be done and in what sequence to correct the problem.
- The third step is recovery where we determine if any loss of data has occurred. If this has happened, we revert to our comprehensive backup system to recover and restore any lost or missing data.
Fortunately, we have never had a scenario where user data has been lost due to a failure of the SMB Solutions Cloud infrastructure or platform.
What Challenges Do We Face During System Outages?
The biggest challenge we face in an outage is the time factor. Progressing through the steps of diagnosis, rectification and recovery can take a significant amount of time, particularly if the resolution of the issue requires engaging with the developers of the software such as SAP, Microsoft and SUSA. Providing exceptional support, particularly in the face of an outage, and providing a swift resolution is immensely important to us.
But unfortunately, nothing in the IT world comes for free and even open-source platforms such as Linux require us to pay to have access to patches, support engineers and technicians to keep our systems running smoothly and effectively. By investing a significant amount in access to premium business support services with these software providers, it puts us in a prime position to receive assistance as quickly as possible and therefore assisting our customers in getting back online. It is important to note that on top of paying for these additional services and support, we have made and maintained good relationships with all of our vendors to ensure we have the best support team on our side if things aren’t performing as they should be.
Our ultimate goal is to ensure that in the best-case outage scenario, we have everything operational again within 5 minutes and worst-case scenario, we don’t ever have an outage that spans more than 4 hours. Of course, this is not always possible and predictable, but over the last 6 years, the SMB Solutions team have only seen 3 scenarios where an outage has lasted 24hours and this was primarily as we have had to spend time liaising with external vendors to resolve the issues.
How Will I Know If There Is An Outage?
We have created a Server Status Page that shows the overall health and uptime of the SMBSCS servers, as well as any incidents that have previously occurred. If you select the Subscribe to Updates option, you will receive email notifications of any incidents, ongoing updates, scheduled maintenance and incident post mortems as soon as they become available. Transparency is one of our core business values and we want to ensure our customers and partners can see our uptime numbers, both the good uptimes and the not so good.
Large companies that spend millions on IT and have teams of hundreds of people still have outages – including Microsoft and the Commonwealth Bank. With the world running and relying on technology, we all do our best to avoid outages, however, over a 1 year period some downtime is not unusual. For this reason, the SMBSCS SLA does not guarantee 100% uptime – but as you will probably notice, no IT provider SLA will ever promise 100% uptime because it is simply an unrealistic and unattainable goal.
Get in touch with the SMB Solutions team today if you want to learn more about our SAP Business One Cloud Hosting Services and Managed Services.