Every now and then, something critical breaks. This is our process for dealing with big problems, such as a site being offline...
If one of the sites has gone down completely then everyone at Semantic will get an alert through to their phones. It would be picked up during waking hours UK time.
Here's the process we follow for Semantic sites. For legacy sites, we would alert the SJ team through Slack/email and escalate as needed.
We'll receive the push notification from Statuscake (the tool we use to monitor uptime)
We try visiting the site to determine if it was just a blip
If it's a blank white screen, then it's likely to be a site restart or hosting issue. If the site is not back within 10 minutes, we'd check the logs to confirm this. If it comes back, then the restart was successful, so no further action is needed. If not, then we continue...
Attempt to force a manual restart through the Azure admin area (this is often sufficient to bring it back).
By this point, we should have an an idea of the cause of the error from the logs. We would then continue to bugfix within the CMS content, republishing or tweaking settings as needed. For larger issues, then more detailed code analysis would be carried out, with hotfixes deployed as soon as possible.
Our procedures and safeguards on the new sites aim to mitigate threats and issues as much as possible.