In this KB:
- Introduction to the Cloudways server monitoring alerts
- Details of server monitoring alert generation
- How you can suggest improvements to the server monitoring alert system
We will explain here how server monitoring alerts work and when they are triggered. There is nothing specific that you need to do or set for this to work. It is enabled and working by default on all accounts and all servers.
The first thing to note about the monitoring system is that it monitors the server itself and the core components of our web stack (Nginx, Varnish, Apache, Memcached, and MySQL). The monitoring system DOES NOT monitor individual sites within your server. This is important to understand to avoid any misunderstanding. You can have a website down in your server for a number of reasons (for e.g. code exception, wrong configuration, etc.) and the monitoring system will not detect it (as the underlying web stack and server are working fine). So, our monitoring system cares about the health of the server and its web stack. For specific applications, you can use something like Pingdom (www.pingdom.com) and we are considering to add an add-on for this to our service.
How it Works
We have our own monitoring system that provides constant updates from our servers. Additionally, we have configured all core services of our stack (Nginx, Varnish, Apache, Memcached and MySQL) to attempt to auto-heal in case they have any issue. Therefore, most issues are sorted out on its own.
Now, when a bad thing happens, this is how our monitoring system will react:
- If there is no contact with the server for 10 minutes, an alert email will be sent to the registered email address of the account the server is in.
- When contact with the server is resumed, another email will be sent to the registered email address notifying about the status change.
- If ANY of the core services of the stack is down for more than 10 minutes (thus the auto-healing will have failed for whatever reason), an alert email will be sent to the registered email address of the account the server is in.
- Similarly, when the service comes back live, another email will be sent to the registered email address notifying about the status change.
Our aim here, while keeping you updated about what’s going on with your servers, is to try to minimize the number of emails and the number of false positives that you receive. We ensure that we have tried to recover the server/services automatically and that we have enough evidence that things are not going to be sorted by themselves.
Further adjustments may be necessary to the system to achieve this and, as usual, we will highly appreciate your feedback on our UserVoice page: http://feedback.cloudways.com/forums/203824-service-improvement