Dear Salesforce, fix your status page

I will keep this short. Salesforce was not loading across APAC (at least) earlier today for around 30 minutes. Salesforce Status page was evergreen through out this time.

There was no update by Salesforce on Twitter either, leaving Salesforce users to ping each other or check with acquaintances to see if the service was working for them. This is not the first time this has happened either.

I love Salesforce as a product and make a living helping companies adopt this incredible tool. Salesforce Trust and Salesforce Status are amazing sites that offer so much transparency to customers and is an important part of the Salesforce ecosystem. However issues like today make people question if these cornerstone elements are anything more than a gimmick.

So how can this be fixed? I have a suggestion. This is what the status page looks like now. You can make a small change and improve the quality of the Status page.

Add a button ‘Report a problem’ and let the person select the instance. On doing this, a backend process can try to automatically try to access the instance from a non-Salesforce domain and see if there is access or degradation being experienced.

If these tests do not return good figures, at least mark it as a ‘Probable issue: Under investigation’. This would be a great start as opposed to depending on support tickets or a maintenance team to raise the issue manually.

What do you think? How else can this be improved?

PS: Servicedown does this already

/u/antiproton had presented an opposing point of view on Reddit. I don’t fullt agree with him, but there is a lot of context in his comment that helps understand why things are the way they are.

They will NEVER do this. They wouldn’t even need to have users reporting a problem. Like any other cloud provider, they have monitoring tools in their infrastructure that could update the site automatically if an outage is detected.

Outages are not updated automatically on purpose.

For one, transient outages happen much more frequently than people realize. Most of them are not top severity, and last only a minute or two, but they happen. A status page history mottled with relatively frequent but incredibly brief outages would undermine confidence in the stability of the platform, which would be a disaster.

For another, messaging is very important. They will not acknowledge an outage unless the severity and impact is high. They do this to try and keep it under wraps. If it’s not widely known, if there aren’t screenshots of DownDetector all over Twitter, Salesforce is not going to call attention to it.

Finally, they will be very reluctant to change their status page if the problem is external to their infra (like the DNS issue earlier). Pinging the service and getting no response indicates the service is down, but it might just be the edge servers are down or DNS is failing.

Salesforce is not the only cloud provider that does this, mind you. They ALL do it, to a lesser or greater extent.

I’m not saying this is right, of course. It simply is. You can make as many blog posts as you like with “helpful suggestions” – but you’re just wasting your time.

Link to the original comment

What do you think? Leave a Reply