Why you should not use Google Cloud. – Punch a Server – Medium

Authored by medium.com and submitted by speckz

Why you should not use Google Cloud.

Looks like this article hit the front page of HN. Please follow discussion here. https://news.ycombinator.com/item?id=17431609

Note: This post is not about the quality of Google Cloud products. They are excellent, on par with AWS. This is about the “no-warnings-given, abrupt way” they pull the plug on your entire systems if they (or the machines) believe something is wrong. This is the second time this has happened to us.

We have a project running in production on Google Cloud (GCP) that is used to monitor hundreds of wind turbines and scores of solar plants scattered across 8 countries. We have control centers with wall-to-wall screens with dashboards full of metrics that are monitored 24/7. Asset Managers use this system to monitor the health of individual wind turbines and solar strings in real time and take immediate corrective maintenance. Development and Forecasting teams use the system to run algorithms on data in BigQuery. All these actions translate directly to revenue. We deal in a ‘wind/solar energy’ — a perishable commodity. If we over produce, we cannot store and sell later. If we under produce, there are penalties to be paid. For this reason assets need to be monitored 24/7 to keep up/down with the needs of the power grid and the power purchase agreements made.

Early today morning (28 June 2018) i receive an alert from Uptime Robot telling me my entire site is down. I receive a barrage of emails from Google saying there is some ‘potential suspicious activity’ and all my systems have been turned off. EVERYTHING IS OFF. THE MACHINE HAS PULLED THE PLUG WITH NO WARNING. The site is down, app engine, databases are unreachable, multiple Firebases say i’ve been downgraded and therefore exceeded limits.

Customer service chat is off. There’s no phone to call. I have an email asking me to fill in a form and upload a picture of the credit card and a government issued photo id of the card holder. Great, let’s wake up the CFO who happens to be the card holder.

We will delete project within 3 business days.

“We will delete your project unless the billing owner corrects the violation by filling out the Account Verification Form within three business days. This form verifies your identity and ownership of the payment instrument. Failure to provide the requested documents may result in permanent account closure.”

What if the card holder is on leave and is unreachable for three days? We would have lost everything — years of work — millions of dollars in lost revenue.

I fill in the form with the details and thankfully within 20 minutes all the services started coming alive. The first time this happened, we were down for a few hours. In all we lost everything for about an hour. An automated email arrives apologizing for ‘inconvenience’ caused. Unfortunately The Machine has no understanding of the ‘quantum of inconvenience’ caused.

You just can’t turn things off and then ask for an explanation.

I understand Google’s need to monitor and prevent suspicious activity. But how you handle things after some suspicious activity is detected matters a lot. You need a human element here — one that cannot be replaced by any amount of code/AI. You just can’t turn things off and then ask for an explanation. Do it the other way round.

This is the first project we built entirely on the Google Cloud. All our previous works were built on AWS. In our experience AWS handles billing issues in a much more humane way. They warn you about suspicious activity and give you time to explain and sort things out. They don’t kick you down the stairs.

I hope GCP team is listening and changes things for better. Until then i’m never building any project on GCP.

Brownishrecluse12 on June 30th, 2018 at 21:59 UTC »

This is obviously minor in comparison, but something similar happened to our robotics team with Google in general. We changed our passwords, and Google found that suspicious so it locked us out of all of our accounts essentially, leaving us unable to access any of our work from the past 2 years. We were somehow able to open the drive on one computer in particular and salvage the data, but it was really unfortunate that due to us trying to be more secure, we almost lost 2 years of work and documentation of our robot...

billy_tables on June 30th, 2018 at 21:41 UTC »

Suspending you immediately is a pain in the ass. But deleting projects after 3 days really takes the piss.

There are so many graceful ways to handle this - keeping existing resources up but blocking creating new ones, requesting extra verification incrementally as your billing costs go up, taking into account your payment track record when evaluating risk - shutting everything down is a really unprofessional choice.

CJKay93 on June 30th, 2018 at 21:32 UTC »

Similar thing happened to me too, and they were completely unhelpful in recovering it. In the end, I had to just let the whole thing die and make do with the offline backups I had.