| | Disclosure: I work on Google Cloud (but disclaimer, I'm on vacation and so not much use to you!).We're having what appears to be a serious networking outage. It's disrupting everything, including unfortunately the tooling we usually use to communicate across the company about outages. There are backup plans, of course, but I wanted to at least come here to say: you're not crazy, nothing is lost (to those concerns downthread), but there is serious packet loss at the least. You'll have to wait for someone actually involved in the incident to say more. |
|
| | > including unfortunately the tooling we usually use to communicate across the company about outages.There's some irony in that. |
|
| | Edit: and I agree!I’m not in SRE so I don’t bother with all the backup modes (direct IRC channel, phone lines, “pagers” with backup numbers). I don’t think the networking SRE folks are as impacted in their direct communication, but they are (obviously) not able to get the word out as easily. Still, it seems reasonable to me to use tooling for most outages that relies on “the network is fine overall”, to optimize for the common case. Note: the status dashboard now correctly highlights (Edit: with a banner at the top) that multiple things are impacted because Networking. The Networking outage is the root cause. |
|
| | > the status dashboard now correctly highlights that multiple things are impacted because Networking.this column of green checkmarks begs to differ: https://i.imgur.com/2TPD9e9.png |
|
| | This is a person who's trying to help out while on vacation…can we try being more thankful, and not nitpick everything they say? |
|
| | The banner at the top. Sorry if that wasn’t clear. |
|
| | >nothing is lostexcept time |
|
| | and a nice Sunday afternoon |
|
| | And lots of sales on my case |
|
| | And the illusion of superiority over non cloud offerings. |
|
|
| | There are some whole argue that the resiliency of cloud providers beats on prem or self hosted, and yet they’re down just as much or more (GCP, Azure, and AWS all the same).You want velocity for your dev team? You get that. You want better uptime? Your expectations are gonna have a bad time. |
|
| | Funny how as soon as I realized that Gmail and Google Sheet aren’t working properly I rushed to HN to figure out what’s going on. I love this community! |
|
| | GCP status page is worthless as it's always happy and green when production systems are down and then they might acknowledge something an hour later |
|
| | Just like AWS, then. "Some users are experiencing increased error rates" = "Everything has been down for hours" |
|
| | I think this might be a static page they are hosting on Akamai? |
|
|
| | One click on this link and it instantly starts streaming your webcam footage to everyone in the chat room. |
|
| | Use 'Turn off my video when joining meeting' in Zoom. |
|
| | This is a local zoom setting, you can change it. |
|
| | Why? Do you expect to be able to do something about it or did you just want somewhere classier than Twitter to complain? |
|
| | Google Cloud is the number 4 most monitored status page on StatusGator and Google Apps is number 12. In addition, at least 20 other services we monitor seemingly depend on Google Cloud because they all posted issues as soon as Google went down.It's always interesting to see these outages at large cloud providers spider out across the rest of the internet, a lot of the world depends on Google to stay up. |
|
| | I guess we know what steam uses (the store at least). |
|
| | No issues for me. Maybe they have a failover mechanism? |
|
| | "a lot of the world depends on Google to stay up."Yup, I'm trying to check the Associated Press News right now and it's having trouble connecting to "storage.googleapis.com". |
|
| | …and only the paranoid survive? |
|
|
| | They both seem to say the same thing…. |
|
| | That feeling when you open https://console.cloud.google.com and see that you don't have your Kubernetes clusters and CloudSQL databases, but CTA to create first. |
|
| | Mumbai region here, and GKE seems to be fine. |
|
| | Gosh, this was so scary… I thought someone had hacked in and deleted everything…I hope they come back. This is still pretty scary |
|
| | Same. I was thinking, oh, my db cluster must be having trouble recovering. Couldn't get any response through kubectl. Logged in to the cloud console and it looks all brand new, like I have no clusters setup at all.Of course, this is 2 weeks after switching everything over from AWS. |
|
| | Same, my Manager called my and said "everything is down".So I wander over to my Firebase console, and there's no database loading. Thank god for twitter, and people also saying that they have the same issue or I would have for sure though we've been hacked. I hope this is a good wake up call for everyone. I know that I'm going to think more about how we do backups and fail-safes |
|
| | And here I thought I was having a bad day with Google Play not loading |
|
| | my vm instances are all still there, can even log in via SSH in the compute engine tab. looks like they got a reboot 15 min ago. just restarted some processes but lost my progress on about 12hrs of computing time, i'm guessing it's going to be hard to get a refund.. |
|
| | With Google Cloud incidents, most of the time whole regions fail, and with AWS generally only a region fails. Of course there would be exceptions, but Google Cloud does not make me feel safe as an outsider (and a user of multi-region AWS) |
|
| | And thus was ruined hundreds or thousands of pleasant Sunday afternoons.I don’t miss being on pager duty one bit. I see it looming in my headlights, sadly. |
|
| | Spare a thought for the pleasant Australian early Monday mornings too! Always a rude awakening… |
|
| | It's the Queen's birthday, a Monday off here in New Zealand…… but not for everybody now. |
|
| | Multi-cloud for those times when you really need that level of availability and can afford it. |
|
| | And Gmail too doesn't feel very well today. [21:55:19] POP< +OK send PASS [21:55:19] POP> PASS ******** [21:55:21] POP< +OK Welcome. [21:55:21] POP> STAT [21:55:21] POP< -ERR [SYS/TEMP] Temporary system problem. Please try again later.
|
|
| | IMAP as well – for some considerable time now. |
|
|
| | Anyone using both AWS and GCP that can form an opinion on availability of both? As a GCP customer I am not very happy with theirs. |
|
| | GCP is incredibly bad at communicating when there are problems with their systems. Just terrible. Its only when our apps start to break that we notice something is down, then look at the green dashboard which is even more infuriating. |
|
|
| | AWS is often the same way. No one seems to be good at communicating outage details. |
|
| | I really don’t get this. There’s a huge number of complaints about poor communication from companies like Google and AWS during every outage. Yet they remain seemingly indifferent to how much customer trust they are losing, and the competitive edge the first one to get this right could gain.I speak from experience: at Monzo we have had some pretty horrific and very public outages, yet because we communicate openly and proactively with our customers, people are very understanding and trust that we’re doing our best to fix things. The amount of love we’ve had from our customers during times when we’ve put them in a really bad spot – sometimes without access to their money – has been astonishing. |
|
| | I suspect there's a correlation between outages that are easy to detect and communicate and outages that automation can recover from so easily that you hardly notice. |
|
| | AWS has what feel like monthly AZ brownouts (typically degradated performance or other control plane issue) with the yearly-ish regional brown/blackout.GCP has quarterly-ish global blackouts, and generally on the data plane at that which makes them significantly more severe. |
|
| | Are there any services that track uptime for various regions and zones from various providers? It's rare that everything goes down and thus the cloud providers pretend they have almost no downtime. |
|
| | Obviously we don't know what the extent of the issue is yet, but afaik there has never been an AWS incident that has affected multiple regions where an application had been designed to use them (like using region specific S3 endpoints). GCP and Azure have had issues in multiple regions that would have affected applications designed for multi-region. |
|
| | > like using region specific S3 endpointsAWS had the S3 incident affecting all of us-east-1: “Other AWS services in the US-EAST-1 Region that rely on S3 for storage, including the S3 console, Amazon Elastic Compute Cloud (EC2) new instance launches, Amazon Elastic Block Store (EBS) volumes (when data was needed from a S3 snapshot), and AWS Lambda were also impacted while the S3 APIs were unavailable.” https://aws.amazon.com/message/41926/ |
|
| | There was a massive push after that to have everything regionalized. It's not 100% but it's super close at this point. |
|
| | That's one region, not the multiple region that OP mentioned |
|
| | I find GCP quicker to post status updates about issues than AWS, but GCP also seems to run into more problems that span across multiple regions.I'm overall happy with it, but if I needed to run a service with a 99.95% uptime SLA or higher, I wouldn't rely solely on GCP. |
|
| | You know this reminds me of a bad taste that Google Sales team left when I asked for some of my billing that I was unaware of running after following a quickstart guide.AWS refunded me in the first reply on the same day! GCP sales rep just copy pasted a link to a self support survey that essentially told me, after a series of YES or NO questions that they can't refund me. So why not just tell your customers like it is? Google Cloud is super strict when it comes to billing. I have called my bank to do a chargeback and put a hold on all future billing with GCP. I'm now back to AWS and still on a Free Tier. Apparently the $300 Trial with Google Cloud did not include some critical products, AWS Free tier makes it super clear and even still I sometimes leave something running on and discover it in my invoice…. I've yet to receive a reply from Google and its been a week now. I do appreciate other products such as Firebase but honestly for infrastructure and for future integration with enterprise customers I feel AWS is more appropriate and mature. |
|
|
| | Good that Google+ is up again |
|
| | I was playing around this afternoon with appengine, and thought I broke one my projects when I started getting 502 back.There appears to be some irregularities on consumer services as well that are of course certainly related, youtube was behaving a bit oddly for me. The impact seems to be cascading down from just GCE to other services as well – that status page certainly does not reflect the reality of the situation. You can't even sign into GCP right now, and things that run on GCE, like appengine seem impacted. |
|
| | Nest is down for me right now.It's amazing how far-reaching outages can be these days. |
|
|
| | Code reuse is a wonderful thing, until it's not. |
|
| | When talking about GCE being down please also mention what regions you are talking about |
|
| | In this case it's a luck if any are working correctly, a problem is global with some exceptions. |
|
| | seems to be some comments here of some regions functioning ok, although perhaps it’s not 100% in all regions |
|
| | us-central1 us-west1 us-west2is what I’ve heard so far. east seems to be OK, and Europe too |
|
| | Yep, I can no longer see my Cloud SQL database – it's as if I've never created one at all. Really hoping this is just an issue displaying it and that Google hasn't punted my infrastructure and backups. |
|
| | Praying isn't working. Now, I'll try sobbing 🙁 |
|
| | Systematic problem solving. I like it |
|
|
| | Just 2 weeks after I migrated a DB cluster from Azure to Google Cloud thinking things would get better. |
|
| | FWIW they might still be. |
|
| | 0 issues at compute, reporting for europe-west3-b, |
|
| | Not sure if related, but I was going to a BBQ yesterday and myself and 3 other people got lost because Google Maps app glitched out, directing us to the wrong places. If you search twitter for #googlemaps tons of people have the same issue. Surprised no one has posted about it. |
|
| | So that's why YouTube was being weird. I thought it was an extension problem or something. |
|
| | Github contribution graphs are also gone |
|
| | Wondered why Snapchat was being weird today. Thought it was my pi-hole setup blocking something from working, but nope, it's Google! |
|
| | could this be the result of another BGP hack ? cyberwarfare ? I am just speculating here big time. |
|
| | So far the Ko list:GCE, GKE, BQ, Pub/Sub, GAE asia-south1 us-west1 us-central1 us-west2 |
|
| | I've noticed problems on GDrive (GSuite) and YouTube as well. Connected? |
|
| | Weird for Twitter to still be up and fully functioning. I thought they migrated everything to GCP this/last year? |
|
| | Not the main functionality of the service, just lots of data analysis tooling. nothing that end users would notice |
|
| | Interesting. Thought I had read some posts of them migrating their data, but you could definitely be right. |
|
| | Is shopify on google cloud? i noticed they are having issues too |
|
|
| | Confirming issues on our end. I'm able to load up my console but when I go to Kubernetes Engine, I don't see my clusters. I'm monitoring closely on twitter |
|
|
| | > We will provide more information by Sunday, 2019-06-02 12:45 US/Pacific.I'm not seeing anything at 12:47. |
|
|
|
| | Cloud status dashboards seem to be hosted on the same cloud, which doesn't say much about redundancy. |
|
|
| | Everything looking normal on our GKE / CloudSQL stuff (eu-west1) |
|
| | gcloud tells me:WARNING: The following zones did not respond: us-west2, us-west2-a, southamerica-east1-c, us-west2-b, southamerica-east1, us-east4-b, us-east4, us-east4-a, northamerica-northeast1-c, northamerica-northeast1-b, us-west2-c, southamerica-east1-b, northamerica-northeast1, southamerica-east1-a, northamerica-northeast1-a, us-east4-c. List results may be incomplete. Luckily for us eu-west1 seems to be working normally. |
|
| | The status page took a while to show issues. My app was down, and Twitter knew google cloud was down before the official status page. |
|
| | Can't wait for the postmortem! |
|
| | My money is on config push. |
|
| | Took me a while to track latency issues to GCP. Wasn't expecting it. This also seems to affect some GAE instances and some of their products like google photos. At least according to my observations |
|
|
| | Ironically, I moved all of my objects off GCS today. |
|
| | u.s. west: all our cloud compute is inaccessible rn…. our API is down, can't ssh into the servers, and also can't see them on the dashboard. |
|
| | We are on region us-east1 and our systems are still up. Specifically, we are on us-east1-b. |
|
| | Yeah I was having trouble accessing my Gsuite apps, had a couple of 502s, which led me to check HN. While it doesn't give me 502 now, it's abnormally slow. |
|
| | Looks like only GCE is down according to the status page now. I'm able to access my console for instances and GKE clusters. |
|
|
| | I happened to be initializing a GKE pool upgrade just as this occurred. The upgrade is now stuck according to the console.The interesting thing is that a couple of minutes before everything went wrong, kubectl returned a "error: You must be logged in to the server (Unauthorized)" error |
|
| | My site runs on Google App Engine and its down as well. |
|
| | GCP has been down since 11:50am and they acked it 35 mins later. They're great at leaving their customers in the dark. |
|
| | Not much different from AWS, from what I've heard. |
|
| | Yeah, Amazon is the master of having their status page read all Green while half of US-East is in the toilet |
|
| | Definitely the case. Neither are super great at this. One issue is that issues that may 100% impact individual clients may only impact a vanishingly small amount of their overall service load. That mismatch between customer and provider experience is one of the ugly aspects of public cloud providers. |
|
| | That's why AWS is all about their Personal Health Dashboard (PHD). They can post specific issues for your account in there. Also, they get to keep the public page looking nice and green to show to executives of prospective customers. |
|
| | Also, it's one which gets hugely understated when people "move to the cloud".especially if you use your bussiness for B2B services. Stuff like this could make you loose your bussiness, especially if some entity like google doesn't communicate and as a result, you do not have a answer for your own customers. Medium sized private cloud providers are a lot better at this, considering the communication lines are a lot shorter. |
|
|
| | It took them 45 minutes to open a status page on this massive outage. I love GCP but that's not great. |
|
| | The GCE console also affected, couldn't send a support ticket just getting errors. |
|
| | Couldn't load the support console to "me too" this one either! |
|
| | Let's see if perfect leetcode skills will save the day. /s |
|
|
| | Slow is understatement… some pages on gitlab.com take minutes to load, and jobs take tens of minutes to start.EDIT: It's been like that since at least 12h ago though. Not sure if it's connected to Google Cloud? |
|
| | btw, Google Analytics realtime is down as well. |
|
| | I wasn't aware that outage and had small heartattack when I saw huge drop of visitors. I think other metrics are also affected. |
|
| | Google Play is also experiencing massive issues. |
|
| | I cant see any gke cluster in Brazil, or any VM. |
|
| | I'm seeing that with northamerica-northeast1. I can't access anything over the network in that region and most of the GKE clusters and VMs in that region aren't listed in the console |
|
|
| | So that's why I can't login to YouTube this morning… |
|
| | gmail also down/super slow atm for me (East Coast, USA) |
|
| | youtube streaming is also down |
|
| | just had rolld20 in the USA blow out a game I wonder if it is effected |
|
| | Looks to be working in the UK |
|