Cloudflare CEO Details What Led to the Massive global Outage | Find a Way

byMatthews Martins -November 19, 2025

101

Cloudflare CEO details what led to the massive global outage

The company had detailed what happened in a new blog post.

Matthews Martins

on November 19, 2025

Credit: Algi Febri Sugita / SOPA Images / LightRocket via Getty Images

A Cloudflare outage took out a large swathe of the internet on Tuesday, with users unable to access numerous sites and services such as X, ChatGPT, Spotify, YouTube, and Uber. The cybersecurity company has now published a blog post detailing exactly what happened.

Cloudflare co-founder and CEO Matthew Prince apologised in the post late Tuesday, stating that this outage was the worst the company has experienced since 2019.

"[I]n the last 6+ years we've not had another outage that has caused the majority of core traffic to stop flowing through our network," said Prince. "On behalf of the entire team at Cloudflare, I would like to apologize for the pain we caused the Internet today."

Recommended Deals For You

Apple AirPods Pro 3 Noise Cancelling Heart Rate Wireless Earbuds — $219.99 (List Price $249.00)

Apple iPad 11" 128GB Wi-Fi Retina Tablet (Blue, 2025 Release) — $279.00 (List Price $349.00)

Amazon Fire HD 10 32GB Tablet (2023 Release, Black) — $69.99 (List Price $139.99)

Sony WH-1000XM5 Wireless Noise Canceling Headphones — $248.00 (List Price $399.99)

Blink Outdoor 4 1080p Security Camera (5-Pack) — $159.99 (List Price $399.99)

Fire TV Stick 4K Streaming Device With Remote (2023 Model) — $24.99 (List Price $49.99)

Bose Quiet Comfort Ultra Wireless Noise Cancelling Headphones — $298.00 (List Price $429.00)

Shark AV2511AE AI Robot Vacuum With XL Self-Empty Base — $249.99 (List Price $599.00)

Apple Watch Series 11 (GPS, 42mm, S/M Black Sport Band) — $349.99 (List Price $399.00)

WD Elements 14TB Desktop External USB 3.0 Hard Drive — $169.99 (List Price $279.99)

Products available for purchase through affiliate links. If you buy something through links on our site, Mashable may earn an affiliate commission.

Prince explained that the Cloudflare outage had been caused by an issue with the system it uses to protect websites from DDoS attacks.

Cloudflare's outage, explained

Cloudflare's Bot Management system is a service which protects websites against malicious bot attacks. These include DDoS attacks that flood websites with excessive traffic, content scraping attacks which gather data from websites without authorisation, and autonomous credential stuffing attacks which try to gain access to websites by using leaked login details from other sites.

Mashable Light Speed

Want more out-of-this world tech, space and science stories?

By clicking Sign Me Up, you confirm you are 16+ and agree to our Terms of Use and Privacy Policy.

This Bot Management system includes an AI model which scores traffic requests. Whenever there's an attempt to access a website protected by Cloudflare's Bot Management, the AI generates a score to determine if it's likely to have been from a bot. In order to do so, the AI considers various features of the request, which are held in a "feature file."

The feature file is where the issue occurred. This file refreshes every five minutes to keep up to date with evolving bot behaviours, and is used across Cloudflare's entire cybersecurity network. However, the company implemented a change to the underlying query that generated the file, which caused it to duplicate information a large number of times. This made the feature file larger than typical, triggering an error in the Bot Management system.

As a result, attempting to access websites which use Cloudflare's Bot Management system resulted in an error code. Cloudflare states that its network began experiencing significant failures about 15 minutes after the feature file generation update was implemented.

Cloudflare initially suspected the outage was a malicious attack, particularly as its status page went down despite being independent from the company's infrastructure. However, Prince stated that this turned out to be a coincidence.

"The issue was not caused, directly or indirectly, by a cyber attack or malicious activity of any kind," Prince stressed. "After we initially wrongly suspected the symptoms we were seeing were caused by a hyper-scale DDoS attack, we correctly identified the core issue and were able to stop the propagation of the larger-than-expected feature file and replace it with an earlier version of the file."

When previously reached by Mashable prior to the blog post, a Cloudflare spokesperson also emphasised that "there [was] no evidence that [the outage] was the result of an attack or caused by malicious activity."

Cloudflare's services were largely restored within three hours, and fully restored after approximately five hours. Prince stated that the company is already planning measures to prevent similar outages in the future, including stopping error reports from being able to overwhelm its systems.

Topics Apps & Software

Tags: Apps & Software Tech

101 Comments

Stay informed!

AnonymousNovember 21, 2025 at 7:47 PM
Nice and great
ReplyDelete
Replies
AnonymousNovember 21, 2025 at 7:48 PM
Quite a crazy turn of events, however, glad to see things are back to normal, the very thing that was designed to protect ended up costing Cloudflare the damage
ReplyDelete
Replies
AnonymousNovember 21, 2025 at 7:49 PM
So the file that managed threat traffic was larger than expected but it wasn't caused by any threats. 🤔
ReplyDelete
Replies
AnonymousNovember 21, 2025 at 7:49 PM
Reading between the lines..cyber attack. To be admitted in 3 years time
ReplyDelete
Replies
AnonymousNovember 21, 2025 at 7:49 PM
Toi aussi un tour
ReplyDelete
Replies
AnonymousNovember 21, 2025 at 7:49 PM
World is moving
New Era
By wise
ReplyDelete
Replies
AnonymousNovember 21, 2025 at 7:50 PM
I almost forgot Mashable existed. 😂. Mashable really fell that hard.
ReplyDelete
Replies
AnonymousNovember 21, 2025 at 7:50 PM
At "This is what happened" article is missing what is actually happened.
Please fix.
ReplyDelete
Replies
AnonymousNovember 21, 2025 at 7:50 PM
Someone pushed an update in prod.
Other explanations are excuses and futile...
ReplyDelete
Replies
AnonymousNovember 21, 2025 at 7:50 PM
Good
ReplyDelete
Replies
AnonymousNovember 21, 2025 at 7:51 PM
https://media1.tenor.co/m/ikEP2PJ787AAAAAC/trump-everythings-fine.gif?
ReplyDelete
Replies
AnonymousNovember 21, 2025 at 7:51 PM
Paling rekomended selalu pengen balik 𝐃𝐄𝐋𝐈𝐂𝐔𝐀𝐍 lagi dan lagi 🤩
ReplyDelete
Replies
AnonymousNovember 21, 2025 at 7:51 PM
Super duper platform
https://media1.tenor.co/m/giEWYOz5WgQAAAAC/best-sis-ever-sister-love.gif?
ReplyDelete
Replies
AnonymousNovember 21, 2025 at 7:51 PM
AI Servers, it's happening .. 😈
Go check the comments : 👇
https://youtu.be/hoxBaTSPwLM?si=lYg4XCJffH-8iTpd
ReplyDelete
Replies
AnonymousNovember 21, 2025 at 7:52 PM
Bullshit
This is due to bugs in Rust language
ReplyDelete
Replies
AnonymousNovember 21, 2025 at 7:52 PM
It was a damn DDoS. Deny it all you want but your ass wad wide open. Secure better your servers you morons
ReplyDelete
Replies
AnonymousNovember 21, 2025 at 7:52 PM
Good Morning 😊
ReplyDelete
Replies
AnonymousNovember 23, 2025 at 7:12 PM
Who, me?
Whups

-EFBIG
ReplyDelete
Replies
AnonymousNovember 23, 2025 at 7:12 PM
Something missed in testing....
ReplyDelete
Replies
AnonymousNovember 23, 2025 at 7:13 PM
I guess they test in production then..
ReplyDelete
Replies
AnonymousNovember 23, 2025 at 7:16 PM
The picture in my mind …

a teenage intern with copies of SQL for Dummiesand The Complete Idiot's to SQL open at the GRANT pages. ;)
ReplyDelete
Replies
AnonymousNovember 23, 2025 at 7:17 PM
Data model assumption gone bad

Sounds like the classic referential integrity mishap leading to more query results than expected
ReplyDelete
Replies
AnonymousNovember 23, 2025 at 7:17 PM
The irony.

mfw DDoS prevention vendor DoS-es itself.
ReplyDelete
Replies
AnonymousNovember 23, 2025 at 7:17 PM
Who watches the global kill switches?

As a consummate pessimist, eventually one might be concerned that all these additional "global kill switches" which future "important" code updates might require, alone become the next source of outages if discovered they are be easy to trigger, just like an actual fire alarm?

Complacency breeds.... something?
ReplyDelete
Replies
AnonymousNovember 23, 2025 at 7:22 PM
Some people are already excusing them based on their size/cost/uptime records, but none of those excuse a system that could allow this to happen, however much an employee screws up.

After all, next time it could be a hacker or malicious state actor. There's a reason to make redundancy inherent in any design .

Another example is that software for Windows a year ago that managed to nuke windows installs worldwide. You can bet they've tightened procedures, and are more careful with testing etc. but I bet, architecturally, their system hasn't changed.
ReplyDelete
Replies
AnonymousNovember 23, 2025 at 7:24 PM
Untested change

It sounds like a very simple untested change was made.

Which is incredible. No basic change control?
ReplyDelete
Replies
AnonymousNovember 23, 2025 at 7:25 PM
stabilized in the failing state

If I was still managing IT systems, I would remember this phrase. Stable systems is the gold standard.
ReplyDelete
Replies
AnonymousNovember 23, 2025 at 7:27 PM
Was visiting about 10 different websites sites at the time this all started and saw varied CloudFlare errors indicating that my browser was ok, the CloudFlare Server had failed and the destination server was ok...

The message suggested that the problem was NOT with CloudFlare's systems but most probably at my end...

For the first ten minutes or so the https://www.cloudflarestatus.com/ site indicated no problems...

After https://www.cloudflarestatus.com/ indicated a problem I noted that the messages when accessing affected sites started saying that there was actually a problem with CloudFlare's systems...

Most hilariously was when I did a search for "is cloudflare down" and found that 19 out of 20 sites I visited failed due to the CloudFlare outage...
ReplyDelete
Replies
AnonymousNovember 23, 2025 at 7:28 PM
I remember an SAP gig
I worked on that was led by a highly paranoid guy that worried about stuff like this. I was dealing with storage and wasn't involved on the deployment end, but I'd shoot the shit with the guys who were. They had a number of SAP application servers but when they made a change in production they applied it to ONLY the odd numbered ones first, then they'd later apply it to the even numbered ones.

The logic was that if they encountered some type of intermittent error related to the change they could track which application server the user's client was connected to and figure out if it was one of the ones with the change, and see if forcing their client to reconnect to an even numbered application server resolved their issues.

Because they'd massively overprovisioned the number of application servers to deal with huge end of quarter / end of year loads if they ran into something like this they could force everyone to an application server without the change and reverse it on the odd ones. If there were no issues they could push it to the even servers. Obviously they were in a change freeze during EOQ/EOY so they always had that slack capacity to work with, so the project leader used it in a smart way. I assume he'd learned the hard way at some point that no matter how much testing you do, sometimes when you push a change into production it doesn't work the way you expect.

They had similar strategies like this for the database too apparently, though I have no idea how that worked.
ReplyDelete
Replies
AnonymousNovember 23, 2025 at 7:29 PM
What is missing from the list:

- Remove SELECT * queries....
ReplyDelete
Replies
AnonymousNovember 23, 2025 at 7:32 PM
Something goes wrong and the reaction is "We're under attack" and not "Oops, was that me?" from someone who just changed something.
ReplyDelete
Replies
AnonymousNovember 23, 2025 at 7:33 PM
Bot attack

So, the AI has control of the internet, and can take it away at will.
ReplyDelete
Replies
AnonymousNovember 23, 2025 at 7:34 PM
Centralization is the problem

What most people aren't getting is that the existence of systems like Cloudflare IS the problem.

It should NOT be possible for an error on one system to take down a huge chunk of the internet. But that's EXACTLY what happened. Cloudflare, and the entire concept of ANY single company that has that much power to cause disruption, are a menace to the internet.

Organizations need to seriously rethink their use of such systems at all.
ReplyDelete
Replies
AnonymousNovember 23, 2025 at 7:35 PM
Bad Rust code was the problem...

For those who sanctify Rust every day, a plain .unwrap() in production code :-O

https://blog.cloudflare.com/18-november-2025-outage/

Remember, Rust is neither the holy grail nor the silver bullet, it is just another tool that can be used by incompetent programmers. Having said that, stop this nowadays nonsense of trying to rewrite everything in Rust.
ReplyDelete
Replies
AnonymousNovember 23, 2025 at 7:37 PM
Respect

Lots of blame going around and legitimate frustration at the impact but, personally, I actually appreciate the statement that was put out. This was very detailed and clearly taking accountability, not just for the outage, but for the remediation. Shit happens and it's about how you deal with it and improve resilience. They've identified the root causes in impressive time and committed to actions to resolve those issues. In my book this is a perfect response.
ReplyDelete
Replies
AnonymousNovember 23, 2025 at 7:37 PM
Man, Little Bobby Tables all grown up and working for a big company like Cloudflare now
ReplyDelete
Replies
AnonymousNovember 23, 2025 at 7:37 PM
Hardened Data

"Hardening ingestion of Cloudflare-generated configuration files in the same way we would for user-generated input"

Should have been done already. IMO, you should sanitise/check input data no matter what the source...maybe even more so for auto-generated stuff. Just because you got it from another computer in the same company doesn't make the data quality any better.

Auto-generated data is just data that has been written by a 'Human Nth removed' programmer. If they are bad or don't understand the task, then of course things like that will happen.

We hear of 'Move fast and break things'....looks like someone did!

Icon is for their change control and risk management.
ReplyDelete
Replies
AnonymousNovember 23, 2025 at 7:38 PM
Microsoft:

Guys we've took 10s of millions of users of far to many times this year, we must be the most unreliable out there!

Cloudflare:

Hold my beer!
ReplyDelete
Replies
AnonymousNovember 23, 2025 at 7:38 PM
Login Hell

So I'm getting bad login, I decide to change my password, 2-factor code not coming through,

I JUST WANT TO ADJUST MY DAMN THERMOSTAT.

The joke about your toilet needing your email address is brilliant and not funny.
ReplyDelete
Replies
AnonymousNovember 23, 2025 at 7:40 PM
They never learn

It was in 1978 that I learned that you *always* validate your input data, particularly size. Almost 50 years later and the same mistakes still happen.
ReplyDelete
Replies
AnonymousNovember 23, 2025 at 7:40 PM
The elephant in the room

There's been a great deal of discussion here, a few jokes, and a fair amount of war stories. Strange, though, that everyone seems to ignore the bigger issue, which is that for a very large chunk of the Internet, Cloudflare has become a single point of failure.

And if that's not bad enough, said SPOF doesn't seem to have adequate change management procedures in place, to the point where they didn't even realize it was their own change that broke mission-critical services that large numbers of the world's businesses and private citizens rely on.

Ehm... That's bad, right?
ReplyDelete
Replies
AnonymousNovember 23, 2025 at 7:40 PM
I’ve seen the future brother, it is murder

I find it darkly feckin’ hilarious that a globally distributed network designed to be fault tolerant has gotten to a situation 40yrs later where large amounts of it become unusable when just one of a half-dozen large companies has a bad day.

How long’s it been since the AWS us-east-1 debacle? Like a month or something?
ReplyDelete
Replies
AnonymousNovember 23, 2025 at 7:40 PM
What we need is

What we need is an Internet designed to not rely on a single point of failure!

Oh, hang on…
ReplyDelete
Replies
AnonymousNovember 23, 2025 at 7:41 PM
Eggs

Eggs in one basket again? Or should that be one bad apple ...
ReplyDelete
Replies
AnonymousNovember 23, 2025 at 7:41 PM
Reminds when I keep confusing F2 and F4 when coding in BASIC, one key loads the other saves, guess who?
ReplyDelete
Replies
AnonymousNovember 23, 2025 at 7:41 PM
So we're positive it wasn't DNS this time?

Because historians need to know.
ReplyDelete
Replies
AnonymousNovember 23, 2025 at 7:41 PM
To big to fail is not good

Having everyone depend on the same infrastructure providers is a problem. When they go bad they take down too much of the internet. Countries should pass laws that encourage lots of these providers instead of supporting the very big ones.
ReplyDelete
Replies

Add comment

Cloudflare CEO Details What Led to the Massive global Outage | Find a Way

Cloudflare CEO details what led to the massive global outage

Cloudflare's outage, explained

101 Comments

Contact Form