The situation is certainly a blow for Cloudflare. The content delivery network offers enhanced cloud security and performance. Such a flaw is problematic considering the nature of the product, and the company’s supported 5 million websites were all at risk. A serious software bug caused private data like cookies, authentication tokens, and passwords to leak. They spilled into public in plaintext format. It is worth noting that this was not a security breach, but simply an error in Cloudflare’s software. The result allowed anyone who noticed the problem to collect customer information that is usually hidden. One man who did notice was Google engineer Tavis Ormandy. He first spotted the bug and called it Cloudbleed. In a blog post he said he discovered the flaw after originally believing there were problems in his own code. “We fetched a few live samples, and we observed encryption keys, cookies, passwords, chunks of POST data and even HTTPS requests for other major Cloudflare-hosted sites from other users,” Ormandy wrote. “This situation was unusual, [personally-identifiable information] was actively being downloaded by crawlers and users during normal usage, they just didn’t understand what they were seeing.” Ormandy points out that all the private data sample he collected were properly destroyed. However, he posted some of the samples in redacted form to show the flaw was real. “We keep finding more sensitive data that we need to cleanup. I didn’t realize how much of the internet was sitting behind a Cloudflare CDN until this incident,” Ormandy wrote. “I’m finding private messages from major dating sites, full messages from a well-known chat service, online password manager data, frames from adult video sites, hotel bookings. We’re talking full HTTPS requests, client IP addresses, full responses, cookies, passwords, keys, data, everything.” Since the blog post, Ormandy and Cloudflare have had differing views as to how the company fixed the issue. The engineer believes the company should have acted quicker. Cloudflare chief technology officer John Graham-Cumming says the company acted well within regulation. “This is subject to a 90 day disclosure. We were disclosing after six days,” Graham-Cumming told TechCrunch. “He’s saying he’s frustrated but I’m a little bemused at why he’s frustrated with six days rather than 90. We would have disclosed even earlier, but because some of this info had been cached, we thought we had a duty to clean that up before it became public. There was a danger that info would persist in search engines like Google.”
Cloudflare Remediation
Solving the software bug was difficult. Firstly, it is hard for the company to know exactly what data was leaked and then seen. The company says it has not contacted major customers like Uber individually because of this. Shoring up the problem also involved approaching major web search providers such as Google, Bing, and Yahoo. This was because the leaked data automatically cached with engines. Those providers and others were asked to work with the company to manually scrub the data. While the leak may have been happening since Sept. 2016, Cloudflare says the most severe drainage of data occurred between Feb. 13 and Feb. 18 this year. Still, the leak was more of a drip than a torrent. Cloudflare says during this fast period, 1 in every 3,300,000 HTTP requests to customer websites would have resulted in a leak. The problem is, once the data leaked it was cached by search engines. This means exposed information could be seen in real time, or later via a browser search. Data only leaked for 0.00003%, emphasizing that indeed this was a small leak. However, Cloudflare has 5 million websites on its network, so even that small number meant lots of exposed data. “At the peak, we were doing 120,000 leakages of a piece of information, for one request, per day,” Cloudflare chief technology officer John Graham-Cumming told TechCrunch. “It was a bug in the thing that understands HTML,” Graham-Cumming explained. “We understand the modifications to web pages on the fly and they pass through us. In order to do that, we have the web pages in memory on the computer. It was possible to keep going past the end of the web page into memory you shouldn’t be looking at.”