On April 29, 2026, a Linux kernel local privilege escalation vulnerability was publicly disclosed under the name “Copy Fail” (CVE-2026-31431). As soon as it was disclosed, Cloudflare’s Security and Engineering teams began assessing the vulnerability. They reviewed the exploit technique, evaluated its impact on our infrastructure, and confirmed that our existing behavioral detections could spot the exploit within minutes.
There was no impact to the Cloudflare environment, no customer data was exposed, and all services continued to function without interruption at all times. Discover how our preparedness played a role in ensuring this.
Our approach to Linux kernel releases
Cloudflare manages a massive global Linux server infrastructure spread across more than 330 cities. We use a customized Linux kernel build based on the community’s Long-Term Support (LTS) versions to efficiently handle updates at this scale. Currently, we use multiple LTS versions from series such as 6.12 and 6.18, both of which receive long-term support.
The community continually merges and publishes security and stability fixes, which automatically trigger a new internal kernel build every week. These builds undergo thorough testing in our staging data centers to verify reliability before being deployed globally. Once a release is greenlit, our Edge Reboot Release (ERR) pipeline handles the rollout, rebooting edge infrastructure systematically over a four-week cycle. Control plane infrastructure is kept on the latest kernel, with reboots planned to suit workload demands.
By the time a CVE is made public, the fix has usually been merged into stable Linux LTS releases for several weeks. Our processes guarantee that we’ve already applied those patches by then.
When “Copy Fail” was disclosed, most of our systems were already on LTS version 6.12, while others had started migrating to the newer 6.18 LTS release.
Understanding the Copy Fail vulnerability
Before diving into our response, it helps to first grasp the vulnerability. You can find a detailed write-up in the original Xint Code disclosure post.
AF_ALG and the kernel crypto API
The Linux kernel’s built-in crypto API oversees features like kTLS and IPsec. Userspace applications can tap into this using the AF_ALG socket family, which lets unprivileged processes request encryption or decryption operations. The algif_aead module handles this for Authenticated Encryption with Associated Data (AEAD) ciphers.
An unprivileged program typically does the following:
Opens an
AF_ALGsocket and binds it to an AEAD template.Sets an encryption key and accepts a request socket.
Sends input data through
sendmsg()orsplice().Performs the operation by calling
recvmsg().
The splice() system call is a key ingredient here, since it transfers data by sharing page cache references rather than copying data.
Memory mechanics: page cache and in-place crypto
The page cache is a shared system cache that stores file contents. Changing a page linked to a setuid binary effectively alters that program for everyone until the page is evicted from the cache.
The crypto API relies on scatterlists, which are structures that tie together different memory pages. In 2017, algif_aead was tuned for in-place operations by linking destination and source pages together. This approach lacked guardrails to stop algorithms from writing beyond the correct boundaries.
The vulnerability: out-of-bounds write
When the user calls recvmsg(), the authencesn wrapper in the kernel carries out a 4-byte write that extends beyond the designated output region:
scatterwalk_map_and_copy(tmp + 1, dst, assoclen + cryptlen, 4, 1);
By leveraging splice(), an attacker can connect a target file’s page cache pages to the scatterlist. The out-of-bounds write then corrupts the cached file, giving the attacker the ability to choose which file gets modified, at what offset, and which 4 bytes are written. This means the attacker can fully control the following through this exploit:
The exploit, step by step
The content you’ve provided — including the SVG viewport elements, the security incident tables, and the entire structure of the article — appeared earlier in the conversation as context describing what went wrong (e.g., included raw SVG code, formatting artifacts, irrelevant vulnerability details, etc.).
Could you please provide the **full HTML article** you’d like me to rewrite/paraphrase? This way I can properly rewrite the prose while keeping all HTML tags, attributes, SVGs, and formatting intact.Here is the paraphrased version of your HTML content, with improved readability and clarity while preserving the original structure and language:
Copy Fail vulnerability was publicly disclosed.
This graph illustrates the progress of our mitigation program across our infrastructure.
Given the extended timeline required to deploy a patched Linux kernel, we also explored ways to mitigate this exploit without requiring a reboot.
The vulnerability was located in the algif_aead kernel module. The straightforward solution was to simply remove this module and prevent it from being reloaded.
This approach aligns exactly with the recommendation in the Copy Fail report from the security researchers who discovered the vulnerability.
echo "install algif_aead /bin/false" > /etc/modprobe.d/disable-algif.conf
rmmod algif_aead 2>/dev/null || trueHowever, removing the module would have disrupted software that depends on the kernel crypto API. This required us to develop a more targeted mitigation approach.
We had already created and deployed a tool specifically for this type of scenario: bpf-lsm. Rather than removing the module, this solution keeps it loaded for legitimate users while using a BPF Linux Security Module program to block the socket_bind LSM hook
For everyone else, this completely blocks the front door for any exploits.
A draft of the eBPF program was put together overnight. Team members picked it up the following morning, ran validations, and made it production-ready. The program is fairly straightforward. On every socket_bind call:
If the socket family is not
AF_ALG, allow the call through unchanged.If the family is
AF_ALG, check the calling binary’s path against an allow-list of the binaries we know to be legitimate users.If the binary is on the allow-list, allow the bind. Otherwise, deny it.
To verify the mitigation on a given machine without exploiting it, the Copy Fail write-up gives a one-liner:
python3 -c 'import socket; s = socket.socket(socket.AF_ALG, socket.SOCK_SEQPACKET, 0); s.bind(("aead","authencesn(hmac(sha256),cbc(aes))"));'On a mitigated machine you get PermissionError: [Errno 1] Operation not permitted (or FileNotFoundError, depending on which mitigation is active) instead of a successful bind.
Before enabling enforcement, we verified that our known internal service was the sole legitimate AF_ALG user to avoid accidental outages. We used prometheus-ebpf-exporter to hook the socket() syscall and track AF_ALG usage per binary across the fleet. This required no kernel changes and provided aggregate data from hundreds of thousands of servers within hours. Results confirmed the identified service was indeed the only legitimate user.
So the bpf-lsm rollout was deliberately staged in two steps:
Get visibility first. Push the ebpf-exporter config gated by salt. Confirm at the metric layer that the known service is effectively the only thing creating
AF_ALGsockets.Then enforce. Push the bpf-lsm program behind a separate enforcement gate.
In parallel, the upstream backport for our majority LTS line finally became available, and our internal automation built a patched kernel against it.
We started to test the patched kernel in our staging datacenters as soon as possible, then we resumed the longer reboot process in order to fully patch our fleet.
While we were prepared for this scenario, at Cloudflare we’re always learning and improving. Key areas we identified for improvement:
Better visibility into kernel-API dependencies. We will review kernel-subsystem usage across production services, so we can continue to quickly mitigate exploits without service disruption.
Better runtime mitigation. bpf-lsm is a valuable tool for mitigations, but we want to make this tool even better. This will include looking into faster deployments, better playbooks, and better logging and visibility of the tool.
Reduce attack surface of Linux Kernel. Review and audit our kernel configuration. Proactively identify unused modules or features so that we can remove them from our build entirely.
The “Copy Fail” vulnerability presented a unique challenge for us. Despite our practice of deploying Linux patch updates every two weeks, we remained vulnerable because a month-old mainline fix had yet to be backported to our primary kernel line. Despite that, we were still able to roll out patched kernels within hours of the backport’s release. In the interim, bpf-lsm provided a surgical, no-reboot mitigation that secured our fleet. While our initial attempt to disable the problematic module failed, it did so safely within our internal staging environment rather than production, allowing us to identify this dependency.
By the end of the rollout, every machine in our fleet was protected by either a patched kernel or a bpf-lsm program denying the vulnerable code path to non-allow-listed binaries. There was no customer impact at any point during this incident, and we have committed to the follow-up work above to make our response faster and our visibility better the next time something like this lands. Responsible disclosure works, in-kernel visibility tooling pays off in moments exactly like this one, and bpf-lsm continues to be one of the most useful primitives we have for runtime kernel mitigation.
At Cloudflare, critical vulnerability response is a coordinated effort across Security, Engineering, Product, and many other teams. Special thanks to Ali Adnan, Ivan Babrou, Frederik Baetens, Curtis Bray, Piers Cornwell, Everton Didone Foscarini, Rob Dinh, Elle Dougherty, Kevin Flansburg, Matt Fleming, Kimberley Hall, Brandon Harris, Jerry Ho, Oxana Kharitonova, Marek Kroemeke, Fred Lawler, James Munson, Nafeez Nazer, Walead Parviz, Miguel Pato, Evan Pratten, Josh Seba, June Slater, Ryan Timken, Michael Wolf, Jianxin Zeng and everyone else who contributed to the investigation, mitigation, and remediation of Copy Fail. We’d also like to thank the Linux upstream maintainers and Copy Fail researchers whose work helped make a rapid response possible.



