18 Ottobre 2024 |

~ 12 minuti

di Davide Ornaghi

Integrating Nftables rules into Syzkaller

In recent years, more and more Linux-based operating systems have migrated their firewalls from the iptables codebase to that of nftables, a programmable and more performant alternative, which, however, boasts of great complexity being an actual virtual machine. Today, most systems, Debian included, implement user-space tools to convert iptables rules into their respective nftables instructions and dynamically manage tables, chains and rules.

Because of the customizable and relatively new nature of the system, nftables is frequently targeted by attackers looking for new 0-days to gain root privileges on the machine (LPE). For a better understanding of nftables internals, I recommend reading the first part of my previous blog post where I go through how to talk to and exploit nftables.

One of the techniques employed by vulnerability researchers is coverage-driven fuzzing based on syscalls, via the opensource platform named syzkaller. In order to also cover networking subsystems that cannot be reached through classical syscalls, syzkaller was updated to inject frames through a virtual network device. However, looking at the syzkaller code coverage reports, it is apparent that some particular components of nftables are hardly ever considered and are therefore excluded from the tests.

This research thus highlights the limitations of syzkaller in analyzing certain functions, and demonstrates a new technique that optimizes the search for vulnerabilities in nftables, with the goal of better managing the attack surface within the firewall. I will often reference this awesome article, Looking for Remote Code Execution bugs in the Linux kernel to describe the interactions between syzkaller and the Linux network stack. Syzkaller employs grammars, defined in a proprietary language, to describe a subsystem and generate well-formed input messages (and edge cases). Since nftables already has its own grammar, this research takes it as a starting point and addresses the optimization process to include the full spectrum of firewall rules in the coverage.

Extending the nftables grammar

Configuring Syzkaller for development

While I won’t be going through the Syzlang language since it’s already been covered several times (e.g. Syscall description syntax), I will show how to take an already existing grammar and update it to support the most recent features, which should also help better understanding syzlang on a practical level.

We can go ahead and clone the syzkaller repo and follow the instructions in the docs to compile it, I suggest installing the latest Go version to prevent some annoying issues with environment variables later on. Remember to include the GOROOT and PATH variables in your favorite .rc file and source it for future use, in this case I chose the /usr/local/go installation path for simplicity.

echo 'export GOROOT=/usr/local/go
export GOPATH=$HOME/go
export PATH=$GOPATH/bin:$GOROOT/bin:$PATH' >> ~/.bashrc

source ~/.bashrc

While we’re at it, we can install some development prerequisites by running the make install_prerequisites build command, this will take care of all the cross-platform building tools.

Once syzkaller is ready, we need a pair of mainline Linux kernel source trees, one to generate the *.const files containing the definitions of constant data, which isn’t meant to be compiled, the other one to provide the base image for our fuzzing instances, to be compiled with, at least, the following kernel configs:

CONFIG_KCOV=y
CONFIG_KCOV_INSTRUMENT_ALL=y
CONFIG_KCOV_ENABLE_COMPARISONS=y
CONFIG_DEBUG_FS=y
CONFIG_DEBUG_INFO=y
CONFIG_KALLSYMS=y
CONFIG_KALLSYMS_ALL=y
CONFIG_USER_NS=y
CONFIG_CONFIGFS_FS=y
CONFIG_SECURITYFS=y
CONFIG_CMDLINE_BOOL=y
CONFIG_CMDLINE="net.ifnames=0"
# CONFIG_RANDOMIZE_BASE is not set
CONFIG_KASAN=y
CONFIG_KASAN_INLINE=y

While it may look trivial, we should also make sure to include the subsystems we are going to fuzz, in this case nftables:

CONFIG_NF_TABLES=y
CONFIG_NF_TABLES_INET=y
CONFIG_NF_TABLES_NETDEV=y
CONFIG_NF_TABLES_IPV4=y
CONFIG_NF_TABLES_ARP=y
CONFIG_NF_TABLES_IPV6=y
CONFIG_NF_TABLES_BRIDGE=y
CONFIG_NFT_NUMGEN=y
CONFIG_NFT_CT=y
CONFIG_NFT_FLOW_OFFLOAD=y
CONFIG_NFT_CONNLIMIT=y
CONFIG_NFT_LOG=y
CONFIG_NFT_LIMIT=y

… and all the other expressions.

Now let’s make sure that the build process is working before making any changes, we will first be generating the const files for each subsystem, then updating the generated code and finally building the syzkaller binaries like so:

make extract TARGETOS=linux SOURCEDIR=$HOME/linuxk/linux-6.9.7-src
make generate
make

Notice how we target the source-only path in the extraction step. Since we’re trying to extract constant data from all subsystems and archs, if the kernel source isn’t aligned with the description files (some consts have been removed), we might have to do some manual work and delete the offending consts from the descriptions.

Fixing the flatc error during make generate

I found myself dealing with this error on Ubuntu where the installed flatbuffers version seemed to be incompatible with syzkaller.

flatc -o pkg/flatrpc --warnings-as-errors --gen-object-api --filename-suffix "" --go --gen-onefile --go-namespace flatrpc pkg/flatrpc/flatrpc.fbs
flatc: error: unknown commandline argument: --warnings-as-errors

By trial and error, I determined that the most recent working version is v2.0.8, you can grab it form the official flatbuffers project.

sudo apt remove flatbuffers-compiler-dev flatbuffers-compiler libflatbuffers-dev  libflatbuffers1
wget https://github.com/google/flatbuffers/archive/refs/tags/v2.0.8.tar.gz
tar -xvzf v2.0.8.tar.gz 
cd flatbuffers-2.0.8/
cmake -G "Unix Makefiles"
make
sudo make install

If no further errors appear, we should be able to find our binaries in the syzkaller/bin/ and syzkaller/bin/linux_amd64/ directories.

Practically understanding syzlang

With a working development environment I started looking at the current grammar for the nftables subsystem, which is located in /sys/linux/socket_netlink_netfilter_nftables.txt, although it’s a txt file, it actually contains code in the syzlang language, which can be pleasantly formatted thanks to the syz-lang extension for VSCode. After the includes, we can immediately spot two type definitions:

File: /sys/linux/socket_netlink_netfilter_nftables.txt

type msghdr_nf_tables[CMD, POLICY] msghdr_netlink[netlink_msg_netfilter_t[NFNL_SUBSYS_NFTABLES, CMD, POLICY]]
# TODO: we should obtain them from somewhere, probably from other netlink messages,
# but we can't extract output netlink attributes.
type nft_chain_id int32be

The fist one is used in simple netlink messages and depends on two parameters: the message type and the corresponding object policy. msghdr_nf_tables is then defined as a subtype of msghdr_netlink with a netlink_msg_netfilter_t type as payload, which is defined under sys/linux/socket_netlink_netfilter.txt and contains the actual netlink message structure, with message len, type, subsystem, flags and attributes:

File: sys/linux/socket_netlink_netfilter.txt

type netlink_msg_netfilter_tt[SUBSYS, CMD, POLICY] {
    len len[parent, int32]
    type    const[CMD, int8]
    subsys  const[SUBSYS, int8]
    flags   flags[netlink_netfilter_msg_flags, int16]
    seq const[0, int32]
    pid const[0, int32]
    hdr nfgenmsg
    attrs   POLICY
} [align[4]]

The nft_chain_id type represents chain ids that can be provided to nftables instead of referring to chains via the usual names. From the chain policy in nf_tables_api.c, we can see its type, uint32_t, hence int32be in syzlang. The comment highlights how, since ids are arbitrarily specified by the user, it should be impossible to keep track of which ones have been used in the same tables. Unfortunately, syzlang currently doesn’t offer a way to store input data from previous programs, so we will most likely miss them.

Next off we can see some read-only nftables syscalls to retrieve information from the existing structures (tables, chains and rules):

sendmsg$NFT_MSG_GETTABLE(fd sock_nl_netfilter, msg ptr[in, msghdr_nf_tables[NFT_MSG_GETTABLE, nft_table_policy]], f flags[send_flags])
sendmsg$NFT_MSG_GETCHAIN(fd sock_nl_netfilter, msg ptr[in, msghdr_nf_tables[NFT_MSG_GETCHAIN, nft_chain_policy]], f flags[send_flags])
sendmsg$NFT_MSG_GETRULE(fd sock_nl_netfilter, msg ptr[in, msghdr_nf_tables[NFT_MSG_GETRULE, nft_rule_policy]], f flags[send_flags])
sendmsg$NFT_MSG_GETSET(fd sock_nl_netfilter, msg ptr[in, msghdr_nf_tables[NFT_MSG_GETSET, nft_set_policy]], f flags[send_flags])
sendmsg$NFT_MSG_GETSETELEM(fd sock_nl_netfilter, msg ptr[in, msghdr_nf_tables[NFT_MSG_GETSETELEM, nft_set_elem_list_policy]], f flags[send_flags])

These definitions will call sendmsg with parameters of specific types: a netfilter fd, a message body respecting the correct message type and nft policy, and finally an array of generic flags. While these syscalls do work for simple messages, the following one was built to generate batch messages, basically made up of multiple submessages:

sendmsg$NFT_BATCH(fd sock_nl_netfilter, msg ptr[in, msghdr_netlink[nft_batch_msg]], f flags[send_flags])

The important part is that the second parameter msg is an input pointer to a netlink message of the nft_batch_msg type:

nft_batch_msg {
    begin   nft_nlmsghdr[NFNL_MSG_BATCH_BEGIN]
    msgs    nft_batch_message
    end nft_nlmsghdr[NFNL_MSG_BATCH_END]
} [packed]

nft_batch_message [
    NFT_MSG_NEWTABLE    netlink_msg_netfilter_t[NFNL_SUBSYS_NFTABLES, NFT_MSG_NEWTABLE, nft_table_policy]
    NFT_MSG_DELTABLE    netlink_msg_netfilter_t[NFNL_SUBSYS_NFTABLES, NFT_MSG_DELTABLE, nft_table_policy]
    NFT_MSG_NEWCHAIN    netlink_msg_netfilter_t[NFNL_SUBSYS_NFTABLES, NFT_MSG_NEWCHAIN, nft_chain_policy]
    NFT_MSG_DELCHAIN
...
] [varlen]

Basically we’re sending a batch of messages randomly chosen among all message types (variable-length array), in this case only create, delete and update operations are selected.

Improving the current nftables descriptions

Now that we’ve got accustomed to the syntax, we can start looking for possible points of improvement. First of all, we see that the netlink message headers include the netfilter family to which the commands will be applied, such as IPv4, IPv6, ARP and so on. Since it’s only set to 0, or NFPROTO_UNSPEC, we could extend it to include other families:

families = NFPROTO_IPV4, NFPROTO_IPV6, NFPROTO_ARP, NFPROTO_NETDEV, NFPROTO_BRIDGE, NFPROTO_UNSPEC, NFPROTO_INET

nfgenmsg_nft {
    nfgen_family    int8[families]
    version     const[NFNETLINK_V0, int8]
    res_id      const[NFNL_SUBSYS_NFTABLES, int16be]
} [align[4]]

Many small adjustments later, I started looking into the newer flags that may have been recently added. Since syzlang enums keep the same names as the ones in the Linux source, we can just search for them there and see if anything changed. I found that several enums were missing some flags, such as nft_chain_flags that lacked NFT_CHAIN_BINDING, used for binding jump rules to another chain.

While checking for expression flags, I noticed how the whole nft_inner and nft_last exprs were missing in the nft_expr_policy array. In fact, nft_inner was added 2 years ago to support tunneling-encapsulated protocols like GRE and VxLAN. In practice, we have an expression capable of encapsulating other nft_payload and nft_meta expressions, meaning that it should perform the same sanity checks as the common expr registration routine, found inside nf_tables_expr_parse. After defining their policies, I just added them to the list, here is the inner policy for reference:

nft_inner_flags = NFT_INNER_HDRSIZE, NFT_INNER_LL, NFT_INNER_NH, NFT_INNER_TH

nft_inner_policy [
    NFTA_INNER_NUM      nlnetw[NFTA_INNER_NUM, int32be[0]]
    NFTA_INNER_FLAGS    nlnetw[NFTA_INNER_FLAGS, flags[nft_inner_flags, int32be]]
    NFTA_INNER_HDRSIZE  nlnetw[NFTA_INNER_HDRSIZE, int32be[0:64]]
    NFTA_INNER_TYPE     nlnetw[NFTA_INNER_TYPE, int32be[0:256]]
    NFTA_INNER_EXPR     nlnest[NFTA_INNER_EXPR, nft_expr_policy_inner]
] [varlen]

I also added x_tables support via the nft_match and nft_target policies, which required knowing all of the x_tables matches and targets. After countless copy & paste operations I managed to group all their names and rev numbers, which ranged from 0 to 3. As for the names, they can be extracted from their xt_*.c files, uppercase means targets, lowercase matches.

You can explore the complete patch in the commit that got merged into syzkaller.

Running syzkaller

Syzkaller is almost ready to start fuzzing our systems, it only requires a config file to keep track of our custom paths and preferences. Below are the most important bits of the config template:

workdir: syzkaller’s output folder,
kernel_obj: path to the kernel build directory,
image: Linux rootfs used by QEMU,
vm.kernel: path of the bzImage.

Additional options:

enable_syscalls: array of syscalls to fuzz: nftables requires the socket$nl_netfilter and sendmsg$NFT_BATCH syscalls,
cover_filters: only collect coverage from specific functions or files, expressed via regex,
suppressions: exclude certain uninteresting crash reports.

An example config file might look something like this:

{
        "target": "linux/amd64",
        "http": "127.0.0.1:56741",
        "workdir": "~/linuxk/syzkaller-data/workdir",
        "kernel_obj": "~/linuxk/linux-6.10.11",
        "image": "~linuxk/syzkaller-data/bookworm.img",
        "sshkey": "~/linuxk/syzkaller-data/bookworm.id_rsa",
        "syzkaller": "~/linuxk/syzkaller",
        "procs": 4,
        "type": "qemu",
        "reproduce": false,
        "vm": {
                "count": 12,
                "kernel": "~/linuxk/linux-6.10.11/arch/x86/boot/bzImage",
                "cmdline": "net.ifnames=0 nokaslr kasan_multi_shot",
                "cpu": 2,
                "mem": 2048
        },
        "enable_syscalls": [
                "socket$nl_netfilter",
                "sendmsg$NFT_BATCH"
        ]
        "cover_filter": {
        "files": [
                "^net/netfilter/nf_.*$",
                "^net/netfilter/nft_.*$",
                "^net/netfilter/core.c$"
                "^net/ipv4/ip_.*$",
                "^net/ipv4/tcp.*$",
                "^net/ipv4/udp.c$"
                "^net/core/dev.c$"
        ]
    }
    "suppressions": [
        "slab-use-after-free in __mutex_lock.cons",
        "no output from test machine",
        "executor NUM failed NUM times"
        "nft_payload_inner_init+0x214",
        "nft_inner_init+0x4bb",
        "null-ptr-deref in range"
    ] 
}

Finally, the fuzzing process can be initiated with the following command:

./bin/syz-manager -config config.cfg

Improving rule coverage

By looking at the upstream coverages from syzbot, I noticed that some rules were hardly being evaluated while fuzzing, therefore I decided to start looking for ways to improve their coverage.

I then brought my research to the MOCA24 conference, unfortunately the talk is in Italian, so I’ve added the transcript down below, even though the slides are in English and should be quite verbose.

Slides download
<span data-mce-type=bookmark style="display: inline-block; width: 0px; overflow: hidden; line-height: 0;" class=mce_SELRES_start></span>

Talk summary

Syzkaller is one of the best tools available for kernel fuzzing. It’s a coverage-guided fuzzer, meaning it optimizes code coverage by generating and testing a wide range of syscalls in an intelligent way. Unlike other tools like Trinity, which operate randomly, Syzkaller is highly scalable and focuses on maximizing the number of code paths it explores within the kernel.

Syzkaller Components

Syzkaller consists of several key components:

Manager: Oversees other components, written in C, and interacts with VMs via RPC and SSH.
Fuzzer: Runs inside a virtual machine (VM) and generates programs based on system calls.
Executor: Executes the generated programs within the VMs.
Prog2c: Translates the grammar of generated programs into C code.
Syzcover: Extracts coverage data to analyze fuzzing results.

Networking stack fuzzing with syzkaller

Fuzzing the network stack with syzkaller involves generating packets and injecting them into the network stack of the operating system. Virtual devices such as TUN or TAP are used to simulate network traffic. One of the main challenges in analyzing nftables is that it primarily operates at Layer 3 (IP), whereas TAP devices operate at Layer 2 (Ethernet). This introduces additional complexity when configuring and fuzzing firewall rules.

The dataflow of networking fuzzing

During the process of fuzzing the network stack, syzkaller follows a precise dataflow to inject network packets into the kernel’s stack.

Packet generation: Syzkaller creates network packets (TCP, UDP, or other protocols) that operate at various layers of the stack, such as Layer 2 (Ethernet) or Layer 3 (IP).
Injection into the network stack: The generated packets are injected into the kernel’s networking stack through virtual devices like TUN or TAP. These devices simulate real network traffic, allowing syzkaller to send packets through the stack as if they were coming from an external device.
Processing and Routing: Once the packet enters the network stack, it is routed through various kernel layers, such as the local network or forwarded to another host, depending on the system’s routing configuration. If a packet is processed internally, it might be sent back to user space through the TUN interface.
Mutation and Coverage: Throughout this process, syzkaller collects coverage data from the kernel, analyzing which parts of the code were executed. The packets are then mutated to explore more parts of the kernel, maximizing the coverage.

This strategy simulates real-world scenarios where network packets are generated from external sources, testing the entire path a packet follows within the kernel.

Challenges in covering Nftables

Nftables allows the creation of rules to manage traffic going in and out of Linux devices. However, their programmability and configuration can be rather complex, especially when testing firewall rules and expressions. For instance, setting up tables, chains, and rules with the correct parameters requires a carefully configured system.

The proposed solution: two fuzzing strategies

To address these challenges, I proposed a solution that involves two main fuzzing strategies:

Option 1: Sending static UDP packets in the output chain

This first strategy is a more straightforward and relatively simple approach. Here’s how it works:

Generate static UDP packets: A static UDP packet is created, typically with a specific payload.
Send packets via local interface: The packet is sent through a local interface, such as 127.0.0.1 or a broader network like 0.0.0.0.
Trigger Netfilter rules: Since the packet passes through the local firewall, it is processed by the rules defined in nftables, activating specific chains and rules for output (NF_INET_LOCAL_OUT). Syzkaller then collects coverage of the kernel portions handling the packet.

This method is limited because the packet’s content is static, reducing the randomness necessary to fully test more complex nft expressions. Each time the packet is sent, the content remains the same, meaning that more intricate rules, such as those parsing payloads or higher-level packet attributes, may not be thoroughly tested.

In summary, this option is useful for testing simple rules but does not fully exploit the fuzzing potential for more complex firewall rules.

Option 2: Using syz_emit_ethernet and the pseudo-syscall syz_batch_emit

The second option is a more advanced and flexible approach, using packet injection at the Ethernet layer and a custom pseudo-syscall to automatically set up nftables structures.

The pseudo-syscall syz_batch_emit

To further reduce setup complexity and improve fuzzing efficiency, I developed a custom pseudo-syscall called syz_batch_emit. This pseudo-syscall is designed to automate the creation of the necessary nftables structures (tables, chains, rules) that would otherwise need to be manually configured.

The injection process goes as follows:

Automatic creation of nftables structures: When syzkaller invokes syz_batch_emit, the pseudo-syscall automatically creates the required tables, chains, and rules for the firewall to work. This bypasses the need for manual setup, allowing the fuzzer to focus solely on the rule expressions.
Batch process: syz_batch_emit sends a batch of configuration commands to nftables, setting up a complete framework for fuzzing in a single operation, instead of requiring multiple individual commands.
Fuzzing expressions: Once the nftables framework is in place, syzkaller uses syz_emit_ethernet to inject mutated packets through these rules and test the complex expressions that nftables evaluates, such as parsing the payload and determining whether to accept, drop, or route the packet.

Although not recommended by the syzkaller maintainers, writing a new pseudo-syscall which depends on another pseudo-syscall significantly reduces the complexity of fuzzing more intricate nftables rules by automating the setup process and enabling syzkaller to generate highly variable and sophisticated packets.

Results and vulnerabilities found

Thanks to this approach, it was possible to improve the coverage of evaluation functions within nftables. This allowed discovering various security issues, including a null pointer dereference followed by a use-after-free like this one, which have already been patched in recent Linux versions. Here you can find the writeup.

The syzkaller grammar used to not include the meta and inner expression policies.
By implementing them I quickly found several NPD bugs which result in UAF-reads of size 4. Linux kernel devs later confirmed to me that this bug class has been extremely common in nftables over the years.
Only payload and meta exprs can be embedded inside inner exprs, when nft_expr_inner_parse initializes them, the inner_ops functions become the inner function ops:

int nft_expr_inner_parse(const struct nft_ctx *ctx, const struct nlattr *nla,
                         struct nft_expr_info *info) {
    struct nlattr *tb[NFTA_EXPR_MAX + 1];
...
    const struct nft_expr_type *type;
...
    type = __nft_expr_type_get(ctx->family, tb[NFTA_EXPR_NAME]);
    info->ops = type->inner_ops;

The respective meta and payload inner init functions, nft_meta_inner_init and nft_payload_inner_init do not perform sanity checks on netlink attributes, resulting in NPD when fetching be32 values from them:

static int nft_meta_inner_init(const struct nft_ctx *ctx,
       const struct nft_expr *expr,
       const struct nlattr * const tb[])
{
    struct nft_meta *priv = nft_expr_priv(expr);
    unsigned int len;
// tb[NFTA_META_KEY] == NULL
    priv->key = ntohl(nla_get_be32(tb[NFTA_META_KEY]));

Not providing any of the following attributes leads to the crash:

meta: NFTA_META_KEY, NFTA_META_DREG, NFTA_PAYLOAD_BASE
payload: NFTA_PAYLOAD_OFFSET, NFTA_PAYLOAD_LEN, NFTA_PAYLOAD_DREG

Furthermore, while the kernel jumps to make_task_dead after the protection fault, mutexes including nft_net->commit_mutex are held locked by the freed task via the previous nf_tables_valid_genid() run.

When later accessing the nftables subsystem (e.g. via nft read ruleset), mutex_lock() calls mutex_optimistic_spin() to spin for the lock acquisition, and then mutex_can_spin_on_owner() -> owner_on_cpu() to access the owner->on_cpu field and ensure that the owner is actually running, which finally triggers a UAF since the owner task struct has been freed.

File: /kernel/locking/mutex.c

static inline int mutex_can_spin_on_owner(struct mutex *lock)
{
    struct task_struct *owner;
    int retval = 1;

    // Access the owner task
    owner = __mutex_owner(lock);
    if (owner)
        // Dereference the on_cpu field
        retval = owner_on_cpu(owner);
    return retval;
}

The generated crash log clearly shows the root cause of the issue, notice how the kernel hasn’t crashed but only the nft subsystem has been stalled:

general protection fault, probably for non-canonical address 0xdffffc0000000000: 0000 [#1] PREEMPT SMP KASAN NOPTI
KASAN: null-ptr-deref in range [0x0000000000000000-0x0000000000000007]
CPU: 1 PID: 93980 Comm: syz-executor.3 Not tainted 6.8.9 #3
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/01/2014
RIP: 0010:nla_get_be32 include/net/netlink.h:1689 [inline]
RIP: 0010:nft_meta_inner_init+0x58/0x1a0 net/netfilter/nft_meta.c:842
Call Trace:
<TASK>
nft_inner_init+0x4b6/0x5f0 net/netfilter/nft_inner.c:343
nf_tables_newexpr net/netfilter/nf_tables_api.c:3265 [inline]
nf_tables_newrule+0xe2e/0x26b0 net/netfilter/nf_tables_api.c:4078
nfnetlink_rcv_batch+0x1385/0x1c30 net/netfilter/nfnetlink.c:519
nfnetlink_rcv_skb_batch net/netfilter/nfnetlink.c:639 [inline]
nfnetlink_rcv+0x3b7/0x420 net/netfilter/nfnetlink.c:657
netlink_unicast_kernel net/netlink/af_netlink.c:1341 [inline]
netlink_unicast+0x6e8/0x920 net/netlink/af_netlink.c:1367
netlink_sendmsg+0x887/0xd60 net/netlink/af_netlink.c:1908
sock_sendmsg_nosec net/socket.c:730 [inline]
__sock_sendmsg net/socket.c:745 [inline]
____sys_sendmsg+0xa0a/0xbd0 net/socket.c:2584
___sys_sendmsg+0x11d/0x1c0 net/socket.c:2638
__sys_sendmsg+0xfe/0x1d0 net/socket.c:2667
do_syscall_x64 arch/x86/entry/common.c:52 [inline]
do_syscall_64+0xab/0x1b0 arch/x86/entry/common.c:83
entry_SYSCALL_64_after_hwframe+0x79/0x81
RIP: 0033:0x7fccd20a41ed
</TASK>
Modules linked in:
---[ end trace 0000000000000000 ]---
RIP: 0010:nla_get_be32 include/net/netlink.h:1689 [inline]
RIP: 0010:nft_meta_inner_init+0x58/0x1a0 net/netfilter/nft_meta.c:842
==================================================================
BUG: KASAN: slab-use-after-free in owner_on_cpu include/linux/sched.h:2165 [inline]
BUG: KASAN: slab-use-after-free in mutex_can_spin_on_owner kernel/locking/mutex.c:409 [inline]
BUG: KASAN: slab-use-after-free in mutex_optimistic_spin kernel/locking/mutex.c:452 [inline]
BUG: KASAN: slab-use-after-free in __mutex_lock_common kernel/locking/mutex.c:612 [inline]
BUG: KASAN: slab-use-after-free in __mutex_lock.constprop.0+0x10ff/0x1130 kernel/locking/mutex.c:752
Read of size 4 at addr ffff88800d0c1774 by task syz-executor.3/93979

CPU: 1 PID: 93979 Comm: syz-executor.3 Tainted: G D 6.8.9 #3
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/01/2014
Call Trace:
<TASK>
owner_on_cpu include/linux/sched.h:2165 [inline]
mutex_can_spin_on_owner kernel/locking/mutex.c:409 [inline]
mutex_optimistic_spin kernel/locking/mutex.c:452 [inline]
__mutex_lock_common kernel/locking/mutex.c:612 [inline]
__mutex_lock.constprop.0+0x10ff/0x1130 kernel/locking/mutex.c:752
mutex_lock+0xd1/0xe0 kernel/locking/mutex.c:286
nft_rcv_nl_event+0x1ba/0x640 net/netfilter/nf_tables_api.c:11470
notifier_call_chain+0x101/0x2e0 kernel/notifier.c:93
blocking_notifier_call_chain kernel/notifier.c:388 [inline]
blocking_notifier_call_chain+0x65/0x90 kernel/notifier.c:376
netlink_release+0x130d/0x1610 net/netlink/af_netlink.c:795
__sock_release+0xb0/0x270 net/socket.c:659
sock_close+0x19/0x30 net/socket.c:1421
__fput+0x262/0xb70 fs/file_table.c:376
__fput_sync+0x45/0x50 fs/file_table.c:461
__do_sys_close fs/open.c:1554 [inline]
__se_sys_close fs/open.c:1539 [inline]
__x64_sys_close+0x8b/0x120 fs/open.c:1539
do_syscall_x64 arch/x86/entry/common.c:52 [inline]
do_syscall_64+0xab/0x1b0 arch/x86/entry/common.c:83
entry_SYSCALL_64_after_hwframe+0x79/0x81
RIP: 0033:0x7fccd20a0d0b
</TASK>

Allocated by task 93979 on cpu 1 at 439.759428s:
kernel_clone+0xe4/0x900 kernel/fork.c:2903
__do_sys_clone3+0x1d7/0x250 kernel/fork.c:3204
do_syscall_x64 arch/x86/entry/common.c:52 [inline]
do_syscall_64+0xab/0x1b0 arch/x86/entry/common.c:83
entry_SYSCALL_64_after_hwframe+0x79/0x81

Freed by task 88 on cpu 1 at 439.851215s:
slab_free_hook mm/slub.c:2121 [inline]
slab_free mm/slub.c:4299 [inline]
kmem_cache_free+0x97/0x220 mm/slub.c:4363
put_task_struct include/linux/sched/task.h:138 [inline]
delayed_put_task_struct+0x183/0x1d0 kernel/exit.c:229
rcu_do_batch kernel/rcu/tree.c:2190 [inline]
rcu_core+0x5ee/0x19d0 kernel/rcu/tree.c:2465
__do_softirq+0x187/0x575 kernel/softirq.c:553

The buggy address belongs to the object at ffff88800d0c1740
which belongs to the cache task_struct of size 5640
The buggy address is located 52 bytes inside of
freed 5640-byte region [ffff88800d0c1740, ffff88800d0c2d48)

page dumped because: kasan: bad access detected

Memory state around the buggy address:
ffff88800d0c1600: fb fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
ffff88800d0c1680: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
>ffff88800d0c1700: fc fc fc fc fc fc fc fc fa fb fb fb fb fb fb fb
^
ffff88800d0c1780: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff88800d0c1800: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
==================================================================

Here are some ideas to exploit similar bugs:

Race with exit() to double free the task (must bypass PF_EXITING checks in make_task_dead).
Find a way to hold the mutex and hopefully run into other more interesting UAFs.
Exploit incorrect reference counting from missing puts on, for instance, file descriptors (fds are actually not vulnerable (only atomic_t, int and similar are unsafe)), though it may be too long until they wrap to 0.

Conclusion

My proposed solution, with these two options, allows for a deeper exploration of nft fuzzing. The UDP output strategy can be useful as a first step for simple rule testing, while the combination of syz_emit_ethernet with syz_batch_emit allows for scaling and automating the fuzzing of complex rules, maximizing coverage and discovering deeper vulnerabilities in the Linux kernel.

These changes haven’t been merged yet as they imply drastically changing the network fuzzing interface, by picking the appropriate TAP device for networking and TUN for nftables.

Leggi gli ultimi articoli

Scopri come possiamo aiutarti

Troviamo insieme le soluzioni più adatte per affrontare le sfide che ogni giorno la tua impresa è chiamata ad affrontare.

Contattaci