Integrating Nftables rules into Syzkaller

In recent years, more and more Linux-based operating systems have migrated their firewalls from the iptables codebase to that of nftables, a programmable and more performant alternative, which, however, boasts of great complexity being an actual virtual machine. Today, most systems, Debian included, implement user-space tools to convert iptables rules into their respective nftables instructions and dynamically manage tables, chains and rules.

Because of the customizable and relatively new nature of the system, nftables is frequently targeted by attackers looking for new 0-days to gain root privileges on the machine (LPE). For a better understanding of nftables internals, I recommend reading the first part of my previous blog post where I explain how to talk to and exploit nftables.

One of the techniques employed by vulnerability researchers is coverage-driven fuzzing based on syscalls, via the opensource platform named syzkaller. In order to also cover networking subsystems that cannot be reached through classical syscalls, syzkaller was updated to inject frames through a virtual network device. However, looking at the syzkaller code coverage reports, it is apparent that some particular components of nftables are hardly ever considered and are therefore excluded from the tests.

This research thus highlights the limitations of syzkaller in analyzing certain functions, and demonstrates a new technique that optimizes the search for vulnerabilities in nftables, with the goal of better managing the attack surface within the firewall. I will often reference this awesome article, Looking for Remote Code Execution bugs in the Linux kernel to describe the interactions between syzkaller and the Linux network stack. Syzkaller employs grammars, defined in a proprietary language, to describe a subsystem and generate well-formed input messages (and edge cases). Since nftables already has its own grammar, this research takes it as a starting point and addresses the optimization process to include the full spectrum of firewall rules in the coverage.

Extending the nftables grammar

Configuring Syzkaller for development

While I won’t be going through the Syzlang language since it’s already been covered several times (e.g. Syscall description syntax), I will show how to take an already existing grammar and update it to support the most recent features, which should also help better understanding syzlang on a practical level.

We can go ahead and clone the syzkaller repo and follow the instructions in the docs to compile it, I suggest installing the latest Go version to prevent some annoying issues with environment variables later on. Remember to include the GOROOT and PATH variables in your favorite .rc file and source it for future use, in this case I chose the /usr/local/go installation path for simplicity.

echo 'export GOROOT=/usr/local/go
export GOPATH=$HOME/go
export PATH=$GOPATH/bin:$GOROOT/bin:$PATH' >> ~/.bashrc

source ~/.bashrc

While we’re at it, we can install some development prerequisites by running the make install_prerequisites build command, this will take care of all the cross-platform building tools.

Once syzkaller is ready, we need a pair of mainline Linux kernel source trees, one to generate the *.const files containing the definitions of constant data, which isn’t meant to be compiled, the other one to provide the base image for our fuzzing instances, to be compiled with, at least, the following kernel configs:

CONFIG_KCOV=y
CONFIG_KCOV_INSTRUMENT_ALL=y
CONFIG_KCOV_ENABLE_COMPARISONS=y
CONFIG_DEBUG_FS=y
CONFIG_DEBUG_INFO=y
CONFIG_KALLSYMS=y
CONFIG_KALLSYMS_ALL=y
CONFIG_USER_NS=y
CONFIG_CONFIGFS_FS=y
CONFIG_SECURITYFS=y
CONFIG_CMDLINE_BOOL=y
CONFIG_CMDLINE="net.ifnames=0"
# CONFIG_RANDOMIZE_BASE is not set
CONFIG_KASAN=y
CONFIG_KASAN_INLINE=y

While it may look trivial, we should also make sure to include the subsystems we are going to fuzz, in this case nftables:

CONFIG_NF_TABLES=y
CONFIG_NF_TABLES_INET=y
CONFIG_NF_TABLES_NETDEV=y
CONFIG_NF_TABLES_IPV4=y
CONFIG_NF_TABLES_ARP=y
CONFIG_NF_TABLES_IPV6=y
CONFIG_NF_TABLES_BRIDGE=y
CONFIG_NFT_NUMGEN=y
CONFIG_NFT_CT=y
CONFIG_NFT_FLOW_OFFLOAD=y
CONFIG_NFT_CONNLIMIT=y
CONFIG_NFT_LOG=y
CONFIG_NFT_LIMIT=y

… and all the other expressions.

Now let’s make sure that the build process is working before making any changes, we will be first generating the const files for each subsystem, then update the generated code and finally build the syzkaller binaries like so:

make extract TARGETOS=linux SOURCEDIR=$HOME/linuxk/linux-6.9.7-src
make generate
make

Notice how we target the source-only path in the extraction step. Since we’re trying to extract constant data from all subsystems and archs, if the kernel source isn’t aligned with the description files (some consts have been removed), we might have to do some manual work and delete the offending consts from the descriptions.

Fixing the flatc error during make generate

I found myself dealing with this error on Ubuntu where the installed flatbuffers version seemed to be incompatible with syzkaller.

flatc -o pkg/flatrpc --warnings-as-errors --gen-object-api --filename-suffix "" --go --gen-onefile --go-namespace flatrpc pkg/flatrpc/flatrpc.fbs
flatc: error: unknown commandline argument: --warnings-as-errors

By trial and error, I determined that the most recent working version is v2.0.8, you can grab it form the official flatbuffers project.

sudo apt remove flatbuffers-compiler-dev flatbuffers-compiler libflatbuffers-dev  libflatbuffers1
wget https://github.com/google/flatbuffers/archive/refs/tags/v2.0.8.tar.gz
tar -xvzf v2.0.8.tar.gz 
cd flatbuffers-2.0.8/
cmake -G "Unix Makefiles"
make
sudo make install

If no further errors appear, we should be able to find our binaries in the syzkaller/bin/ and syzkaller/bin/linux_amd64/ directories.

Practically understanding syzlang

With a working development environment I started looking at the current grammar for the nftables subsystem, which is located in /sys/linux/socket_netlink_netfilter_nftables.txt, although it’s a txt file, it actually contains code in the syzlang language, which can be pleasantly formatted thanks to the syz-lang extension for VSCode. After the includes, we can immediately spot two type definitions:

File: /sys/linux/socket_netlink_netfilter_nftables.txt

type msghdr_nf_tables[CMD, POLICY] msghdr_netlink[netlink_msg_netfilter_t[NFNL_SUBSYS_NFTABLES, CMD, POLICY]]
# TODO: we should obtain them from somewhere, probably from other netlink messages,
# but we can't extract output netlink attributes.
type nft_chain_id int32be

The fist one is used in simple netlink messages and depends on two parameters: the message type and the corresponding object policy. msghdr_nf_tables is then defined as a subtype of msghdr_netlink with a netlink_msg_netfilter_t type as payload, which is defined under sys/linux/socket_netlink_netfilter.txt and contains the actual netlink message structure, with message len, type, subsystem, flags and attributes:

File: sys/linux/socket_netlink_netfilter.txt

type netlink_msg_netfilter_tt[SUBSYS, CMD, POLICY] {
    len len[parent, int32]
    type    const[CMD, int8]
    subsys  const[SUBSYS, int8]
    flags   flags[netlink_netfilter_msg_flags, int16]
    seq const[0, int32]
    pid const[0, int32]
    hdr nfgenmsg
    attrs   POLICY
} [align[4]]

The nft_chain_id type represents chain ids that can be provided to nftables instead of referring to chains via the usual names. From the chain policy in nf_tables_api.c, we can see its type, uint32_t, hence int32be in syzlang. The comment highlights how, since ids are arbitrarily specified by the user, it should be impossible to keep track of which ones have been used in the same tables. Unfortunately, syzlang doesn’t offer a way to store input data from previous programs, so we will most likely miss them.

Next off we can see some read-only nftables syscalls to retrieve information from the existing structures (tables, chains and rules):

sendmsg$NFT_MSG_GETTABLE(fd sock_nl_netfilter, msg ptr[in, msghdr_nf_tables[NFT_MSG_GETTABLE, nft_table_policy]], f flags[send_flags])
sendmsg$NFT_MSG_GETCHAIN(fd sock_nl_netfilter, msg ptr[in, msghdr_nf_tables[NFT_MSG_GETCHAIN, nft_chain_policy]], f flags[send_flags])
sendmsg$NFT_MSG_GETRULE(fd sock_nl_netfilter, msg ptr[in, msghdr_nf_tables[NFT_MSG_GETRULE, nft_rule_policy]], f flags[send_flags])
sendmsg$NFT_MSG_GETSET(fd sock_nl_netfilter, msg ptr[in, msghdr_nf_tables[NFT_MSG_GETSET, nft_set_policy]], f flags[send_flags])
sendmsg$NFT_MSG_GETSETELEM(fd sock_nl_netfilter, msg ptr[in, msghdr_nf_tables[NFT_MSG_GETSETELEM, nft_set_elem_list_policy]], f flags[send_flags])

These definitions will call sendmsg with parameters of specific types: a netfilter fd, a message body respecting the correct message type and nft policy, and finally an array of generic flags. While these syscalls do work for simple messages, the following one was built to generate batch messages, basically made up of multiple submessages:

sendmsg$NFT_BATCH(fd sock_nl_netfilter, msg ptr[in, msghdr_netlink[nft_batch_msg]], f flags[send_flags])

The important part is that the second parameters msg is an input pointer to a netlink message of the nft_batch_msg type:

nft_batch_msg {
    begin   nft_nlmsghdr[NFNL_MSG_BATCH_BEGIN]
    msgs    nft_batch_message
    end nft_nlmsghdr[NFNL_MSG_BATCH_END]
} [packed]

nft_batch_message [
    NFT_MSG_NEWTABLE    netlink_msg_netfilter_t[NFNL_SUBSYS_NFTABLES, NFT_MSG_NEWTABLE, nft_table_policy]
    NFT_MSG_DELTABLE    netlink_msg_netfilter_t[NFNL_SUBSYS_NFTABLES, NFT_MSG_DELTABLE, nft_table_policy]
    NFT_MSG_NEWCHAIN    netlink_msg_netfilter_t[NFNL_SUBSYS_NFTABLES, NFT_MSG_NEWCHAIN, nft_chain_policy]
    NFT_MSG_DELCHAIN
...
] [varlen]

Basically we’re sending a batch of messages randomly chosen among all message types (variable-length array), in this case only create, delete and update operations are selected.

Improving the current nftables descriptions

Now that we’ve got accustomed to the syntax, we can start looking for possible points of improvement. First of all, we see that the netlink message headers include the netfilter family to which the commands will be applied, such as IPv4, IPv6, ARP and so on. Since it’s only set to 0, or NFPROTO_UNSPEC, we could extend it to include other families:

families = NFPROTO_IPV4, NFPROTO_IPV6, NFPROTO_ARP, NFPROTO_NETDEV, NFPROTO_BRIDGE, NFPROTO_UNSPEC, NFPROTO_INET
nfgenmsg_nft {
    nfgen_family    int8[families]
    version     const[NFNETLINK_V0, int8]
    res_id      const[NFNL_SUBSYS_NFTABLES, int16be]
} [align[4]]

Many small adjustments later, I started looking into the newer flags that may have been recently added. Since syzlang enums keep the same names as the ones in the Linux source, we can just search for them there and see if anything changed. I found that several enums were missing some flags, such as nft_chain_flags that lacked NFT_CHAIN_BINDING, used for binding jump rules to another chain.

While checking for expression flags, I noticed how the whole nft_inner and nft_last exprs were missing in the nft_expr_policy array. In fact, nft_inner was added 2 years ago to support tunneling-encapsulated protocols like GRE and VxLAN. In practice, we have an expression capable of encapsulating other nft_payload and nft_meta expressions, meaning that it should perform the same sanity checks as the common expr registration routine, found inside nf_tables_expr_parse. After defining their policies, I just added them to the list, here is the inner policy for reference:

nft_inner_flags = NFT_INNER_HDRSIZE, NFT_INNER_LL, NFT_INNER_NH, NFT_INNER_TH

nft_inner_policy [
    NFTA_INNER_NUM      nlnetw[NFTA_INNER_NUM, int32be[0]]
    NFTA_INNER_FLAGS    nlnetw[NFTA_INNER_FLAGS, flags[nft_inner_flags, int32be]]
    NFTA_INNER_HDRSIZE  nlnetw[NFTA_INNER_HDRSIZE, int32be[0:64]]
    NFTA_INNER_TYPE     nlnetw[NFTA_INNER_TYPE, int32be[0:256]]
    NFTA_INNER_EXPR     nlnest[NFTA_INNER_EXPR, nft_expr_policy_inner]
] [varlen]

I also added x_tables support via the nft_match and nft_target policies, which required knowing all of the x_tables matches and targets. After countless copy & paste operations I managed to group all their names and rev numbers, which ranged from 0 to 3. As for the names, they can be extracted from their xt_*.c files, uppercase means targets, lowercase matches.

You can explore the complete patch in the commit that got merged into syzkaller.

Running syzkaller

Syzkaller is almost ready to start fuzzing our systems, it only requires a config file to keep track of our custom paths and preferences. Below are the most important bits of the config template:

  • workdir: syzkaller’s output folder,
  • kernel_obj: path to the kernel build directory,
  • image: Linux rootfs used by QEMU,
  • vm.kernel: path the the bzImage.

Additional options:

  • enable_syscalls: array of syscalls to fuzz: nftables requires the socket$nl_netfilter and sendmsg$NFT_BATCH syscalls,
  • cover_filters: only collect coverage from specific functions or files, expressed via regex,
  • suppressions: exclude certain uninteresting crash reports.

An example config file might look something like this:

{
        "target": "linux/amd64",
        "http": "127.0.0.1:56741",
        "workdir": "~/linuxk/syzkaller-data/workdir",
        "kernel_obj": "~/linuxk/linux-6.10.11",
        "image": "~linuxk/syzkaller-data/bookworm.img",
        "sshkey": "~/linuxk/syzkaller-data/bookworm.id_rsa",
        "syzkaller": "~/linuxk/syzkaller",
        "procs": 4,
        "type": "qemu",
        "reproduce": false,
        "vm": {
                "count": 12,
                "kernel": "~/linuxk/linux-6.10.11/arch/x86/boot/bzImage",
                "cmdline": "net.ifnames=0 nokaslr kasan_multi_shot",
        "cpu": 2,
                "mem": 2048
        },
        "enable_syscalls": [
                "socket$nl_netfilter",
                "sendmsg$NFT_BATCH"
        ]
        "cover_filter": {
        "files": [
                "^net/netfilter/nf_.*$",
                "^net/netfilter/nft_.*$",
                "^net/netfilter/core.c$"
                "^net/ipv4/ip_.*$",
                "^net/ipv4/tcp.*$",
                "^net/ipv4/udp.c$"
                "^net/core/dev.c$"
        ]
    }
    "suppressions": [
        "slab-use-after-free in __mutex_lock.cons",
        "no output from test machine",
        "executor NUM failed NUM times"
        "nft_payload_inner_init+0x214",
        "nft_inner_init+0x4bb",
        "null-ptr-deref in range"
    ] 
}

Finally, the fuzzing process can be initiated with the following command:

./bin/syz-manager -config config.cfg

Improving rules coverage

By looking at the upstream coverages from syzbot, I noticed that some rules were hardly being evaluated while fuzzing, therefore I decided to start looking for ways to improve its coverage.

I then brought my research to the MOCA24 conference, unfortunately the talk is in Italian, so I’ve written the transcript down below, even though the slides are in English and should be quite verbose.

Slides download | MOCA24 presentation (ITA)

Talk summary

Syzkaller is one of the best tools available for kernel fuzzing. It’s a coverage-guided fuzzer, meaning it optimizes code coverage by generating and testing a wide range of syscalls in an intelligent way. Unlike other tools like Trinity, which operate randomly, Syzkaller is highly scalable and focuses on maximizing the number of code paths it explores within the kernel.

Syzkaller Components

Syzkaller consists of several key components:

  • Manager: Oversees other components, written in C, and interacts with VMs via RPC and SSH.
  • Fuzzer: Runs inside a virtual machine (VM) and generates programs based on system calls.
  • Executor: Executes the generated programs within the VMs.
  • Prog2c: Translates the grammar of generated programs into C code.
  • Syzcover: Extracts coverage data to analyze fuzzing results.

Networking stack fuzzing with syzkaller

Fuzzing the network stack with syzkaller involves generating packets and injecting them into the network stack of the operating system. Virtual devices such as TUN or TAP are used to simulate network traffic. One of the main challenges in analyzing nftables is that it primarily operates at Layer 3 (IP), whereas TAP devices operate at Layer 2 (Ethernet). This introduces additional complexity when configuring and fuzzing firewall rules.

The dataflow of networking fuzzing

During the process of fuzzing the network stack, syzkaller follows a precise dataflow to inject network packets into the kernel’s stack.

  1. Packet generation: Syzkaller creates network packets (TCP, UDP, or other protocols) that operate at various layers of the stack, such as Layer 2 (Ethernet) or Layer 3 (IP).
  2. Injection into the network stack: The generated packets are injected into the kernel’s networking stack through virtual devices like TUN or TAP. These devices simulate real network traffic, allowing syzkaller to send packets through the stack as if they were coming from an external device.
  3. Processing and Routing: Once the packet enters the network stack, it is routed through various kernel layers, such as the local network or forwarded to another host, depending on the system’s routing configuration. If a packet is processed internally, it might be sent back to user space through the TUN interface.
  4. Mutation and Coverage: Throughout this process, syzkaller collects coverage data from the kernel, analyzing which parts of the code were executed. The packets are then mutated to explore more parts of the kernel, maximizing the coverage.

This strategy simulates real-world scenarios where network packets are generated from external sources, testing the entire path a packet follows within the kernel.

Challenges in covering Nftables

Nftables allows the creation of rules to manage traffic going in and out of Linux devices. However, their programmability and configuration can be complex, especially when testing firewall rules and expressions. For instance, setting up tables, chains, and rules with the correct parameters requires a carefully configured system, adding to the complexity.

The proposed solution: two fuzzing strategies

To address these challenges, I proposed a solution that involves two main fuzzing strategies:

Option 1: Sending static UDP packets in the output chain

This first strategy is a more straightforward and relatively simple approach. Here’s how it works:

  1. Generate static UDP packets: A static UDP packet is created, typically with a specific payload.
  2. Send packets via local interface: The packet is sent through a local interface, such as 127.0.0.1 or a broader network like 0.0.0.0.
  3. Trigger Netfilter rules: Since the packet passes through the local firewall, it is processed by the rules defined in nftables, activating specific chains and rules for output (NF_INET_LOCAL_OUT). Syzkaller then collects coverage of the kernel portions handling the packet.

This method is limited because the packet’s content is static, reducing the randomness necessary to fully test more complex nft expressions. Each time the packet is sent, the content remains the same, meaning that more intricate rules, such as those parsing payloads or higher-level packet attributes, may not be thoroughly tested.

In summary, this option is useful for testing simple rules but does not fully exploit the fuzzing potential for more complex firewall rules.

Option 2: Using syz_emit_ethernet and the pseudo-syscall syz_batch_emit

The second option is a more advanced and flexible approach, using packet injection at the Ethernet layer and a custom pseudo-syscall to automatically set up nftables structures.

The pseudo-syscall syz_batch_emit

To further reduce setup complexity and improve fuzzing efficiency, I developed a custom pseudo-syscall called syz_batch_emit. This pseudo-syscall is designed to automate the creation of the necessary nftables structures (tables, chains, rules) that would otherwise need to be manually configured.

It follows the following steps:

  • Automatic creation of nftables structures: When syzkaller invokes syz_batch_emit, the pseudo-syscall automatically creates the required tables, chains, and rules for the firewall. This bypasses the need for manual setup, allowing the fuzzer to focus solely on the rule expressions.
  • Batch process: syz_batch_emit sends a batch of configuration commands to nftables, setting up a complete framework for fuzzing in a single operation, instead of requiring multiple individual commands.
  • Fuzzing expressions: Once the nftables framework is in place, syzkaller uses syz_emit_ethernet to inject mutated packets through these rules and test the complex expressions that nftables evaluates, such as parsing the payload and determining whether to accept, drop, or route the packet.
  • Advantages of the pseudo-syscall: The pseudo-syscall significantly reduces the complexity of fuzzing more intricate nftables rules by automating the setup process and enabling syzkaller to generate highly variable and sophisticated packets.

Results and discovered vulnerabilities

Thanks to this approach, it was possible to improve the coverage of evaluation functions within nftables. This allowed discovering various security issues, including null pointer dereferences and use-after-frees like this one, which have already been patched in recent Linux versions. Although these vulnerabilities are not extremely severe as they cannot be directly exploited, they highlight the importance of tools like Syzkaller for kernel-level security.

Conclusion

My proposed solution, with these two options, allows for a deeper exploration of nft fuzzing. The UDP output strategy can be useful as a first step for simple rule testing, while the combination of syz_emit_ethernet with syz_batch_emit allows for scaling and automating the fuzzing of complex rules, maximizing coverage and discovering deeper vulnerabilities in the Linux kernel.

These changes haven’t been merged yet as they imply drastically changing the network fuzzing interface, by picking the appropriate TAP device for networking and TUN for nftables.

Condividi l'articolo

Scopri come possiamo aiutarti

Troviamo insieme le soluzioni più adatte per affrontare le sfide che ogni giorno la tua impresa è chiamata ad affrontare.