Poking around with nftables

Here's the nftables config for a VM Host I'm currently setting up, because there's a lack of real-world nft documentation and examples out there.

#!/usr/sbin/nft -f
# vim:set ts=2:
#
# 9.9.9.9 is the server's own external IP
# 10.99.99.0/24 are internal IPs of virtual machines running on this machine
#   We perform SNAT for these so that they can open TCP connections to the
#   internet, which will appear to come from our external IP.
#   Additionally, we perform DNAT for these to map ports on the external IP to
#   port 22 on each server.
# 42.42.42.42/29 is our office subnet which is supposed to be able to connect to all ports
# 1.3.3.7 and 1.3.3.8 are our backup servers which need to be able to connect to sshd.
#
# IPv6 is on the roadmap but not currently a priority. In any case, the syntax
# is not a lot different.

flush ruleset;

# The next few lines currently do not work because I got the syntax wrong,
# documentation is slightly inconclusive and I need to read the bison grammar in
# order to figure out what the proper syntax is - and even afterwards it can
# happen that the userspace utility 'nft' segfaults or has assertion failures.
# The debian wiki has the verdict that nft can be considered stable at this
# point. I don't really think so.

#add map nat ssh_ports { type inet_service : ipv4_addr; }
#add element nat ssh_ports { 9992 : 10.99.99.2 };
#add element nat ssh_ports { 9993 : 10.99.99.3 };
#add rule ip nat postrouting dnat tcp dport map @ssh_ports : 22;

table nat {
  chain prerouting {
    type nat hook prerouting priority 0;
    #dnat tcp dport map @ssh_ports : 22; # does not work, workaround:
    dnat tcp dport map { 9992 : 10.99.99.2, 9993 : 10.99.99.3 } : 22;
  }
  chain postrouting {
    type nat hook postrouting priority 100;
    ip saddr 10.99.99.0/24 iifname virbr1 oifname eth0 snat 88.198.140.22 random,persistent;
  }
}

table filter {
  chain input {
    type filter hook input priority 0; 
    
    # using iif instead of iifname on lo is faster because lo is persistent. All
    # other interfaces need iifname instead, because they might go down.
    iif lo accept;
    iifname virbr1 accept; # bridge device where all VMs are connected.
    ct state established,related accept;

    tcp dport {80, 443} accept;
    ip protocol icmp accept;

    ip saddr {
       42.42.42.42/29,       # office subnet
       1.3.3.7,              # first backup server
       1.3.3.8,              # second backup server
    } accept;
    
    # by default, return an ICMP message if the packet wasn't accepted above.
    reject;
  }
# chain forward {
#   type filter hook forward priority 0;
# }
# chain output {
#   type filter hook output priority 0;
# }
}

As of 2016-07-14, the following segfaults on debian jessie due to "flags interval", which is required for CIDR support. Solution for now is to use an implicit set rather than defining a named one. This seems to be a known bug.

add set filter hosts_whitelist { type ipv4_addr; flags interval; };
add element filter hosts_whitelist { 
        42.42.42.42/29,     # office subnet
        1.3.3.7,            # first backup server
        1.3.3.8,            # second backup server
}

Personal wishlist so far

Being able to annotate IPs, ports etc. with a label rather than inside a comment would be a plus. These labels could be used in case of logging etc. and to make the "nft list" output more clear. e.g. something similar to the following, not necessarily using the following syntax:

ip saddr {
   42.42.42.42/24 desc "office subnet",
   1.3.3.7 desc "backup server 1",
   1.3.3.8 desc "backup server 2"
}

Opinions