General Agenda

From NFWS 2019
Jump to: navigation, search

Tuesday

netfilter core infra listification

  • Who: Florian
  • Duration: 30 minutes

Basically, how to change

 hook(priv, skb, state)

to

 hook(priv, head, state)

Initial goal would be at least ingress hook for the flow infra fast path. (David says he'd take a patch that generates skb lists from gro leftovers), so its somewhat realistic that this will happen.

I started to dabble with this, I think I have a grasp on where all problems are, so I plan to give a somewhat formal session about this.

nf test infra

  • Who: Florian
  • Duration: ?

Lets discuss how to best proceed here and improve coverage without fractureing and copypaste from libmnl examples.

IPv6 Segment Routing (SRv6)

  • Who: Ahmed
  • Duration: 40 minutes
  • Description: IPv6 Segment Routing (SRv6) leverages the source routing paradigm to steer a packet through an ordered list of instructions called "segments". SRv6 has built a strong ecosystem supported by network vendors, network operators, open source and academia. In this talk, we will present the SRv6 technology, use-cases, standrization efforts, and deployments update. We will also give an update on the SRv6 implementation in Linux kernel across the various kernel releases. We will conclude by the foreseen opportunities for SRv6 in the containers networking world.
    • SRv6 101
    • Deployment Use-cases
    • Service Chaining
    • Container networking

Lunch break

SRv6 support in Linux

  • Who: Ahmed
  • Duration: 20 minutes
  • Description:
    • IPv6 routing extension headers processing in Linux
    • SRv6 support in Linux
    • SRv6 Performance
    • SREXT

OSF in nftables, present and future

  • Who: Fernando
  • Duration: 15 minutes
  • Description: In 2000 Michal Zalewski released the second version of p0f (passive OS fingerprinting). Three years later openBSD integrated this system in its firewall and iptables adopted it a few months later. A year ago OSF was implemented in nftables. We will talk about the improvements regarding the implementation in iptables and we will consider improvements.

dict: A netfilter expression for dictionary lookups

  • Who: Brett, Dirk
  • Duration: ?
  • The nft_dict, short for netfilter dictionary, module provides a nft rule mechanism do table lookups on environment metadata that is not present in the packet and not contained within the rule. In a firewall you often wish to block or manipulate packets based on things not immediately evident in the packet, but things that can often be calculated via other mechanisms.

Wednesday

xtables-addons features for nftables, what's useful?

  • Who: Pablo
  • Duration: 30 minutes
  • Description: An overview of what it is available in xtables-addons, discussion on what would be good/useful to integrate into nftables.

Connection tracking for bridge

  • Who: Pablo
  • Duration: 30 minutes
  • Description: Bridge got native connection tracking support for the upcoming 5.3 release cycle. How it works, caveats

netfilter.org infrastructure

  • Who: Pablo
  • Duration: 20 minutes
  • Description: What we have now, where we can go to get better.

Lunch

Netfilter hardware offloads

  • Who: Pablo
  • Duration: 30 minutes
  • Description: Summary of upstreamed and ongoing work.

nft: Quote user-defined names

  • Who: Phil
  • Duration: 15min

We can't do

 nft add table ip hour

and currently not even

 nft add table ip '"hour"'

and yes, the double quoting is ugly but required on command line.

If quoting is the way, we should quote all user-defined names on output as well so that

 nft list ruleset | nft -f -

works.

Enforce QUOTED_STRING for new things?

Are there better alternatives?

Thursday

further improvements of iptables-nft

  • Who: Phil
  • Duration: 45min

Presentation and discussion of planned work:

  • ebtables broute table support
  • ebtables among match implementation
    • limited to homogeneous elements (for now)
  • eliminate needless table/chain creation for list commands
  • remove empty tables/chains after flush commands
    • This is not just an optimizaton: Kernel modules get loaded and can't be removed unless one uses nft tool to flush ruleset
  • reduce caching
    • fetch rule cache only if needed
    • support caching individual tables or even chains only?
  • further merging of duplicated code

Support conntrack timeout policy on OVS

  • Who: Yi-Hung Wei
  • Duration: 20 mins
  • Slides

This presentation is about a use case of conntrack timeout policy in OVS. It would cover the design issues that we encountered and follow by some discussion of supporting zone-based feature in netfilter.

nft undefined behaviour situations

  • Who: Phil
  • Duration: 30min

Basic problem is that

 iptables-nft -t nat -A POSTROUTING ! -i lo -j MASQUERADE

behaves differently (will never match) than the same rule in legacy iptables.

While in nft correct behaviour is a matter of definition, iptables-nft aims at consistency with iptables-legacy. If nft_meta.c is correct, how to fix iptables-nft?

/etc/xtables/xtables.conf

  • Who: Phil
  • Duration: 10min
 Should I stay or should I go?
 If I stay there will be trouble,
 If I go there will be double?

nft icmp matches

  • Who: Phil
  • Duration: 15min

Maybe a brainstorming session might help in finally finding a solution for all the tests/py warnings.

lunch

Open questions from Florian

  • Who: Florian
  • Duration: 10m

Upper limits for transactions

We already enforce a (very large) upper cap on xtables blobs and on the number of base chains with no complaints so far.

I'd like to enforce an upper cap on size of the kernel memory that can be pinned by userspace. Unlike iptables, the memory used by a transaction is not directly related to the ruleset representations size. Also, the transaction memory lifetime is tied to the syscall/process. So, one alternative could be to use memory control groups. That leaves the question of also placing a max cap on ruleset memory.

Florian to add a large hardcoded limit, like iptables Will also check wrt memcg and extend nft to use memcg too.

conntrackd breakage with less repetitive conntrack IDs

Recently reported: conntrackd runs into problems with old flows not being expired after a recent change on how conntrack IDs are generated. Question is how to proceed here, we can make them repetitive again, only patch conntrackd, or e.g. provide a sysctl on how the IDs should be generated.

Pablo will fix this

8, 16, 32 bit integer types

There is a patch for nfables floating around that adds 3 integer types, for 8, 16 and 32 bit widths. In payload_init_raw(), the "proper" type is chosen depending on requested length. Clearly not optimal, is there a better solution?

Use case is:

< YmrDtnJu> fw: the idea is to add the layer4 protocol to a set: type ipv4_addr . ipv4_addr . inet_proto . integer16

set test {

type ipv4_addr . ipv4_addr . inet_proto . inet_service
elements = { 127.0.0.1 . 127.0.0.2 . tcp . 22 }

}

gives: nft add rule ip filter input ip saddr . ip daddr . ip protocol . @th,16,16 @test counter Error: can not use variable sized data types (integer) in concat expressions

... with patch, integer16 can be used instead of inet_service in set definition, and raw expression will use that instead of plain "integer".

Florian will check to just add a "transport dport" mnemonic what would use inet_service.

dnat/snat and altering both addr and port via maps

Background:

nft rules to replace the following ipvsadm rules to achieve the same function:

  1. ipvsadm -A -t 10.167.181.254:8080 -s rr
  2. ipvsadm -a -t 10.167.181.254:8080 -r 10.167.191.1:80 -m
  3. ipvsadm -a -t 10.167.181.254:8080 -r 10.167.191.2:80 -m

nft map does not support "dnat IP:port" format. Multi-Dnat doesn't work:

  1. nft add rule ip nat prerouting iifname eth1 \

tcp dport 8080 dnat to numgen inc mod 2 map { 0 : 10.167.191.1, 1 : 10.167.191.2 } dnat : ip daddr map { 10.167.191.1 : 80, 10.167.191.2 : 80 } <cmdline>:1:171-174: Error: Statement after terminal statement has no effect

implement CONNMARK --save-mark etc. in nftables

AFAICS this would need support for bitops with non-constant RHS: "nft add ... ct mark set ct mark & 0xffff00000 ^ (meta mark & 0xffff)" Other ideas?

type conversion and casting

nft ... meta mark set ip saddr Error: datatype mismatch: expected packet mark, expression has type IPv4 address

to be fair: meta mark jhash ip saddr works, its an arbitrary example. Do we need cast operator? Implicit conversion?

Todo: look at other languages how they handle this (explicit vs implicit) What to do when RHS has larger type? meta mark set ip6 saddr

take val[0]? val[3]? Might make sense to allow user to specify this in some way, so not just C equivalent of reg32 = reg64 (upper bits removed), but also reg32 = reg64 >> 32ul:

Integrating the network stack into XDP

  • Who: Steffen
  • Duration: 25 min

Partitioning the system into control and dataplane CPUs

  • Who: Steffen
  • Duration: 5 min


Load Balancing & Clustering in nft

  • Who: Laura García
  • Duration: 30 min

Some updates about the nftables load balancing core (nftlb), integration challenges to replace iptables completely and clustering multi-node requirements.

File:NFWS2019 - load balancing & clustering in nft.pdf

L7 Proxy offloading

  • Who: Abdessamad El Abbassi
  • Duration: 15 min

L7 proxy development and L7 protocols analysis offloading to kernel using nft subsystem.

Friday

Let's mind other business

  • Who: Éric Leblond
  • Duration: 15 min

Partitioning the system into control and dataplane CPUs

Could Suricata and other sniffing software benefit from the flow table and other Netfilter offload mechanism ? The main issue there is that we need to treat all packets.

nft: rule and ruleset optimizations

  • Who: Florian
  • Duration: 20m

see https://patchwork.ozlabs.org/patch/1101599/

ct state {established, related} => ct state established,related
ip saddr { 1.1.1.1 } => ip saddr 1.1.1.1
tcp dport 22-22 => tcp dport 22

ip saddr 1.1.1.1 accept
ip saddr 2.2.2.2 accept => ip saddr { 1.1.1.1, 2.2.2.2, 3.3.3.3} accept
ip saddr 3.3.3.3 accept

nftables command (in)consistencies

  • Who: Florian
  • Duration: 5 minutes
 02:14 < s34n> how do include a table flush in my nftables config file?
 02:14 < s34n> if the table doesn't exist yet, flush causes an error

'add table' with existing table is fine, 'delete table' with non-existing table is not. This is probably very easy to resolve, we just need to decide action item here. We could patch this in kernel (just don't generate an error), we can fix it in userspace (don't send a command if the table doesn't exist), or we can add new keyword and netlink flags to say 'delete but don't report error'. For add, we have create (gives an error), for delete, we also have flush but it has different meaning.


firewalld, libnftables, and json, oh my

This will focus on:

  • libnftables json support
  • what it looks like
  • how you can use it
  • how firewalld uses it
  • why it's awesome

set concatenations with intervals

  • Who: Eric Garver
  • Duration: 10 min

nftables lacks support for set concatenations with intervals. This prevents us from supporting ipset equivalents like "hash:net,port". This was previously discussed during NFWS 2018. These types of sets are often used by firewalld users.

   # nft add set ip t my_set3 '{ type ipv4_addr . inet_service ; flags interval ; }'
   Error: concatenated types not supported in interval sets
   add set ip t my_set3 { type ipv4_addr . inet_service ; flags interval ; }
                         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

match packets inside tunnels

Summary of what was dicussed so far wrt. inner header matching and ideas for implemtation / solution proposals

Primary assumption: Middlebox case, i.e. we are forwarding

tc-u32 solution: OFFSET := [ plus int_value ] [ at 2*int_value ] [ mask u16_hex_value ] [ shift int_value ] [ eat ]

Explict manipulation of an offset value, either via immediate value "plus 42", or by loading an offset from packet directly, "mask" and "shift" can be used for e.g. iph->ihl "eat" pulls off the header, so 0-offset then is e.g. tcp header Problem: ipv6 extension headers: how do I skip to IPPROTO_IPV6 nextheader value?

What we need from kernel/nft eval loop point of view:

  1. starting point. Currently is one of skb->mac_header,nh,th for payload and skb->data for others (reject for instance)
  2. Type of what we're trying to match. This is implicit via family, inet (ipv4 or ipv6), arp, bridge (ethernet header), arbitrary l2 header (ingress hook, dependency added via device->type / device arphrd).

For inner header, we therefore need all of the following:

  1. how many bytes to skip in skb (additional offset) to move past L3 OR L4 header. Problem is that a fixed value is enough for some use cases (e.g. vxlan), but not others.
  2. What to expect as the next (encapsulated) data. Ether? IPv4? Either IPv4 or IPv6?

Examples:

 nft add rule filter forward inner ether in gre ..
 nft add rule filter forward inner inet in ipv6 ..
 nft add rule filter forward inner ip in ip ..
 nft add rule filter forward inner ipv6 in udp ..
 nft add rule filter forward inner ether in vxlan ...

So "inner" keyword expects next protocol type and the header its supposed to skip. In some cases, userspace can just sent a fixed value, e.g. in vxlan and udp cases.

In other cases, e.g. ipv6 or ip, we can use pre-parsed info from nft_pktinfo, in other cases we might need additional kernel code to deal with non-fixed size headers.

The type of the inner header data allows us to know which type of base hook family to emulate, e.g. "ether" -> call nft_set_pktinfo_ipv4/6_validate depending on value of inner_eth->h_proto, or "inet" call nft_set_pktinfo_ipv4/6_validate based on current pkt->tprot (outer transport header).

nft_pktinfo struct is extended with "u16 inner_off;", nft_payload and nft_exthdr are patched to work by adding pkt->inner_off.

Requires audit of other rules, e.g. synproxy and reject to DTRT (synproxy: skip, reject, drop only).

To avoid repeated use of "inner ...", we could enforce a chain-based scoping. Example:

  nft add rule filter forward ip saddr 1.2.3.4 udp dport 12341 \
      inner ip in ip jump mychain   # runs mychain, but on inner ipv4 header, ie. "ip daddr 4.5.6.7" would load inner ip header daddr.

Other idea: - Explicit "undo" expression, i.e.: pop inner ether in gre ip daddr 1.2/16 accept ip saddr 1.3/16 accept drop push inner accept

... might create problems, as removing first rule changes meaning of remaining rules.

Future Directions: Connection tracking support for inner packet

- requires conversion of conntrack to skb_header_pointer() or possibly more costly pskb_may_pull()
- Loop back packet to same hook family? Hard to do due to iptables, skb can handle only one nftct entry.
- Loop packet back to same table:  Recursively call nft_do_chain() from the "inner" expression.

BZ session

  • Who: Florian
  • Duration: 1h?

Lets walk through open bugs on bz.nf.org and decide on action items, Fix/Wontfix/Florian please fix it, and closing tickets that are already resolved (if any).