Summary of the netfilter developer workshop, 26/27 November, 2001

Summary of the netfilter developer workshop, 26/27 November, 2001
======================================================================

Participants:

Rusty Russell <rusty@rustcorp.com.au>
Marc Boucher <mb@mbsi.ca>
Harald Welte <laforge@gnumonks.org>
Andras Kis-Szabo <kisza@sch.bme.hu>
Jozsef Kadlecsik <kadlec@blackhole.kfki.hu>
Balazs Scheidler <bazsi@balabit.hu>
Fabrice Marie <fabrice@celestix.com>
Jay Schulist <jschlst@samba.org>
Lennert Buytenhek <buytenh@gnu.org>
Michael Bellion <bellion@gmx.de>
Thomas Heinz <josef.k@mytomorrow.de>
Tommi Virtanen <tv@debian.org>
Gianni Tedesco <gianni@ecsc.co.uk>


1. Conntrack timeouts (ACK/FIN issue)

  Rusty has a proposed patch which increments the CLOSE_WAIT timeout
from 60 to 120 seconds.  Jozsef has done some testing while incrementing
the timeout to 5 days.  Neither of the two solutions sounds good, 120secs
most likely not long enough, 5 days too long (somebody reboots his box and
the connection is never closed => conntrack entries stay for 5 days).

  To find a solution (some apropriate value), we should gather some
empirical statistics.  Jozsef is going to prepare a patch against 
current stable kernel which produces this output.  The patch increases
the timeout to some big value (some hours or one day) and prints the
remainining timeout at the time the connection is closed in the other
direction.  By putting this patch into a couple of setups, we should get
statistical data about how long most of the half-closed connections remain
in this state before being closed.


2. TCP window tracking

  The coreteam likes the code, and it should have gotten into the mainstream
kernel for a long time.  We definitely want to put it into the kernel, but
first want to make sure that we don't have any false positives (out of window
packets, ...).  We need more debugging on when the out of window packets
appear.  To debug this, people need to fully tcpdump their traffic and then
get a out-of-window log with exact timestamps (to verify with tcpdump logs).

The patch will contain a debug-mode and a normal mode.  In normal mode
there are no sysctl values, in debug mode sysctl's for all timeouts are
available.  


3. newnat

  The current newnat code (as in patch-o-matic) will get into the stable
kernel soon.  All new conntrack/nat helpers should be developed for the new
api.  In order to make life for protocols with dynamic port numbers (h323, ...)
easier, we will remove the re-set of expectfn in alter_reply.

  Additionally, the #ifdef CONFIG_IP_NF_FTP / #endif style construct in
ip_conntrack.h is going to be removed for all helpers.  This means we will
have the same size of struct ip_conntrack independent of which protocol
helpers ar compiled or not.


4. bridging firewall

  Lennert has given a nice overview about his current work with regard to
bridging firewalls.  We made a decision on how to move ahead:

- Lennert submits support for bridging packet filtering to netfilter core
  team, which will put this patch in patch-o-matic, and after some testing
  submit it to the main kernel.  This patch is not allowed to create new
  members of struct sk_buff, for obvious reasons.

- bridge support for NAT (which needs new sk_buff members) will stay outside
  of the 2.4.x kernels but exist as an incremental patch inside netfilter
  patch-o-matic.


5. Conntrack exepmtions

  In order to do exemptions to connection tracking, we would need to have 
a table attached to the PRE_ROUTING hook with a priority before conntrack
gets executed.  The table's name is 'notrack'.  The table would set up
the nfct field of the skb to point to some dummy conntrack entry.  The 
connection tracking core would then check against this dummy entry before
setting nfct.
  The state match is going to be extended with a --state UNTRACKED extension,
which checks nfct against the special dummy conntrack entry.
  This feature will only get in patch-o-matic and 2.5.x.


6. Five hooked mangle table

  We put the five hooked mangle table in patch-o-matic pending section of
patch-o-matic.  The mangle priority will be the same as currently.  After
checking if it breaks with old iptables binaries, we will submit it to the
mainstream kernel

x removed unknown table (did we really fix the dropped-table problems?)


7. future of IP tables

  Currently an IP table is a contiguous chunk of memory containing all the
rules with lots of relative pointers inside.  This has the advantage of 
atomically replacing the whole table, but at a high cost:  Dynamic rule
changes are extremely expensive, and the kernel has to do lots of checking.
We consider this as a mistake and want to change the structure for 2.5.x
back to linked lists of rules.  A table is thus a linked list of chains, 
which are themselves a linked list of rules.


8. nfnetlink / ctnetlink / iptnetlink

  Jay gave an overview about his work, which is greatly appreciated by
the core team.  As for the future:

- iptnetlink will be rewritten to comform with the 2.5.x new internal
  linked list datastructure of iptables. (see 7.)
- ctnetlink is aimed for later 2.4.x inclusion
- ulog and ip_queue will be ported to nfnetlink for 2.5.x


9. userspace tool iptables

  There are a couple of problems with the current approach of iptables:

- the plugins are bound to commandline parsing (getopt, ...) and thus not
  flexible enough for alternative interfaces (gui, firewall languages, ...)
- libiptc is extremely low-level and of no use if you don't know about the
  plugins and their datastructures
- the macro-based approach for iptables / ip6tables shared code is not the
  best idea.

  Results:
- libiptc is going to disappear, since in 2.5.x all rule changes (of the
  new linked list internal data structures) will be made using iptnetlink.
- the plugin handling and big parts of iptables will move into a new library
  called libiptables.  This library can be used to query the available matches
  and targets as well as their parameters, valid values, help, ...
- multiple frontends (iptables style, ip/tc style, ...) will run on top of this
  high-level library.  iptables style will be supported by the netfilter 
  project, other ones by 3rd party.


10. Transparent proxying

  The transparent proxying shortcomings in the 2.4.x kernel are obvious.
Balasz and Rusty discussed possible implementation details and will hopefully
one day enlighten us about what exactly has been planned ;)

  Another issue coming into discussion was a two-phase-accept, where a 
transparent proxy could become even more transparent to the application,
especially in the case where the real server returns a connection refused,
which is not propageted to the client but results in a connection reset
after the connection to the proxy has been established.  Marc and Lennert
are especially interested in this, and they will take care of further
discussion + implementation.  


11. Debugging aids

  Debugging complex rulesets within the different tables, policy routing, 
etc. is extremely difficult.  Ideally we would have some packet tracing
functionality, where a user could get a detailed log of what happened to
his packet:  When the packet traversed which chain, which decision was made,
where it was altered in which way, which routing table made the routing
decision and where to, ...

  Of course we cannot just do this for every packet, because most people
would want to run this in a production enviroment where we would produce
tons of uninteresting logs per second.  So we need some classification for
telling the packet tracing system about which packets we want to trace.

  The proposed way is to add a debug table (priority before everything else)
which marks the packet in a certain way (_not_ nfmark) and then have 
a special macro called at peculiar places in the network stack.  The macro
generates event messages about every debug-marked packet.  The event messages
 are sent to userspace (netlink/syslog/...) for further analyzation.

12. failover / state replication

  Lots of people are interested in high available firwalls (failover).  We
can't do failover as soon as somebody does conntrack/nat at the moment.

  There are two basic ways:

- full state replication from one master to multiple clients using ctnetlink
  and a userspace process which distributes state changes to the slave boxes.

  + reliable, solid solution
  + does support NAT out of the box
  - complex implementation
  - lots of overhead on the master

- poor man's failover, where all firewalls are connected to all interface,
  and each one does it's own tracking.  Only one box has forwarding enabled,
  all other ones just do state tracking and once failover occurs  enable 
  forwarding.

  + extremely easy to implement
  + same amount of work on all machines, no performance degrade at all
  - problems when it comes to NAT
  - what to do after one machine is rebooted (initial sync?)
  - would need hubs on all interfaces

  Currently are no implementation plans (missing sponsor), although Harald
thinks about implementing the poor-man's approach once he has some time ;)


13. multi-packet expectation causes

  We have a problem with all of our conntrack and NAT helpers as soon as
an expectation-cause (e.g. PORT command) is spread over multiple packets.
Currently we just drop the packet, hoping that before retransmission the
sending TCP stack is coalescing the packets.  The high number of 
'partial match' messages in syslog on some machines 

x ipv6 connection tracking?
	- do we want a l3-independent conntrack?
	- how to do ipv4 <-> ipv6 nat

x how to deal with fragmented expectation-causes (PORT spread over 2 packets)?
	- alter 'partial match' message to only report the second time per
 	  connection.

14. Organizational stuff

- patch-o-matic scoreboard? 
	more feedback from the users about what is working for them or not

- testsuite
	Rusty will update the testsuite or delete it from CVS

- milestones
	we don't want milestones.

- bugtracking system
	find a volunteer who will alter and install a suitable system
	on netfilter.gnumonks.org

- new homepage
	page looks nice but the way it was built from templates is unsuitable
	for our needs since it requires proprietary windows software (eek!).
	New homepage will appear under www.netfilter.org / www.iptables.org,
	which are round-robin dns entries for the three (or even more) 
	sites.

- examples
	put cgi on new homepage and make announcement asking people to
	contribute

- cvs snapshots
	... are a good idea.  Harald needs to write a small script for
	creating them

- CVS server
	will move to a seperate machine at gnumonks.org and get rsync'ed to
	samba.org for public access.  This way we can give more people cvs
	write access without needing more samba.org accounts.



-- 
Live long and prosper
- Harald Welte / laforge@gnumonks.org               http://www.gnumonks.org/
============================================================================
GCS/E/IT d- s-: a-- C+++ UL++++$ P+++ L++++$ E--- W- N++ o? K- w--- O- M- 
V-- PS+ PE-- Y+ PGP++ t++ 5-- !X !R tv-- b+++ DI? !D G+ e* h+ r% y+(*)