Summary of the netfilter developer workshop, 26/27 November, 2001 ====================================================================== Participants: Rusty Russell <rusty@rustcorp.com.au> Marc Boucher <mb@mbsi.ca> Harald Welte <laforge@gnumonks.org> Andras Kis-Szabo <kisza@sch.bme.hu> Jozsef Kadlecsik <kadlec@blackhole.kfki.hu> Balazs Scheidler <bazsi@balabit.hu> Fabrice Marie <fabrice@celestix.com> Jay Schulist <jschlst@samba.org> Lennert Buytenhek <buytenh@gnu.org> Michael Bellion <bellion@gmx.de> Thomas Heinz <josef.k@mytomorrow.de> Tommi Virtanen <tv@debian.org> Gianni Tedesco <gianni@ecsc.co.uk> 1. Conntrack timeouts (ACK/FIN issue) Rusty has a proposed patch which increments the CLOSE_WAIT timeout from 60 to 120 seconds. Jozsef has done some testing while incrementing the timeout to 5 days. Neither of the two solutions sounds good, 120secs most likely not long enough, 5 days too long (somebody reboots his box and the connection is never closed => conntrack entries stay for 5 days). To find a solution (some apropriate value), we should gather some empirical statistics. Jozsef is going to prepare a patch against current stable kernel which produces this output. The patch increases the timeout to some big value (some hours or one day) and prints the remainining timeout at the time the connection is closed in the other direction. By putting this patch into a couple of setups, we should get statistical data about how long most of the half-closed connections remain in this state before being closed. 2. TCP window tracking The coreteam likes the code, and it should have gotten into the mainstream kernel for a long time. We definitely want to put it into the kernel, but first want to make sure that we don't have any false positives (out of window packets, ...). We need more debugging on when the out of window packets appear. To debug this, people need to fully tcpdump their traffic and then get a out-of-window log with exact timestamps (to verify with tcpdump logs). The patch will contain a debug-mode and a normal mode. In normal mode there are no sysctl values, in debug mode sysctl's for all timeouts are available. 3. newnat The current newnat code (as in patch-o-matic) will get into the stable kernel soon. All new conntrack/nat helpers should be developed for the new api. In order to make life for protocols with dynamic port numbers (h323, ...) easier, we will remove the re-set of expectfn in alter_reply. Additionally, the #ifdef CONFIG_IP_NF_FTP / #endif style construct in ip_conntrack.h is going to be removed for all helpers. This means we will have the same size of struct ip_conntrack independent of which protocol helpers ar compiled or not. 4. bridging firewall Lennert has given a nice overview about his current work with regard to bridging firewalls. We made a decision on how to move ahead: - Lennert submits support for bridging packet filtering to netfilter core team, which will put this patch in patch-o-matic, and after some testing submit it to the main kernel. This patch is not allowed to create new members of struct sk_buff, for obvious reasons. - bridge support for NAT (which needs new sk_buff members) will stay outside of the 2.4.x kernels but exist as an incremental patch inside netfilter patch-o-matic. 5. Conntrack exepmtions In order to do exemptions to connection tracking, we would need to have a table attached to the PRE_ROUTING hook with a priority before conntrack gets executed. The table's name is 'notrack'. The table would set up the nfct field of the skb to point to some dummy conntrack entry. The connection tracking core would then check against this dummy entry before setting nfct. The state match is going to be extended with a --state UNTRACKED extension, which checks nfct against the special dummy conntrack entry. This feature will only get in patch-o-matic and 2.5.x. 6. Five hooked mangle table We put the five hooked mangle table in patch-o-matic pending section of patch-o-matic. The mangle priority will be the same as currently. After checking if it breaks with old iptables binaries, we will submit it to the mainstream kernel x removed unknown table (did we really fix the dropped-table problems?) 7. future of IP tables Currently an IP table is a contiguous chunk of memory containing all the rules with lots of relative pointers inside. This has the advantage of atomically replacing the whole table, but at a high cost: Dynamic rule changes are extremely expensive, and the kernel has to do lots of checking. We consider this as a mistake and want to change the structure for 2.5.x back to linked lists of rules. A table is thus a linked list of chains, which are themselves a linked list of rules. 8. nfnetlink / ctnetlink / iptnetlink Jay gave an overview about his work, which is greatly appreciated by the core team. As for the future: - iptnetlink will be rewritten to comform with the 2.5.x new internal linked list datastructure of iptables. (see 7.) - ctnetlink is aimed for later 2.4.x inclusion - ulog and ip_queue will be ported to nfnetlink for 2.5.x 9. userspace tool iptables There are a couple of problems with the current approach of iptables: - the plugins are bound to commandline parsing (getopt, ...) and thus not flexible enough for alternative interfaces (gui, firewall languages, ...) - libiptc is extremely low-level and of no use if you don't know about the plugins and their datastructures - the macro-based approach for iptables / ip6tables shared code is not the best idea. Results: - libiptc is going to disappear, since in 2.5.x all rule changes (of the new linked list internal data structures) will be made using iptnetlink. - the plugin handling and big parts of iptables will move into a new library called libiptables. This library can be used to query the available matches and targets as well as their parameters, valid values, help, ... - multiple frontends (iptables style, ip/tc style, ...) will run on top of this high-level library. iptables style will be supported by the netfilter project, other ones by 3rd party. 10. Transparent proxying The transparent proxying shortcomings in the 2.4.x kernel are obvious. Balasz and Rusty discussed possible implementation details and will hopefully one day enlighten us about what exactly has been planned ;) Another issue coming into discussion was a two-phase-accept, where a transparent proxy could become even more transparent to the application, especially in the case where the real server returns a connection refused, which is not propageted to the client but results in a connection reset after the connection to the proxy has been established. Marc and Lennert are especially interested in this, and they will take care of further discussion + implementation. 11. Debugging aids Debugging complex rulesets within the different tables, policy routing, etc. is extremely difficult. Ideally we would have some packet tracing functionality, where a user could get a detailed log of what happened to his packet: When the packet traversed which chain, which decision was made, where it was altered in which way, which routing table made the routing decision and where to, ... Of course we cannot just do this for every packet, because most people would want to run this in a production enviroment where we would produce tons of uninteresting logs per second. So we need some classification for telling the packet tracing system about which packets we want to trace. The proposed way is to add a debug table (priority before everything else) which marks the packet in a certain way (_not_ nfmark) and then have a special macro called at peculiar places in the network stack. The macro generates event messages about every debug-marked packet. The event messages are sent to userspace (netlink/syslog/...) for further analyzation. 12. failover / state replication Lots of people are interested in high available firwalls (failover). We can't do failover as soon as somebody does conntrack/nat at the moment. There are two basic ways: - full state replication from one master to multiple clients using ctnetlink and a userspace process which distributes state changes to the slave boxes. + reliable, solid solution + does support NAT out of the box - complex implementation - lots of overhead on the master - poor man's failover, where all firewalls are connected to all interface, and each one does it's own tracking. Only one box has forwarding enabled, all other ones just do state tracking and once failover occurs enable forwarding. + extremely easy to implement + same amount of work on all machines, no performance degrade at all - problems when it comes to NAT - what to do after one machine is rebooted (initial sync?) - would need hubs on all interfaces Currently are no implementation plans (missing sponsor), although Harald thinks about implementing the poor-man's approach once he has some time ;) 13. multi-packet expectation causes We have a problem with all of our conntrack and NAT helpers as soon as an expectation-cause (e.g. PORT command) is spread over multiple packets. Currently we just drop the packet, hoping that before retransmission the sending TCP stack is coalescing the packets. The high number of 'partial match' messages in syslog on some machines x ipv6 connection tracking? - do we want a l3-independent conntrack? - how to do ipv4 <-> ipv6 nat x how to deal with fragmented expectation-causes (PORT spread over 2 packets)? - alter 'partial match' message to only report the second time per connection. 14. Organizational stuff - patch-o-matic scoreboard? more feedback from the users about what is working for them or not - testsuite Rusty will update the testsuite or delete it from CVS - milestones we don't want milestones. - bugtracking system find a volunteer who will alter and install a suitable system on netfilter.gnumonks.org - new homepage page looks nice but the way it was built from templates is unsuitable for our needs since it requires proprietary windows software (eek!). New homepage will appear under www.netfilter.org / www.iptables.org, which are round-robin dns entries for the three (or even more) sites. - examples put cgi on new homepage and make announcement asking people to contribute - cvs snapshots ... are a good idea. Harald needs to write a small script for creating them - CVS server will move to a seperate machine at gnumonks.org and get rsync'ed to samba.org for public access. This way we can give more people cvs write access without needing more samba.org accounts. -- Live long and prosper - Harald Welte / laforge@gnumonks.org http://www.gnumonks.org/ ============================================================================ GCS/E/IT d- s-: a-- C+++ UL++++$ P+++ L++++$ E--- W- N++ o? K- w--- O- M- V-- PS+ PE-- Y+ PGP++ t++ 5-- !X !R tv-- b+++ DI? !D G+ e* h+ r% y+(*)