- Wrong integer type was specified.
- Wrong or missing "access" specifier. The "access" specifier
sometimes included the SYSCTL type, which it should not, except for
procedural SYSCTL nodes.
- Logical OR where binary OR was expected.
- Properly assert the "access" argument passed to all SYSCTL macros,
using the CTASSERT macro. This applies to both static- and dynamically
created SYSCTLs.
- Properly assert the the data type for both static and dynamic
SYSCTLs. In the case of static SYSCTLs we only assert that the data
pointed to by the SYSCTL data pointer has the correct size, hence
there is no easy way to assert types in the C language outside a
C-function.
- Rewrote some code which doesn't pass a constant "access" specifier
when creating dynamic SYSCTL nodes, which is now a requirement.
- Updated "EXAMPLES" section in SYSCTL manual page.
MFC after: 3 days
Sponsored by: Mellanox Technologies
otherwise bad consequences including a routing loop can occur.
Move pf_set_rt_ifp() earlier in state creation sequence and
inline it, cutting some extra code.
PR: 183997
Submitted by: Kajetan Staszkiewicz <vegeta tuxpowered.net>
Sponsored by: InnoGames GmbH
- Do not count global number of states and of src_nodes,
use uma_zone_get_cur() to obtain values.
- Struct pf_status becomes merely an ioctl API structure,
and moves to netpfil/pf/pf.h with its constants.
- V_pf_status is now of type struct pf_kstatus.
Submitted by: Kajetan Staszkiewicz <vegeta tuxpowered.net>
Sponsored by: InnoGames GmbH
These changes prevent sysctl(8) from returning proper output,
such as:
1) no output from sysctl(8)
2) erroneously returning ENOMEM with tools like truss(1)
or uname(1)
truss: can not get etype: Cannot allocate memory
there is an environment variable which shall initialize the SYSCTL
during early boot. This works for all SYSCTL types both statically and
dynamically created ones, except for the SYSCTL NODE type and SYSCTLs
which belong to VNETs. A new flag, CTLFLAG_NOFETCH, has been added to
be used in the case a tunable sysctl has a custom initialisation
function allowing the sysctl to still be marked as a tunable. The
kernel SYSCTL API is mostly the same, with a few exceptions for some
special operations like iterating childrens of a static/extern SYSCTL
node. This operation should probably be made into a factored out
common macro, hence some device drivers use this. The reason for
changing the SYSCTL API was the need for a SYSCTL parent OID pointer
and not only the SYSCTL parent OID list pointer in order to quickly
generate the sysctl path. The motivation behind this patch is to avoid
parameter loading cludges inside the OFED driver subsystem. Instead of
adding special code to the OFED driver subsystem to post-load tunables
into dynamically created sysctls, we generalize this in the kernel.
Other changes:
- Corrected a possibly incorrect sysctl name from "hw.cbb.intr_mask"
to "hw.pcic.intr_mask".
- Removed redundant TUNABLE statements throughout the kernel.
- Some minor code rewrites in connection to removing not needed
TUNABLE statements.
- Added a missing SYSCTL_DECL().
- Wrapped two very long lines.
- Avoid malloc()/free() inside sysctl string handling, in case it is
called to initialize a sysctl from a tunable, hence malloc()/free() is
not ready when sysctls from the sysctl dataset are registered.
- Bumped FreeBSD version to indicate SYSCTL API change.
MFC after: 2 weeks
Sponsored by: Mellanox Technologies
De-virtualize UMA zone pf_mtag_z and move to global initialization part.
The m_tag struct does not know about vnet context and the pf_mtag_free()
callback is called unaware of current vnet. This causes a panic.
MFC after: 1 week
The m_tag struct does not know about vnet context and the pf_mtag_free()
callback is called unaware of current vnet. This causes a panic.
Reviewed by: Nikos Vassiliadis, trociny@
- Use counter(9) for rt_pksent (former rt_rmx.rmx_pksent). This
removes another cache trashing ++ from packet forwarding path.
- Create zini/fini methods for the rtentry UMA zone. Via initialize
mutex and counter in them.
- Fix reporting of rmx_pksent to routing socket.
- Fix netstat(1) to report "Use" both in kvm(3) and sysctl(3) mode.
The change is mostly targeted for stable/10 merge. For head,
rt_pksent is expected to just disappear.
Discussed with: melifaro
Sponsored by: Netflix
Sponsored by: Nginx, Inc.
race prone. Some just gather statistics, but some are later used in
different calculations.
A real problem was the race provoked underflow of the states_cur counter
on a rule. Once it goes below zero, it wraps to UINT32_MAX. Later this
value is used in pf_state_expires() and any state created by this rule
is immediately expired.
Thus, make fields states_cur, states_tot and src_nodes of struct
pf_rule be counter(9)s.
Thanks to Dennis for providing me shell access to problematic box and
his help with reproducing, debugging and investigating the problem.
Thanks to: Dennis Yusupoff <dyr smartspb.net>
Also reported by: dumbbell, pgj, Rambler
Sponsored by: Nginx, Inc.
- Removed pf_remove_src_node().
- Introduce pf_unlink_src_node() and pf_unlink_src_node_locked().
These function do not proceed with freeing of a node, just disconnect
it from storage.
- New function pf_free_src_nodes() works on a list of previously
disconnected nodes and frees them.
- Utilize new API in pf_purge_expired_src_nodes().
In collaboration with: Kajetan Staszkiewicz <kajetan.staszkiewicz innogames.de>
Sponsored by: InnoGames GmbH
Sponsored by: Nginx, Inc.
parts and global parts.
- Since global parts appeared to be only mutex initializations, just
abandon them and use MTX_SYSINIT() instead.
- Kill my incorrect VNET_FOREACH() iterator and instead use correct
approach with VNET_SYSINIT().
Submitted by: Nikos Vassiliadis <nvass gmx.com>
Reviewed by: trociny
Original log:
Make sure pd2 has a pointer to the icmp header in the payload; fixes
panic seen with some some icmp types in icmp error message payloads.
Obtained from: OpenBSD
Stricter state checking for ICMP and ICMPv6 packets: include the ICMP type
in one port of the state key, using the type to determine which
side should be the id, and which should be the type. Also:
- Handle ICMP6 messages which are typically sent to multicast
addresses but recieve unicast replies, by doing fallthrough lookups
against the correct multicast address. - Clear up some mistaken
assumptions in the PF code:
- Not all ICMP packets have an icmp_id, so simulate
one based on other data if we can, otherwise set it to 0.
- Don't modify the icmp id field in NAT unless it's echo
- Use the full range of possible id's when NATing icmp6 echoy
Difference with OpenBSD version:
- C99ify the new code
- WITHOUT_INET6 safe
Reviewed by: glebius
Obtained from: OpenBSD
in net, to avoid compatibility breakage for no sake.
The future plan is to split most of non-kernel parts of
pfvar.h into pf.h, and then make pfvar.h a kernel only
include breaking compatibility.
Discussed with: bz
to this event, adding if_var.h to files that do need it. Also, include
all includes that now are included due to implicit pollution via if_var.h
Sponsored by: Netflix
Sponsored by: Nginx, Inc.
Before this change state creating sequence was:
1) lock wire key hash
2) link state's wire key
3) unlock wire key hash
4) lock stack key hash
5) link state's stack key
6) unlock stack key hash
7) lock ID hash
8) link into ID hash
9) unlock ID hash
What could happen here is that other thread finds the state via key
hash lookup after 6), locks ID hash and does some processing of the
state. When the thread creating state unblocks, it finds the state
it was inserting already non-virgin.
Now we perform proper interlocking between key hash locks and ID hash
lock:
1) lock wire & stack hashes
2) link state's keys
3) lock ID hash
4) unlock wire & stack hashes
5) link into ID hash
6) unlock ID hash
To achieve that, the following hacking was performed in pf_state_key_attach():
- Key hash mutex is marked with MTX_DUPOK.
- To avoid deadlock on 2 key hash mutexes, we lock them in order determined
by their address value.
- pf_state_key_attach() had a magic to reuse a > FIN_WAIT_2 state. It unlinked
the conflicting state synchronously. In theory this could require locking
a third key hash, which we can't do now.
Now we do not remove the state immediately, instead we leave this task to
the purge thread. To avoid conflicts in a short period before state is
purged, we push to the very end of the TAILQ.
- On success, before dropping key hash locks, pf_state_key_attach() locks
ID hash and returns.
Tested by: Ian FREISLICH <ianf clue.co.za>
- Add my copyright to files I've touched a lot this year.
- Add dash in front of all copyright notices according to style(9).
- Move $OpenBSD$ down below copyright notices.
- Remove extra line between cdefs.h and __FBSDID.