M_NOWAIT. Currently, the code allows for sleeping in the ioctl path
to guarantee allocation. However code also handles ENOMEM gracefully, so
propagate this error back to user-space, rather than sleeping while
holding the global pf mutex.
Reviewed by: glebius
Discussed with: bz
- Define schednetisr() to swi_sched.
- In the swi handler check if there is some data prepared,
and if true, then call pfsync_sendout(), however tell it
not to schedule swi again.
- Since now we don't obtain the pfsync lock in the swi handler,
don't use ifqueue mutex to synchronize queue access.
revision 1.128
date: 2009/08/16 13:01:57; author: jsg; state: Exp; lines: +1 -5
remove prototypes of a bunch of functions that had their implementations
removed in pfsync v5.
o Make the pfsync.ko actually usable. Before this change loading it
didn't register protosw, so was a nop. However, a module /boot/kernel
did confused users.
o Rewrite the way we are joining multicast group:
- Move multicast initialization/destruction to separate functions.
- Don't allocate memory if we aren't going to join a multicast group.
- Use modern API for joining/leaving multicast group.
- Now the utterly wrong pfsync_ifdetach() isn't needed.
o Move module initialization from SYSINIT(9) to moduledata_t method.
o Refuse to unload module, unless asked forcibly.
o Improve a bit some FreeBSD porting code:
- Use separate malloc type.
- Simplify swi sheduling.
This change is probably wrong from VIMAGE viewpoint, however pfsync
wasn't VIMAGE-correct before this change, too.
Glanced at by: bz
destroyed prior to pfsync_uninit(). To do this, move all the
initialization to the module_t method, instead of SYSINIT(9).
o Fix another panic after module unload, due to not clearing the
m_addr_chg_pf_p pointer.
o Refuse to unload module, unless being unloaded forcibly.
o Revert the sub argument to MODULE_DECLARE, to the stable/8 value.
This change probably isn't correct from viewpoint of VIMAGE, but
the module wasn't VIMAGE-correct before the change, as well.
Glanced at by: bz
revision 1.170
date: 2011/10/30 23:04:38; author: mikeb; state: Exp; lines: +6 -7
Allow setting big MTU values on the pfsync interface but not larger
than the syncdev MTU. Prompted by the discussion with and tested
by Maxim Bourmistrov; ok dlg, mpf
Consistently use sc_ifp->if_mtu in the MTU check throughout the
module. This backs out r228813.
value used in sys/ofed/include/linux/netdevice.h), so there will be no
buffer overruns in the rest of the inline functions in this file.
Reviewed by: kmacy
MFC after: 1 week
revision 1.122
date: 2009/05/13 01:01:34; author: dlg; state: Exp; lines: +6 -4
only keep track of the number of updates on tcp connections. state sync on
all the other protocols is simply pushing the timeouts along which has a
resolution of 1 second, so it isnt going to be hurt by pfsync taking up
to a second to send it over.
keep track of updates on tcp still though, their windows need constant
attention.
revision 1.120
date: 2009/04/04 13:09:29; author: dlg; state: Exp; lines: +5 -5
use time_uptime instead of time_second internally. time_uptime isnt
affected by adjusting the clock.
revision 1.175
date: 2011/11/25 12:52:10; author: dlg; state: Exp; lines: +3 -3
use time_uptime to set state creation values as time_second can be
skewed at runtime by things like date(1) and ntpd. time_uptime is
monotonic and therefore more useful to compare against.
revision 1.118
date: 2009/03/23 06:19:59; author: dlg; state: Exp; lines: +8 -6
wait an appropriate amount of time before giving up on a bulk update,
rather than giving up after a hardcoded 5 seconds (which is generally much
too short an interval for a bulk update).
pointed out by david@, eyeballed by mcbride@
revision 1.171
date: 2011/10/31 22:02:52; author: mikeb; state: Exp; lines: +2 -1
Don't forget to cancel bulk update failure timeout when destroying an
interface. Problem report and fix from Erik Lax, thanks!
Start a brief note of revisions merged from OpenBSD.
7.x, 8.x and 9.x with pf(4) imports: pfsync(4) should suppress CARP
preemption, while it is running its bulk update.
However, reimplement the feature in more elegant manner, that is
partially inspired by newer OpenBSD:
- Rename term "suppression" to "demotion", to match with OpenBSD.
- Keep a global demotion factor, that can be raised by several
conditions, for now these are:
- interface goes down
- carp(4) has problems with ip_output() or ip6_output()
- pfsync performs bulk update
- Unlike in OpenBSD the demotion factor isn't a counter, but
is actual value added to advskew. The adjustment values for
particular error conditions are also configurable, and their
defaults are maximum advskew value, so a single failure bumps
demotion to maximum. This is for POLA compatibility, and should
satisfy most users.
- Demotion factor is a writable sysctl, so user can do
foot shooting, if he desires to.
of scheduling next run pfsync_bulk_update(), pfsync_bulk_fail()
was scheduled.
This lead to instant 100% state leak after first bulk update
request.
- After above fix, it appeared that pfsync_bulk_update() lacks
locking. To fix this, sc_bulk_tmo callout was converted to an
mtx one. Eventually, all pf/pfsync callouts should be converted
to mtx version, since it isn't possible to stop or drain a
non-mtx callout without risk of race.
- Add comment that callout_stop() in pfsync_clone_destroy() lacks
locking. Since pfsync0 can't be destroyed (yet), let it be here.
The root of problem is re-locking at the end of pfsync_sendout().
Several functions are calling pfsync_sendout() holding pointers
to pf data on stack, and these functions expect this data to be
consistent.
To fix this, the following approach was taken:
- The pfsync_sendout() doesn't call ip_output() directly, but
enqueues the mbuf on sc->sc_ifp's interfaces queue, that
is currently unused. Then pfsync netisr is scheduled. PF_LOCK
isn't dropped in pfsync_sendout().
- The netisr runs through queue and ip_output()s packets
on it.
Apart from fixing race, this also decouples stack, fixing
potential issues, that may happen, when sending pfsync(4)
packets on input path.
Reviewed by: eri (a quick review)
number of packets can be queued on sc, while we are in ip_output(), and then
we wipe the accumulated sc_len. On next pfsync_sendout() that would lead to
writing beyond our mbuf cluster.
to document where we are expecting to be called with a lock held to
more easily catch unnoticed code paths.
This does not neccessarily improve locking in pfsync, it just tries
to avoid the panics reported.
PR: kern/159390, kern/158873
Submitted by: pluknet (at least something that partly resembles
my patch ignoring other cleanup, which I only saw
too late on the 2nd PR)
MFC After: 3 days
and virtualization it is not helpful but complicates things.
Current state of art is to not virtualize these kinds of locks -
inp_group/hash/info/.. are all not virtualized either.
MFC after: 3 days