190 Commits

Author SHA1 Message Date
kevlo
7120d23af8 Add missing PF_UNLOCK in pf_test
Reviewed by:	bz
2011-10-30 14:55:00 +00:00
glebius
198f18300e Utilize new IF_DEQUEUE_ALL(ifq, m) macro in pfsyncintr() to reduce
contention on ifqueue lock.
2011-10-27 09:47:00 +00:00
glebius
846e388f82 Merge several fixes to bulk update processing from OpenBSD. Merged
revisions: 1.148, 1.149, 1.150. This makes number of states on
master/slave to be of a sane value.
2011-10-23 15:15:17 +00:00
glebius
df3649a331 Fix indentation, no code changed. 2011-10-23 15:10:15 +00:00
glebius
7df2cbe8bc - Fix a bad typo (FreeBSD specific) in pfsync_bulk_update(). Instead
of scheduling next run pfsync_bulk_update(), pfsync_bulk_fail()
  was scheduled.
  This lead to instant 100% state leak after first bulk update
  request.
- After above fix, it appeared that pfsync_bulk_update() lacks
  locking. To fix this, sc_bulk_tmo callout was converted to an
  mtx one. Eventually, all pf/pfsync callouts should be converted
  to mtx version, since it isn't possible to stop or drain a
  non-mtx callout without risk of race.
- Add comment that callout_stop() in pfsync_clone_destroy() lacks
  locking. Since pfsync0 can't be destroyed (yet), let it be here.
2011-10-23 15:08:18 +00:00
glebius
aaf1597b1d Fix from r226623 is not sufficient to close all races in pfsync(4).
The root of problem is re-locking at the end of pfsync_sendout().
Several functions are calling pfsync_sendout() holding pointers
to pf data on stack, and these functions expect this data to be
consistent.

To fix this, the following approach was taken:

- The pfsync_sendout() doesn't call ip_output() directly, but
  enqueues the mbuf on sc->sc_ifp's interfaces queue, that
  is currently unused. Then pfsync netisr is scheduled. PF_LOCK
  isn't dropped in pfsync_sendout().
- The netisr runs through queue and ip_output()s packets
  on it.

Apart from fixing race, this also decouples stack, fixing
potential issues, that may happen, when sending pfsync(4)
packets on input path.

Reviewed by:	eri (a quick review)
2011-10-23 14:59:54 +00:00
glebius
a1471140d1 Absense of M_WAITOK in malloc flags for UMA doesn't
equals presense of M_NOWAIT. Specify M_NOWAIT explicitly.

This fixes sleeping with PF_LOCK().
2011-10-23 10:13:20 +00:00
glebius
0b986dc6d5 Correct flag for uma_zalloc() is M_WAITOK. M_WAIT is an old and
deprecated flag from historical mbuf(9) allocator.

This is style only change.
2011-10-23 10:05:25 +00:00
glebius
904eece235 Fix a race: we should update sc_len before dropping the pf lock, otherwise a
number of packets can be queued on sc, while we are in ip_output(), and then
we wipe the accumulated sc_len. On next pfsync_sendout() that would lead to
writing beyond our mbuf cluster.
2011-10-21 22:28:15 +00:00
glebius
0391c17cc2 In FreeBSD ip_output() expects ip_len and ip_off in host byte order
PR:		kern/159029
2011-10-21 11:11:18 +00:00
bz
5203b21813 Fix recursive pf locking leading to panics. Splatter PF_LOCK_ASSERT()s
to document where we are expecting to be called with a lock held to
more easily catch unnoticed code paths.
This does not neccessarily improve locking in pfsync, it just tries
to avoid the panics reported.

PR:		kern/159390, kern/158873
Submitted by:	pluknet (at least something that partly resembles
		my patch ignoring other cleanup, which I only saw
		too late on the 2nd PR)
MFC After:	3 days
2011-10-19 13:13:56 +00:00
bz
5f647b5434 De-virtualize the pf_task_mtx lock. At the current state of pf locking
and virtualization it is not helpful but complicates things.

Current state of art is to not virtualize these kinds of locks -
inp_group/hash/info/.. are all not virtualized either.

MFC after:	3 days
2011-10-19 11:04:49 +00:00
bz
dd787aecc7 Adjust the PF_ASSERT() macro to what we usually use in the network stack:
PF_LOCK_ASSERT() and PF_UNLOCK_ASSERT().

MFC after:	3 days
2011-10-19 10:16:42 +00:00
bz
5be0c466b9 In the non-FreeBSD case we do not expect PF_LOCK and friends to do anything.
MFC after:	3 days
2011-10-19 10:08:58 +00:00
bz
c9c086db96 Pseudo interfaces should go at SI_SUB_PSEUDO. However at least
pfsync also depends on pf to be initialized already so pf goes at
FIRST and the interfaces go at ANY.
Then the (VNET_)SYSINIT startups for pf stays at SI_SUB_PROTO_BEGIN
and for pfsync we move to the later SI_SUB_PROTO_IF.

This is not ideal either but at least an order that should work for
the moment and can be re-fined with the VIMAGE merge, once this will
actually work with more than one network stack.

MFC after:	3 days
2011-10-19 10:04:24 +00:00
bz
677cad9768 Fix an obvious locking bug where we would lock again rather than unlock.
MFC after:	3 days
2011-10-19 09:34:40 +00:00
bz
21d5547b98 Fix a bug when NPFSYNC > 0 that on FreeBSD we would always return
and never remove state.

This fixes the problem some people are seeing that state is removed when pf
is loaded as a module but not in situations when compiled into the kernel.

Reported by:	many on freebsd-pf
Tested by:	flo
MFC after:	3 days
2011-10-19 08:57:17 +00:00
bz
d8751c7f52 Fix indentation in a loop and a tiny maze of #ifdefs for just the
__FreeBSD__ parts that had it wrong.

MFC after:	3 days
2011-10-19 08:37:48 +00:00
bz
e414ca1893 Use the correct byte order for the ip_divert(4) mbuf tag port meta
information in pf(4).

Submitted by:	Yaocl (chunlinyao gmail.com), forum post 145106
Approved by:	re (kib)
2011-08-25 09:38:33 +00:00
pluknet
28d7c80803 Fix build failure without BPF.
Reported by:	deeptech71 at gmail dot com
Approved by:	re (kib)
2011-08-17 13:02:50 +00:00
bz
e15f804c7b Update packet filter (pf) code to OpenBSD 4.5.
You need to update userland (world and ports) tools
to be in sync with the kernel.

Submitted by:	mlaier
Submitted by:	eri
2011-06-28 11:57:25 +00:00
rwatson
e9eb5d3b9c Add _mbuf() variants of various inpcb-related interfaces, including lookup,
hash install, etc.  For now, these are arguments are unused, but as we add
RSS support, we will want to use hashes extracted from mbufs, rather than
manually calculated hashes of header fields, due to the expensive of the
software version of Toeplitz (and similar hashes).

Add notes that it would be nice to be able to pass mbufs into lookup
routines in pf(4), optimising firewall lookup in the same way, but the
code structure there doesn't facilitate that currently.

(In principle there is no reason this couldn't be MFCed -- the change
extends rather than modifies the KBI.  However, it won't be useful without
other previous possibly less MFCable changes.)

Reviewed by:    bz
Sponsored by:   Juniper Networks, Inc.
2011-06-04 16:33:06 +00:00
bz
90dbe667c5 Remove some further INET related symbols from pf to allow the module
to not only compile bu load as well for testing with IPv6-only kernels.
For the moment we ignore the csum change in pf_ioctl.c given the
pending update to pf45.

Reported by:	dru
Sponsored by:	The FreeBSD Foundation
Sponsored by:	iXsystems
MFC after:	20 days
2011-05-31 15:05:29 +00:00
rwatson
fdfdadb612 Decompose the current single inpcbinfo lock into two locks:
- The existing ipi_lock continues to protect the global inpcb list and
  inpcb counter.  This lock is now relegated to a small number of
  allocation and free operations, and occasional operations that walk
  all connections (including, awkwardly, certain UDP multicast receive
  operations -- something to revisit).

- A new ipi_hash_lock protects the two inpcbinfo hash tables for
  looking up connections and bound sockets, manipulated using new
  INP_HASH_*() macros.  This lock, combined with inpcb locks, protects
  the 4-tuple address space.

Unlike the current ipi_lock, ipi_hash_lock follows the individual inpcb
connection locks, so may be acquired while manipulating a connection on
which a lock is already held, avoiding the need to acquire the inpcbinfo
lock preemptively when a binding change might later be required.  As a
result, however, lookup operations necessarily go through a reference
acquire while holding the lookup lock, later acquiring an inpcb lock --
if required.

A new function in_pcblookup() looks up connections, and accepts flags
indicating how to return the inpcb.  Due to lock order changes, callers
no longer need acquire locks before performing a lookup: the lookup
routine will acquire the ipi_hash_lock as needed.  In the future, it will
also be able to use alternative lookup and locking strategies
transparently to callers, such as pcbgroup lookup.  New lookup flags are,
supplementing the existing INPLOOKUP_WILDCARD flag:

  INPLOOKUP_RLOCKPCB - Acquire a read lock on the returned inpcb
  INPLOOKUP_WLOCKPCB - Acquire a write lock on the returned inpcb

Callers must pass exactly one of these flags (for the time being).

Some notes:

- All protocols are updated to work within the new regime; especially,
  TCP, UDPv4, and UDPv6.  pcbinfo ipi_lock acquisitions are largely
  eliminated, and global hash lock hold times are dramatically reduced
  compared to previous locking.
- The TCP syncache still relies on the pcbinfo lock, something that we
  may want to revisit.
- Support for reverting to the FreeBSD 7.x locking strategy in TCP input
  is no longer available -- hash lookup locks are now held only very
  briefly during inpcb lookup, rather than for potentially extended
  periods.  However, the pcbinfo ipi_lock will still be acquired if a
  connection state might change such that a connection is added or
  removed.
- Raw IP sockets continue to use the pcbinfo ipi_lock for protection,
  due to maintaining their own hash tables.
- The interface in6_pcblookup_hash_locked() is maintained, which allows
  callers to acquire hash locks and perform one or more lookups atomically
  with 4-tuple allocation: this is required only for TCPv6, as there is no
  in6_pcbconnect_setup(), which there should be.
- UDPv6 locking remains significantly more conservative than UDPv4
  locking, which relates to source address selection.  This needs
  attention, as it likely significantly reduces parallelism in this code
  for multithreaded socket use (such as in BIND).
- In the UDPv4 and UDPv6 multicast cases, we need to revisit locking
  somewhat, as they relied on ipi_lock to stablise 4-tuple matches, which
  is no longer sufficient.  A second check once the inpcb lock is held
  should do the trick, keeping the general case from requiring the inpcb
  lock for every inpcb visited.
- This work reminds us that we need to revisit locking of the v4/v6 flags,
  which may be accessed lock-free both before and after this change.
- Right now, a single lock name is used for the pcbhash lock -- this is
  undesirable, and probably another argument is required to take care of
  this (or a char array name field in the pcbinfo?).

This is not an MFC candidate for 8.x due to its impact on lookup and
locking semantics.  It's possible some of these issues could be worked
around with compatibility wrappers, if necessary.

Reviewed by:    bz
Sponsored by:   Juniper Networks, Inc.
2011-05-30 09:43:55 +00:00
bz
26a10f2195 Make pf compile without INET support by adding #ifdef INETs and
correcting few #includes.

Reviewed by:	gnn
Sponsored by:	The FreeBSD Foundation
Sponsored by:	iXsystems
MFC after:	4 days
2011-04-27 19:34:01 +00:00
csjp
4f835d8864 Correct bogus initialization. It should be noted that this change
has been corrected in the vendor branch, but for now, silence clang
warnings.

Found by:	clang
Discussed with:	mlaier
MFC after:	1 week
2011-01-14 04:24:53 +00:00
rpaulo
c652fa1474 Ignore the return value of ADDCARRY(). 2010-10-13 17:16:08 +00:00
bz
b27a644a18 When using pf routing options, properly handle IP fragmentation
for interfaces with TSO enabled, otherwise one would see an extra
ICMP unreach, frag needed pre matching packet on lo0.
This syncs pf code to ip_output.c r162084.

PR:		kern/144311
Submitted by:	yongari via mlaier
Reviewed by:	eri
Tested by:	kib
MFC after:	8 days
2010-09-10 00:00:06 +00:00
delphij
40c18ac3ff Adapt OpenBSD pf's "sloopy" TCP state machine which is useful for Direct
Server Return mode, where not all packets would be visible to the load
balancer or gateway.

This commit should be reverted when we merge future pf versions.  The
benefit it would provide is that this version does not break any existing
public interface and thus won't be a problem if we want to MFC it to
earlier FreeBSD releases.

Discussed with:	mlaier
Obtained from:	OpenBSD
Sponsored by:	iXsystems, Inc.
MFC after:	1 month
2009-12-24 00:43:44 +00:00
mlaier
1142208531 Fix argument ordering to memcpy as well as the size of the copy in the
(theoretical) case that pfi_buffer_cnt should be greater than ~_max.

Submitted by:	pjd
Reviewed by:	{krw,sthen,markus}@openbsd.org
MFC after:	3 days
2009-08-25 19:30:32 +00:00
mlaier
abdec3b9b7 If we cannot immediately get the pf_consistency_lock in the purge thread,
restart the scan after acquiring the lock the hard way.  Otherwise we might
end up with a dead reference.

Reported by:		pfsense
Reviewed by:		eri
Initial patch by:	eri
Tested by:		pfsense
Approved by:		re (kib)
2009-08-19 00:10:10 +00:00
rwatson
5c6699ad3d Many network stack subsystems use a single global data structure to hold
all pertinent statatistics for the subsystem.  These structures are
sometimes "borrowed" by kernel modules that require a place to store
statistics for similar events.

Add KPI accessor functions for statistics structures referenced by kernel
modules so that they no longer encode certain specifics of how the data
structures are named and stored.  This change is intended to make it
easier to move to per-CPU network stats following 8.0-RELEASE.

The following modules are affected by this change:

      if_bridge
      if_cxgb
      if_gif
      ip_mroute
      ipdivert
      pf

In practice, most of these statistics consumers should, in fact, maintain
their own statistics data structures rather than borrowing structures
from the base network stack.  However, that change is too agressive for
this point in the release cycle.

Reviewed by:	bz
Approved by:	re (kib)
2009-08-02 19:43:32 +00:00
rwatson
fb9ffed650 Merge the remainder of kern_vimage.c and vimage.h into vnet.c and
vnet.h, we now use jails (rather than vimages) as the abstraction
for virtualization management, and what remained was specific to
virtual network stacks.  Minor cleanups are done in the process,
and comments updated to reflect these changes.

Reviewed by:	bz
Approved by:	re (vimage blanket)
2009-08-01 19:26:27 +00:00
rwatson
57ca4583e7 Build on Jeff Roberson's linker-set based dynamic per-CPU allocator
(DPCPU), as suggested by Peter Wemm, and implement a new per-virtual
network stack memory allocator.  Modify vnet to use the allocator
instead of monolithic global container structures (vinet, ...).  This
change solves many binary compatibility problems associated with
VIMAGE, and restores ELF symbols for virtualized global variables.

Each virtualized global variable exists as a "reference copy", and also
once per virtual network stack.  Virtualized global variables are
tagged at compile-time, placing the in a special linker set, which is
loaded into a contiguous region of kernel memory.  Virtualized global
variables in the base kernel are linked as normal, but those in modules
are copied and relocated to a reserved portion of the kernel's vnet
region with the help of a the kernel linker.

Virtualized global variables exist in per-vnet memory set up when the
network stack instance is created, and are initialized statically from
the reference copy.  Run-time access occurs via an accessor macro, which
converts from the current vnet and requested symbol to a per-vnet
address.  When "options VIMAGE" is not compiled into the kernel, normal
global ELF symbols will be used instead and indirection is avoided.

This change restores static initialization for network stack global
variables, restores support for non-global symbols and types, eliminates
the need for many subsystem constructors, eliminates large per-subsystem
structures that caused many binary compatibility issues both for
monitoring applications (netstat) and kernel modules, removes the
per-function INIT_VNET_*() macros throughout the stack, eliminates the
need for vnet_symmap ksym(2) munging, and eliminates duplicate
definitions of virtualized globals under VIMAGE_GLOBALS.

Bump __FreeBSD_version and update UPDATING.

Portions submitted by:  bz
Reviewed by:            bz, zec
Discussed with:         gnn, jamie, jeff, jhb, julian, sam
Suggested by:           peter
Approved by:            re (kensmith)
2009-07-14 22:48:30 +00:00
bz
0808d0b1a6 After cleaning up rt_tables from vnet.h and cleaning up opt_route.h
a lot of files no longer need route.h either. Garbage collect them.
While here remove now unneeded vnet.h #includes as well.
2009-06-23 17:03:45 +00:00
bz
5ca915a0f9 All consumers of in_cksum.h have been properly #ifdefed already,
so do not include the file either as it would give as an extra
dependency on INET.
2009-06-10 11:19:34 +00:00
bz
b7ff2bdc20 After r193232 rt_tables in vnet.h are no longer indirectly dependent on
the ROUTETABLES kernel option thus there is no need to include opt_route.h
anymore in all consumers of vnet.h and no longer depend on it for module
builds.

Remove the hidden include in flowtable.h as well and leave the two
explicit #includes in ip_input.c and ip_output.c.
2009-06-08 19:57:35 +00:00
rwatson
f4934662e5 Move "options MAC" from opt_mac.h to opt_global.h, as it's now in GENERIC
and used in a large number of files, but also because an increasing number
of incorrect uses of MAC calls were sneaking in due to copy-and-paste of
MAC-aware code without the associated opt_mac.h include.

Discussed with:	pjd
2009-06-05 14:55:22 +00:00
zec
904747c9f9 V_loif is not an array but a pure pointer, so treat it as such.
Reviewed by:	bz
Approved by:	julian (mentor)
2009-06-01 21:29:54 +00:00
kmacy
24b38efdce Change if_output to take a struct route as its fourth argument in order
to allow passing a cached struct llentry * down to L2

Reviewed by:	rwatson
2009-04-16 20:30:28 +00:00
rwatson
4801e9aee9 Update stats in struct icmpstat and icmp6stat using four new
macros: ICMPSTAT_ADD(), ICMPSTAT_INC(), ICMP6STAT_ADD(), and
ICMP6STAT_INC(), rather than directly manipulating the fields
of these structures across the kernel.  This will make it
easier to change the implementation of these statistics,
such as using per-CPU versions of the data structures.

In on case, icmp6stat members are manipulated indirectly, by
icmp6_errcount(), and this will require further work to fix
for per-CPU stats.

MFC after:	3 days
2009-04-12 13:22:33 +00:00
rwatson
5dc0256f14 Update stats in struct udpstat using two new macros, UDPSTAT_ADD()
and UDPSTAT_INC(), rather than directly manipulating the fields
across the kernel.  This will make it easier to change the
implementation of these statistics, such as using per-CPU versions
of the data structures.

MFC after:	3 days
2009-04-12 11:42:40 +00:00
rwatson
692f8aa2fa Update stats in struct ipstat using four new macros, IPSTAT_ADD(),
IPSTAT_INC(), IPSTAT_SUB(), and IPSTAT_DEC(), rather than directly
manipulating the fields across the kernel.  This will make it easier
to change the implementation of these statistics, such as using
per-CPU versions of the data structures.

MFC after:	3 days
2009-04-11 23:35:20 +00:00
rwatson
b79ff9a30d Update stats in struct tcpstat using two new macros, TCPSTAT_ADD() and
TCPSTAT_INC(), rather than directly manipulating the fields across the
kernel.  This will make it easier to change the implementation of
these statistics, such as using per-CPU versions of the data structures.

MFC after:	3 days
2009-04-11 22:07:19 +00:00
bz
df2be82cec For all files including net/vnet.h directly include opt_route.h and
net/route.h.

Remove the hidden include of opt_route.h and net/route.h from net/vnet.h.

We need to make sure that both opt_route.h and net/route.h are included
before net/vnet.h because of the way MRT figures out the number of FIBs
from the kernel option. If we do not, we end up with the default number
of 1 when including net/vnet.h and array sizes are wrong.

This does not change the list of files which depend on opt_route.h
but we can identify them now more easily.
2009-02-27 14:12:05 +00:00
ed
ba4371ba78 Revert my previous two changes.
Even though the code seems to be FreeBSD kernel code, it isn't compiled
on FreeBSD. I could have known this, because I was a little amazed that
I couldn't find a prototype of pfopen()/pfclose() somewhere else,
because it isn't marked as static.

Apart from that, removing these functions wouldn't have been harmful
anyway, because there are some other strange things about them (the
implementation isn't consistent with the prototype at the top). Still,
it's better to leave it, because it makes merging code back to older
branches a little harder.

Requested by:	mlaier
2009-01-25 16:52:41 +00:00
ed
6b2f4a1972 Remove pfopen() and pfclose() entirely.
It turns out I was patching functions that weren't used by pf(4) anyway.
They still seem to use `struct proc *' instead of `struct thread *'.
They weren't listed in pf_cdevsw.
2009-01-25 14:39:15 +00:00
ed
4a0deaef86 Remove unneeded checking for invalid minor numbers from pf(4).
Because it is not possible to access the pf(4) character device through
any other device node as the one in devfs, there is no need to check for
unknown device minor numbers.

Approved by:	mlaier
2009-01-25 14:00:00 +00:00
qingli
ec826ad5c7 This main goals of this project are:
1. separating L2 tables (ARP, NDP) from the L3 routing tables
2. removing as much locking dependencies among these layers as
   possible to allow for some parallelism in the search operations
3. simplify the logic in the routing code,

The most notable end result is the obsolescent of the route
cloning (RTF_CLONING) concept, which translated into code reduction
in both IPv4 ARP and IPv6 NDP related modules, and size reduction in
struct rtentry{}. The change in design obsoletes the semantics of
RTF_CLONING, RTF_WASCLONE and RTF_LLINFO routing flags. The userland
applications such as "arp" and "ndp" have been modified to reflect
those changes. The output from "netstat -r" shows only the routing
entries.

Quite a few developers have contributed to this project in the
past: Glebius Smirnoff, Luigi Rizzo, Alessandro Cerri, and
Andre Oppermann. And most recently:

- Kip Macy revised the locking code completely, thus completing
  the last piece of the puzzle, Kip has also been conducting
  active functional testing
- Sam Leffler has helped me improving/refactoring the code, and
  provided valuable reviews
- Julian Elischer setup the perforce tree for me and has helped
  me maintaining that branch before the svn conversion
2008-12-15 06:10:57 +00:00
bz
74c06b7ee1 Like for tcp_subr.c in r186057 make the MD5 context a function local
variable in this copy of the code[1].

While here prefix the variables with 'pf_' to avoid file static global
variables with colliding names that are or will be virtualized.

Discussed with:	rwatson, silby [1]
2008-12-13 22:04:52 +00:00