7.x, 8.x and 9.x with pf(4) imports: pfsync(4) should suppress CARP
preemption, while it is running its bulk update.
However, reimplement the feature in more elegant manner, that is
partially inspired by newer OpenBSD:
- Rename term "suppression" to "demotion", to match with OpenBSD.
- Keep a global demotion factor, that can be raised by several
conditions, for now these are:
- interface goes down
- carp(4) has problems with ip_output() or ip6_output()
- pfsync performs bulk update
- Unlike in OpenBSD the demotion factor isn't a counter, but
is actual value added to advskew. The adjustment values for
particular error conditions are also configurable, and their
defaults are maximum advskew value, so a single failure bumps
demotion to maximum. This is for POLA compatibility, and should
satisfy most users.
- Demotion factor is a writable sysctl, so user can do
foot shooting, if he desires to.
of scheduling next run pfsync_bulk_update(), pfsync_bulk_fail()
was scheduled.
This lead to instant 100% state leak after first bulk update
request.
- After above fix, it appeared that pfsync_bulk_update() lacks
locking. To fix this, sc_bulk_tmo callout was converted to an
mtx one. Eventually, all pf/pfsync callouts should be converted
to mtx version, since it isn't possible to stop or drain a
non-mtx callout without risk of race.
- Add comment that callout_stop() in pfsync_clone_destroy() lacks
locking. Since pfsync0 can't be destroyed (yet), let it be here.
The root of problem is re-locking at the end of pfsync_sendout().
Several functions are calling pfsync_sendout() holding pointers
to pf data on stack, and these functions expect this data to be
consistent.
To fix this, the following approach was taken:
- The pfsync_sendout() doesn't call ip_output() directly, but
enqueues the mbuf on sc->sc_ifp's interfaces queue, that
is currently unused. Then pfsync netisr is scheduled. PF_LOCK
isn't dropped in pfsync_sendout().
- The netisr runs through queue and ip_output()s packets
on it.
Apart from fixing race, this also decouples stack, fixing
potential issues, that may happen, when sending pfsync(4)
packets on input path.
Reviewed by: eri (a quick review)
number of packets can be queued on sc, while we are in ip_output(), and then
we wipe the accumulated sc_len. On next pfsync_sendout() that would lead to
writing beyond our mbuf cluster.
to document where we are expecting to be called with a lock held to
more easily catch unnoticed code paths.
This does not neccessarily improve locking in pfsync, it just tries
to avoid the panics reported.
PR: kern/159390, kern/158873
Submitted by: pluknet (at least something that partly resembles
my patch ignoring other cleanup, which I only saw
too late on the 2nd PR)
MFC After: 3 days
and virtualization it is not helpful but complicates things.
Current state of art is to not virtualize these kinds of locks -
inp_group/hash/info/.. are all not virtualized either.
MFC after: 3 days
pfsync also depends on pf to be initialized already so pf goes at
FIRST and the interfaces go at ANY.
Then the (VNET_)SYSINIT startups for pf stays at SI_SUB_PROTO_BEGIN
and for pfsync we move to the later SI_SUB_PROTO_IF.
This is not ideal either but at least an order that should work for
the moment and can be re-fined with the VIMAGE merge, once this will
actually work with more than one network stack.
MFC after: 3 days
and never remove state.
This fixes the problem some people are seeing that state is removed when pf
is loaded as a module but not in situations when compiled into the kernel.
Reported by: many on freebsd-pf
Tested by: flo
MFC after: 3 days
hash install, etc. For now, these are arguments are unused, but as we add
RSS support, we will want to use hashes extracted from mbufs, rather than
manually calculated hashes of header fields, due to the expensive of the
software version of Toeplitz (and similar hashes).
Add notes that it would be nice to be able to pass mbufs into lookup
routines in pf(4), optimising firewall lookup in the same way, but the
code structure there doesn't facilitate that currently.
(In principle there is no reason this couldn't be MFCed -- the change
extends rather than modifies the KBI. However, it won't be useful without
other previous possibly less MFCable changes.)
Reviewed by: bz
Sponsored by: Juniper Networks, Inc.
to not only compile bu load as well for testing with IPv6-only kernels.
For the moment we ignore the csum change in pf_ioctl.c given the
pending update to pf45.
Reported by: dru
Sponsored by: The FreeBSD Foundation
Sponsored by: iXsystems
MFC after: 20 days
- The existing ipi_lock continues to protect the global inpcb list and
inpcb counter. This lock is now relegated to a small number of
allocation and free operations, and occasional operations that walk
all connections (including, awkwardly, certain UDP multicast receive
operations -- something to revisit).
- A new ipi_hash_lock protects the two inpcbinfo hash tables for
looking up connections and bound sockets, manipulated using new
INP_HASH_*() macros. This lock, combined with inpcb locks, protects
the 4-tuple address space.
Unlike the current ipi_lock, ipi_hash_lock follows the individual inpcb
connection locks, so may be acquired while manipulating a connection on
which a lock is already held, avoiding the need to acquire the inpcbinfo
lock preemptively when a binding change might later be required. As a
result, however, lookup operations necessarily go through a reference
acquire while holding the lookup lock, later acquiring an inpcb lock --
if required.
A new function in_pcblookup() looks up connections, and accepts flags
indicating how to return the inpcb. Due to lock order changes, callers
no longer need acquire locks before performing a lookup: the lookup
routine will acquire the ipi_hash_lock as needed. In the future, it will
also be able to use alternative lookup and locking strategies
transparently to callers, such as pcbgroup lookup. New lookup flags are,
supplementing the existing INPLOOKUP_WILDCARD flag:
INPLOOKUP_RLOCKPCB - Acquire a read lock on the returned inpcb
INPLOOKUP_WLOCKPCB - Acquire a write lock on the returned inpcb
Callers must pass exactly one of these flags (for the time being).
Some notes:
- All protocols are updated to work within the new regime; especially,
TCP, UDPv4, and UDPv6. pcbinfo ipi_lock acquisitions are largely
eliminated, and global hash lock hold times are dramatically reduced
compared to previous locking.
- The TCP syncache still relies on the pcbinfo lock, something that we
may want to revisit.
- Support for reverting to the FreeBSD 7.x locking strategy in TCP input
is no longer available -- hash lookup locks are now held only very
briefly during inpcb lookup, rather than for potentially extended
periods. However, the pcbinfo ipi_lock will still be acquired if a
connection state might change such that a connection is added or
removed.
- Raw IP sockets continue to use the pcbinfo ipi_lock for protection,
due to maintaining their own hash tables.
- The interface in6_pcblookup_hash_locked() is maintained, which allows
callers to acquire hash locks and perform one or more lookups atomically
with 4-tuple allocation: this is required only for TCPv6, as there is no
in6_pcbconnect_setup(), which there should be.
- UDPv6 locking remains significantly more conservative than UDPv4
locking, which relates to source address selection. This needs
attention, as it likely significantly reduces parallelism in this code
for multithreaded socket use (such as in BIND).
- In the UDPv4 and UDPv6 multicast cases, we need to revisit locking
somewhat, as they relied on ipi_lock to stablise 4-tuple matches, which
is no longer sufficient. A second check once the inpcb lock is held
should do the trick, keeping the general case from requiring the inpcb
lock for every inpcb visited.
- This work reminds us that we need to revisit locking of the v4/v6 flags,
which may be accessed lock-free both before and after this change.
- Right now, a single lock name is used for the pcbhash lock -- this is
undesirable, and probably another argument is required to take care of
this (or a char array name field in the pcbinfo?).
This is not an MFC candidate for 8.x due to its impact on lookup and
locking semantics. It's possible some of these issues could be worked
around with compatibility wrappers, if necessary.
Reviewed by: bz
Sponsored by: Juniper Networks, Inc.
safer for i386 because it can be easily over 4 GHz now. More worse, it can
be easily changed by user with 'machdep.tsc_freq' tunable (directly) or
cpufreq(4) (indirectly). Note it is intentionally not used in performance
critical paths to avoid performance regression (but we should, in theory).
Alternatively, we may add "virtual TSC" with lower frequency if maximum
frequency overflows 32 bits (and ignore possible incoherency as we do now).
o) Add 'octm', a trivial driver for the 10/100 management ports found on some
Octeon systems.
o) Make the Simple Executive's management port helper routines compile on
FreeBSD (namely by not doing math on void pointers.)
o) Add a cvmx_mgmt_port_sendm routine to the Simple Executive to send an mbuf
so there is only one copy in the transmit path, rather than having to first
copy the mbuf to an intermediate buffer and then copy that to the Simple
Executive's transmit ring.
o) Properly work out MII addresses of management ports on the Lanner MR-730.
XXX The MR-730 also needs some patches to the MII read/write routines, but
this is sufficient for now. Media detection will be fixed in the future
when I can spend more time reading the vendor-supplied patches.
facilities as well as support for the Octeon 2 family of SoCs.
XXX Note that with our antediluvian assembler, we can't support some Octeon 2
instructions and fall back to using the old ones instead.
o) Fix enumeration of PHY addresses on the MR-955.
o) Parse link state for the MR-730 using the Broadcom PHY support in the SDK.
It's not clear that this is entirely-correct, but it seems to work. Since
this board uses a BCM5482S, this may mean that we work correctly for copper
but not SFI, which is untested.
the miibus attached to octe interfaces.
o) Add an SMI/MDIO interface to the MV88E61XX and use it for the switch PHY on
the Lanner MR-320. An actual driver for the switch PHY will come later.
Note that for now it intercepts and fakes MII_BMSR reads to prevent the
miibus from talking to anything but the switch itself.
seem to be problems both with the on-board Ethernet interfaces and the em(4)
interfaces on PCI under FreeBSD.
Thanks to Lanner for providing access to hardware.
for interfaces with TSO enabled, otherwise one would see an extra
ICMP unreach, frag needed pre matching packet on lo0.
This syncs pf code to ip_output.c r162084.
PR: kern/144311
Submitted by: yongari via mlaier
Reviewed by: eri
Tested by: kib
MFC after: 8 days
library:
o) Increase inline unit / large function growth limits for MIPS to accommodate
the needs of the Simple Executive, which uses a shocking amount of inlining.
o) Remove TARGET_OCTEON and use CPU_CNMIPS to do things required by cnMIPS and
the Octeon SoC.
o) Add OCTEON_VENDOR_LANNER to use Lanner's allocation of vendor-specific
board numbers, specifically to support the MR320.
o) Add OCTEON_BOARD_CAPK_0100ND to hard-wire configuration for the CAPK-0100nd,
which improperly uses an evaluation board's board number and breaks board
detection at runtime. This board is sold by Portwell as the CAM-0100.
o) Add support for the RTC available on some Octeon boards.
o) Add support for the Octeon PCI bus. Note that rman_[sg]et_virtual for IO
ports can not work unless building for n64.
o) Clean up the CompactFlash driver to use Simple Executive macros and
structures where possible (it would be advisable to use the Simple Executive
API to set the PIO mode, too, but that is not done presently.) Also use
structures from FreeBSD's ATA layer rather than structures copied from
Linux.
o) Print available Octeon SoC features on boot.
o) Add support for the Octeon timecounter.
o) Use the Simple Executive's routines rather than local copies for doing reads
and writes to 64-bit addresses and use its macros for various device
addresses rather than using local copies.
o) Rename octeon_board_real to octeon_is_simulation to reduce differences with
Cavium-provided code originally written for Linux. Also make it use the
same simplified test that the Simple Executive and Linux both use rather
than our complex one.
o) Add support for the Octeon CIU, which is the main interrupt unit, as a bus
to use normal interrupt allocation and setup routines.
o) Use the Simple Executive's bootmem facility to allocate physical memory for
the kernel, rather than assuming we know which addresses we can steal.
NB: This may reduce the amount of RAM the kernel reports you as having if
you are leaving large temporary allocations made by U-Boot allocated
when starting FreeBSD.
o) Add a port of the Cavium-provided Ethernet driver for Linux. This changes
Ethernet interface naming from rgmxN to octeN. The new driver has vast
improvements over the old one, both in performance and functionality, but
does still have some features which have not been ported entirely and there
may be unimplemented code that can be hit in everyday use. I will make
every effort to correct those as they are reported.
o) Support loading the kernel on non-contiguous cores.
o) Add very conservative support for harvesting randomness from the Octeon
random number device.
o) Turn SMP on by default.
o) Clean up the style of the Octeon kernel configurations a little and make
them compile with -march=octeon.
o) Add support for the Lanner MR320 and the CAPK-0100nd to the Simple
Executive.
o) Modify the Simple Executive to build on FreeBSD and to build without
executive-config.h or cvmx-config.h. In the future we may want to
revert part of these changes and supply executive-config.h and
cvmx-config.h and access to the options contained in those files via
kernel configuration files.
o) Modify the Simple Executive USB routines to support getting and setting
of the USB PID.
Executive is a library that can be used by standalone applications and kernels
to abstract access to Octeon SoC and board-specific hardware and facilities.
The FreeBSD port to Octeon will be updated to use this where possible.
"Whitspace" churn after the VIMAGE/VNET whirls.
Remove the need for some "init" functions within the network
stack, like pim6_init(), icmp_init() or significantly shorten
others like ip6_init() and nd6_init(), using static initialization
again where possible and formerly missed.
Move (most) variables back to the place they used to be before the
container structs and VIMAGE_GLOABLS (before r185088) and try to
reduce the diff to stable/7 and earlier as good as possible,
to help out-of-tree consumers to update from 6.x or 7.x to 8 or 9.
This also removes some header file pollution for putatively
static global variables.
Revert VIMAGE specific changes in ipfilter::ip_auth.c, that are
no longer needed.
Reviewed by: jhb
Discussed with: rwatson
Sponsored by: The FreeBSD Foundation
Sponsored by: CK Software GmbH
MFC after: 6 days
it from machine/in_cksum.h. This definition prevents us from using
hand-tuned assembler versions of in_cksum.
# this fixes the modules build on arm for ipfilter.
This is a split merge because of non-uniform licensing of the DTC package
contents and the way these components will be used in the FreeBSD environment.
The original DTC package is composed of the following two major pieces:
1. sys/contrib/libfdt (BSD [dual] license)
2. contrib/dtc (GPLv2)
The libfdt component is going to be shared in all aspects of the environment:
- /boot/loader
- kernel
- dtc (the device tree compiler proper, userspace tool)
TRENDnet TEW-504UB/EU) idProduct didn't be decreased after loading the
firmware.
Pointed by: Steven Friedrich <freebsd at insightbb.com>
Reviewed by: sam
* new firmware
* untested support for 1000 and 6000 series
* bgscan support
* remove unnecessary RXON changes
* allow setting of country/regdomain by enforcing channel flags read
from the EEPROM
* suspend/resume fixes
* RF kill switch fixes
* LED adjustments
* several bus_dma*() related fixes
* addressed some LORs
* many other bug fixes
Submitted by: Bernhard Schmidt <bschmidt at techwires.net>
Obtained from: Brandon Gooch <jamesbrandongooch at gmail dot com> (LED
related changes), Benjamin Kaduk <kaduk at mit dot edu>
(LOR fixes), OpenBSD
Server Return mode, where not all packets would be visible to the load
balancer or gateway.
This commit should be reverted when we merge future pf versions. The
benefit it would provide is that this version does not break any existing
public interface and thus won't be a problem if we want to MFC it to
earlier FreeBSD releases.
Discussed with: mlaier
Obtained from: OpenBSD
Sponsored by: iXsystems, Inc.
MFC after: 1 month
- Do not map entire real mode memory (1MB). Instead, we map IVT/BDA and
ROM area separately. Most notably, ROM area is mapped as device memory
(uncacheable) as it should be. User memory is dynamically allocated and
free'ed with contigmalloc(9) and contigfree(9). Remove now redundant and
potentially dangerous x86bios_alloc.c. If this emulator ever grows to
support non-PC hardware, we may implement it with rman(9) later.
- Move all host-specific initializations from x86emu_util.c to x86bios.c and
remove now unnecessary x86emu_util.c. Currently, non-PC hardware is not
supported. We may use bus_space(9) later when the KPI is fixed.
- Replace all bzero() calls for emulated registers with more obviously named
x86bios_init_regs(). This function also initializes DS and SS properly.
- Add x86bios_get_intr(). This function checks if the interrupt vector is
available for the platform. It is not necessary for PC-compatible hardware
but it may be needed later. ;-)
- Do not try turning off monitor if DPMS does not support the state.
- Allocate stable memory for VESA OEM strings instead of just holding
pointers to them. They may or may not be accessible always. Fix a memory
leak of video mode table while I am here.
- Add (experimental) BIOS POST call for vesa(4). This function calls VGA
BIOS POST code from the current VGA option ROM. Some video controllers
cannot save and restore the state properly even if it is claimed to be
supported. Usually the symptom is blank display after resuming from suspend
state. If the video mode does not match the previous mode after restoring,
we try BIOS POST and force the known good initial state. Some magic was
taken from NetBSD (and it was taken from vbetool, I believe.)
- Add a loader tunable for vgapci(4) to give a hint to dpms(4) and vesa(4)
to identify who owns the VESA BIOS. This is very useful for multi-display
adapter setup. By default, the POST video controller is automatically
probed and the tunable "hw.pci.default_vgapci_unit" is set to corresponding
vgapci unit number. You may override it from loader but it is very unlikely
to be necessary. Unfortunately only AGP/PCI/PCI-E controllers can be
matched because ISA controller does not have necessary device IDs.
- Fix a long standing bug in state save/restore function. The state buffer
pointer should be ES:BX, not ES:DI according to VBE 3.0. If it ever worked,
that's because BX was always zero. :-)
- Clean up register initializations more clearer per VBE 3.0.
- Fix a lot of style issues with vesa(4).
x86emu to this new module.
This changeset also brings a fix for bugs introduced with the initial
x86emu commit, which prevents the user from using some display mode or
cause instant reboots during mode switch.
Submitted by: paradox <ddkprog yahoo com>