lookup {dst-ip|src-ip|dst-port|src-port|uid|jail} N
which searches the specified field in table N and sets tablearg
accordingly.
With dst-ip or src-ip the option replicates two existing options.
When used with other arguments, the option can be useful to
quickly dispatch traffic based on other fields.
Work supported by the Onelab project.
MFC after: 1 week
after ether_ifattach(), as ether_ifattach() initializes it with
ETHER_HDR_LEN.
While I'm here remove setting if_mtu, it's already handled in
ether_ifattach().
called and vge(4) used to drive auto-negotiation timer(mii_tick) in
vge_tick(). Therefore the mii_tick was not called for every hz such
that auto-negotiation complete was never handled in vge(4).
Use mii_pollstat to extract current negotiated speed/duplex instead
of mii_tick. The latter is valid only for auto-negotiation case.
While I'm here change the confusing function name vge_tick() to
vge_link_statchg().
is called in vge_init_lock(), vge(4) always used to reload EEPROM.
Also add more comment why vge(4) clears VGE_CHIPCFG0_PACPI bit.
While I'm here add missing new line in vge_reset().
controllers(VT613x), we assume the PHY address is 1.
Use the saved PHY address in MII register access routines and
remove accessing VGE_MIICFG register.
While I'm here save PCI express capability register which will be
used in near future.
record device specific bits. Remove vge_link and use vge_flags.
While here, move clearing link state before mii_mediachg() as
mii_mediachg() may affect link state.
if_alloc() of ifp. This fixes the panic reported in the PR, but
not the attach failure.
PR: kern/139079
Tested by: Steven Noonan <steven uplinklabs.net>
Reviewed by: thompsa
Approved by: ed (mentor)
MFC after: 2 weeks`
seems to work like a tag that indicates 'not list end' of queued
frames. Without having a VGE_TXDESC_Q bit indicates 'list end'. So
the last frame of multiple queued frames has no VGE_TXDESC_Q bit.
The hardware has peculiar behavior for VGE_TXDESC_Q bit handling.
If the VGE_TXDESC_Q bit of descriptor was set the controller would
fetch next descriptor. However if next descriptor's OWN bit was
cleared but VGE_TXDESC_Q was set, it could confuse controller.
Clearing VGE_TXDESC_Q bit for transmitted frames ensure correct
behavior.
o Separate TX/RX buffer DMA tag from TX/RX descriptor ring DMA tag.
o Separate RX buffer DMA tag from common buffer DMA tag. RX DMA
tag has different restriction compared to TX DMA tag.
o Add 40bit DMA address support.
o Adjust TX/RX descriptor ring alignment to 64 bytes from 256
bytes as documented in datasheet.
o Added check to ensure TX/RX ring reside within a 4GB boundary.
Since TX/RX ring shares the same high address register they
should have the same high address.
o TX/RX side bus_dmamap_load_mbuf_sg(9) support.
o Add lock assertion to vge_setmulti().
o Add RX spare DMA map to recover from DMA map load failure.
o Add optimized RX buffer handler, vge_discard_rxbuf which is
activated when vge(4) sees bad frames.
o Don't blindly update VGE_RXDESC_RESIDUECNT register. Datasheet
says the register should be updated only when number of
available RX descriptors are multiple of 4.
o Use __NO_STRICT_ALIGNMENT instead of defining VGE_FIXUP_RX which
is only set for i386 architecture. Previously vge(4) also
performed expensive copy operation to align IP header on amd64.
This change should give RX performance boost on amd64
architecture.
o Don't reinitialize controller if driver is already running. This
should reduce number of link state flipping.
o Since vge(4) drops a driver lock before passing received frame
to upper layer, make sure vge(4) is still running after
re-acquiring driver lock.
o Add second argument count to vge_rxeof(). The argument will
limit number of packets could be processed in RX handler.
o Rearrange vge_rxeof() not to allocate RX buffer if received
frame was bad packet.
o Removed if_printf that prints DMA map failure. This type of
message shouldn't be used in fast path of driver.
o Reduce number of allowed TX buffer fragments to 6 from 7. A TX
descriptor allows 7 fragments of a frame. However the CMZ field
of descriptor has just 3bits and the controller wants to see
fragment + 1 in the field. So if we have 7 fragments the field
value would be 0 which seems to cause unexpected results under
certain conditions. This change should fix occasional TX hang
observed on vge(4).
o Simplify vge_stat_locked() and add number of available TX
descriptor check.
o vge(4) controllers lack padding short frames. Make sure to fill
zero for the padded bytes. This closes unintended information
disclosure.
o Don't set VGE_TDCTL_JUMBO flag. Datasheet is not clear whether
this bit should be set by driver or write-back status bit after
transmission. At least vendor's driver does not set this bit so
remove it. Without this bit vge(4) still can send jumbo frames.
o Don't start driver when vge(4) know there are not enough RX
buffers.
o Remove volatile keyword in RX descriptor structure. This should
be handled by bus_dma(9).
o Collapse two 16bits member of TX/RX descriptor into single 32bits
member.
o Reduce number of RX descriptors to 252 from 256. The
VGE_RXDESCNUM is 16bits register but only lower 8bits are valid.
So the maximum number of RX descriptors would be 255. However
the number of should be multiple of 4 as controller wants to
update 4 RX descriptors at a time. This limits the maximum
number of RX descriptor to be 252.
Tested by: Dewayne Geraghty (dewayne.geraghty <> heuristicsystems dot com dot au)
Carey Jones (m.carey.jones <> gmail dot com)
Yoshiaki Kasahara (kasahara <> nc dor kyushu-u dot ac dotjp)
struct callout_cpu. From the comment in the file:
+ * There is one struct callout_cpu per cpu, holding all relevant
+ * state for the callout processing thread on the individual CPU.
+ * In particular:
+ * cc_ticks is incremented once per tick in callout_cpu().
+ * It tracks the global 'ticks' but in a way that the individual
+ * threads should not worry about races in the order in which
+ * hardclock() and hardclock_cpu() run on the various CPUs.
+ * cc_softclock is advanced in callout_cpu() to point to the
+ * first entry in cc_callwheel that may need handling. In turn,
+ * a softclock() is scheduled so it can serve the various entries i
+ * such that cc_softclock <= i <= cc_ticks .
Together with a smaller patch committed in september, this fixes a
bug that affects 8.0 with apps that rely on callouts to fire exactly
in the number of ticks specified (qemu among them).
Right now, callouts in 8.0 fire one tick late.
This was discussed in september with JeffR and jhb
MFC after: 3 days
- These revisions no longer have cable detection capability.
- The UDMA support bit of register 0x4b has been dropped without an
replacement.
- According to Linux it's crucial for working ATAPI DMA support to
also set the reserved bit 1 of regsiter 0x53 with these revisions.
MFC after: 1 week
native, i.e. big-endian, format and convert as appropriate like we
also do with the multibyte fields of the other pages. This fixes
the output of acd_describe() to match reality on big-endian machines
without breaking it on little-endian ones. While at it, also convert
the remaining multibyte fields of the pages read although they are
currently unused for consistency and in order to prevent possible
similar bugs in the future.
MFC after: 1 week
if (jailed(cred))
left. If you are running with a vnet (virtual network stack) those will
return true and defer you to classic IP-jails handling and thus things
will be "denied" or returned with an error.
Work around this problem by introducing another "jailed()" function,
jailed_without_vnet(), that also takes vnets into account, and permits
the calls, should the jail from the given cred have its own virtual
network stack.
We cannot change the classic jailed() call to do that, as it is used
outside the network stack as well.
Discussed with: julian, zec, jamie, rwatson (back in Sept)
MFC after: 5 days
kernel to boot from NFS. [1]
Note: this is not a full virtualization of nfsclient. It is only does
what advertised above and nothing more.
Requested by: public demand [1]
Tested by: kris, ..
MFC after: 5 days
is determined by MD_IMAGE_SIZE. A file system can be embedded
into the loader with /sys/tools/embed_mfs.sh.
Note that md.c is not included when MD_IMAGE_SIZE is not set.
sxlock, via the sx_{s, x}lock_sig() interface, or plain lockmgr), will
leave the waiters flag on forcing the owner to do a wakeup even when if
the waiter queue is empty.
That operation may lead to a deadlock in the case of doing a fake wakeup
on the "preferred" (based on the wakeup algorithm) queue while the other
queue has real waiters on it, because nobody is going to wakeup the 2nd
queue waiters and they will sleep indefinitively.
A similar bug, is present, for lockmgr in the case the waiters are
sleeping with LK_SLEEPFAIL on. In this case, even if the waiters queue
is not empty, the waiters won't progress after being awake but they will
just fail, still not taking care of the 2nd queue waiters (as instead the
lock owned doing the wakeup would expect).
In order to fix this bug in a cheap way (without adding too much locking
and complicating too much the semantic) add a sleepqueue interface which
does report the actual number of waiters on a specified queue of a
waitchannel (sleepq_sleepcnt()) and use it in order to determine if the
exclusive waiters (or shared waiters) are actually present on the lockmgr
(or sx) before to give them precedence in the wakeup algorithm.
This fix alone, however doesn't solve the LK_SLEEPFAIL bug. In order to
cope with it, add the tracking of how many exclusive LK_SLEEPFAIL waiters
a lockmgr has and if all the waiters on the exclusive waiters queue are
LK_SLEEPFAIL just wake both queues.
The sleepq_sleepcnt() introduction and ABI breakage require
__FreeBSD_version bumping.
Reported by: avg, kib, pho
Reviewed by: kib
Tested by: pho
this requires a small reordering of headers and a few #defines to
map functions not available in userland.
Remove a useless #ifndef block at the beginning of the file.
Introduce (temporarily) rn_init2(), see the comment in the code
for the proper long term change.
No ABI or functional change.
MFC after: 7 days
was made in r199543 to remove MTX_RECURSE. These routines can be
called in device attach phase(e.g. mii_phy_probe()) so checking
assertion here is not right as caller does not hold a driver lock.
used by ggatectl, flags are potentially useful).
Other parts are internal kernel data structures and should
not be visible to userland.
No API change involved.
MFC after: 3 days
The rewrite of the interrupt handler includes:
o loop until all pending interrupts are handled. This closes a
race condition.
o count the number of interrupt sources we handled so that we can
properly return FILTER_HANDLED or FILTER_STRAY when we break out
of the loop.
o When matching the interrupt source to the devices that have that
source pending, check only from the set of devices we found to
have a pending interrupt.
PR: kern/140947
MFC after: 3 days
management over the data endpoint causes communication to die.
Take this one step further and model it on the existing NetBSD quirk and import
other device IDs from them.
Obtained from: NetBSD
- cast the result of LEN() to int as this is the main usage.
- use LEN() in one place where it was forgotten.
- Document the use of a static variable in rw mode.
More small changes to follow.
MFC after: 7 days
are not allocated by the device driver. These resources should still appear
allocated from the system's perspective so that their assigned ranges are
not reused by other resource requests. The PCI bus driver has used a hack
to effect this for a while now where it uses rman_set_device() to assign
devices to the PCI bus when they are first encountered and later assigns
them to the actual device when a driver allocates a BAR. A few downsides of
this approach is that it results in somewhat confusing devinfo -r output as
well as not being very easily portable to other bus drivers.
This commit adds generic support for "reserved" resources to the resource
list API used by many bus drivers to manage the resources of child devices.
A resource may be reserved via resource_list_reserve(). This will allocate
the resource from the bus' parent without activating it.
resource_list_alloc() recognizes an attempt to allocate a reserved resource.
When this happens it activates the resource (if requested) and then returns
the reserved resource. Similarly, when a reserved resource is released via
resource_list_release(), it is deactivated (if it is active) and the
resource is then marked reserved again, but is left allocated from the
bus' parent. To completely remove a reserved resource, a bus driver may
use resource_list_unreserve(). A bus driver may use resource_list_busy()
to determine if a reserved resource is allocated by a child device or if
it can be unreserved.
The PCI bus driver has been changed to use this framework instead of
abusing rman_set_device() to keep track of reserved vs allocated resources.
Submitted by: imp (an older version many moons ago)
MFC after: 1 month
gptzfsboot. I got the segment and offset fields reversed in the structure,
but I also succeeded in crossing the assignments so the actual EDD packet
ended up correct.
MFC after: 1 week
safely allocate a heap region above 1MB. This enables {gpt,}zfsboot()
to allocate much larger buffers than before.
- Use a larger buffer (1MB instead of 128K) for temporary ZFS buffers. This
allows more reliable reading of compressed files in a raidz/raidz2 pool.
Submitted by: Matt Reimer mattjreimer of gmail
MFC after: 1 week
hw.bge.forced_collapse. hw.bge.forced_collapse affects all bge(4)
controllers on system which may not desirable behavior of the
sysctl node. Also allow the sysctl node could be modified at any
time.
Reviewed by: bde (initial version)
128 FIBs first and allocated more later if necessary. Remove now unused
definitions from the header file[1].
- Force sequential bus scanning. It seems parallel scanning is in fact
slower and causes more harm than good[1]. Adjust a comment to reflect that.
PR: kern/141269
Submitted by: Alexander Sack (asack at niksun dot com)[1]
Reviewed by: scottl
drivers. These add new hardware support, most importantly
the pch (i5 chipset) in the em driver. Also, both drivers
now have the simplified (and I hope improved) watchdog
code. The igb driver uses the new RX cleanup that I
first implemented in ixgbe.
em - version 6.9.24
igb - version 1.8.4
This adds new feature support for the 82599, a hardware
assist to LRO, doing this required a large revamp to the
RX cleanup code because the descriptor ring may not be
processed out of order, this necessitated the elimination
of global pointers.
Additionally, the RX routine now does not refresh mbufs
on every descriptor, rather it will do a range, and then
update the hardware pointer at that time. These are
performance oriented changes.
The TX side now has a cleaner simpler watchdog algorithm
as well, in TX cleanup a read of ticks is stored, that
can then be compared in local_timer to determine if
there is a hang.
Various other cleanups along the way, thanks to all who
have provided input and testing.
feature. These registers are reserved on controllers that have no
support for jumbo frame.
Only BCM5700 has mini ring so do not poke mini ring related
registers if controller is not BCM5700.
Reviewed by: marius