support for their new RAID adapter ARC-1214.
Many thanks to Areca for continuing to support FreeBSD.
Submitted by: 黃清隆 Ching-Lung Huang <ching2048 areca com tw>
MFC after: 2 weeks
Starting with firmware v7.5, the "Read TouchPad Modes" ($01) and "Read
Capabilities" ($02) commands changed: previously constant bytes now
carry variable information.
We now compare those bytes to expected constants only for firmware prior
to v7.5.
Tested by: Zeus Panchenko <zeus@gnu.org.ua>
MFC after: 1 week
* The warning message was:
'warning error: format string is not a string literal';
* Changed how make_dev is called, now a string literal
for formatting is used;
Approved by: adrian (mentor)
calls and turn it on.
- Do not allow to call them inside jail. [1]
Pointed out by: trasz [1]
Reviewed by: avg
Approved by: kib (mentor)
MFC after: 1 week
caused by use of an invalid kgss_gssd_handle during an upcall to
the gssd daemon when it has exited. This patch seems to avoid the
crashes by holding a reference count on the kgss_gssd_handle until
the upcall is done. It also adds a new mutex kgss_gssd_lock used to
make manipulation of kgss_gssd_handle SMP safe.
Tested by: Illias A. Marinos, Herbert Poeckl
Reviewed by: jhb
MFC after: 2 weeks
LUNs for the virtual processor device. This removes lots of CAM warnings,
and follows similar recent changes to tws(4) and twa(4) drivers.
Also fix case where CAM_REQ_CMP was getting OR'd with CAM_DEV_NOT_THERE
in the nonexistent LUN case, resulting in different CAM status (CAM_UA_TERMIO)
getting reported to CAM. This issue existing previously, but was more subtle
because it changed CAM_SEL_TIMEOUT to CAM_CMD_TIMEOUT.
Sponsored by: Intel
Reported and tested by: Willem Jan Withagen <wjw@digiware.nl>
MFC after: 1 week
This fixed panic where we hold mutex (process lock) and try to obtain sleepable
lock (vnode lock in expand_name()). The panic could occur when %I was used
in kern.corefile.
Additionally we avoid expand_name() overhead when coredumps are disabled.
Obtained from: WHEEL Systems
This fixes panic when listing sysctls on INVARIANTS-enabled kernel while
having wbwd loaded.
This panic was not fatal, at worst one additional space was printed.
Also sbuf_trim() makes some sense even if drain function is set. The drain
function is called only when buffer is to be expanded. So we could still trim
existing buffer before drain is called. In this case it worked just fine - the
trailing space was correctly trimmed.
Obtained from: WHEEL Systems
MFC after: 1 week
For now use 256 buckets and fnv_hash function. Use xor'ed 32-bit
s6_addr32 parts of in6_addr structure as a hash key. Update
in6_localip and in6_is_addr_deprecated to use hash table for fastest
lookup.
Sponsored by: Yandex LLC
Discussed with: dwmalone, glebius, bz
set.
As the checks don't require vnet context, this is fixed by setting
vnet after the checks.
PR: kern/160541
Submitted by: Nikos Vassiliadis (slightly different approach)
implement the BSM audit trail format. Rename the kernel versions of the
files to match the userspace filenames so that it's easier to work out
what they correspond to, and therefore ensure they are kept in-sync.
Obtained from: TrustedBSD Project
yields, specify the user priority for the yield. Otherwise, a
higher-priority (kernel) thread could fall into the priority-inversion
with the thread owning the mutex lock.
On single-processor machines or UP kernels, do not loop adaptively
when the next vnode cannot be locked, instead yield unconditionally.
Restructure the iteration initializer and the iterator to remove code
duplication. Put the code to fetch and lock a vnode next to the
current marker, into the mnt_vnode_next_active() function, and use it
instead of repeating the loop.
Reported by: hrs, rmacklem
Tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 3 days
was being copied from the wrong place. This patch fixes that.
This could cause access failures for mapped users, when the group
permissions were needed.
PR: 147998
Submitted by: Christopher Key (cjk32 at cam.ac.uk)
MFC after: 2 weeks
If virtio_setup_intr() failed during boot, we would hang in
taskqueue_free() -> taskqueue_terminate() for all the taskq
threads to terminate. This will never happen since the
scheduler is not running by this point.
Reported by: neel, grehan
Approved by: grehan (mentor)
Rather than trying to KASSERT for callers that invoke this on
IO tags, either do nothing (for write_8) or return ~0 (for read_8).
Using KASSERT here just makes bus.h too messy from both
polluting bus.h with systm.h (for any number of drivers that include
bus.h without first including systm.h) or ports that use bus.h
directly (i.e. libpciaccess) as reported by zeising@.
Also don't try to implement all of the other bus_space functions for
8 byte access since realistically only these two are needed for some
devices that expose 64-bit memory-mapped registers.
Put the amd64-specific functions here rather than sys/amd64/include/bus.h
so that we can keep this header unified for x86, as requested by mdf@
and tijl@.
Submitted by: Carl Delsey <carl.r.delsey@intel.com>
MFC after: 3 days
initialisation to be enabled (1) / disabled (0) defaults to enabled.
This is useful for devices which have a slow trim speed and are either
new or have otherwise already been wiped e.g. secure erase.
PR: kern/173116
Submitted by: Steven Hartland
Approved by: pjd (mentor)
making range consolidation much more effective particularly for small
deletes.
This reduces memory used by the free map as well as reducing the number
of bio requests down to geom required to process all deletes.
In tests this achieved a factor of 10 reduction of trim ranges / geom
call downs.
While I'm here correct the description of zio_vdev_io_start.
PR: kern/173254
Submitted by: Steven Hartland
Approved by: pjd (mentor)
date: 2009/03/31 01:21:29; author: dlg; state: Exp; lines: +9 -16
...
this also firms up some of the input parsing so it handles short frames a
bit better.
This actually fixes reading beyond mbuf data area in pfsync_input(), that
may happen at certain pfsync datagrams.
entering llentry_free(), and in case if we lose the race, we should simply
perform LLE_FREE_LOCKED(). Otherwise, if the race is lost by the thread
performing arptimer(), it will remove two references from the lle instead
of one.
Reported by: Ian FREISLICH <ianf clue.co.za>
Prior to r222417, setting `password' in loader.conf(5) did not prevent boot
but instead only prevented changes to boot options by prompting for password
if autoboot failed or the user interrupted the countdown sequence.
After r222417 the same machine with `password' set in loader.conf(5) would no
longer boot without _always_ entering the password.
This patch restores the old (8.x and older) functionality for password in
loader.conf(5) while adding a new bootlock_password feature to replace the
edge-case should anybody desire the regressed functionality (HINT: great for
PXE servers and/or private distributions).
loader.conf(5) was updated to be more clear with-respect to password setting
(previous text was misleading).
Documentation (loader.conf(5) and check-password.4th(8)) has been updated to
include notes on the new bootlock_password setting.
Special thanks to Alex Verbod for bringing this to my attention and helping to
refine the loader.conf(5) text.
PR: conf/170110
Submitted by: Vitaly Zakharov <ded3axap@gmail.com>
Reviewed by: Alexander Verbod <alexander.verbod@gmail.com>
but later after processing and freeing the tag, we need to jump back again
to the findpcb label. Since the fwd_tag pointer wasn't NULL we tried to
process and free the tag for second time.
Reported & tested by: Pawel Tyll <ptyll nitronet.pl>
MFC after: 3 days
I am not exactly sure about the naming due to lack of specs on AMD site,
but it is better to have some identification then none at all.
MFC after: 1 month
guest floating point state without having to know the
size of floating-point state.
Unstaticize fpurestore to allow the hypervisor to
save/restore guest state using fpusave/fpurestore
on the allocated FPU state area.
Reviewed by: kib
Obtained from: NetApp/bhyve
MFC after: 1 week
These must have been accidently copied from the if statement a few
lines later. Also remove parameter name from function prototype.
Approved by: grehan (mentor)
as r242694):
do better detection of when we have a better version of the tcp sequence
windows than our peer.
this resolves the last of the pfsync traffic storm issues ive been able to
produce, and therefore makes it possible to do usable active-active
statuful firewalls with pf.
This is an ongoing effort to provide runtime debug information
useful in the field that does not panic existing installations.
This gives us the flexibility needed when shipping images to a
potentially large audience with WITNESS enabled without worrying
about formerly non-fatal LORs hurting a release.
Sponsored by: iXsystems
In preparation for sysctl(8) growing the ability to only print
out boot/run-time tunables we need a way to differentiate between
RW sysctl nodes that tune a particular thing, or simply export
a stat that we want to allow the sysadmin to reset to 0 (or some
other value).
To do so, we add the CTLFLAG_STATS which should be OR'd into the
CTLFLAGs when exporting a "writable/resettable" statistic node via
sysctl.
kern_yield() is problematic than.
The owned mutex is the mount interlock, and it is in fact not needed
to guarantee the stability of the mount list of active vnodes, so fix
the the issue by only taking the mount interlock for MNT_REF and
MNT_REL operations.
While there, augment the unconditional yield by some amount of
spinning [1].
Reported and tested by: pho
Reviewed by: attilio
Submitted by: attilio [1]
MFC after: 3 days
cause kernel panics.
Add a flag to the bpf descriptor to indicate whether the hold buffer
is in use. In bpfread(), set the "hold buffer in use" flag before
dropping the descriptor lock during the call to bpf_uiomove().
Everywhere else the hold buffer is used or changed, wait while
the hold buffer is in use by bpfread(). Add a KASSERT in bpfread()
after re-acquiring the descriptor lock to assist uncovering any
additional hold buffer races.
an IBSS VAP to RUN.
An 11n IBSS was beaconing HTINFO/HTCAP IE's that didn't have any HT
information setup (like the HT TX/RX MCS bitmask.)
Tested:
* AR9280, IBSS - both a statically setup channel and a scanned channel
PR: kern/172955
hierarchy of the page table entries which map the specified address.
Reviewed by: alc (previous version)
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
call. The function indicates a failure by the TRUE return value. To
be extra safe, assert that the return value from the following
vm_map_insert() indicates success.
Fix style issues in the nearby lines, reformulate the comment.
Reviewed by: alc (previous version)
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
parameters in IBSSes.
IBSS was just being plainly ignored here even though aggressive mode
was 'on'.
This still doesn't fix the "why are the WME parameters reset upon
interface down/up" issue.
PR: kern/165969
is totally wrong.
If we parse the WME IE here, we'll be constantly updating the WME
configuration from each WME enabled IBSS node we see.
There's a separate issue where the WME configuration is blanked out
when the interface is brought up; the WME parameters aren't "sticky."
Also, ieee80211_init_neighbor() parses the ath IE, so doing it here
isn't required.
Sorry about the noise.
PR: kern/165969
The Adhoc support wasn't parsing and handling the ath specific and WME
IEs, thus the atheros vendor support and WME TXOP parameters aren't being
copied from the peer.
It copies the WME parameters from whichever adhoc node it decides to
associate to, rather than just having them be statically configured
per adhoc node. This may or may not be exactly "right", but it's certainly
going to be more convienent for people - they just have to ensure their
adhoc nodes are setup with correct WME parameters.
Since WME parameters aren't per-node but are configured on hardware TX
queues, if some nodes support WME and some don't - or perhaps, have
different WME parameters - things will get quite quirky.
So ensure that you configure your adhoc nodes with the same WME
parameters.
Secondly - the Atheros Vendor IE is parsed and operated on per-node, so
this should work out ok between nodes that do and don't do Atheros
extensions. Once you see a becaon from that node and you setup the
association state, it _should_ parse things correctly.
TODO:
* I do need to ensure that both adhoc setup paths are correctly updating
the IE stuff. Ie, if the adhoc node is created by a data frame instead
of a beacon frame, it'll come up with no WME/ath IE config. The next
beacon frame that it receives from that node will update the state.
I just need to sit down and better understand how that's suppose to
work in IBSS mode.
Tested:
* AR5416 <-> AR9280 - fast frames and the WME configuration both popped
up. (This is with a local HAL patch that enables the fast frames
capability on the AR5416 chipsets.)
PR: kern/165969
ctl_frontend_cam_sim.c: Coalesce cfcs_online() and cfcs_offline()
into a single function since these were
identical except for one line.
Make sure we hold the SIM lock around path
creation, and calling xpt_rescan().
scsi_ctl.c: In ctlfe_onoffline(), make sure we hold the
SIM lock around path creation and free
calls, as well as xpt_action().
In ctlfe_lun_enable(), hold the SIM lock
around path and peripheral operations that
require it.
Sponsored by: Spectra Logic Corporation
MFC after: 1 week
The stageqdepth (global, over all staging queues) was being kept
incorrectly. It was being incremented whenever things were added,
but only decremented during a flush. During active fast frames activity
it wasn't being decremented, resulting in it always having a non-zero
value during normal fast-frames operation.
It was only used when checking if the aging queue should be checked;
we may as well just defer to each of those staging queue counters (which
look correct, thankfully.)
Whilst I'm here, add locking assertions in the staging queue add/remove
functions. The current crash shows that the staging queue has one frame,
but only has a tail pointer set (the head pointer being set to NULL.)
I'd like to grab a few more crashes where these locking assertions are
in place so I can narrow down the issue between "somehow locking is
messed up and things are racy" and "the stage queue head/tail pointer
manipulation logic is subtly wrong."
Tested:
* AR5416 STA, AR5413 AP; with FastFrames enabled in the AR5416 HAL.
PR: kern/174283
Committed with changes to support the following from loader.conf(5):
+ console="vidconsole comconsole" (not just console="comconsole")
+ boot_serial="anything" (not just boot_serial="YES")
+ boot_multicons="anything" (unsupported in originally-submitted patch)
PR: conf/121064
Submitted by: koitsu
Reviewed by: gcooper, adrian (co-mentor)
Approved by: adrian (co-mentor)
pointers and leave the stage queue flush routine to just do nothing
(since both head and tail here will be NULL.)
This should quieten the "stageq empty" panic where the stageq itself
is empty, but it won't fix the second KASSERT() here "staging queue empty"
as that's likely a different underlying problem.
PR: kern/174283
similar changes had to be made in various places throughout the machine-
independent virtual memory layer to support the new vm object type.
However, in most of these places, it's actually not the type of the vm
object that matters to us but instead certain attributes of its pages.
For example, OBJT_DEVICE, OBJT_MGTDEVICE, and OBJT_SG objects contain
fictitious pages. In other words, in most of these places, we were
testing the vm object's type to determine if it contained fictitious (or
unmanaged) pages.
To both simplify the code in these places and make the addition of future
vm object types easier, this change introduces two new vm object flags
that describe attributes of the vm object's pages, specifically, whether
they are fictitious or unmanaged.
Reviewed and tested by: kib
to head. I don't think the NFS client behaviour will change unless
the new "minorversion=1" mount option is used. It includes basic
NFSv4.1 support plus support for pNFS using the Files Layout only.
All problems detecting during an NFSv4.1 Bakeathon testing event
in June 2012 have been resolved in this code and it has been tested
against the NFSv4.1 server available to me.
Although not reviewed, I believe that kib@ has looked at it.
while doing a copyout. That can cause a panic, because copyout
can trigger VM faults, and we can't handle VM faults while holding
a mutex.
The solution here is to malloc a separate buffer to hold the OOA
queue entries, so that we don't risk a VM fault while filling up
the buffer and we don't have to drop the lock. The other solution
would be to wire the user's memory while filling their buffer with
copyout, but that would have been a little more complex.
Also fix a debugging parenthesis issue in ctl_abort_task() pointed
out by Chuck Tuffli.
Sponsored by: Spectra Logic Corporation
MFC after: 1 week
drivers.
The bug occurrs when a userland process has the driver instance
open and the underlying device goes away. We get the devfs
callback that the device node has been destroyed, but not all of
the closes necessary to fully decrement the reference count on the
CAM peripheral.
The reason is that once devfs calls back and says the device has
been destroyed, it is moved off to deadfs, and devfs guarantees
that there will be no more open or close calls. So the solution
is to keep track of how many outstanding open calls there are on
the device, and just release that many references when we get the
callback from devfs.
scsi_pass.c,
scsi_enc.c,
scsi_enc_internal.h: Add an open count to the softc in these
drivers. Increment it on open and
decrement it on close.
When we get a devfs callback to say that
the device node has gone away, decrement
the peripheral reference count by the
number of still outstanding opens.
Make sure we don't access the peripheral
with cam_periph_unlock() after what might
be the final call to
cam_periph_release_locked(). The
peripheral might have been freed, and we
will be dereferencing freed memory.
scsi_ch.c,
scsi_sg.c: For the ch(4) and sg(4) drivers, add the
same changes described above, and in
addition, fix another bug that was
previously fixed in the pass(4) and enc(4)
drivers.
These drivers were calling destroy_dev()
from their cleanup routine, but that could
cause a deadlock because the cleanup
routine could be indirectly called from
the driver's close routine. This would
cause a deadlock, because the device node
is being held open by the active close
call, and can't be destroyed.
Sponsored by: Spectra Logic Corporation
MFC after: 1 week
are used by NFSv4.1 for callbacks. A backchannel is a connection
established by the client, but used for RPCs done by the server
on the client (callbacks). As a result, this patch mixes some
client side calls in the server side and vice versa. Some
definitions in the .c files were extracted out into a file called
krpc.h, so that they could be included in multiple .c files.
This code has been in projects/nfsv4.1-client for some time.
Although no one has given it a formal review, I believe kib@
has taken a look at it.
The problem was a race condition between the EDT traversal used by
things like 'camcontrol devlist', and CAM peripheral driver
removal.
The EDT traversal code holds the CAM topology lock, and wants
to show devices that have been invalidated. It acquires a
reference to the peripheral to make sure the peripheral it is
examining doesn't go away.
However, because the peripheral removal code in camperiphfree()
drops the CAM topology lock to call the peripheral's destructor
routine, we can run into a situation where the EDT traversal
increments the peripheral reference count after free process is
already in progress. At that point, the reference count is
ignored, because it was 0 when we started the process.
Fix this race by setting a flag, CAM_PERIPH_FREE, that I previously
added and checked in xptperiphtraverse() and xptpdperiphtravsere(),
but failed to use. If the EDT traversal code sees that flag,
it will know that the peripheral free process has already started,
and that it should not access that peripheral.
Also, fix an inconsistency in the locking between
xptpdperiphtraverse() and xptperiphtraverse(). They now both
hold the CAM topology lock while calling the peripheral traversal
function.
cam_xpt.c: Change xptperiphtraverse() to hold the CAM topology
lock across calls to the traversal function.
Take out the comment in xptpdperiphtraverse() that
referenced the locking inconsistency.
cam_periph.c: Set the CAM_PERIPH_FREE flag when we are in the
process of freeing a peripheral driver.
Sponsored by: Spectra Logic Corporation
MFC after: 1 week
- unp_zone: kern.ipc.maxsockets limit reached
- socket_zone: kern.ipc.maxsockets limit reached
- zone_mbuf: kern.ipc.nmbufs limit reached
- zone_clust: kern.ipc.nmbclusters limit reached
- zone_jumbop: kern.ipc.nmbjumbop limit reached
- zone_jumbo9: kern.ipc.nmbjumbo9 limit reached
- zone_jumbo16: kern.ipc.nmbjumbo16 limit reached
Note that those warnings are printed not often than every five minutes and can
be globally turned off by setting sysctl/tunable vm.zone_warnings to 0.
Discussed on: arch
Obtained from: WHEEL Systems
MFC after: 2 weeks
will be printed once the given zone becomes full and cannot allocate an
item. The warning will not be printed more often than every five minutes.
All UMA warnings can be globally turned off by setting sysctl/tunable
vm.zone_warnings to 0.
Discussed on: arch
Obtained from: WHEEL Systems
MFC after: 2 weeks
This is to allow debug images to be used without taking down the
system when non-fatal asserts are hit.
The following sysctls are added:
debug.kassert.warn_only: 1 = log, 0 = panic
debug.kassert.do_ktr: set to a ktr mask for logging via KTR
debug.kassert.do_log: 1 = log, 0 = quiet
debug.kassert.warnings: stats, number of kasserts hit
debug.kassert.log_panic_at:
number of kasserts before we actually panic, 0 = never
debug.kassert.log_pps_limit: pps limit for log messages
debug.kassert.log_mute_at: stop warning after N kasserts, 0 = never stop
debug.kassert.kassert: set this sysctl to trigger a kassert
Discussed with: scottl, gnn, marcel
Sponsored by: iXsystems
The XC900M acts as a Ubiquiti XR9 (and I _think_ SR9) by default;
it uses the same 900MHz<->2.4GHz downconverter mapping.
However it has an alternative frequency mapping which squeezes in a couple
more half/quarter rate channels. Since the default HAL doesn't support
fractional tuning (sub-1MHz) in 2.4GHz mode on the AR5413/AR5414, they
implement it using a jumper.
Datasheet: http://www.xagyl.com/download/XC900M_Datasheet.pdf
Thankyou to Xagyl Communications for the XC900M NICs and Edgar Martinez
for organising the donation.
Tested:
* XC900M <-> XC900M
* Ubiquiti XR9 <-> XC900M
TODO:
* Test against SR9 and GZ901 if possible (the IEEE channel<->frequency
mapping may not match up, thanks to the slightly different channels
involved)
EPROTONOSUPPORT if the address family is not supported.
- introduce pffinddomain() to find a domain by family and use it as
appropriate.
Reviewed by: glebius
id hash. If a state has been disconnected from id hash, its rule pointers
can no longer be dereferenced, and referenced memory can't be modified.
Thus, move rule statistics from pf_free_rule() to pf_unlink_rule() and
update them prior to releasing id hash slot lock.
Reported by: Ian FREISLICH <ianf cloudseed.co.za>
from pfsync:
- Call into pfsync_delete_state() holding the state lock.
- Set the state timeout to PFTM_UNLINKED after state has been moved
to the PFSYNC_S_DEL queue in pfsync.
Reported by: Ian FREISLICH <ianf cloudseed.co.za>
- As the comment report, CALLOUT_LOCAL_ALLOC cannot be checked
directly from the callout flags but might be checked by a cached
value. Hence, do so before to actually remove the callout, when
needed, in softclock_call_cc().
- In softclock_call_cc() also add a comment in the waiting and deferred
migration case explaining that the dereference should be safe
because of the migration dereference invariants.
Additively:
- In softclock_call_cc(), for the deferred migration case, move all the
accesses to callout structure after the comment stating the callout
must not be destroyed.
- For consistency with this last tweak, use cached c_flags for the
KASSERT() in the deferred migration case. It is not strictly necessary
but this way all the callout accesses happen after the above mentioned
comment, improving consistency.
Pointy hat to: me
Sponsored by: Isilon Systems / EMC Corporation
Reviewed by: kib
MFC after: 2 weeks
X-MFC: 243901
to map, and technically this isn't allowed.
Functionally, it works OK (at least on x86) to call bus_dmamap_load with
a NULL data pointer and zero length, so this is primarily for correctness
and consistency with other drivers.
While here, remove check in isci_io_request_construct for nseg==0.
Previously, bus_dmamap_load would pass nseg==1, even for case where
buffer is NULL and length = 0, which allowed CAM_DIR_NONE CCBs
to get processed. This check is not correct though, and needed to be
removed both for the changes elsewhere in this patch, as well as jeff's
preliminary bus_dmamap_load_ccb patch (which uncovered all of this in
the first place).
MFC after: 3 days
- Deembed scope id in L3 address in in6_lltable_dump().
- Simplify scope id recovery in rtsock routines.
- Remove embedded scope id handling in ndp(8) and route(8) completely.
from the callwheel. Calculate the cc->cc_next before removing the
callout, otherwise the code followed the invalid tailq links. After
this, make softclock_call_cc() return void, since it always return
cc->cc_next, which is immediately available to the softclock()
anyway. This also allows to eliminate a label under #ifdef SMP.
Remove the assignment of cc->cc_next from callout_cc_del(), since the
function is called with the callout already removed from callwheel.
If cancelling the migration, also clear the CALLOUT_DFRMIGRATION flag.
Postpone the free of the timeout(9) allocated callouts after the
migration checks are done.
Add some more strict asserts about the state of the callout in
callout_call_cc().
Reviewed by: attilio
Reported and tested by: pho (previous version)
MFC after: 2 weeks
callout is started before kern_setitimer() acquires process mutex, but
looses a race and kern_setitimer() gets the process mutex before the
callout. Then, assuming that new specified struct itimerval has
it_interval zero, but it_value non-zero, the callout, after it starts
executing again, clears p->p_realtimer.it_value, but kern_setitimer()
already rescheduled the callout.
As the result of the race, both p_realtimer is zero, and the callout
is rescheduled. Then, in the exit1(), the exit code sees that it_value
is zero and does not even try to stop the callout. This allows the
struct proc to be reused and eventually the armed callout is
re-initialized. The consequence is the corrupted callwheel tailq.
Use process mutex to interlock the callout start, which fixes the race.
Reported and tested by: pho
Reviewed by: jhb
MFC after: 2 weeks
- Check V_deembed_scopeid before checking if sa_family == AF_INET6.
- Fix scope id handing in route(8)[2] and ifconfig(8).
Reported by: rpaulo[1], Mateusz Guzik[1], peter[2]
over the active list. The mount interlock is not enough to guarantee
the validity of the tailq link pointers. The __mnt_vnode_next_active()
and __mnt_vnode_first_active() active lists iterators helper functions
did not provided the neccessary stability for the list, allowing the
iterators to pick garbage.
This was uncovered after the r243599 made the active list iterators
non-nop.
Since a vnode interlock is before the vnode_free_list_mtx, obtain the
vnode ilock in the non-blocking manner when under vnode_free_list_mtx,
and restart iteration after the yield if the lock attempt failed.
Assert that a vnode found on the list is active, and assert that the
helpers return the vnode with interlock owned.
Reported and tested by: pho
MFC after: 1 week
thought I've decided its overkill,a simple tuneable for
each RX and TX limit, and then init sets the ring values
based on that, should be sufficient.
More importantly, fix a bug causing a panic, when changing
the define style to IXGBE_LEGACY_TX a taskqueue init was
inadvertently set #ifdef when it should be #ifndef.
I couldn't think of a way to maintain the hardware TXQ locks _and_ layer
on top of that per-TXQ software queuing and any other kind of fine-grained
locks (eg per-TID, or per-node locks.)
So for now, to facilitate some further code refactoring and development
as part of the final push to get software queue ps-poll and u-apsd handling
into this driver, just do away with them entirely.
I may eventually bring them back at some point, when it looks slightly more
architectually cleaner to do so. But as it stands at the present, it's
not really buying us much:
* in order to properly serialise things and not get bitten by scheduling
and locking interactions with things higher up in the stack, we need to
wrap the whole TX path in a long held lock. Otherwise we can end up
being pre-empted during frame handling, resulting in some out of order
frame handling between sequence number allocation and encryption handling
(ie, the seqno and the CCMP IV get out of sequence);
* .. so whilst that's the case, holding the lock for that long means that
we're acquiring and releasing the TXQ lock _inside_ that context;
* And we also acquire it per-frame during frame completion, but we currently
can't hold the lock for the duration of the TX completion as we need
to call net80211 layer things with the locks _unheld_ to avoid LOR.
* .. the other places were grab that lock are reset/flush, which don't happen
often.
My eventual aim is to change the TX path so all rejected frame transmissions
and all frame completions result in any ieee80211_free_node() calls to occur
outside of the TX lock; then I can cut back on the amount of locking that
goes on here.
There may be some LORs that occur when ieee80211_free_node() is called when
the TX queue path fails; I'll begin to address these in follow-up commits.
which dumps out the actual options being used by an NFS mount.
This will be used to implement a "-m" option for nfsstat(1).
Reviewed by: alfred
MFC after: 2 weeks
This brand of controllers expects that the number of
contexts specified in the input slot context points
to an active endpoint context, else it refuses to
operate.
- Ring the correct doorbell when streams mode is used.
- Wrap one or two long lines.
Tested by: Markus Pfeiffer (DragonFlyBSD)
MFC after: 1 week
Programming the low bits has a side-effect if unmasking the pin if it is
not disabled. So if an interrupt was pending then it would be delivered
with the correct new vector but to the incorrect old LAPIC.
This fix could be made clearer by preserving the mask bit while
programming the low bits and then explicitly resetting the mask bit
after all the programming is done.
Probability to trip over the fixed bug could be increased by bootverbose
because printing of the interrupt information in ioapic_assign_cpu
lengthened the time window during which an interrupt could arrive while
a pin is masked.
Reported by: Andreas Longwitz <longwitz@incore.de>
Tested by: Andreas Longwitz <longwitz@incore.de>
MFC after: 12 days
Also, make it explicit that V_XATTRDIR is not properly supported in gfs
code yet.
The bad code was plain incorrect: (a) it spoiled handling of v_usecount
reaching zero and (b) it leaked v_holdcnt.
The ugly code employs potentially unsafe locking tricks.
Ideally we should separate vnode lifecycle and gfs node lifecycle.
A gfs node should have its own reference count where its child nodes
should be accounted.
PR: kern/151111
Reviewed by: kib
MFC after: 13 days
... to avoid any races or inconsistencies.
This should fix a regression introduced in r243404.
Also, remove a stale comment that has not been true for quite a while
now.
Pointyhat to: avg
Teested by: trociny, emaste, dumbbell (earlier version)
MFC after: 1 week
src/sys/{bsm,security/audit}. There are a few tweaks to help with the
FreeBSD build environment that will be merged back to OpenBSM. No
significant functional changes appear on the kernel side.
Obtained from: TrustedBSD Project
Sponsored by: The FreeBSD Foundation (auditdistd)
enforcing the TXOP and TBTT limits:
* Frames which will overlap with TBTT will not TX;
* Frames which will exceed TXOP will be filtered.
This is not enabled by default; it's intended to be enabled by the
TDMA code on 802.11n capable chipsets.
the revamped sysctl code did not work, and needed a change. This
makes the limit get set at the time that all sysctl stats are
created and is actually more elegant imho anyway.
TX hot path by getting rid of index calculations and simply
managing pointers. Much of the creative code is due to my
coworker here at Intel, Alex Duyck, thanks Alex!
Also, this whole series of patches was given the critical
eye of Gleb Smirnoff and is all the better for it, thanks
Gleb!
- add a limit for both RX and TX, change the default to 256
- change the sysctl usage to be common, and now to be called
during init for each ring.
- the TX limit is not yet used, but the changes in the last
patch in this series uses the value.
- the motivation behind these changes is to improve data
locality in the final code.
- rxeof interface changes since it now gets limit from the
ring struct
Fix path handling for *at() syscalls.
Before the change directory descriptor was totally ignored,
so the relative path argument was appended to current working
directory path and not to the path provided by descriptor, thus
wrong paths were stored in audit logs.
Now that we use directory descriptor in vfs_lookup, move
AUDIT_ARG_UPATH1() and AUDIT_ARG_UPATH2() calls to the place where
we hold file descriptors table lock, so we are sure paths will
be resolved according to the same directory in audit record and
in actual operation.
Sponsored by: FreeBSD Foundation (auditdistd)
Reviewed by: rwatson
MFC after: 2 weeks
defines (at Gleb's request). Also, change the defines around
the old transmit code to IXGBE_LEGACY_TX, I do this to make
it possible to define this regardless of the OS level (it is
not defined by default). There are also a couple changed
comments for clarity.
Currently when we discover that trail file is greater than configured
limit we send AUDIT_TRIGGER_ROTATE_KERNEL trigger to the auditd daemon
once. If for some reason auditd didn't rotate trail file it will never
be rotated.
Change it by sending the trigger when trail file size grows by the
configured limit. For example if the limit is 1MB, we will send trigger
on 1MB, 2MB, 3MB, etc.
This is also needed for the auditd change that will be committed soon
where auditd may ignore the trigger - it might be ignored if kernel
requests the trail file to be rotated too quickly (often than once a second)
which would result in overwriting previous trail file.
Sponsored by: FreeBSD Foundation (auditdistd)
MFC after: 2 weeks
Currently on each record write we call VFS_STATFS() to get available space
on the file system as well as VOP_GETATTR() to get trail file size.
We can assume that trail file is only updated by the audit worker, so instead
of asking for file size on every write, get file size on trail switch only
(it should be zero, but it's not expensive) and use global variable audit_size
protected by the audit worker lock to keep track of trail file's size.
This eliminates VOP_GETATTR() call for every write. VFS_STATFS() is satisfied
from in-memory data (mount->mnt_stat), so shouldn't be expensive.
Sponsored by: FreeBSD Foundation (auditdistd)
MFC after: 2 weeks
these are FCOE stats (fiber channel over ethernet), something that
FreeBSD does not yet have, they were mistaken for flow control by
the implementor I believe. Secondly, the real flow control stats
are oddly named with a 'link' tag on the front, it was requested
by my validation engineer to make these stats have the same name as
the igb driver for clarity and that seemed reasonable to me.
Remove redundant call to AUDIT_ARG_UPATH1().
Path will be remembered by the following NDINIT(AUDITVNODE1) call.
Sponsored by: FreeBSD Foundation (auditdistd)
MFC after: 2 weeks
multiqueue code, this functionality has proven to be more
trouble than it was worth. Thanks to Gleb for a second
critical look over my code and help in the patches!
* Global IPFW_DYN_LOCK() is changed to per-bucket mutex.
* State expiration is done in ipfw_tick every second.
* No expiration is done on forwarding path.
* hash table resize is done automatically and does not flush all states.
* Dynamic UMA zone is now allocated per each VNET
* State limiting is now done via UMA(9) api.
Discussed with: ipfw
MFC after: 3 weeks
Sponsored by: Yandex LLC
- Add "fdt addr" subcommand that lets you specify preloaded blob address
- Do not pre-initialize blob for "fdt addr"
- Do not try to load dtb every time fdt subcommand is issued,
do it only once
- Change the way DTB is passed to kernel. With introduction of "fdt addr"
actual blob address can be not virtual but physical or reside in
area higher then 64Mb. ubldr should create copy of it in kernel area
and pass pointer to this newly allocated buffer which is guaranteed to work
in kernel after switching on MMU.
- Convert memreserv FDT info to "memreserv" property of root node
FDT uses /memreserve/ data to notify OS about reserved memory areas.
Technically it's not real property, it's just data blob, sequence
of <start, size> pairs where both start and size are 64-bit integers.
It doesn't fit nicely with OF API we use in kernel, so in order to unify
thing ubldr converts this data to "memreserve" property using the same
format for addresses and sizes as /memory node.
It returns memory regions restricted from being used by kernel. These
regions are dfined in "memreserve" property of root node in the same
format as "reg" property of /memory node
embryonic connection has been setup and never attempt to abort a tid
before this is done. This fixes a bad race where a listening socket is
closed when the driver is in the middle of step (b) here. The symptom
of this were "ARP miss" errors from the driver followed by tid leaks.
A hardware-offloaded passive open works this way:
a) A SYN "hits" the TCAM entry for a server tid and the chip delivers it
to the queue associated with the server tid (say, queue A). It waits
for a response from the driver telling it what to do.
b) The driver decides it is ok to proceed. It adds the new tid to the
list of embryonic connections associated with the server tid and then
hands off the SYN to the kernel's syncache to make sure that the kernel
okays it too. If it does then the driver provides an L2 table entry,
queue id (say, queue B), etc. and instructs the chip to send the SYN/ACK
response.
c) The chip delivers a status to queue B depending on how the third step
of the 3-way handshake goes. The driver removes the tid from its list
of embryonic connections and either expands the syncache entry or
destroys the tid. In any case all subsequent messages for the new tid
will be delivered to queue B, not queue A. Anything running in queue B
knows that the L2 entry has long been setup and the new flag is of no
interest from here on. If the listener is closed it will deal with
so_comp as normal.
MFC after: 1 week
for bridge interface.
- If we found a collision we can break the loop - only one collision is
possible and one is exactly enough to need to renegerate.
Obtained from: WHEEL Systems
MFC after: 1 week
variable as they may overflow on i386/PAE and i386 with > 2GB RAM.
Use 64bit quad_t instead. It has broader kernel infrastructure support
with TUNABLE_QUAD_FETCH() and qmin/qmax() than other available types.
Pointed out by: alc, bde
but LDFLAGS is not (yet) passed on to the linker (via SYSTEM_LD et al).
Do so now. As such, any kernel configuration can now define linker
flags by setting LDFLAGS as normal and not have to revert to hacks
like setting DEBUG for flags that do not relate to debugging (see
sys/powerpc/conf/MPC85XX).
Make the following interface changes to my beastie boot menu:
+ Move boot options to a submenu
+ Add a new "Boot Single" menu item
+ Make "Boot" item and new "Boot Single" item reverse when boot_single is set
+ Add new "Load Defaults" item (in new "Boot Options" submenu) for overridding
loader.conf(5) provided values with system defaults.
Reviewed by: adrian (co-mentor)
Approved by: adrian (co-mentor)
Bring several definitions required for newer ext4 features.
Rename EXT2F_COMPAT_HTREE to EXT2F_COMPAT_DIRHASHINDEX since it
is not being used yet and the new name is more compatible with
NetBSD and Linux.
This change is purely cosmetic and has no effect on the real
code.
Obtained from: NetBSD
MFC after: 3 days
When a file is first being written, the dynamic block reallocation
(implemented by ext2_reallocblks) relocates the file's blocks
so as to cluster them together into a contiguous set of blocks on
the disk.
When the cluster crosses the boundary into the first indirect block,
the first indirect block is initially allocated in a position
immediately following the last direct block. Block reallocation
would usually destroy locality by moving the indirect block out of
the way to keep the data blocks contiguous.
The issue was diagnosed long ago by Bruce Evans on ffs and surfaced
on ext2fs when block reallocaton was ported. This is only a partial
solution based on the similarities with FFS. We still require more
review of the allocation details that vary in ext2fs.
Reported by: bde
MFC after: 1 week
kernel memory, whichever is lower. The overall mbuf related memory
limit must be set so that mbufs (and clusters of various sizes)
can't exhaust physical RAM or KVM.
The limit is set to half of the physical RAM or KVM (whichever is
lower) as the baseline. In any normal scenario we want to leave
at least half of the physmem/kvm for other kernel functions and
userspace to prevent it from swapping too easily. Via a tunable
kern.maxmbufmem the limit can be upped to at most 3/4 of physmem/kvm.
At the same time divorce maxfiles from maxusers and set maxfiles to
physpages / 8 with a floor based on maxusers. This way busy servers
can make use of the significantly increased mbuf limits with a much
larger number of open sockets.
Tidy up ordering in init_param2() and check up on some users of
those values calculated here.
Out of the overall mbuf memory limit 2K clusters and 4K (page size)
clusters to get 1/4 each because these are the most heavily used mbuf
sizes. 2K clusters are used for MTU 1500 ethernet inbound packets.
4K clusters are used whenever possible for sends on sockets and thus
outbound packets. The larger cluster sizes of 9K and 16K are limited
to 1/6 of the overall mbuf memory limit. When jumbo MTU's are used
these large clusters will end up only on the inbound path. They are
not used on outbound, there it's still 4K. Yes, that will stay that
way because otherwise we run into lots of complications in the
stack. And it really isn't a problem, so don't make a scene.
Normal mbufs (256B) weren't limited at all previously. This was
problematic as there are certain places in the kernel that on
allocation failure of clusters try to piece together their packet
from smaller mbufs.
The mbuf limit is the number of all other mbuf sizes together plus
some more to allow for standalone mbufs (ACK for example) and to
send off a copy of a cluster. Unfortunately there isn't a way to
set an overall limit for all mbuf memory together as UMA doesn't
support such a limiting.
NB: Every cluster also has an mbuf associated with it.
Two examples on the revised mbuf sizing limits:
1GB KVM:
512MB limit for mbufs
419,430 mbufs
65,536 2K mbuf clusters
32,768 4K mbuf clusters
9,709 9K mbuf clusters
5,461 16K mbuf clusters
16GB RAM:
8GB limit for mbufs
33,554,432 mbufs
1,048,576 2K mbuf clusters
524,288 4K mbuf clusters
155,344 9K mbuf clusters
87,381 16K mbuf clusters
These defaults should be sufficient for even the most demanding
network loads.
MFC after: 1 month
accept queues a new socket/connection may be added to the queue
due to a race on the ACCEPT_LOCK.
The submitted patch is slightly changed in comments, teardown
and locking order and extended with KASSERT's.
Submitted by: Vijay Singh <vijju.singh-at-gmail-dot-com>
Found by: His team.
MFC after: 1 week
now this works for non-debug and debug builds.
* Add a comment reminding me (or someone) to audit all of the relevant
math to ensure there's no weird wrapping issues still lurking about.
But yes, this does seem to be mostly working.
Pointy-hat-to: adrian, yet again
is in capability mode.
- Add VN_OPEN_NOCAPCHECK flag for vn_open_cred() to will ne converted into
NOCAPCHECK namei flag.
This functionality will be used to enable core dumps for sandboxed processes.
Reviewed by: rwatson
Obtained from: WHEEL Systems
MFC after: 2 weeks
to himself. For example abort(3) at first tries to do kill(getpid(), SIGABRT)
which was failing in capability mode, so the code was failing back to exit(1).
Reviewed by: rwatson
Obtained from: WHEEL Systems
MFC after: 2 weeks