This helps fix a -Wshadow issue with exp(3) with tests/sys/acct/acct_test,
which include math.h, which in turn defines exp(3)
MFC after: 2 weeks
Tested with: clang, gcc 4.2.1, gcc 4.9
Sponsored by: Dell EMC Isilon
dropped then reacquired due to using M_WAITOK, which opens a window in
which the tty device can disappear. Check for this and return ENXIO
back up the call chain so that callers can cope.
This closes a race where TF_GONE would get set while buffers were being
allocated as part of ttydev_open(), causing a subsequent call to
ttydevsw_modem() later in ttydev_open() to assert.
Reported by: pho
Reviewed by: kib
after tty_timedwait() returns an error only if the error is EWOULDBLOCK;
other errors cause an immediate return. This fixes the case of the tty
disappearing while in tty_drain().
Reported by: pho
m_pulldown()
m_pulldown() only needs to determine if a mbuf is writable if it is going to
copy data into the data region of an existing mbuf. It does this to create a
contiguous data region in a single mbuf from multiple mbufs in the chain. If
the requested memory region is already contiguous and nothing needs to
change, the mbuf does not need to be writeable.
Submitted by: Brian Mueller <bmueller@panasas.com>
Reviewed by: bz
MFC after: 1 week
Sponsored by: Panasas
Differential Revision: https://reviews.freebsd.org/D9053
drain timeout handling to historical freebsd behavior.
The primary reason for these changes is the need to have tty_drain() call
ttydevsw_busy() at some reasonable sub-second rate, to poll hardware that
doesn't signal an interrupt when the transmit shift register becomes empty
(which includes virtually all USB serial hardware). Such hardware hangs
in a ttyout wait, because it never gets an opportunity to trigger a wakeup
from the sleep in tty_drain() by calling ttydisc_getc() again, after
handing the last of the buffered data to the hardware.
While researching the history of changes to tty_drain() I stumbled across
some email describing the historical BSD behavior of tcdrain() and close()
on serial ports, and the ability of comcontrol(1) to control timeout
behavior. Using that and some advice from Bruce Evans as a guide, I've
put together these changes to implement the hardware polling and restore
the historical timeout behaviors...
- tty_drain() now calls ttydevsw_busy() in a loop at 10 Hz to accomodate
hardware that requires polling for busy state.
- The "new historical" behavior for draining during close(2) is retained:
the drain timeout is "1 second without making any progress". When the
1-second timeout expires, if the count of bytes remaining in the tty
layer buffer is smaller than last time, the timeout is extended for
another second. Unfortunately, the same logic cannot be extended all
the way down to the hardware, because the interface to that layer is a
simple busy/not-busy indication.
- Due to the previous point, an application that needs a guarantee that
all data has been transmitted must use TIOCDRAIN/tcdrain(3) before
calling close(2).
- The historical behavior of honoring the drainwait setting for TIOCDRAIN
(used by tcdrain(3)) is restored.
- The historical kern.drainwait sysctl to control the global default
drainwait time is restored, but is now named kern.tty_drainwait.
- The historical default drainwait timeout of 300 seconds is restored.
- Handling of TIOCGDRAINWAIT and TIOCSDRAINWAIT ioctls is restored
(this also makes the comcontrol(1) drainwait verb work again).
- Manpages are updated to document these behaviors.
Reviewed by: bde (prior version)
biowait() will otherwise race with completions of such BIOs. In-tree code
only calls biowait() on BIOs that do not specify a handler, so this change
should not have any functional impact.
Reviewed by: mav
MFC after: 1 month
Sponsored by: Dell EMC Isilon
Differential Revision: https://reviews.freebsd.org/D9070
Add a MSG_MOREOTOCOME message flag. When this flag is set, sosend*
set PRUS_MOREOTOCOME when invoking the protocol send method. The aio
worker tasks for sending on a socket set this flag when there are
additional write jobs waiting on the socket buffer.
Reviewed by: adrian
MFC after: 1 month
Sponsored by: Chelsio Communications
Differential Revision: https://reviews.freebsd.org/D8955
sys/ptrace.h includes sys/signal.h, which includes sys/_sigset.h.
Note that sys/_sigset.h only defines osigset_t if COMPAT_43 was defined.
Two lines later, sys/ptrace.h includes machine/reg.h, which in case of
powerpc, includes opt_compat.h.
After the include headers reordering in r311345, we have sys/ptrace.h
included before sys/sysproto.h.
If COMPAT_43 was requested in the kernel config, the result is that
sys/_sigset.h does not define osigset_t, but sys/sysproto.h sees
COMPAT_43 and uses osigset_t.
Fix this by explicitely including opt_compat.h to cover the whole
kern/kern_exec.c scope.
Sponsored by: The FreeBSD Foundation
Right now size of the structure is 472 bytes on amd64, which is
already large and stack allocations are indesirable. With the ino64
work, MNAMELEN is increased to 1024, which will make it impossible to have
struct statfs on the stack.
Extracted from: ino64 work by gleb
Discussed with: mckusick
Tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Upon each execve, we allocate a KVA range for use in copying data to the
new image. Pages must be faulted into the range, and when the range is
freed, the backing pages are freed and their mappings are destroyed. This
is a lot of needless overhead, and the exec_map management becomes a
bottleneck when many CPUs are executing execve concurrently. Moreover, the
number of available ranges is fixed at 16, which is insufficient on large
systems and potentially excessive on 32-bit systems.
The new allocator reduces overhead by making exec_map allocations
persistent. When a range is freed, pages backing the range are marked clean
and made easy to reclaim. With this change, the exec_map is sized based on
the number of CPUs.
Reviewed by: kib
MFC after: 1 month
Differential Revision: https://reviews.freebsd.org/D8921
returns memory which must be freed, regardless of the error. Assign
NULL to *buf in case we are not going to allocate any memory due to
invalid mode.
Reported and tested by: pho
Reviewed by: jhb
Sponsored by: The FreeBSD Foundation
MFC after: 3 weeks (together with r310638)
Differential revision: https://reviews.freebsd.org/D9042
a symlink and an autofs mount request. The crash was caused by namei()
calling bcopy() with a negative length, caused by numeric underflow:
in lookup(), in the relookup path, the ni_pathlen was decremented too
many times. The bug was introduced in r296715.
Big thanks to Alex Deiter for his help with debugging this.
Reviewed by: kib@
Tested by: Alex Deiter <alex.deiter at gmail.com>
MFC after: 1 month
Instead of spuriously re-reading the lock value, read it once.
This change also has a side effect of fixing a performance bug:
on failed _mtx_obtain_lock, it was possible that re-read would find
the lock is unowned, but in this case the primitive would make a trip
through turnstile code.
This is diff reduction to a variant which uses atomic_fcmpset.
Discussed with: jhb (previous version)
Tested by: pho (previous version)
and prison enforcement. Do it on the caller buffer directly.
Besides eliminating memory copies, this change also removes large
structure from the kernel stack.
Extracted from: ino64 work by gleb
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
- iflib - add checksum in place support (mmacy)
- iflib - initialize IP for TSO (going to be needed for e1000) (mmacy)
- iflib - move isc_txrx from shared context to softc context (mmacy)
- iflib - Normalize checks in TXQ drainage. (shurd)
- iflib - Fix queue capping checks (mmacy)
- iflib - Fix invalid assert, em can need 2 sentinels (mmacy)
- iflib - let the driver determine what capabilities are set and what
tx csum flags are used (mmacy)
- add INVARIANTS debugging hooks to gtaskqueue enqueue (mmacy)
- update bnxt(4) to support the changes to iflib (shurd)
Some other various, sundry updates. Slightly more verbose changelog:
Submitted by: mmacy@nextbsd.org
Reviewed by: shurd
mFC after:
Sponsored by: LimeLight Networks and Dell EMC Isilon
All hash sizes are power-of-2, but the compiler does not know that for sure
and 'foo % size' forces doing a division.
Store the size - 1 and use 'foo & hash' instead which allows mere shift.
This argument is not a bitmask of flags, but only accepts a single value.
Fail with EINVAL if an invalid value is passed to 'flag'. Rename the
'flags' argument to getmntinfo(3) to 'mode' as well to match.
This is a followup to r308088.
Reviewed by: kib
MFC after: 1 month
closed by r310302 for knote().
If KN_INFLUX | KN_SCAN flags are set for the note passed to knote() or
knote_fork(), i.e. the knote is scanned, we might erronously clear
INFLUX when finishing notification. For normal knote() it was fixed
in r310302 simply by remembering the fact that we do not own
KN_INFLUX, since there we own knlist lock and scan thread cannot clear
KN_INFLUX until we drop the lock. For knote_fork(), the situation is
more complicated, e must drop knlist lock AKA the process lock, since
we need to register new knotes.
Change KN_INFLUX into counter and allow shared ownership of the
in-flux state between scan and knote_fork() or knote(). Both in-flux
setters need to ensure that knote is not dropped in parallel. Added
assert about kn_influx == 1 in knote_drop() verifies that in-flux state
is not shared when knote is destroyed.
Since KBI of the struct knote is changed by addition of the int
kn_influx field, reorder kn_hook and kn_hookid to fill pad on LP64
arches [1]. This keeps sizeof(struct knote) to same 128 bytes as it
was before addition of kn_influx, on amd64.
Reviewed by: markj
Suggested by: markj [1]
Tested by: pho (previous version)
Sponsored by: The FreeBSD Foundation
Differential revision: https://reviews.freebsd.org/D8898
accepting the wrong state and printing warning. Do not obliterate
kl_lock and kl_unlock pointers, they are often useful for post-mortem
analysis.
Reviewed by: markj
Tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks
X-Differential revision: https://reviews.freebsd.org/D8898
There is no need to do two allocations per kqueue timer. Gather all
data needed by the timer callout into the structure and allocate it at
once.
Use the structure to preserve the result of timer2sbintime(), to not
perform repeated 64bit calculations in callout.
Remove tautological casts.
Remove now unused p_nexttime [1].
Noted by: markj [1]
Reviewed by: markj (previous version)
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
X-MFC note: do not remove p_nexttime
Differential revision: https://reviews.freebsd.org/D8901
The removal of TAILQ_FOREACH_SAFE introduced a small race: when the last
thread on a sleepqueue is awoken, it reclaims the sleepqueue and may begin
executing on a different CPU before sleepq_resume_thread() returns. This
leaves a window during which it may go back to sleep and incorrectly be
awoken again by the caller of sleepq_broadcast().
Reported and tested by: pho
MFC after: 3 days
Sponsored by: Dell EMC Isilon
pause() uses a spin loop to simulate a sleep during early boot. However,
we only need this for thread0 to get far enough in the boot process to
enable timers (at which point pause() can sleep). For other kthreads,
sleeping in pause() is ok as the callout will be scheduled and will
eventually fire once thread0 initializes timers.
Tested by: Steven Kargl
Sleuthing by: markj
MFC after: 1 week
Sponsored by: Netflix
For notes in KN_INFLUX|KN_SCAN state, the influx bit is set by a
parallel scan. When knote() reports event for the vnode filters,
which require kqueue unlocked, it unconditionally sets and then clears
influx to keep note around kqueue unlock. There, do not clear influx
flag if a scan set it, since we do not own it, instead we prevent scan
from executing by holding knlist lock.
The knote_fork() function has somewhat similar problem, it might set
KN_INFLUX for scanned note, drop kqueue and list locks, and then clear
the flag after relock. A solution there would be different enough, as
well as the test program, so close the reported issue first.
Reported and test case provided by: yjh0502@gmail.com
PR: 214923
Tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Apparently stdatomic.h implementation for gcc 4.2 on sparc64 does not
work properly. This effectively reverts r251803.
Reported and tested by: lidl
Discussed with: ed
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
This way it becomes possible to graph a property for all instances of a
single driver. For example, graphing the number of packets across all
USB controllers, the amount of dropped packets on all NICs, etc.
Reviewed by: cem
Differential Revision: https://reviews.freebsd.org/D8775
Sysctls like kern.eventtimer.et.*.quality currently embed the name of
the clock device. This is problematic for the Prometheus metrics
exporter for two reasons:
- Some of those clocks have dashes in their names, which Prometheus
doesn't allow to be used in metric names.
- It doesn't allow for extracting the same property of all clocks on the
system from within a single query.
Attach these nodes to have a label, so that the Prometheus metrics
exporter gives these metric a uniform name with the name of the clock
attached as a label.
Reviewed by: cem
Differential Revision: https://reviews.freebsd.org/D8775