115476 Commits

Author SHA1 Message Date
jtl
be1c33dfcc The TCPPCAP debugging feature caches recently-used mbufs for use in
debugging TCP connections. This commit provides a mechanism to free those
mbufs when the system is under memory pressure.

Because this will result in lost debugging information, the behavior is
controllable by a sysctl. The default setting is to free the mbufs.

Reviewed by:	gnn
Approved by:	re (gjb)
Differential Revision:	https://reviews.freebsd.org/D6931
Input from:	novice_techie.com
2016-07-06 16:17:13 +00:00
nwhitehorn
89d01c24d1 Replace a number of conflations of mp_ncpus and mp_maxid with either
mp_maxid or CPU_FOREACH() as appropriate. This fixes a number of places in
the kernel that assumed CPU IDs are dense in [0, mp_ncpus) and would try,
for example, to run tasks on CPUs that did not exist or to allocate too
few buffers on systems with sparse CPU IDs in which there are holes in the
range and mp_maxid > mp_ncpus. Such circumstances generally occur on
systems with SMT, but on which SMT is disabled. This patch restores system
operation at least on POWER8 systems configured in this way.

There are a number of other places in the kernel with potential problems
in these situations, but where sparse CPU IDs are not currently known
to occur, mostly in the ARM machine-dependent code. These will be fixed
in a follow-up commit after the stable/11 branch.

PR:		kern/210106
Reviewed by:	jhb
Approved by:	re (glebius)
2016-07-06 14:09:49 +00:00
hselasky
6716d2f9c5 Fix regression issue with XHCI on 32-bit ARMv7 Armada-38x. Make sure
"struct xhci_dev_ctx_addr" fits into a single 4K page until further.

Approved by:	re (hrs)
MFC after:	1 week
2016-07-06 10:57:04 +00:00
bz
a1ab1fdac6 Only set the ipfilter running state to 'not running' if we are
doing the teardown.  ipf_destroy_all() may free ipfmain in case
of ipf_dynamic_softc being true, thus we are avoiding a possible
memory modified after free as well.

Reported by:	Coverity
Coverity CID:	1357320
Approved by:	re (hrs)
MFC after:	10 days
2016-07-06 10:29:29 +00:00
cem
5cdf98a966 ioat(4): Block asynchronous work during HW reset
Fix the race between ioat_reset_hw and ioat_process_events.

HW reset isn't protected by a lock because it can sleep for a long time
(40.1 ms).  This resulted in a race where we would process bogus parts
of the descriptor ring as if it had completed.  This looked like
duplicate completions on old events, if your ring had looped at least
once.

Block callout and interrupt work while reset runs so the completion end
of things does not observe indeterminate state and process invalid parts
of the ring.

Start the channel with a manually implemented ioat_null() to keep other
submitters quiesced while we wait for the channel to start (100 us).

r295605 may have made the race between ioat_reset_hw and
ioat_process_events wider, but I believe it already existed before that
revision.  ioat_process_events can be invoked by two asynchronous
sources: callout (softclock) and device interrupt.  Those could race
each other, to the same effect.

Reviewed by:	markj
Approved by:	re
Sponsored by:	EMC / Isilon Storage Division
Differential Revision:	https://reviews.freebsd.org/D7097
2016-07-05 20:53:32 +00:00
cem
74e2a82c2d ioat(4): Serialize ioat_reset_hw invocations
Reviewed by:	markj
Approved by:	re
Sponsored by:	EMC / Isilon Storage Division
Differential Revision:	https://reviews.freebsd.org/D7097
2016-07-05 20:52:35 +00:00
cem
daf01694f3 ioat(4): Split timer into poll and shrink functions
Poll should happen quickly, while shrink should happen infrequently.

Protect is_completion_pending with submit_lock.

Reviewed by:	markj
Approved by:	re
Sponsored by:	EMC / Isilon Storage Division
Differential Revision:	https://reviews.freebsd.org/D7097
2016-07-05 20:51:52 +00:00
glebius
2b2f782761 The paradigm of a callout is that it has three consequent states:
not scheduled -> scheduled -> running -> not scheduled. The API and the
manual page assume that, some comments in the code assume that, and looks
like some contributors to the code also did. The problem is that this
paradigm isn't true. A callout can be scheduled and running at the same
time, which makes API description ambigouous. In such case callout_stop()
family of functions/macros should return 1 and 0 at the same time, since it
successfully unscheduled future callout but the current one is running.
Before this change we returned 1 in such a case, with an exception that
if running callout was migrating we returned 0, unless CS_MIGRBLOCK was
specified.

With this change, we now return 0 in case if future callout was unscheduled,
but another one is still in action, indicating to API users that resources
are not yet safe to be freed.

However, the sleepqueue code relies on getting 1 return code in that case,
and there already was CS_MIGRBLOCK flag, that covered one of the edge cases.
In the new return path we will also use this flag, to keep sleepqueue safe.

Since the flag CS_MIGRBLOCK doesn't block migration and now isn't limited to
migration edge case, rename it to CS_EXECUTING.

This change fixes panics on a high loaded TCP server.

Reviewed by:	jch, hselasky, rrs, kib
Approved by:	re (gjb)
Differential Revision:	https://reviews.freebsd.org/D7042
2016-07-05 18:47:17 +00:00
glebius
0417c9be8f Compile in the kassert_panic() function with INVARIANT_SUPPORT
option, not INVARIANTS.  The function is required if we want
to load in a module that is compiled with INVARIANTS.

Reviewed by:	jhb
Approved by:	re (gjb)
2016-07-05 18:34:34 +00:00
markj
d6ae30f932 Ensure that spinlock sections are balanced even after a panic.
vpanic() uses spinlock_enter() to disable interrupts before dumping core.
However, when the scheduler is stopped and INVARIANTS is not configured,
thread_lock() does not acquire a spinlock section, while thread_unlock()
releases one. This can result in interrupts staying enabled while the
kernel dumps core, complicating post-mortem analysis of the crash.

Approved by:	re (gjb)
MFC after:	1 week
Sponsored by:	EMC / Isilon Storage Division
2016-07-05 17:59:04 +00:00
rwatson
0696f7cf29 Call audit hooks to capture vnode attributes for three file-descriptor
method implementations: fstat(2), close(2), and poll(2).  This change
synchronises auditing here with similar auditing for VFS-specific system
calls such as stat(2) that audit more complete vnode information.

Sponsored by:	DARPA, AFRL
Approved by:	re (kib)
MFC after:	1 week
2016-07-05 16:37:01 +00:00
emaste
c769866ed1 add description for debug.elf{32,64}_legacy_coredump sysctl
Approved by:	re (kib)
MFC after:	1 week
Sponsored by:	The FreeBSD Foundation
2016-07-05 14:46:06 +00:00
kib
b028e9a2d9 Clarify the vnode_destroy_vobject() logic handling for already terminated
objects.

Assert that there is no new waiters for the already terminated objects.
Old waiters should have been notified by the termination calling
vnode_pager_dealloc() (old/new are with regard of the lock acquisition
interval).

Only clear the vp->v_object for the case of already terminated object,
since other branches call vnode_pager_dealloc(), which should clear
the pointer.  Assert this.

Tested by:	pho
Reviewed by:	alc
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
Approved by:	re (gjb)
2016-07-05 11:21:02 +00:00
jhibbits
d88fefb346 Remove SoC-specific integrations from dTSEC, to make it SoC agnostic.
This will allow a single kernel to run on all SoCs supported by the dTSEC driver.

Approved by:	re@(gjb)
2016-07-05 06:16:42 +00:00
jhibbits
3720bcbd30 Unbreak the LBC driver, broken with the large RMan and 36-bit physical address changes.
Remove the use of fdt_data_to_res(), and instead construct the resources
manually.  Additionally, avoid the 32-bit size limitation of fdt_data_get(), by
building physical addresses manually from the lbc ranges property.

Approved by:	re@(gjb)
2016-07-05 06:14:23 +00:00
np
58667ff11c cxgbe(4): Changes to the CPL-handler registration mechanism and code
related to "shared" CPLs.

a) Combine t4_set_tcb_field and t4_set_tcb_field_rpl into a single
function.  Allow callers to direct the response to any iq.  Tidy up
set_ulp_mode_iscsi while there to use names from t4_tcb.h instead of
magic constants.

b) Remove all CPL handler tables from struct adapter.  This reduces its
size by around 2KB.  All handlers are now registered at MOD_LOAD instead
of attach or some kind of initialization/activation.  The registration
functions do not need an adapter parameter any more.

c) Add per-iq handlers to deal with CPLs whose destination cannot be
determined solely from the opcode.  There are 2 such CPLs in use right
now: SET_TCB_RPL and L2T_WRITE_RPL.  The base driver continues to send
filter and L2T_WRITEs over the mgmtq and solicits the reply on fwq.
t4_tom (including the DDP code) now uses the port's ctrlq to send
L2T_WRITEs and SET_TCB_FIELDs and solicits the reply on an ofld_rxq.
fwq and ofld_rxq have different handlers that know what kind of tid to
expect in the reply.  Update t4_write_l2e and callers to to support any
wrq/iq combination.

Approved by:	re@ (kib@)
Sponsored by:	Chelsio Communications
2016-07-05 01:29:24 +00:00
truckman
9e6e43be95 Fix a race condition between the main thread in aqm_pie_cleanup() and the
callout thread that can cause a kernel panic.  Always do the final cleanup
in the callout thread by passing a separate callout function for that task
to callout_reset_sbt().

Protect the ref_count decrement in the callout with DN_BH_WLOCK().  All
other ref_count manipulation is protected with this lock.

There is still a tiny window between ref_count reaching zero and the end
of the callout function where it is unsafe to unload the module.  Fixing
this would require the use of callout_drain(), but this can't be done
because dummynet holds a mutex and callout_drain() might sleep.

Remove the callout_pending(), callout_active(), and callout_deactivate()
calls from calculate_drop_prob().  They are not needed because this callout
uses callout_init_mtx().

Submitted by:	Rasool Al-Saadi <ralsaadi@swin.edu.au>
Approved by:	re (gjb)
MFC after:	3 days
Differential Revision:	https://reviews.freebsd.org/D6928
2016-07-05 00:53:01 +00:00
hselasky
1649e59d99 Fix interrupt loop when switching from USB device to USB host mode by
clearing all endpoint interrupt bits.

PR:		210736
Approved by:	re (glebius)
MFC after:	1 week
2016-07-04 17:12:22 +00:00
emaste
bca2b7caa7 boot1.efi: fix assignment / comparison expression
PR:		210706
Submitted by:	David Binderman <dcb314@hotmail.com>
Approved by:	re (kib)
MFC after:	1 week
2016-07-04 16:50:21 +00:00
kib
2ff8e9c636 Provide helper macros to detect 'non-silent SBDRY' state and to
calculate appropriate return value for stops.  Simplify the code by
using them.

Fix typo in sig_suspend_threads().  The thread which sleep must be
aborted is td2. (*)

In issignal(), when handling stopping signal for thread in
TD_SBDRY_INTR state, do not stop, this is wrong and fires assert.
This is yet another place where execution should be forced out of
SBDRY-protected region.  For such case, return -1 from issignal() and
translate it to corresponding error code in sleepq_catch_signals().
Assert that other consumers of cursig() are not affected by the new
return value. (*)

Micro-optimize, mostly VFS and VOP methods, by avoiding calling the
functions when SIGDEFERSTOP_NOP non-change is requested. (**)

Reported and tested by:	pho (*)
Requested by:	bde (**)
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
Approved by:	re (gjb)
2016-07-03 18:19:48 +00:00
kib
c31bb3499e Remove racy assert. The thread which changes vnode usecount from 0 to 1
does it under the vnode interlock, but the interlock is not owned by the
asserting thread.  As result, we might read increased use counter but also
still see VI_OWEINACT.

In collaboration with: nwhitehorn
Hardware donated by: IBM LTC
Sponsored by:	The FreeBSD Foundation (kib)
Approved by:	re (gjb)
2016-07-03 01:56:48 +00:00
kib
fb6d455926 Change type of the 'dead' variable to boolean.
Requested by:	alc
MFC after:	1 week
Approved by:	re (gjb)
2016-07-03 00:08:17 +00:00
np
d08d8c2ff3 cxgbe(4): Avoid a NULL dereference while dumping the L2 table. Entries
used by switching filters that rewrite L2 information do not have any
associated ifnet.

Approved by:	re@ (gjb@)
Sponsored by:	Chelsio Communications
2016-07-01 23:18:49 +00:00
nwhitehorn
1a2acce994 Clean up some FDT-related code in the PowerPC bootloader, improving error
checking and robustness. Prevents errors and crashes in FDT commands on
PowerMac G5 systems.

Approved by:	re (gjb)
2016-07-01 21:09:30 +00:00
kib
38d067a317 When a process knote was attached to the process which is already exiting,
the knote is activated immediately.  If the exit1() later activates
knotes, such knote is attempted to be activated second time.  Detect
the condition by zeroed kn_ptr.p_proc pointer, and avoid excessive
activation.

Before r302235, such knotes were removed from the knlist immediately
upon activation.

Reported by:	truckman
Sponsored by:	The FreeBSD Foundation
Approved by:	re (gjb)
2016-07-01 20:11:28 +00:00
adrian
d6284376d0 [net80211] teach AMRR to log the initial MCS rate as "MCS X"
Otheriwse it logs it as the rate value, which is 0x80 (MCS flag) + MCS,
which isn't that helpful.

Approved by:	re (gjb)
2016-07-01 19:58:13 +00:00
hselasky
e531a038fb Fix detection of USB device disconnects in USB host mode when the USB
device is connected directly to the USB port of the DWC OTG, in this
case a RPI-zero.

PR:		210695
Approved by:	re (gjb)
MFC after:	1 week
2016-07-01 07:27:33 +00:00
gjb
cce5cf3ef4 Update 11.0 to ALPHA6.
Approved by:	re (implicit)
Sponsored by:	The FreeBSD Foundation
2016-07-01 00:00:35 +00:00
bz
2af4ea8fc2 In case of the global eventhandler make sure the current VNET
is still operational before doing any work;  otherwise we might
run into, e.g., destroyed locks.

PR:		210724
Reported by:	olevole olevole.ru
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
Obtained from:	projects/vnet
Approved by:	re (gjb)
2016-06-30 19:32:45 +00:00
bz
0c1171f994 Virtualise ipfilter.
Split initializzation an teardown into module (global state) and VNET
(per virtual network stack) parts.  Virtualise global state, which is
not "const".

Cleanup eventhandlers, so that we can make use of the passed in argument
to get the vnet state from the ifp;  disable the "cloner" event as it is
too early, has no state, and can fire before initialisation (see comment
in the source).

Handle the dynamic sysctls specially.  The problem is that "ipmain"
is the virtualized struct, but the fields used for the sysctls are
hanging off memory allocated and attached to the virtualized "ipmain"
thus standard VNET macros and sysctl handling do not work.
We still say it is VNET sysctls to get the proper protection checks
in the VIMAGE case;  to solve the problem of accessing the right bit
of memory hanging of each per-VNET ipmain, we use a dedicated handler
function wrapping around sysctl_ipf_int() undoing the base calculation
from kern_sysctl.c and then adding the passed-in offset into the right
struct depending on handler.  A bit of a mess exposing VNET-internals
this way but the only way to keep the code without having to massively
restructure ipf internals.

Approved by:		re (hrs)
Sponsored by:		The FreeBSD Foundation
Obtained from:		projects/vnet
MFC after:		2 weeks
Reviewed by:		cy
Differential Revision:	https://reviews.freebsd.org/D7000
2016-06-30 15:01:07 +00:00
mav
b281d573a9 Revert r299454 and r299448.
Those changes were found confusing FreeBSD libc ACL code, that doesn't
differentiate ACL for directories and files, and report ACLs for all
directories created after those patches as non-trivial.  On the other
side these changes were considered wrong from POSIX and NFSv4 points of
view.  Until further investigation done upstream, revert those changes
locally in preparation for FreeBSD 11.0 release.

Approved by:	re (hrs)
2016-06-30 14:55:49 +00:00
tuexen
557adfd043 This patch fixes two bugs related to the setting of the I-Bit
for SCTP DATA and I-DATA chunks.
* For fragmented user messages, set the I-Bit only on the last
  fragment.
* When using explicit EOR mode, set the I-Bit on the last
  fragment, whenever SCTP_SACK_IMMEDIATELY was set in snd_flags
  for any of the send() calls.

Approved by:	re (hrs)
MFC after:	1 week
2016-06-30 06:06:35 +00:00
wma
eaf77c9cc6 ARM, ARM64: Workaround for buf_ring reordering
This patch offers a workaround to buf_ring reordering
    visible on armv7 and armv8. This is supposed to be
    removed once new buf_ring implementation is integrated
    into the tree.

    Obtained from:         Semihalf
    Reviewed by:           alc,emaste
    Differential Revision: https://reviews.freebsd.org/D6986
    Approved by:           re (gjb)
2016-06-30 05:18:37 +00:00
wma
2da75cc60b ARM64: fix DMAP calculation
Use arithmetic operators instead of logical. This fixes
    DMAP ranges calculation for ThunderX Dual Socket.

    Obtained from:         Semihalf
    Sponsored by:          Cavium
    Reviewed by:           zbb
    Differential Revision: https://reviews.freebsd.org/D7023
    Approved by:           re (gjb)
2016-06-30 04:58:19 +00:00
bz
c79242bce1 Move the ipfw_log_bpf() calls from global module initialisation to
per-VNET initialisation and virtualise the interface cloning to
allow a dedicated ipfw log interface per VNET.

Approved by:		re (gjb)
MFC after:		2 weeks
Sponsored by:		The FreeBSD Foundation
2016-06-30 01:33:14 +00:00
bz
47f08657c2 Remove unused global variables as well as unused memory
allocations from ipfilter in preparation for VNET support.

Suggested by:		cy (see D7000)
Sponsored by:		The FreeBSD Foundation
MFC after:		2 weeks
Approved by:		re (gjb)
2016-06-30 01:32:12 +00:00
gonzo
62a7c9548d Fixed FreeBSD/mips MALTA support for QEMU
Recource management functions in GT PCI controller driver
treated memory/IO resources as KSEG1 addresses, later during
activation these values would be increased by KSEG1 base again
rendering the address invalid and causing "bus error" trap.

Actual logic was converted to use real physical addresses,
so mapping takes place only during activation.

Submitted by:	Aleksandr Rybalko <ray@FreeBSD.org>
Approved by:	re (gjb)
2016-06-29 23:33:44 +00:00
bdrewery
3ef66a1afe WITH_META_MODE: Avoid false-positive error due to missing .meta with build commands.
Sponsored by:	EMC / Isilon Storage Division
Approved by:	re (blanket, META_MODE)
2016-06-29 22:39:22 +00:00
sobomax
fa8fbeaaf4 1.Improve handling around last compressed block of the file, which is
necessary because CLOOP format lacks explicit EOF or length, so that
  in the presence of padding or when the CLOOP is put onto a larger
  partition upper level provider size may be larger. Bound amount
  of extra data that we might touch to the max length of the compressed
  block and detect zero-padding in the last cluster, which when
  sector is all-zero might cause us to emit bogus I/O error after
  decompression of that fails. To not make code any more complicated
  that it needs to be deal with it in lazy-manner, i.e. when we
  first access that specific cluster.

  This change also fixes stupid mistake in the LZMA code, inherited
  from geom_lzma, which does not share length of the output buffer
  buffer with the decompression routine, so that in the presence
  of corrupted or purposedly tailored data may easily cause heap
  overflow and kernel memory corruption.

  Beef up validation of the CLOOP TOC by checking that lengths of
  all but the last compressed clusters match upper limit set by
  the decompressor and improve some error diagnostic output while
  I am here.

2.Add kern.geom.uzip.attach_to tunable to artifically limit
  attaching uzip to certain devices in the dev tree only.

    For example the following only makes us attaching to the
    GPT labels:

    kern.geom.uzip.attach_to="gpt/*"

3.Add kern.geom.uzip.noattach_to, which does opposite to the (2)
  above, i.e. prevents geom_uzip from tasting / attaching to
  providers matching some pattern. By default we don't attach
  to our own kind, i.e. kern.geom.uzip.noattach_to="*.uzip".
  It saves us quite some CPU cycles, esp on low-end embedded
  systems.

Approved by:	re (gjb)
Differential Revision:	https://reviews.freebsd.org/D7013
2016-06-29 18:19:05 +00:00
avos
19e196315e net80211: fix LOR/deadlock in ieee80211_ff_node_cleanup().
Add new lock for stageq (part of ieee80211_superg structure) and
ni_tx_superg (part of ieee80211_node structure);
drop com_lock protection where it is used to protect them.

While here, drop duplicate OPACKETS counter incrementation.

ni_tx_ampdu is not protected with it (however, it is also used without
locking in other places; probably, it requires some other solution
to be thread-safe).

Tested with RTL8188CUS (AP) and RTL8188EU (STA).

NOTE: Since this change breaks KBI, all wireless drivers need to be
recompiled.

Reviewed by:	adrian
Approved by:	re (gjb)
Differential Revision:	https://reviews.freebsd.org/D6958
2016-06-29 17:25:46 +00:00
sbruno
7fe30bec23 Correct PERSISTENT RESERVE OUT command and populate scsi_cmd->length.
PR:	202625
Submitted by:	niakrisn@gmail.com
Reviewed by:	scottl kenm
Approved by:	re (gjb)
MFC after:	2 weeks
2016-06-29 16:41:37 +00:00
nwhitehorn
032d51d92b Fix fat-fingering: #if AIM should have been #ifdef AIM to avoid failures on
Book-E kernels.

Approved by:	re (gjb)
Pointy hat to:	nwhitehorn
2016-06-29 16:34:56 +00:00
nwhitehorn
74554ccb4a Do not rely on firmware having pre-enabled the MMU in a reasonable way for
late boot: enable it explicitly after installing the page tables. If booting
from an FDT, also make sure to escape the firmware's MMU context early
before overwriting firmware page tables.

Approved by:	re (gjb)
2016-06-29 14:40:43 +00:00
smh
b24634094a Allow ZFS ARC min / max to be tuned at runtime
Prior to this change ZFS ARC min / max could only be changed using
boot time tunables, this allows the values to be tuned at runtime
using the sysctls:
* vfs.zfs.arc_max
* vfs.zfs.arc_min

When adjusting ZFS ARC minimum the memory used  will only reduce
to the new minimum given memory pressure.

Reviewed by:	allanjude
Approved by:	re (gjb)
MFC after:	2 weeks
Relnotes:	yes
Sponsored by:	Multiplay
Differential Revision:	https://reviews.freebsd.org/D5907
2016-06-29 07:55:45 +00:00
np
ecbf2c3532 cxgbe(4): Do not bring up an interface when IFCAP_TOE is enabled on it.
The interface's queues are functional after VI_INIT_DONE (which is short
of interface-up) and that's all that's needed for t4_tom to communicate
with the chip.

Approved by:	re@ (gjb@)
Sponsored by:	Chelsio Communications
2016-06-29 06:55:30 +00:00
cem
eaa90873e9 USB: Add Garmin FR230 device quirk (broken INQUIRY)
PR:		210544
Reviewed by:	hps
Approved by:	re
2016-06-29 06:42:20 +00:00
bz
2acea814a2 Several device drivers call if_alloc() and then do further checks and
will cal if_free() in case of conflict, error, ..
if_free() however sets the VNET instance from the ifp->if_vnet which
was not yet initialized but would only in if_attach(). Fix this by
setting the curvnet from where we allocate the interface in if_alloc().
if_attach() will later overwrite this as needed. We do not set the home_vnet
early on as we only want to prevent the if_free() panic but not change any
of the other housekeeping, e.g., triggered through ifioctl()s.

Reviewed by:	brooks
Approved by:	re (gjb)
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D7010
2016-06-29 05:21:25 +00:00
sbruno
c6d7fbc03e Revert svn r302253 at the request/review of Ken M. This commit is
incorrect.

PR:		202625
Approved by:	re (implicit)
2016-06-28 18:32:15 +00:00
sbruno
90baee121f Correct PERSISTENT RESERVE OUT command and populate scsi_cmd->length.
PR:		202625
Submitted by:	niakrisn@gmail.com
Reviewed by:	scottl
Approved by:	re (hrs)
MFC after:	2 weeks
2016-06-28 18:08:47 +00:00
kib
42d92058c3 Currently the ntptime code and resettodr() are Giant-locked. In
particular, the Giant is supposed to protect against parallel
ntp_adjtime(2) invocations.  But, for instance, sys_ntp_adjtime() does
copyout(9) under Giant and then examines time_status to return syscall
result.  Since copyout(9) could sleep, the syscall result might be
inconsistent.

Another and more important issue is that if PPS is configured,
hardpps(9) is executed without any protection against the parallel
top-level code invocation. Potentially, this may result in the
inconsistent state of the ntptime state variables, but I cannot say
how serious such distortion is. The non-functional splclock() call in
sys_ntp_adjtime() protected against clock interrupts calling hardpps()
in the pre-SMP era.

Modernize the locking. A mutex protects ntptime data.  Due to the
hardpps() KPI legitimately serving from the interrupt filters (and
e.g. uart(4) does call it from filter), the lock cannot be sleepable
mutex if PPS_SYNC is defined.  Otherwise, use normal sleepable mutex
to reduce interrupt latency.

Reviewed by:	  imp, jhb
Sponsored by:	  The FreeBSD Foundation
Approved by:	  re (gjb)
Differential revision:	https://reviews.freebsd.org/D6825
2016-06-28 16:43:23 +00:00