13810 Commits

Author SHA1 Message Date
delphij
a0741a7553 MFC r285424 (ian):
Use the monotonic (uptime) counter rather than time-of-day to measure
elapsed time between ntp_adjtime() clock offset adjustments.  This
eliminates spurious frequency steering after a large clock step (such
as a 1970->2015 step on a system with no battery-backed clock hardware).

This problem was discovered after the import of ntpd 4.2.8, which does
things in a slightly different (but still correct) order than the 4.2.4
we had previously.  In particular, 4.2.4 would step the clock then
immediately after use ntp_adjtime() to set the frequency and offset to
zero, which captured the post-step time-of-day as a side effect.  In
4.2.8, ntpd sets frequency and offset to zero before any initial clock
step, capturing the time as 1970-ish, then when it next calls
ntp_adjtime() it's with a non-zero offset measurement. This non-zero
value gets multiplied by the apparent 45-year interval, which blows up
into a completely bogus frequency steer.  That gets clamped to 500ppm,
but that's still enough to make the clock drift so fast that ntpd has
to keep stepping it every few minutes to compensate.

Approved by:	re (gjb)
2015-07-15 19:11:43 +00:00
kib
969c638bab MFC r284887:
Handle errors from background write of the cylinder group blocks.

MFC r284927:
Simplify code.

Approved by:	re (gjb)
2015-07-11 19:11:40 +00:00
avg
97017b18a6 MFC r284297: several lockstat improvements 2015-07-01 10:15:49 +00:00
kib
febe4e6e91 MFC r284495:
Keep a vnode which is freed but still owing inactivation, on the active list.
This closes a race where such vnode is not msync-ed until reboot.
2015-07-01 06:54:25 +00:00
kib
6499094e0c MFC r284719:
Only take previous buffer queue lock (olock) when needed for REMFREE
in binsfree().
2015-06-30 05:53:15 +00:00
neel
c85aee0195 MFC r279444:
Allow passthrough devices to be hinted.

MFC r279683:
When ICW1 is issued the edge sense circuit is reset which means that
following an initialization a low-to-high transistion is necesary to
generate an interrupt.

MFC r279925:
Add -p parameter to list PCI device to pass through to the guest.

MFC r281559:
Fix handling of BUS_PROBE_NOWILDCARD in 'device_probe_child()'.

MFC r280447:
When fetching an instruction in non-64bit mode, consider the value of the
code segment base address.

MFC r280725:
Move legacy interrupt allocation for virtio devices to common code.

MFC r280775:
Fix the RTC device model to operate correctly in 12-hour mode.

MFC r280929:
Fix "MOVS" instruction memory to MMIO emulation.

MFC r280968:
Display instruction bytes and %rip prior to aborting due to an instruction
emulation error.

MFC r281145:
Enhance the support for Group 1 Extended opcodes for CMP, AND, OR instructions.

MFC r281542:
Initialize 'error' before use (Coverity IDs 1249748, 1249747, 1249751, 1249749)

MFC r281561:
Prior to aborting due to an ioport error, it is always interesting to see what
the guest's %rip is.

MFC r281611:
If the number of guest vcpus is less than '1' then flag it as an error.

MFC r281612:
Prefer 'vcpu_should_yield()' over checking 'curthread->td_flags' directly.

MFC r281630:
Relax the check on which vectors can be delivered through the APIC. According
to the Intel SDM vectors 16 through 255 are allowed to be delivered via the
local APIC.

MFC r281879:
Missing break in switch case (Coverity ID 1292499)

MFC r281946:
Don't allow guest to modify readonly bits in the PCI config 'status' register.

MFC r281987:
STOS/STOSB/STOSW/STOSD/STOSQ instruction emulation.

MFC r282206:
Implement the century byte in the RTC.
2015-06-28 01:21:55 +00:00
tuexen
7e040df666 When using KTRACE, set a variable to the appropriate value and don't
leave it initialized at NULL.
Since the affected functions where moved from sys/kern/uipc_syscalls.c
to sys/netinet/sctp_syscalls.c it was not possible to MFC r284613.
Therefore, this is a direct commit with the corresponding changes of r284613.

Reported by:	Coverity
CID:		1018058, 1018060
2015-06-22 06:06:38 +00:00
trasz
e1055c772b MFC r282213:
Add kern.racct.enable tunable and RACCT_DISABLED config option.
The point of this is to be able to add RACCT (with RACCT_DISABLED)
to GENERIC, to avoid having to rebuild the kernel to use rctl(8).

MFC r282901:

Build GENERIC with RACCT/RCTL support by default.  Note that it still
needs to be enabled by adding "kern.racct.enable=1" to /boot/loader.conf.

Note those two are MFC-ed together, because the latter one changes the
name of RACCT_DISABLED option to RACCT_DEFAULT_TO_DISABLED.  Should have
committed the renaming separately...

Relnotes:	yes
Sponsored by:	The FreeBSD Foundation
2015-06-21 06:28:26 +00:00
markj
93e43c433d MFC r284127:
witness: don't warn about matrix inconsistencies without holding the mutex

Lock order checking is done without the witness mutex held, so multiple
threads that are racing to establish a new lock order may read matrix
entries that are in an inconsistent state. Don't print a warning in this
case, but instead just redo the check after taking the witness lock.
2015-06-21 00:36:02 +00:00
kib
9d6c0060b8 MFC r284178:
Add barriers when updating and reading th_generation.

MFC r284256:
Tweaks for r284178.
2015-06-18 13:46:32 +00:00
ken
1d2632e58b MFC, r284192:
------------------------------------------------------------------------
  r284192 | ken | 2015-06-09 15:39:38 -0600 (Tue, 09 Jun 2015) | 102 lines

  Add support for reading MAM attributes to camcontrol(8) and libcam(3).

  MAM is Medium Auxiliary Memory and is most commonly found as flash
  chips on tapes.

  This includes support for reading attributes and decoding most
  known attributes, but does not yet include support for writing
  attributes or reporting attributes in XML format.

  libsbuf/Makefile:
  	Add subr_prf.c for the new sbuf_hexdump() function.  This
  	function is essentially the same function.

  libsbuf/Symbol.map:
  	Add a new shared library minor version, and include the
  	sbuf_hexdump() function.

  libsbuf/Version.def:
  	Add version 1.4 of the libsbuf library.

  libutil/hexdump.3:
  	Document sbuf_hexdump() alongside hexdump(3), since it is
  	essentially the same function.

  camcontrol/Makefile:
  	Add attrib.c.

  camcontrol/attrib.c:
  	Implementation of READ ATTRIBUTE support for camcontrol(8).

  camcontrol/camcontrol.8:
  	Document the new 'camcontrol attrib' subcommand.

  camcontrol/camcontrol.c:
  	Add the new 'camcontrol attrib' subcommand.

  camcontrol/camcontrol.h:
  	Add a function prototype for scsiattrib().

  share/man/man9/sbuf.9:
  	Document the existence of sbuf_hexdump() and point users to
  	the hexdump(3) man page for more details.

  sys/cam/scsi/scsi_all.c:
  	Add a table of known attributes, text descriptions and
  	handler functions.

  	Add a new scsi_attrib_sbuf() function along with a number
  	of other related functions that help decode attributes.

  	scsi_attrib_ascii_sbuf() decodes ASCII format attributes.

  	scsi_attrib_int_sbuf() decodes binary format attributes, and
  	will pass them off to scsi_attrib_hexdump_sbuf() if they're
  	bigger than 8 bytes.

  	scsi_attrib_vendser_sbuf() decodes the vendor and drive
  	serial number attribute.

  	scsi_attrib_volcoh_sbuf() decodes the Volume Coherency
  	Information attribute that LTFS writes out.

  sys/cam/scsi/scsi_all.h:
  	Add a number of attribute-related structure definitions and
  	other defines.

  	Add function prototypes for all of the functions added in
  	scsi_all.c.

  sys/kern/subr_prf.c:
  	Add a new function, sbuf_hexdump().  This is the same as
  	the existing hexdump(9) function, except that it puts the
  	result in an sbuf.

  	This also changes subr_prf.c so that it can be compiled in
  	userland for includsion in libsbuf.

  	We should work to change this so that the kernel hexdump
  	implementation is a wrapper around sbuf_hexdump() with a
  	statically allocated sbuf with a drain.  That will require
  	a drain function that goes to the kernel printf() buffer
  	that can take a non-NUL terminated string as input.
  	That is because an sbuf isn't NUL-terminated until it is
  	finished, and we don't want to finish it while we're still
  	using it.

  	We should also work to consolidate the userland hexdump and
  	kernel hexdump implemenatations, which are currently
  	separate.  This would also mean making applications that
  	currently link in libutil link in libsbuf.

  sys/sys/sbuf.h:
  	Add the prototype for sbuf_hexdump(), and add another copy
  	of the hexdump flag values if they aren't already defined.

  	Ideally the flags should be defined in one place but the
  	implemenation makes it difficult to do properly.  (See
  	above.)

  Sponsored by:	Spectra Logic Corporation

------------------------------------------------------------------------
2015-06-16 02:31:11 +00:00
delphij
201165c60b MFC r283889,r283891:
Clear p_stops when doing PT_DETACH and PROCFS_CTL_DETACH.

Without this, if a process was being traced by truss(1), which
uses different p_stops bits than gdb(1), the latter would
misbehave because of the unexpected bits.

Reported by:	jceel
Submitted by:	sef
Sponsored by:	iXsystems, Inc.
2015-06-15 18:16:23 +00:00
jhb
7b9316ff50 MFC 283546:
Add KTR tracing for some MI ptrace events.
2015-06-13 16:15:43 +00:00
kib
bcf18ca2ab Add chunk missed in the r284199. 2015-06-10 02:44:56 +00:00
kib
6f94b985b6 MFC r283602:
Prevent dounmount() from acting on the freed (although type-stable)
memory by changing the interface to require the mount point to be
referenced.

MFC r283629:
Add missed {}.
2015-06-10 02:27:00 +00:00
kib
2187b2a06e MFC r283601:
Add V_MNTREF flag, to indicate that caller of vn_start*_write() already
owns a reference on the mount point, and the functions can consume it.
2015-06-10 02:20:58 +00:00
kib
49520bf551 MFC r283600:
Perform SU cleanup in the AST handler.  Do not sleep waiting for SU cleanup
while owning vnode lock.

On MFC, for KBI stability, td_su member was moved to the end of the
struct thread.
2015-06-10 02:04:02 +00:00
asomers
fe9825745f MFC r283115
Properly null-terminate strings in a kernel dump header.  A version string
longer than 192 bytes will cause the version field of a dump header to
overflow. strncpy doesn't null terminate it, so savecore will print a
corrupted info file. Using strlcpy fixes the bug.
2015-06-09 19:41:16 +00:00
kib
8adebdc1fb MFC r283735:
Remove several write-only variables.
2015-06-05 08:36:25 +00:00
kib
0a8cb339c3 MFC r283745:
Do not raise priority of the idle thread on singal delivery.
2015-06-05 08:26:38 +00:00
emaste
d77c39991b MFC r259438 by pjd: Fix syscalls that can be loaded as kernel modules
They were not given the flag allowing to call them from capability
  mode sandbox.

And regenerate init_sysent.c

Sponsored by:	The FreeBSD Foundation
2015-06-03 18:33:47 +00:00
emaste
02441119bc MFC r261220 by csjp: Allow sigwait(2) in capabilities mode.
It's common for multi-threaded processes to create a thread for
  the purpose of synchronously processing signals. Allow such processes to
  utilize a capabilities sandbox.
2015-06-03 13:12:08 +00:00
emaste
e6fee85e82 MFC r259436,259437 by pjd: Allow for pselect(2) in capability mode. 2015-06-03 13:10:25 +00:00
emaste
d97b35fcad Regen for r283940. 2015-06-03 11:39:29 +00:00
emaste
2c4dd777cb MFC r257736 (by pjd):
- Remove mac_get_fd/mac_set_fd - those are not syscalls. The
    __mac_get_fd() and __mac_set_fd() syscalls are listed earlier.
  - Correct typo in syscall name. It should be sched_rr_get_interval,
    not sched_rr_getinterval.
2015-06-03 11:36:47 +00:00
kib
80880ee480 MFC r283320:
Always obey thread request to not stop on non-boundary.
2015-05-30 08:54:42 +00:00
markj
9bbe24351d MFC r281915:
Make vpanic() externally visible.

MFC r281916:
Fix DTrace's panic() action.
2015-05-29 04:01:39 +00:00
kib
0d8ee7566b MFC r282708:
On exec, single-threading must be enforced before arguments space is
allocated from exec_map.
2015-05-24 07:32:02 +00:00
ian
b92e1caf1c MFC r279728, r279729, r279756, r279773, r282424, r281367:
Add mutex support to the pps_ioctl() API in the kernel.

  Add PPS support to USB serial drivers.

  Use correct mode variable for PPS support.

  Switch polarity of USB serial PPS events.

  The ftdi "get latency" and "get bitmode" device commands are read
  operations, not writes.

  Implement a mechanism for making changes in the kernel<->driver PPS
  interface without breaking ABI or API compatibility with existing drivers.

  Bump version number to indicate the new PPS ABI version changes in the
  pps_state structure.
2015-05-24 00:53:43 +00:00
ian
132e8a35de MFC r274711:
Stop using early_putc immediately after configuring console with cninit()
2015-05-23 22:34:25 +00:00
kib
09f1502b69 MFC r282690:
Call uma_reclaim() from the additional pagedaemon thread to reclaim kmem
arena address space.
2015-05-23 09:14:29 +00:00
kib
b453b29575 MFC r282944:
Decrement p_boundary_count in the single-threading thread, during making
other thread runnable.  This guarantees that upon return from the
thread_single_end(), p_boundary_count is zero.
2015-05-22 08:11:31 +00:00
ae
511909eef4 MFC r282594:
m_dup() is supposed to give a writable copy of an mbuf chain. It uses
  m_dup_pkthdr(), that uses M_COPYFLAGS mask to copy m_flags field.
  If original mbuf chain has M_RDONLY flag, its copy also will have it.
  Reset this flag explicitly.
2015-05-21 08:28:35 +00:00
hselasky
e136fd019b MFC r280495:
Implement a simple OID number garbage collector. Given the increasing
number of dynamically created and destroyed SYSCTLs during runtime it
is very likely that the current new OID number limit of 0x7fffffff can
be reached. Especially if dynamic OID creation and destruction results
from automatic tests. Additional changes:

- Optimize the typical use case by decrementing the next automatic OID
sequence number instead of incrementing it. This saves searching time
when inserting new OIDs into a fresh parent OID node.

- Add simple check for duplicate non-automatic OID numbers.
2015-05-21 06:30:44 +00:00
kib
0f07927a1e MFC r282679:
Do not return from thread_single(SINGLE_BOUNDARY) until all stopped
thread are guarenteed to be removed from the processors.
2015-05-16 09:13:56 +00:00
rmacklem
ad5b8ec6b4 MFC: r281960
MAXBSIZE defines both the largest UFS block size and the
largest size for a buffer in the buffer cache. This patch
defines a new constant MAXBCACHEBUF, which is the largest
size for a buffer in the buffer cache. Having a separate
constant allows MAXBCACHEBUF to be set larger than MAXBSIZE
on a per-architecture basis, so that NFS can do larger read/writes
for these architectures. It modifies sys/param.h so that BKVASIZE
can also be set on a per-architecture basis.
A couple of cases where NFS used MAXBSIZE instead of NFS_MAXBSIZE
is fixed as well.
2015-05-14 22:50:07 +00:00
avg
a8e9a4b88b MFC r275576: remove opensolaris cyclic code, replace with high-precision callouts 2015-05-11 07:54:39 +00:00
mav
8ef9c9b39e MFC r281860: Make AIO to not allocate pbufs for unmapped I/O like r281825.
While there, make few more performance optimizations.

On 40-core system doing many 512-byte AIO reads from array of raw SSDs
this change removes lock congestions inside pbuf allocator and devfs,
and bottleneck on single AIO completion taskqueue thread.  It improves
peak AIO performance from ~600K to ~1.3M IOPS.
2015-05-06 21:08:16 +00:00
mav
721a41ae3e MFC r281825: Rewrite physio() to not allocate pbufs for unmapped I/O.
pbufs is a limited resource, and their allocator is not SMP-scalable.
So instead of always allocating pbuf to immediately convert it to bio,
allocate bio just here.  If buffer needs kernel mapping, then pbuf is
still allocated, but used only as a source of KVA and storage for a list
of held pages.

On 40-core system doing many 512-byte reads from user level to array of
raw SSDs this change removes huge lock congestion inside pbuf allocator.
It improves peak performance from ~300K to ~1.2M IOPS.  On my previous
24-core system this problem also existed, but was less serious.
2015-05-06 21:06:32 +00:00
kib
e245530f98 MFC r282085:
Partially revert r255986: do not call VOP_FSYNC() when helping
bufdaemon in getnewbuf(), do use buf_flush().  The difference is that
bufdaemon uses TRYLOCK to get buffer locks, which allows calls to
getnewbuf() while another buffer is locked.
2015-05-04 08:16:32 +00:00
kib
d84bf0ab22 MFC r282084:
Fix locking for oshmctl() and shmsys().
2015-05-04 08:13:05 +00:00
mav
bd39d936df MFC r281026, r281108, r281109:
Make ZFS ARC track both KVA usage and fragmentation.

Even on Illumos, with its much larger KVA, ZFS ARC steps back if KVA usage
reaches certain threshold (3/4 on i386 or 16/17 otherwise).  FreeBSD has
even less KVA, but had no such limit on archs with direct map as amd64.
As result, on machines with a lot of RAM, during load with very small user-
space memory pressure, such as `zfs send`, it was possible to reach state,
when there is enough both physical RAM and KVA (I've seen up to 25-30%),
but no continuous KVA range to allocate even single 128KB I/O request.

Address this situation from two sides:
 - restore KVA usage limitations in a way the most close to Illumos;
 - introduce new requirement for KVA fragmentation, specifying that we
should have at least one sequential KVA range of zfs_max_recordsize bytes.

Experiments show that first limitation done alone is not sufficient.  On
machine with 64GB of RAM it is sometimes needed to drop up to half of ARC
size to get at leats one 1MB KVA chunk.  Statically limiting ARC to half
of KVA/RAM is too strict, so second limitation makes it to work in cycles:
accumulate trash up to certain critical mass, do massive spring-cleaning,
and then start littering again.
2015-05-03 07:13:14 +00:00
rmacklem
fba63dddeb MFC: r281562
File systems that do not use the buffer cache (such as ZFS) must
use VOP_FSYNC() to perform the NFS server's Commit operation.
This patch adds a mnt_kern_flag called MNTK_USES_BCACHE which
is set by file systems that use the buffer cache. If this flag
is not set, the NFS server always does a VOP_FSYNC().
This should be ok for old file system modules that do not set
MNTK_USES_BCACHE, since calling VOP_FSYNC() is correct, although
it might not be optimal for file systems that use the buffer cache.
2015-04-30 12:39:24 +00:00
kib
bcef2e5533 MFC r272290 (by mjg):
Use bzero instead of explicitly zeroing stuff in do_execve.
2015-04-27 12:54:04 +00:00
kib
d0bd361bd7 MFC r281696:
Initialize td_sel in the thread_init().

PR:	199518
2015-04-25 08:06:21 +00:00
kib
441214c038 MFC r281883:
Handle incorrect ELF images specifying size for PT_GNU_STACK not being
multiple of page size.
2015-04-25 08:03:36 +00:00
bz
cb88cb87a1 MFC r280786:
Try to unbreak !SMP kernels  broken in r280785 (head), r281657 by using
  the proper macros to access cc_cpu.

Requested by:	jmallett
Pointyhat to:	rrs
2015-04-24 07:52:21 +00:00
kib
4d7f19696a MFC r281003:
Speed up symbol lookup for the amd64 kernel modules.
2015-04-23 07:32:28 +00:00
kib
62ec6c0a01 MFC r281548:
Implement support for binary to request specific stack size for the
initial thread.
2015-04-22 10:57:00 +00:00
pluknet
8ed3961fe9 Fix r281843 mis-merge.
Reported by:	Thomas Mueller tmueller at sysgo com
2015-04-22 10:25:08 +00:00