Use the monotonic (uptime) counter rather than time-of-day to measure
elapsed time between ntp_adjtime() clock offset adjustments. This
eliminates spurious frequency steering after a large clock step (such
as a 1970->2015 step on a system with no battery-backed clock hardware).
This problem was discovered after the import of ntpd 4.2.8, which does
things in a slightly different (but still correct) order than the 4.2.4
we had previously. In particular, 4.2.4 would step the clock then
immediately after use ntp_adjtime() to set the frequency and offset to
zero, which captured the post-step time-of-day as a side effect. In
4.2.8, ntpd sets frequency and offset to zero before any initial clock
step, capturing the time as 1970-ish, then when it next calls
ntp_adjtime() it's with a non-zero offset measurement. This non-zero
value gets multiplied by the apparent 45-year interval, which blows up
into a completely bogus frequency steer. That gets clamped to 500ppm,
but that's still enough to make the clock drift so fast that ntpd has
to keep stepping it every few minutes to compensate.
Approved by: re (gjb)
Allow passthrough devices to be hinted.
MFC r279683:
When ICW1 is issued the edge sense circuit is reset which means that
following an initialization a low-to-high transistion is necesary to
generate an interrupt.
MFC r279925:
Add -p parameter to list PCI device to pass through to the guest.
MFC r281559:
Fix handling of BUS_PROBE_NOWILDCARD in 'device_probe_child()'.
MFC r280447:
When fetching an instruction in non-64bit mode, consider the value of the
code segment base address.
MFC r280725:
Move legacy interrupt allocation for virtio devices to common code.
MFC r280775:
Fix the RTC device model to operate correctly in 12-hour mode.
MFC r280929:
Fix "MOVS" instruction memory to MMIO emulation.
MFC r280968:
Display instruction bytes and %rip prior to aborting due to an instruction
emulation error.
MFC r281145:
Enhance the support for Group 1 Extended opcodes for CMP, AND, OR instructions.
MFC r281542:
Initialize 'error' before use (Coverity IDs 1249748, 1249747, 1249751, 1249749)
MFC r281561:
Prior to aborting due to an ioport error, it is always interesting to see what
the guest's %rip is.
MFC r281611:
If the number of guest vcpus is less than '1' then flag it as an error.
MFC r281612:
Prefer 'vcpu_should_yield()' over checking 'curthread->td_flags' directly.
MFC r281630:
Relax the check on which vectors can be delivered through the APIC. According
to the Intel SDM vectors 16 through 255 are allowed to be delivered via the
local APIC.
MFC r281879:
Missing break in switch case (Coverity ID 1292499)
MFC r281946:
Don't allow guest to modify readonly bits in the PCI config 'status' register.
MFC r281987:
STOS/STOSB/STOSW/STOSD/STOSQ instruction emulation.
MFC r282206:
Implement the century byte in the RTC.
leave it initialized at NULL.
Since the affected functions where moved from sys/kern/uipc_syscalls.c
to sys/netinet/sctp_syscalls.c it was not possible to MFC r284613.
Therefore, this is a direct commit with the corresponding changes of r284613.
Reported by: Coverity
CID: 1018058, 1018060
Add kern.racct.enable tunable and RACCT_DISABLED config option.
The point of this is to be able to add RACCT (with RACCT_DISABLED)
to GENERIC, to avoid having to rebuild the kernel to use rctl(8).
MFC r282901:
Build GENERIC with RACCT/RCTL support by default. Note that it still
needs to be enabled by adding "kern.racct.enable=1" to /boot/loader.conf.
Note those two are MFC-ed together, because the latter one changes the
name of RACCT_DISABLED option to RACCT_DEFAULT_TO_DISABLED. Should have
committed the renaming separately...
Relnotes: yes
Sponsored by: The FreeBSD Foundation
witness: don't warn about matrix inconsistencies without holding the mutex
Lock order checking is done without the witness mutex held, so multiple
threads that are racing to establish a new lock order may read matrix
entries that are in an inconsistent state. Don't print a warning in this
case, but instead just redo the check after taking the witness lock.
------------------------------------------------------------------------
r284192 | ken | 2015-06-09 15:39:38 -0600 (Tue, 09 Jun 2015) | 102 lines
Add support for reading MAM attributes to camcontrol(8) and libcam(3).
MAM is Medium Auxiliary Memory and is most commonly found as flash
chips on tapes.
This includes support for reading attributes and decoding most
known attributes, but does not yet include support for writing
attributes or reporting attributes in XML format.
libsbuf/Makefile:
Add subr_prf.c for the new sbuf_hexdump() function. This
function is essentially the same function.
libsbuf/Symbol.map:
Add a new shared library minor version, and include the
sbuf_hexdump() function.
libsbuf/Version.def:
Add version 1.4 of the libsbuf library.
libutil/hexdump.3:
Document sbuf_hexdump() alongside hexdump(3), since it is
essentially the same function.
camcontrol/Makefile:
Add attrib.c.
camcontrol/attrib.c:
Implementation of READ ATTRIBUTE support for camcontrol(8).
camcontrol/camcontrol.8:
Document the new 'camcontrol attrib' subcommand.
camcontrol/camcontrol.c:
Add the new 'camcontrol attrib' subcommand.
camcontrol/camcontrol.h:
Add a function prototype for scsiattrib().
share/man/man9/sbuf.9:
Document the existence of sbuf_hexdump() and point users to
the hexdump(3) man page for more details.
sys/cam/scsi/scsi_all.c:
Add a table of known attributes, text descriptions and
handler functions.
Add a new scsi_attrib_sbuf() function along with a number
of other related functions that help decode attributes.
scsi_attrib_ascii_sbuf() decodes ASCII format attributes.
scsi_attrib_int_sbuf() decodes binary format attributes, and
will pass them off to scsi_attrib_hexdump_sbuf() if they're
bigger than 8 bytes.
scsi_attrib_vendser_sbuf() decodes the vendor and drive
serial number attribute.
scsi_attrib_volcoh_sbuf() decodes the Volume Coherency
Information attribute that LTFS writes out.
sys/cam/scsi/scsi_all.h:
Add a number of attribute-related structure definitions and
other defines.
Add function prototypes for all of the functions added in
scsi_all.c.
sys/kern/subr_prf.c:
Add a new function, sbuf_hexdump(). This is the same as
the existing hexdump(9) function, except that it puts the
result in an sbuf.
This also changes subr_prf.c so that it can be compiled in
userland for includsion in libsbuf.
We should work to change this so that the kernel hexdump
implementation is a wrapper around sbuf_hexdump() with a
statically allocated sbuf with a drain. That will require
a drain function that goes to the kernel printf() buffer
that can take a non-NUL terminated string as input.
That is because an sbuf isn't NUL-terminated until it is
finished, and we don't want to finish it while we're still
using it.
We should also work to consolidate the userland hexdump and
kernel hexdump implemenatations, which are currently
separate. This would also mean making applications that
currently link in libutil link in libsbuf.
sys/sys/sbuf.h:
Add the prototype for sbuf_hexdump(), and add another copy
of the hexdump flag values if they aren't already defined.
Ideally the flags should be defined in one place but the
implemenation makes it difficult to do properly. (See
above.)
Sponsored by: Spectra Logic Corporation
------------------------------------------------------------------------
Clear p_stops when doing PT_DETACH and PROCFS_CTL_DETACH.
Without this, if a process was being traced by truss(1), which
uses different p_stops bits than gdb(1), the latter would
misbehave because of the unexpected bits.
Reported by: jceel
Submitted by: sef
Sponsored by: iXsystems, Inc.
Prevent dounmount() from acting on the freed (although type-stable)
memory by changing the interface to require the mount point to be
referenced.
MFC r283629:
Add missed {}.
Perform SU cleanup in the AST handler. Do not sleep waiting for SU cleanup
while owning vnode lock.
On MFC, for KBI stability, td_su member was moved to the end of the
struct thread.
Properly null-terminate strings in a kernel dump header. A version string
longer than 192 bytes will cause the version field of a dump header to
overflow. strncpy doesn't null terminate it, so savecore will print a
corrupted info file. Using strlcpy fixes the bug.
It's common for multi-threaded processes to create a thread for
the purpose of synchronously processing signals. Allow such processes to
utilize a capabilities sandbox.
- Remove mac_get_fd/mac_set_fd - those are not syscalls. The
__mac_get_fd() and __mac_set_fd() syscalls are listed earlier.
- Correct typo in syscall name. It should be sched_rr_get_interval,
not sched_rr_getinterval.
Add mutex support to the pps_ioctl() API in the kernel.
Add PPS support to USB serial drivers.
Use correct mode variable for PPS support.
Switch polarity of USB serial PPS events.
The ftdi "get latency" and "get bitmode" device commands are read
operations, not writes.
Implement a mechanism for making changes in the kernel<->driver PPS
interface without breaking ABI or API compatibility with existing drivers.
Bump version number to indicate the new PPS ABI version changes in the
pps_state structure.
Decrement p_boundary_count in the single-threading thread, during making
other thread runnable. This guarantees that upon return from the
thread_single_end(), p_boundary_count is zero.
m_dup() is supposed to give a writable copy of an mbuf chain. It uses
m_dup_pkthdr(), that uses M_COPYFLAGS mask to copy m_flags field.
If original mbuf chain has M_RDONLY flag, its copy also will have it.
Reset this flag explicitly.
Implement a simple OID number garbage collector. Given the increasing
number of dynamically created and destroyed SYSCTLs during runtime it
is very likely that the current new OID number limit of 0x7fffffff can
be reached. Especially if dynamic OID creation and destruction results
from automatic tests. Additional changes:
- Optimize the typical use case by decrementing the next automatic OID
sequence number instead of incrementing it. This saves searching time
when inserting new OIDs into a fresh parent OID node.
- Add simple check for duplicate non-automatic OID numbers.
MAXBSIZE defines both the largest UFS block size and the
largest size for a buffer in the buffer cache. This patch
defines a new constant MAXBCACHEBUF, which is the largest
size for a buffer in the buffer cache. Having a separate
constant allows MAXBCACHEBUF to be set larger than MAXBSIZE
on a per-architecture basis, so that NFS can do larger read/writes
for these architectures. It modifies sys/param.h so that BKVASIZE
can also be set on a per-architecture basis.
A couple of cases where NFS used MAXBSIZE instead of NFS_MAXBSIZE
is fixed as well.
While there, make few more performance optimizations.
On 40-core system doing many 512-byte AIO reads from array of raw SSDs
this change removes lock congestions inside pbuf allocator and devfs,
and bottleneck on single AIO completion taskqueue thread. It improves
peak AIO performance from ~600K to ~1.3M IOPS.
pbufs is a limited resource, and their allocator is not SMP-scalable.
So instead of always allocating pbuf to immediately convert it to bio,
allocate bio just here. If buffer needs kernel mapping, then pbuf is
still allocated, but used only as a source of KVA and storage for a list
of held pages.
On 40-core system doing many 512-byte reads from user level to array of
raw SSDs this change removes huge lock congestion inside pbuf allocator.
It improves peak performance from ~300K to ~1.2M IOPS. On my previous
24-core system this problem also existed, but was less serious.
Partially revert r255986: do not call VOP_FSYNC() when helping
bufdaemon in getnewbuf(), do use buf_flush(). The difference is that
bufdaemon uses TRYLOCK to get buffer locks, which allows calls to
getnewbuf() while another buffer is locked.
Make ZFS ARC track both KVA usage and fragmentation.
Even on Illumos, with its much larger KVA, ZFS ARC steps back if KVA usage
reaches certain threshold (3/4 on i386 or 16/17 otherwise). FreeBSD has
even less KVA, but had no such limit on archs with direct map as amd64.
As result, on machines with a lot of RAM, during load with very small user-
space memory pressure, such as `zfs send`, it was possible to reach state,
when there is enough both physical RAM and KVA (I've seen up to 25-30%),
but no continuous KVA range to allocate even single 128KB I/O request.
Address this situation from two sides:
- restore KVA usage limitations in a way the most close to Illumos;
- introduce new requirement for KVA fragmentation, specifying that we
should have at least one sequential KVA range of zfs_max_recordsize bytes.
Experiments show that first limitation done alone is not sufficient. On
machine with 64GB of RAM it is sometimes needed to drop up to half of ARC
size to get at leats one 1MB KVA chunk. Statically limiting ARC to half
of KVA/RAM is too strict, so second limitation makes it to work in cycles:
accumulate trash up to certain critical mass, do massive spring-cleaning,
and then start littering again.
File systems that do not use the buffer cache (such as ZFS) must
use VOP_FSYNC() to perform the NFS server's Commit operation.
This patch adds a mnt_kern_flag called MNTK_USES_BCACHE which
is set by file systems that use the buffer cache. If this flag
is not set, the NFS server always does a VOP_FSYNC().
This should be ok for old file system modules that do not set
MNTK_USES_BCACHE, since calling VOP_FSYNC() is correct, although
it might not be optimal for file systems that use the buffer cache.