in-progress count and the vnode. Prior to r188331, we always acquired
the vnode lock before incrementing the object's paging-in-progress count.
Now, we increment it before attempting to acquire the vnode lock with
LK_NOWAIT, but we never sleep acquiring the vnode lock while we have the
count incremented.
Reviewed by: kib
MFC after: 3 days
Before this an earlier writes to a ZVOL opened without FSYNC could get to
ZIL after later writes to the same ZVOL opened with FSYNC. Fix this by
replicating functionality of ZPL (zv_sync_cnt equivalent to z_sync_cnt),
marking all log records sync if anybody opened the ZVOL with FSYNC.
MFC after: 2 weeks
And put them under HN_IFSTART_SUPPORT, which is by default on until
we whack the if_start related bits from base system.
MFC after: 1 week
Sponsored by: Microsoft
Differential Revision: https://reviews.freebsd.org/D8392
Upstream the BUF_TRACKING and FULL_BUF_TRACKING buffer debugging code.
This can be handy in tracking down what code touched hung bios and bufs
last. The full history is especially useful, but adds enough bloat that
it shouldn't be enabled in release builds.
Function names (or arbitrary string constants) are tracked in a
fixed-size ring in bufs. Bios gain a pointer to the upper buf for
tracking. SCSI CCBs gain a pointer to the upper bio for tracking.
Reviewed by: markj
Sponsored by: Dell EMC Isilon
Differential Revision: https://reviews.freebsd.org/D8366
Previously these were only declared under #ifdef SMP in <machine/smp.h>.
However, these variables are defind in pmap.c unconditionally, and efirt.c
references them unconditionally. This fixes non-SMP kernel builds.
Discussed with: kib
MFC after: 1 week
A grant-table user-space device will allow user-space applications to map
and share grants (Xen way to share memory) among Xen domains. This grant
table user-space device has been tested with the QEMU Qdisk Xen backed.
Submitted by: jaggi
Reviewed by: royger
Differential review: https://reviews.freebsd.org/D7293
Add a reference count to xenisrc. This is required for implementation of
unmap-notifications in the grant table userspace device (gntdev). We need to
hold a reference to the event channel port, in case the user deallocates the
port before we send the notification.
Submitted by: jaggi
Reviewed by: royger
Differential review: https://reviews.freebsd.org/D7429
first EFI device if we can't find the one from which the image was loaded.
Reviewed by: allanjude,imp,jhb
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D6780
- Move the SYSINIT to DRIVER/SECOND, i.e. after the vm_guest becomes
determistic.
- Minor style changes.
MFC after: 1 week
Sponsored by: Microsoft
Differential Revision: https://reviews.freebsd.org/D8370
Mainly because the host side only set TCPCS and IPCS even for
UDP datagrams.
MFC after: 1 week
Sponsored by: Microsoft
Differential Revision: https://reviews.freebsd.org/D8369
And use large default temporary channel packer buffer; we really
don't want it to be expanded at run time.
MFC after: 1 week
Sponsored by: Microsoft
Differential Revision: https://reviews.freebsd.org/D8367
Summary:
The hardware does not expose a classic SMBus interface.
Instead it has a lower level interface that can express a far richer
I2C protocol than what smbus offers. However, the interface does not
provide a way to explicitly generate the I2C stop and start conditions.
It's only possible to request that the stop condition is generated
after transferring the next byte in either direction. So, at least
one data byte must always be transferred.
Thus, some I2C sequences are impossible to generate, e.g., an equivalent
of smbus quick command (<start>-<slave addr>-<r/w bit>-<stop>).
At the same time isl(4) and cyapa(4) are moved to iicbus and now they use
iicbus_transfer for communication. Previously they used smbus_trans()
interface that is not defined by the SMBus protocol and was implemented
only by ig4(4). In fact, that interface was impossible to implement
for the typical SMBus controllers like intpm(4) or ichsmb(4) where
a type of the SMBus command must be programmed.
The plan is to remove smbus_trans() and all its uses.
As an aside, the smbus_trans() method deviates from the standard,
but perhaps backwards, FreeBSD convention of using 8-bit slave
addresses (shifted by 1 bit to the left). The method expects
7-bit addresses.
There is a user facing consequence of this change.
A user must now provide device hints for isl and cyapa that specify an iicbus to use
and a slave address on it.
On Chromebook hardware where isl and cyapa devices are commonly found
it is also possible to use a new chromebook_platform(4) driver that
automatically configures isl and cyapa devices. There is no need to
provide the device hints in that case,
Right now smbus(4) driver tries to discover all slaves on the bus.
That is very dangerous. Fortunately, the probing code uses smbus_trans()
to do its job, so it is really enabled for ig4 only.
The plan is to remove that auto-probing code and smbus_trans().
Tested by: grembo, Matthias Apitz <guru@unixarea.de> (w/o
chromebook_platform)
Discussed with: grembo, imp
Reviewed by: wblock (docs)
MFC after: 1 month
Relnotes: yes
Differential Revision: https://reviews.freebsd.org/D8172
It is possible that wrmsr in amd_stop_pmc() causes an overflow in a counter
that it disables. In that case a non-maskable interrupt is generated. The
interrupt handler code was written in such a way that it would re-enable the
counter. That would lead to an unexpected interrupt later on.
This problem was easy to reproduce with
$ pmcstat -T -P instructions -t $pid
if the target process is sufficiently busy and there are context switches from
time to time. There would be a lot of interrupts to "race" with amd_stop_pmc()
called during the context switches. The problem affected only AMD processors.
While there, trace whether amd_intr() claimed an interrupt.
Reviewed by: jhb
MFC after: 2 weeks
Convert vm_fault_hold()'s Boolean variables that are only used
internally to "bool". Add a comment describing why the one
remaining "boolean_t" was not converted.
Reviewed by: kib
MFC after: 8 days
(gpt)zfsboot will read one-time boot directives from a special ZFS pool
area. The area was previously described as "Boot Block Header", but
currently it is know as Pad2, marked as reserved and is zeroed out on
pool creation. The new code interprets data in this area, if any, using
the same format as boot.config. The area is immediately wiped out.
Failure to parse the directives results in a reboot right after the
cleanup. Otherwise the boot sequence proceeds as usual.
zfsbootcfg writes zfsboot arguments specified on its command line to the
Pad2 area of a disk identified by vfs.zfs.boot.primary_pool and
vfs.zfs.boot.primary_vdev kenv variables that are set by loader during
boot. Please see the manual page for more.
Thanks to all who reviewed, contributed and made suggestions! There are
many potential improvements to the feature, please see the review for
details.
Reviewed by: wblock (docs)
Discussed with: jhb, tsoome
MFC after: 3 weeks
Relnotes: yes
Differential Revision: https://reviews.freebsd.org/D7612
not a bitfield. For the intended usage - being passed either MNT_WAIT,
or MNT_NOWAIT - this shouldn't introduce any changes in behaviour.
Reviewed by: jhb@
MFC after: 1 month
Differential Revision: https://reviews.freebsd.org/D8373
The CHANSTS register is a split 64-bit register on CBDMA units before
hardware v3.3. If a torn read happens during ioat_process_events(),
software cannot know when to stop completing descriptors correctly.
So, just use the device-pushed main memory channel status instead.
Remove the ioat_get_active() seatbelt as well. It does nothing if the
completion address is valid.
Sponsored by: Dell EMC Isilon
There are 4 independent knobs in T5+ chips to include or exclude PAUSE
frames from the "total frames" and "multicast frames" counters in either
direction. This change lets the driver deal with any combination of
these settings.
but never released. Since no real hardware was released with this ID,
just drop it from the aacraid driver. This paves the path for future
drivers for hardware that actually has this ID.
Submitted by: Scott Benesh from Microsemi.
Differential Revision: https://reviews.freebsd.org/D8377
MFC After: 3 days
data structure sizes when mounting and reloading UFS/FFS
filesystems by using a u_long rather than an int for the size.
Reported by: Mariusz Zaborski <oshogbo@>
MFC after: 1 week
It allows to avoid extra GEOM providers flapping without significant need.
Since GEOM got resize support, we don't need to reopen provider to get new
size. If provider was orphaned and no longer valid, ZFS should already
know that, and in such case reopen should be done in full as expected.
MFC after: 2 weeks
In case of vdev detach, causing top level mirror vdev destruction, leaf
vdev changes its GUID to one of the destroyed mirror, that creates race
condition when GUID in vdev label may not match one in the pool config.
This change replicates logic nuance of vdev_validate() by adding special
exception, matching the vdev GUID against the top level vdev GUID.
Since this exception is not completely reliable (may give false positives
if we fail to erase label on detached vdev), use it only as last resort.
Quick way to reproduce this scenario now is detach vdev from a pool with
enabled autoextend. During vdev detach autoextend logic tries to reopen
remaining vdev, that always fails now since in-memory configuration is
already updated, while on-disk labels are not yet.
MFC after: 2 weeks
Just using vm_paddr_t value with all bits set.
That should work as long as the type is unsigned.
While there, fix a couple of whitespace issues nearby.
MFC after: 1 week
X-MFC with: r307903
we have to refresh it ... always. This fixes problems reported in NetMap
with em(4) devices after conversion to extended descriptor format in
svn r293331.
Submitted by: luigi@
Reported by: franco@opnsense.org
MFC after: 2 days
The userspace case was already handled by pmap_allocpte(). For kernel
VA, page table page must exist, and demote cannot fail, so we need to
just call pmap_demote_pde(). Also note that due to the machine AS
layout, promotions in the KVA on i386 are highly unlikely, so this
change is mostly for completeness.
Reviewed by: alc, markj
Tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks
Differential revision: https://reviews.freebsd.org/D8323
which also use buffer cache.
Most important addition to the code is the handling of filesystems
where the block size is less than the machine page size, which might
require reading several buffers to validate single page.
Tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks
volume limits. In particular:
- Assert that usemap_alloc() and usemap_free() cluster number argument
is valid.
- In chainlength(), return 0 if cluster start is after the max cluster.
- In chainlength(), cut the calculated cluster chain length at the max
cluster.
- For true paranoia, after the pm_inusemap is calculated in
fillinusemap(), reset all bits in the array for clusters after the
max cluster, as in-use.
Reported and tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks
Use the same logic to calculate the nominal CPU frequency from the P-state
MSRs on family 0x12, 0x15, and 0x16 CPUs as is used for family 0x10.
Family 0x14 was included in the original patch in the PR but I left that
out as the BIOS writer's guide for family 0x14 CPUs show a different layout
for the relevant MSR and include a different formulate for calculating the
frequency.
While here, simplify a few expressions and print out the family of
unsupported CPUs in hex rather than decimal.
PR: 212020
Submitted by: Anthony Jenkins <Scoobi_doo@yahoo.com>
MFC after: 1 month
Differential Revision: https://reviews.freebsd.org/D7587
Reject attempts to read from or memory map offsets in /dev/mem that are
beyond the maximum-supported physical address of the current CPU.
Reviewed by: kib
MFC after: 1 month
Differential Revision: https://reviews.freebsd.org/D7408
- use PCI_VENDOR and PCI_DEVICE ids from a publicly allocated range
(thanks to RedHat)
- export memory pool information through PCI registers
- improve mechanism for configuring passthrough on different hypervisors
Code is from Vincenzo Maffione as a follow up to his GSOC work.
illumos/illumos-gate@260af64db7260af64db7https://www.illumos.org/issues/3746
From the original change log:
It was possible for a reference to be added even with the lock held, and
for references added just after a lock release to be lost.
This bug was also independently found and reported in wesunsolve.net
issues 6985013 6995524.
In zrl_add(), always use an atomic operation to update the refcount.
The mutex in the ZRL only guarantees that wakeups occur for waiters on the
lock. It offers no protection against concurrent updates of the refcount.
The only refcount transition that is safe to perform without an atomic
operation is from ZRL_LOCKED back to 0, since this can only be performed
by the thread which has the ZRL locked.
Authored by: Will Andrews <will@freebsd.org>
Reviewed by: Boris Protopopov <bprotopopov@hotmail.com>
Reviewed by: Pavel Zakharov <pavel.zakha@gmail.com>
Reviewed by: Yuri Pankov <yuri.pankov@gmail.com>
Reviewed by: Justin T. Gibbs <gibbs@scsiguy.com>
Approved by: Matt Ahrens <mahrens@delphix.com>
Author: Youzhong Yang <yyang@mathworks.com>
PR: 204037
MFC after: 1 week
This paves way for more chimney sending buffer reorganization.
MFC after: 1 week
Sponsored by: Microsoft
Differential Revision: https://reviews.freebsd.org/D8343
directory create and delete operations. If it ever finds a directory
with a link count less than 2, it panics. Thus, an rm -rf that
encounters a directory with a link count below 2 causes a kernel
panic. The proposed fix is to return the error EINVAL rather than
panicing. The effect is that the requested operation is not done,
but the system continues to run. At a more convenient later time,
the filesystem can be unmounted and cleaned (with fsck or journal
run). Once cleaned, the operation can be rerun to successful
completion.
This fix takes that approach. The panic message has been converted
into a uprintf(9) to provide the user with the inode number and
filesystem mount point of the offending directory and EINVAL is
returned for the operation.
The long (three year) delay in fixing this problem occurred because
the bug was misclassified when originally assigned and only this week
was found during a sweep of old unresolved bug reports.
PR: 180894
Reviewed by: kib
MFC after: 2 weeks
EFER_NXE is set in the EFER MSR by initializecpu() and must be set on all
CPUs in the system. When PG_NX support was added to PAE on i386, the
block to enable EFER_NXE was placed in a section of initializecpu() that
only runs if 'cpu == CPU_686'. During early boot, locore does an
initial pass to set cpu that sets it to CPU_686 on all CPUs later than
a Pentium. Later, printcpuinfo() adjusts the 'cpu' variable on
PII and later CPUs to one of CPU_PII, CPU_PIII, or CPU_P4. However,
printcpuinfo() is called after initializecpu() on the BSP, so the BSP
would enable EFER_NXE and pg_nx. The APs execute initializecpu() much
later after printcpuinfo() has run. The end result on a modern CPU was
that cpu was set to CPU_PIII when the APs invoked initializecpu(), so
they did not enable EFER_NXE. As a result, the APs would fault when
trying to access any pages marked with PG_NX set.
When booting a 2 CPU PAE kernel in bhyve this manifested as a hang before
single user mode. The attempt to execute /bin/init tried to copy out
the exec strings (argv, etc.) to a non-executable mapping while running
on the AP. The instruction kept faulting due to invalid bits in the PTE
in an infinite loop.
Fix this by moving the code to enable EFER_NXE out of the switch statement
on 'cpu' and always doing it if 'amd_feature' supports AMDID_NX.
MFC after: 2 weeks
Add missing fields ('sr' and 'mc_tls') to 'struct sigcontext'.
The kernel doesn't use 'struct sigcontext' but instead uses 'ucontext_t'
which includes 'mcontext_t' in 'struct sigframe' to build the signal frame.
As a result, this change is not an ABI change but simply making
'struct sigcontext' correct. Note that 'struct sigcontext' is only used
for "Traditional BSD style" signal handlers.
While here, rename the 'xxx' field to '__spare__' to match 'mcontext_t'.
Sponsored by: DARPA, AFRL
introduced with r261242. The useful and expected soisconnected()
call is done in tcp_do_segment().
Has been found as part of unrelated PR:212920 investigation.
Improve slightly (~2%) the maximum number of TCP accept per second.
Tested by: kevin.bowling_kev009.com, jch
Approved by: gnn, hiren
MFC after: 1 week
Sponsored by: Verisign, Inc
Differential Revision: https://reviews.freebsd.org/D8072
them. Previously this would walk past the end of the array and print
whatever happened to be after the trapframe struct.
MFC after: 1 week
Sponsored by: DARPA, AFRL
And use it for vmbus channel logging, which can log the channel
owner's name properly, instead of vmbus0.
Submitted by: QianYue You <t-youqi microsoft com>
MFC after: 1 week
Sponsored by: Microsoft
The directly following m_defrag() call can wait, so there is no reason this
call can't as well.
Reported by: Coverity
CID: 1353551
Sponsored by: Dell EMC Isilon
FICL definitions not in ficl/ficl32 files broke this generally. This
makes that stuff conditional on BOOT_FORTH. Also, move definitions
related to the architecture (FICL_CPUARCH and friends) into
Makefile.ficl that all parts of the tree that include files with ficl
need to include (but only if MK_FORTH == yes). In addition, had to fix
library ordering issue with LIBSTAND to keep it last. Without boot
forth, there's no references to memset to bring in memset.o from
libstand.a to satisfy libgeliboot.a's use of it. Listing libstand last
solves this issue (and it's the proper place for libstand to boot).
PLL1 is used by the cpu core, allowing changing freq is needed for cpufreq.
The factors table contains all the frequencies in the operating point table
present in the DTS.
MFC after: 1 week
these show a 9-10% reduction in user and system time for a buildworld -j48.
Obtained from: ABT Systems Ltd
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
* On arm64 we need to use the ${MACHINE_CPUARCH} subdirectory.
* env.c is only needed when using forth so only build it there.
Sponsored by: ABT Systems Ltd
hardware that supports the mp extensions. If so it should use the broadcast
tlb invalidate instructions as other CPUs or devices may need to know about
the invalidation.
To simplify the code have the compiler optimise out the else case when not
builing for Cortex-A8.
Sponsored by: ABT Systems Ltd
Differential Revision: https://reviews.freebsd.org/D8092
To achieve that the whole svm_softc is allocated with contigmalloc now.
It would be more effient to de-embed those arrays and allocate only them
with contigmalloc.
Previously, if malloc(9) used non-contiguous pages for the arrays, then
random bits in physical pages next to the first page would be used to
determine permissions for I/O port and MSR accesses. That could result
in a guest dangerously modifying the host hardware configuration.
One example is that sometimes NMI watchdog driver in a Linux guest
would be able to configure a performance counter on a host system.
The counter would generate an interrupt and if hwpmc(4) driver is loaded
on the host, then the interrupt would be delivered as an NMI.
Discussed with: jhb
Reviewed by: grehan
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D8321
loss event but not use or obay the recommendations i.e. values set by it in some
cases.
Here is an attempt to solve that confusion by following relevant RFCs/drafts.
Stack only sets congestion window/slow start threshold values when there is no
CC module availalbe to take that action. All CC modules are inspected and
updated when needed to take appropriate action on loss.
tcp_stacks/fastpath module has been updated to adapt these changes.
Note: Probably, the most significant change would be to not bring congestion
window down to 1MSS on a loss signaled by 3-duplicate acks and letting
respective CC decide that value.
In collaboration with: Matt Macy <mmacy at nextbsd dot org>
Discussed on: transport@ mailing list
Reviewed by: jtl
MFC after: 1 month
Sponsored by: Limelight Networks
Differential Revision: https://reviews.freebsd.org/D8225
In r304435, ip_output() was changed to use the result of the route
lookup to decide whether the outgoing packet was a broadcast or
not. This introduced a regression on interfaces where
IFF_BROADCAST was not set (e.g. point-to-point links), as the
algorithm could incorrectly treat the destination address as a
broadcast address, and ip_output() would subsequently drop the
packet as broadcasting on a non-IFF_BROADCAST interface is not
allowed.
Differential Revision: https://reviews.freebsd.org/D8303
Reviewed by: jtl
Reported by: ambrisko
MFC after: 2 weeks
X-MFC-With: r304435
Sponsored by: Dell EMC Isilon
- Make !KDB config buildable.
- Simplify interface to nmi_handle_intr() by evaluating panic_on_nmi
in one place, namely nmi_call_kdb(). This allows to remove do_panic
argument from the functions, and to remove i386/amd64 duplication of
the variable and sysctl definitions. Note that now NMI causes
panic(9) instead of trap_fatal() reporting and then panic(9),
consistently for NMIs delivered while CPU operated in ring 0 and 3.
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks
environment variable to allow conditional compilation based on EFI
being present or not. Provide efi-setenv, efi-getenv, and
efi-unsetenv, though those need improvement. Move the efi definition
to libefi (but include a reference so they get included).
Push-Pull Two Wire interface is a almost compatible iic like bus used
in sun6i SoC. It's only use is to communicate with the power management IC.
Reviewed by: jmcneill
MFC after: 1 week
Relnotes: yes
contiguous regions in an mbuf chain.
If the payload of an mbuf ends at a page boundary count_mbuf_nsegs would
incorrectly consider the next mbuf's payload physically contiguous based
solely on a KVA comparison.
MFC after: 1 week
Sponsored by: Chelsio Communications