Previously, debug exceptions were only enabled on the boot CPU if
DDB was enabled in the dbg_monitor_init() function. APs also called
this function, but since mp_machdep.c doesn't include opt_ddb.h, the
APs ended up calling an empty stub defined in <machine/debug_monitor.h>
instead of the real function. Also, if DDB was not enabled in the kernel,
the boot CPU would not enable debug exceptions.
Fix this by adding a new dbg_init() function that always clears the OS
lock to enable debug exceptions which the boot CPU and the APs call.
This function also calls dbg_monitor_init() to enable hardware breakpoints
from DDB on all CPUs if DDB is enabled. Eventually base support for
hardware breakpoints/watchpoints will need to move out of the DDB-only
debug_monitor.c for use by userland debuggers.
Reviewed by: andrew
Differential Revision: https://reviews.freebsd.org/D12001
Only fetch the VFP state from the CPU if the thread whose registers are
being requested is the current thread. If a stopped thread's registers
are being fetched by a debugger, the saved state in the PCB is already
valid.
Reviewed by: andrew
MFC after: 1 week
generic driver with minimal feature support for a large number of chips.
More featureful per-chip drivers might exist (especially out-of-tree) and
those should win the bidding even if they use BUS_PROBE_DEFAULT.
the current state to determine whether to generate a link-state change
notification. This fixes a bug introduced in r321063 that caused the
driver to sometimes skip these notifications.
Reported by: Jason Eggleston @ LLNW
MFC after: 3 days
Sponsored by: Chelsio Communications
Added in r238189, ARM_MANY_BOARD adds support for multiple ARM boards in
a single kernel. Include it for LINT builds to avoid duplicate symbol
errors when linking with lld.
Sponsored by: The FreeBSD Foundation
The encrypted kernel dump code writes data in blocks of this size. A buffer
size of 4096 allows encrypted dumps to work with 4Kn drives.
Reviewed by: cem
Sponsored by: Dell EMC Isilon
Differential Revision: https://reviews.freebsd.org/D11870
removes the only reference to atrtc_set() from outside of atrtc.c, so make
it static.
The xen timer driver registers as a realtime clock with 1us resolution. In
the past that resulted in only the xen timer's clock_settime() getting
called, so it would call atrtc_set() to set the hardware clock as well. As
of r32090, the clock_settime() method of all registered realtime clocks gets
called, so the xen driver no longer needs to chain-call the lower-resolution
driver.
Thanks to royger@ for talking me through the xen stuff, and for testing.
M_PMC is defined in sys/dev/hwpmc/hwpmc_mod.c, and the LINT kernel build
fails when linking with lld due to a duplicate symbol error.
Sponsored by: The FreeBSD Foundation
TCP connections (order of tens of thousands), with predominantly Transmits.
Choice to perform receive operations either in IThread or Taskqueue Thread.
Submitted by:Vaishali.Kulkarni@cavium.com
MFC after:5 days
cpumask it probably means we are unable to sent interrupts to CPUs outside
the map. As such only return the current CPU when it's within the mask
otherwise return the first valid CPU.
This is needed on ThunderX as, in a dual socket configuration, we are
unable to send MSI/MSI-X interrupts between sockets.
Reviewed by: mmel
Sponsored by: DARPA, AFRL
Differential Revision: https://reviews.freebsd.org/D11957
in the VM area structure in the LinuxKPI when doing mmap() and that
unsupported bits are masked away.
While at it fix some redundant use of parenthesing inside some related
macros.
Found by: KrishnamRaju ErapaRaju <Krishna2@chelsio.com>
MFC after: 1 week
Sponsored by: Mellanox Technologies
Such drivers attach to a vgapci bus rather than directly to a pci bus. For
the rest of the LinuxKPI to work correctly in this case, we override the
vgapci bus' ivars with those of the grandparent.
Reviewed by: hselasky
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D11932
We can remove some unnecessary object radix tree lookups by using the
object memq to iterate over pages in the specified range. This does not,
however, eliminate the lookup needed in vm_page_free_toq() to remove each
tree entry.
Reviewed by: alc, kib (previous revision)
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D11945
When the mps(4) and mpr(4) drivers need to reinitialize the
firmware, they sometimes need to reallocate all of the memory
allocated by the driver. The reallocation happens whenever the IOC
Facts change. That should only happen after a firmware upgrade.
If the reinitialization happens as a result of a timed out command
sent to the card, the command that timed out and triggered the
reinit may have been freed if iocfacts_allocate() reallocated all
memory. If the caller attempts to access the command after that,
the kernel will panic because the caller will be dereferencing
freed memory.
The solution is to set a flag in the softc when we reallocate,
and avoid dereferencing the command strucure if we've reallocated.
The changes are largely the same in both drivers, since mpr(4) is a
derivative of mps(4).
o In iocfacts_allocate(), if the IOC Facts have changed and we
need to reallocate, set the REALLOCATED flag in the softc.
o Change wait_command() to take a struct mps_command ** instead of
a struct mps_command *. This allows us to NULL out the caller's
command pointer if we have to reinit the controller and the data
structures get reallocated. (The REALLOCATED flag will be set
in the softc if that has happened.)
o In every place that calls wait_command(), make sure we handle
the case where the command is NULL after the call.
o The mpr(4) driver has mpr_request_polled() which can also
reinitialize the card. Also check for reallocation there.
Reviewed by: scottl, slm
MFC after: 1 week
Sponsored by: Spectra Logic
New version is not compatible on supervisor mode with v1.9.1
(previous version).
Highlights:
o BBL (Berkeley Boot Loader) provides no initial page tables
anymore allowing us to choose VM, to build page tables manually
and enable MMU in S-mode.
o SBI interface changed.
o GENERIC kernel.
FDT is now chosen standard for RISC-V hardware description.
DTB is now provided by Spike (golden model simulator). This
allows us to introduce GENERIC kernel. However, description
for console and timer devices is not provided in DTB, so move
these devices temporary to nexus bus.
o Supervisor can't access userspace by default. Solution is to
set SUM (permit Supervisor User Memory access) bit in sstatus
register.
o Compressed extension is now turned on by default.
o External GCC 7.1 compiler used.
o _gp renamed to __global_pointer$
o Compiler -march= string is now in use allowing us to choose
required extensions (compressed, FPU, atomic, etc).
Sponsored by: DARPA, AFRL
Differential Revision: https://reviews.freebsd.org/D11800
This patch modifies function ofw_fdt_setprop (called by OF_setprop),
so that it can add property, when replacing is not possible.
Adding property is needed to fixup FDT's that have missing
properties.
Submitted by: Patryk Duda <pdk@semihalf.com>
Reviewed by: nwhitehorn, cognet (mentor)
Approved by: cognet (mentor)
Obtained from: Semihalf
Differential Revision: https://reviews.freebsd.org/D11879
after r319757.
1) Correct the return value from __wait_event_common() from 1 to 0 in
case the timeout is specified as MAX_SCHEDULE_TIMEOUT. In the other
case __ret is zero and will be substituted in the last part of the
macro with the appropriate value before return.
2) Make sure the "timeout" argument is casted to "int" before
evaluating negativity. Else the signedness of a "long" might be
checked instead of the signedness of an integer.
3) The wait_event() function should not have a return value.
Found by: KrishnamRaju ErapaRaju <Krishna2@chelsio.com>
MFC after: 1 week
Sponsored by: Mellanox Technologies
handles a timeout value of MAX_SCHEDULE_TIMEOUT which basically means there
is no timeout. This is a regression issue after r319757.
While at it change the type of returned variable from "long" to "int" to
match the actual return type.
MFC after: 1 week
Sponsored by: Mellanox Technologies
Introduce a new define to take int account the xAPIC ID limit, for
systems where x2APIC is not available/reliable.
Also change some of the usages of the APIC ID to use an unsigned int
(which is the correct storage type to deal with x2APIC IDs as found in
x2APIC MADT entries).
This allows booting FreeBSD on a box with 256 CPUs and APIC IDs up to
295:
FreeBSD/SMP: Multiprocessor System Detected: 256 CPUs
FreeBSD/SMP: 1 package(s) x 64 core(s) x 4 hardware threads
Package HW ID = 0
Core HW ID = 0
CPU0 (BSP): APIC ID: 0
CPU1 (AP/HT): APIC ID: 1
CPU2 (AP/HT): APIC ID: 2
CPU3 (AP/HT): APIC ID: 3
[...]
Core HW ID = 73
CPU252 (AP): APIC ID: 292
CPU253 (AP/HT): APIC ID: 293
CPU254 (AP/HT): APIC ID: 294
CPU255 (AP/HT): APIC ID: 295
Submitted by: kib (previous version)
Relnotes: yes
MFC after: 1 month
Reviewed by: kib
Differential revision: https://reviews.freebsd.org/D11913
So that MAX_APIC_ID can be bumped without wasting memory.
Note that the usage of MAX_APIC_ID in the SRAT parsing forces the
parser to allocate memory directly from the phys_avail physical memory
array, which is not the best approach probably, but I haven't found
any other way to allocate memory so early in boot. This memory is not
returned to the system afterwards, but at least it's sized according
to the maximum APIC ID found in the MADT table.
Sponsored by: Citrix Systems R&D
MFC after: 1 month
Reviewed by: kib
Differential revision: https://reviews.freebsd.org/D11912
Populate the lapics arrays and call cpu_add/lapic_create in the setup
phase instead. Also store the max APIC ID found in the newly
introduced max_apic_id global variable.
This is a requirement in order to make the static arrays currently
using MAX_LAPIC_ID dynamic.
Sponsored by: Citrix Systems R&D
MFC after: 1 month
Reviewed by: kib
Differential revision: https://reviews.freebsd.org/D11911
The amd64 build of boot2 was failing with gcc 6.3.0 due to being more
than 1 kB too large. It was apparently generating a .eh_frame section
which was not being removed by objcopy -S. The .eh_frame section seems
to be mandatory per the amd64 ABI, but boot2 is compiled for i386 (uses
-m32), and therefore should be optional in this context. Suppress
generation of .eh_frame with the -fno-asynchronous-unwind-tables flag to
gcc. This saves 1348 bytes (the limit is 7680 bytes).
Reviewed by: dim, imp
Approved by: markj (mentor)
Sponsored by: Dell EMC Isilon
Differential Revision: https://reviews.freebsd.org/D11928
key_msg2sp() is used for parsing data from setsockopt(IP[V6]_IPSEC_POLICY)
call. This socket option is usually used to configure IPsec bypass for
socket. Only privileged user can set this socket option.
The message syntax is described here
http://www.kame.net/newsletter/20021210/
and our libipsec is usually used to create the correct request.
Add additional checks:
* that sadb_x_ipsecrequest_len is not out of bounds of user supplied buffer
* that src/dst's sa_len is the same
* that 2*sa_len is not out of bounds of user supplied buffer
* that 2*sa_len fits into bounds of sadb_x_ipsecrequest
Reported by: Ilja van Sprundel
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D11796
reduces diff between amd64 and i386. Also, it fixes a regression introduced
in r322076, i.e., identify_hypervisor() failed to identify some hypervisors.
This function assumes cpu_feature2 is already initialized.
Reported by: dexuan
Tested by: dexuan
geom_bsd, geom_mbr and geom_sunlabel have been obsolete since Marcel
Moolenaar's geom_part was in FreeBSD 7. They haven't been in GENERIC
since FreeBSD 8. Add warning when used.
geom_vol_ffs has been obsolete since ufs support to geom_label was
committed in FreeBSD 5. It hasn't been in GENERIC since FreeBSD 5.
Add warning when used.
geom_fox has been obsolete since gmultipath was committed in FreeBSD 7.
(no warning added, since this is a very obscure class).
These will all be removed in FreeBSD 12.
MFC After: 3 days
Differential Revision: https://reviews.freebsd.org/D11935
Note: Classes will be removed after MFC
New flag 0x4 can be configured in net.enc.[in|out].ipsec_bpf_mask.
When it is set, if_enc(4) additionally captures a packet via BPF after
invoking pfil hook. This may be useful for debugging.
MFC after: 2 weeks
Sponsored by: Yandex LLC
Differential Revision: https://reviews.freebsd.org/D11804
is in VF or SRIOV mode typically in a virtual machine environment.
Submitted by: Sepherosa Ziehau <sephe@dragonflybsd.org>
MFC after: 3 days
Sponsored by: Mellanox Technologies
How network VF works with hn(4) on Hyper-V in transparent mode:
- Each network VF has a cooresponding hn(4).
- The network VF and the it's cooresponding hn(4) have the same hardware
address.
- Once the network VF is attached, the cooresponding hn(4) waits several
seconds to make sure that the network VF attach routing completes, then:
o Set the intersection of the network VF's if_capabilities and the
cooresponding hn(4)'s if_capabilities to the cooresponding hn(4)'s
if_capabilities. And adjust the cooresponding hn(4) if_capable and
if_hwassist accordingly. (*)
o Make sure that the cooresponding hn(4)'s TSO parameters meet the
constraints posed by both the network VF and the cooresponding hn(4).
(*)
o The network VF's if_input is overridden. The overriding if_input
changes the input packet's rcvif to the cooreponding hn(4). The
network layers are tricked into thinking that all packets are
neceived by the cooresponding hn(4).
o If the cooresponding hn(4) was brought up, bring up the network VF.
The transmission dispatched to the cooresponding hn(4) are
redispatched to the network VF.
o Bringing down the cooresponding hn(4) also brings down the network
VF.
o All IOCTLs issued to the cooresponding hn(4) are pass-through'ed to
the network VF; the cooresponding hn(4) changes its internal state
if necessary.
o The media status of the cooresponding hn(4) solely relies on the
network VF.
o If there are multicast filters on the cooresponding hn(4), allmulti
will be enabled on the network VF. (**)
- Once the network VF is detached. Undo all damages did to the
cooresponding hn(4) in the above item.
NOTE:
No operation should be issued directly to the network VF, if the
network VF transparent mode is enabled. The network VF transparent mode
can be enabled by setting tunable hw.hn.vf_transparent to 1. The network
VF transparent mode is _not_ enabled by default, as of this commit.
The benefit of the network VF transparent mode is that the network VF
attachment and detachment are transparent to all network layers; e.g. live
migration detaches and reattaches the network VF.
The major drawbacks of the network VF transparent mode:
- The netmap(4) support is lost, even if the VF supports it.
- ALTQ does not work, since if_start method cannot be properly supported.
(*)
These decisions were made so that things will not be messed up too much
during the transition period.
(**)
This does _not_ need to go through the fancy multicast filter management
stuffs like what vlan(4) has, at least currently:
- As of this write, multicast does not work in Azure.
- As of this write, multicast packets go through the cooresponding hn(4).
MFC after: 3 days
Sponsored by: Microsoft
Differential Revision: https://reviews.freebsd.org/D11803
unable to automatically find alternate superblocks. This checkin
places the information needed to find alternate superblocks to the
end of the area reserved for the boot block.
Filesystems created with a newfs of this vintage or later will
create the recovery information. If you have a filesystem created
prior to this change and wish to have a recovery block created for
your filesystem, you can do so by running fsck in forground mode
(i.e., do not use the -p or -y options). As it starts, fsck will
ask ``SAVE DATA TO FIND ALTERNATE SUPERBLOCKS'' to which you should
answer yes.
Discussed with: kib, imp
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D11589
vm_page_grab() on consecutive page indices. Besides simplifying the code
in the caller, vm_page_grab_pages() allows for batching optimizations.
For example, the current implementation replaces calls to vm_page_lookup()
on consecutive page indices by cheaper calls to vm_page_next().
Reviewed by: kib, markj
Tested by: pho (an earlier version)
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D11926
Since the cache controller nodes fixup is added to the platform code,
this patch aligns it to the Linux device tree representation.
Submitted by: Patryk Duda <pdk@semihalf.com>
Reviewed by: cognet (mentor)
Approved by: cognet (mentor)
Obtained from: Semihalf
Differential Revision: https://reviews.freebsd.org/D11884
Updating PL310 sotfware context sc_io_coherent field in
platform_pl310_init() routine for Armada 38x helps to avoid
using 'arm,io-coherent' property, which is by default not present
in the device tree node in Linux.
This way another step for DT unification between two operating
systems is done. The improvemnt will also work after enabling
PLATFORM for Marvell ARMv7 SoCs.
Reviewed by: andrew, cognet (mentor)
Approved by: cognet (mentor)
Obtained from: Semihalf
Differential Revision: https://reviews.freebsd.org/D11883
Since the timers' base frequency setting is added to the platform code,
this patch removes clock-frequency properties from global
and twd timers, aligning both to the Linux device tree.
Submitted by: Patryk Duda <pdk@semihalf.com>
Reviewed by: cognet (mentor)
Approved by: cognet (mentor)
Obtained from: Semihalf
Differential Revision: https://reviews.freebsd.org/D11882
Instead of using 'clock-frequency' device tree property for global/twd
mpcore timers of Armada 38x SoCs, set it in platform_late_init stage
with arm_tmr_change_frequency() function.
Reviewed by: cognet (mentor)
Approved by: cognet (mentor)
Obtained from: Semihalf
Differential Revision: https://reviews.freebsd.org/D11881
Before this patch function ofw_bus_find_compatible was using
memory allocations in order to find compatible node and the property's
length. This way there was always a suited buffer for property,
however this approach had also disadvantages - ofw_bus_find_compatible
couldn't be used when malloc is not available, e.g. during fdt fixup stage.
In order to remove the usage limitation of ofw_bus_find_compatible(),
this patch modifies the function to use ofw_bus_node_is_compatible()
(instead of the one without _int suffix), which uses a fixed
buffer on stack instead of dynamic allocations.
Submitted by: Patryk Duda <pdk@semihalf.com>
Reviewed by: nwhitehorn, cognet (mentor)
Approved by: cognet (mentor)
Obtained from: Semihalf
Differential Revision: https://reviews.freebsd.org/D11880
Sometimes it's convenient to provide fixup to many boards
that use the same SoC family (eg. Marvell Armada 38x).
Instead of putting multiple entries in fdt_fixup_table,
use one entry which refers to all boards with given SoC.
Submitted by: Patryk Duda <pdk@semihalf.com>
Reviewed by: nwhitehorn, cognet (mentor)
Approved by: cognet (mentor)
Obtained from: Semihalf
Differential Revision: https://reviews.freebsd.org/D11878
Because fdt_get_ranges can process now multiple 'ranges' entries,
restoring the ranges from original Linux device trees is possible.
Submitted by: Patryk Duda <pdk@semihalf.com>
Reviewed by: cognet (mentor)
Approved by: cognet (mentor)
Obtained from: Semihalf
Differential Revision: https://reviews.freebsd.org/D11877
This patch makes possible to boot with up to 8 ranges in soc.
Dynamic allocation cannot be used, because ftd_get_ranges
function is called early, when malloc is not available.
Change is required for the alignment of Marvell Armada 38x
device trees present in sys/gnu/dts/arm - originally
the platform has 6 entries in simple-bus 'ranges'.
Submitted by: Patryk Duda <pdk@semihalf.com>
Reviewed by: manu, nwhitehorn, cognet (mentor)
Approved by: cognet (mentor)
Obtained from: Semihalf
Differential Revision: https://reviews.freebsd.org/D11876
transfers in their probe() or attach() routines, and that doesn't work
when the low-level controller requires interrupts to be functional.
The DS133x family of chips is nearly identical to the DS1307 and support
for them should be added to that driver, then the ds133x driver can be
deleted. The s35390a driver just needs a non-trivial workover. In both
cases that work will be done and committed separately.
This is an import of Alexander Bluhm's OpenBSD commit r1.60,
the first chunk had to be modified because on OpenBSD the
'cut' declaration is located elsewhere.
Upstream report by Jingmin Zhou:
https://marc.info/?l=openbsd-pf&m=150020133510896&w=2
OpenBSD commit message:
Use a 32 bit variable to detect integer overflow when searching for
an unused nat port. Prevents a possible endless loop if high port
is 65535 or low port is 0.
report and analysis Jingmin Zhou; OK sashan@ visa@
Quoted from: https://cvsweb.openbsd.org/cgi-bin/cvsweb/src/sys/net/pf_lb.c
PR: 221201
Submitted by: Fabian Keil <fk@fabiankeil.de>
Obtained from: OpenBSD via ElectroBSD
MFC after: 1 week
libefivar expects opening /dev/efi to indicate if the we can make efi
runtime calls. With a null routine, it was always succeeding leading
efi_variables_supported() to return the wrong value. Only succeed if
we have an efi_runtime table. Also, while I'm hear, out of an
abundance of caution, add a likely redundant check to make sure
efi_systbl is not NULL before dereferencing it. I know it can't be
NULL if efi_cfgtbl is non-NULL, but the compiler doesn't.
LinuxKPI workqueue wrappers reported "successful" cancellation for works
already completed in normal way. This change brings reported status and
real cancellation fact into sync. This required for drm-next operation.
Reviewed by: hselasky (earlier version)
Sponsored by: iXsystems, Inc.
Differential Revision: https://reviews.freebsd.org/D11904
If mly_user_command fails to allocate a command slot it jumps to an 'out'
label used for error handling. The error handling code checks for a data
buffer in 'mc->mc_data' to free before checking if 'mc' is NULL. Fix by
just returning directly if we fail to allocate a command and only using
the 'out' label for subsequent errors when there is actual cleanup to
perform.
PR: 217747
Reported by: PVS-Studio
Reviewed by: emaste
MFC after: 1 week
p1003_1b.aio_listio_max is now a tunable. Its value is reflected in the
sysctl of the same name, and the sysconf(3) variable _SC_AIO_LISTIO_MAX.
Its value will be bounded from below by the compile-time constant
AIO_LISTIO_MAX and from above by the compile-time constant
MAX_AIO_QUEUE_PER_PROC and the tunable vfs.aio.max_aio_queue.
Reviewed by: jhb, kib
MFC after: 3 weeks
Relnotes: yes
Sponsored by: Spectra Logic Corp
Differential Revision: https://reviews.freebsd.org/D11601
Also improve the formatting of the corresponding KASSERT message.
Based on the submission by: Svyatoslav <razmyslov@viva64.com>
Found by: PVS-Studio
PR: 217741
Reviewed by: emaste
Sponsored by: The FreeBSD Foundation (kib)
MFC after: 1 week
input errors in the mlx5en(4) driver. This improves the sysadmin view of
physical port errors.
Submitted by: gallatin@
MFC after: 1 week
Sponsored by: Mellanox Technologies
The m_defrag() function can only defrag mbuf chains which have a valid
mbuf packet header. In r291699 when the mlx4en(4) driver was converted
into using BUSDMA(9), the call to m_defrag() was moved after the part
of the transmit routine which strips the header from the mbuf chain.
This effectivly disabled the mbuf defrag mechanism and such packets
simply got dropped.
This patch removes the stripping of mbufs from a chain and loads all
mbufs using busdma. If busdma finds there are no segments, unload
the DMA map and free the mbuf right away, because that means all
data in the mbuf has been inlined in the TX ring. Else proceed
as usual.
Add a per-ring rounter for the number of defrag attempts and
make sure the oversized_packets counter gets zeroed while at it.
The counters are per-ring to avoid excessive cache misses in the
TX path.
Submitted by: mjoras@
Differential Revision: https://reviews.freebsd.org/D11683
MFC after: 1 week
Sponsored by: Mellanox Technologies
illumos/illumos-gate@d28671a3b0d28671a3b0https://www.illumos.org/issues/8373
The code that writes ZIL blocks uses dmu_tx_assign(TXG_WAIT) to assign
a transaction to a transaction group. That seems to be logically
incorrect as writing of the ZIL block does not introduce any new dirty
data. Also, when there is a lot of dirty data, the call can introduce
significant delays into the ZIL commit path, thus affecting all
synchronous writes. Additionally, ARC throttling may affect the ZIL
writing.
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Prakash Surya <prakash.surya@delphix.com>
Approved by: Dan McDonald <danmcd@joyent.com>
Author: Andriy Gapon <avg@FreeBSD.org>
MFC after: 2 weeks
illumos/illumos-gate@79c2b812ee79c2b812eehttps://www.illumos.org/issues/8491
The zpool checkpoint feature in DxOS added a new field in the uberblock.
The Multi-Modifier Protection Pull Request from ZoL adds two new fields in the
uberblock (Reference: https://github.com/zfsonlinux/zfs/pull/6279).
As these two changes come from two different sources and once upstreamed and
deployed will introduce an incompatibility with each other we want
to upstream a change that will reserve the padding for both of them so
integration goes smoothly and everyone gets both features.
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed by: Olaf Faaland <faaland1@llnl.gov>
Approved by: Gordon Ross <gwr@nexenta.com>
Author: Serapheim Dimitropoulos <serapheim@delphix.com>
MFC after: 3 weeks
illumos/illumos-gate@267ae6c3a8267ae6c3a8https://www.illumos.org/issues/7915
l2arc_evict() is strictly serialized with respect to
l2arc_write_buffers() and l2arc_write_done(). Normally, l2arc_evict()
and l2arc_write_buffers() are called from the same thread, so they can
not be concurrent. Also, l2arc_write_buffers() uses zio_wait() on the
parent zio of all cache zio-s. That ensures that l2arc_write_done()
is completed before l2arc_write_buffers() returns. Finally, if a
cache device is removed, then l2arc_evict() is called under SCL_ALL in
the exclusive mode. That ensures that it can not be concurrent with
the normal L2ARC accesses to the device (including writing and
evicting buffers). Given the above, some checks and actions in
l2arc_evict() do not make sense. For instance, it must never
encounter the write head header let alone remove it from the buffer
list.
Reviewed by: Dan Kimmel <dan.kimmel@delphix.com>
Reviewed by: Prakash Surya <prakash.surya@delphix.com>
Approved by: Matthew Ahrens <mahrens@delphix.com>
Author: Andriy Gapon <avg@FreeBSD.org>
MFC after: 2 weeks
illumos/illumos-gate@dcb6872c56dcb6872c56https://www.illumos.org/issues/8126
The sync thread is concurrently modifying dn_phys->dn_nlevels
while dbuf_dirty() is trying to assert something about it, without
holding the necessary lock. We need to move this assertion further down
in the function, after we have acquired the dn_struct_rwlock.
Reviewed by: Pavel Zakharov <pavel.zakharov@delphix.com>
Reviewed by: Serapheim Dimitropoulos <serapheim@delphix.com>
Approved by: Robert Mustacchi <rm@joyent.com>
Author: Matthew Ahrens <mahrens@delphix.com>
MFC after: 2 weeks
illumos/illumos-gate@9b195260e29b195260e2https://www.illumos.org/issues/8426
abd_copy_from_buf and abd_cmp_buf do not modify their void *buf arguments, so
qualify them with const.
abd_copy_from_buf_off and abd_cmp_buf_off already had that type for the
corresponding arguments.
Reviewed by: Matt Ahrens <mahrens@delphix.com>
Approved by: Robert Mustacchi <rm@joyent.com>
Author: Andriy Gapon <avg@FreeBSD.org>
MFC after: 2 weeks
illumos/illumos-gate@77b171372e77b171372ehttps://www.illumos.org/issues/7600
At present, the kernel side code seems to blindly rollback to whatever happens
to be the latest snapshot at the time when the rollback task is processed.
The expected target's name should be passed to the kernel driver and the sync
task should validate that the target exists and that it is the latest snapshot
indeed.
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Pavel Zakharov <pavel.zakharov@delphix.com>
Approved by: Robert Mustacchi <rm@joyent.com>
Author: Andriy Gapon <avg@FreeBSD.org>
MFC after: 3 weeks
illumos/illumos-gate@42418f9e7342418f9e73https://www.illumos.org/issues/8377
The problem is that when dsl_bookmark_destroy_check() is executed from open
context (the pre-check), it fills in dbda_success based on the existence of the
bookmark.
But the bookmark (or containing filesystem as in this case) can be destroyed
before we get to syncing context. When we re-run dsl_bookmark_destroy_check()
in syncing
context, it will not add the deleted bookmark to dbda_success, intending for
dsl_bookmark_destroy_sync() to not process it. But because the bookmark is
still in dbda_success
from the open-context call, we do try to destroy it.
The fix is that dsl_bookmark_destroy_check() should not modify dbda_success
when called from open context.
Reviewed by: Paul Dagnelie <pcd@delphix.com>
Reviewed by: Pavel Zakharov <pavel.zakharov@delphix.com>
Reviewed by: George Wilson <george.wilson@delphix.com>
Approved by: Robert Mustacchi <rm@joyent.com>
Author: Matthew Ahrens <mahrens@delphix.com>
MFC after: 2 weeks
illumos/illumos-gate@b7edcb9408b7edcb9408https://www.illumos.org/issues/8378
The problem is that zfs_get_data() supplies a stale zgd_bp to dmu_sync(), which
we then nopwrite against.
zfs_get_data() doesn't hold any DMU-related locks, so after it copies db_blkptr
to zgd_bp, dbuf_write_ready()
could change db_blkptr, and dbuf_write_done() could remove the dirty record.
dmu_sync() then sees the stale
BP and that the dbuf it not dirty, so it is eligible for nop-writing.
The fix is for dmu_sync() to copy db_blkptr to zgd_bp after acquiring the
db_mtx. We could still see a stale
db_blkptr, but if it is stale then the dirty record will still exist and thus
we won't attempt to nopwrite.
Reviewed by: Prakash Surya <prakash.surya@delphix.com>
Reviewed by: George Wilson <george.wilson@delphix.com>
Approved by: Robert Mustacchi <rm@joyent.com>
Author: Matthew Ahrens <mahrens@delphix.com>
MFC after: 2 weeks
FreeBD note: the essence of this change was committed to FreeBSD in
r314274. This commit catches up with differences between what was
committed to FreeBSD and what was committed to OpenZFS, mainly more
logical variable names.
illumos/illumos-gate@16a7e5ac1116a7e5ac11https://www.illumos.org/issues/7910
It seems that the change in issue #6950 resurrected the problem that was
earlier fixed by the change in issue #5219.
Please also see the following FreeBSD bug report:
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=216178
Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: Dan Kimmel <dan.kimmel@delphix.com>
Approved by: Robert Mustacchi <rm@joyent.com>
Author: Andriy Gapon <avg@FreeBSD.org>
MFC after: 2 weeks
This also involves adding a quirk table as TRIM is broken for some
Kingston eMMC devices, though. Compared to ERASE (declared "legacy"
in the eMMC specification v5.1), TRIM has the advantage of operating
on write sectors rather than on erase sectors, which typically are
of a much larger size. Thus, employing TRIM, we don't need to fiddle
with coalescing BIO_DELETE requests that are also of (write) sector
units into erase sectors, which might not even add up in all cases.
- For some SanDisk iNAND devices, the CMD38 argument, e. g. ERASE,
TRIM etc., has to be specified via EXT_CSD[113], which now is also
handled via a quirk.
- My initial understanding was that for eMMC partitions, the granularity
should be used as erase sector size, e. g. 128 KB for boot partitions.
However, rereading the relevant parts of the eMMC specification v5.1,
this isn't actually correct. So drop the code which used partition
granularities for delmaxsize and stripesize. For the most part, this
change is a NOP, though, because a) for ERASE, mmcsd_delete() used
the erase sector size unconditionally for all partitions anyway and
b) g_disk_limit() doesn't actually take the stripesize into account.
- Take some more advantage of mmcsd_errmsg() in mmcsd(4) for making
error codes human readable.
No need to set any fields in the cloned device. devfs uses symlinks,
so the adev entries returned won't be presented to the drivers. Since
we don't save copies, nothing else will see them. This code came from
the old compat code, and it appears to be obsolete or never needed.
Submitted by: kib@
Differential Review: https://reviews.freebsd.org/D11919
All ndaX and ndaXpY nodes will appear as nvdX and nvdXpY as well
(through symlinks in devfs via the normal disk aliasing mechanism in
GEOM).
Differential Revision: https://reviews.freebsd.org/D11873
Implement disk_add_alias to allow aliases to be added to disks. All
disk have a primary name (say "foo") can also have secondary names
(say "bar") such that all instances of "foo" also have a "bar"
alias. So if you have foo0, foo0p1, foo1, foo1s1 and foo1s1a nodes
created by the foo driver and gpart, device nodes bar0, bar0p1, bar1,
bar1s1 and bar1s1a will appear as symlinks back to the original nodes.
This generalizes to multiple aliases. However, since the unit number
follows the primary name, multiple device drivers can't create the
same aliases unless those drives coorinate the unit number space (eg
you couldn't add an alias 'disk' to both 'da' and 'ada' because it's
possible to have da0 and ada0, because 'disk0' is ambiguous).
Differential Revision: https://reviews.freebsd.org/D11873
When we're creating new providers for each of the partitions, add
aliases to the geom before we create the provider so when geom_dev
tastes the provider, the aliases are in place so the proper /dev
entries are created. So foo5p6 gets created as an alias for bar5p6
when foo is an alias for bar in the geom we're partitioning with
g_part. This also copies aliases from the container geom (eg disk) to
the label geom (the disk with GPT partitioning) so that aliases nest
properly.
Differential Revision: https://reviews.freebsd.org/D11873
Add an alias name list to geoms. Use them in geom_dev to create
aliases. Previously, geom_dev would create an device node for the name
of the geom. Now, additional nodes are created pointing back to the
primary node with make_dev_alias_p. Aliases must be in place on the
geom before any tasting occurs.
Differential Revision: https://reviews.freebsd.org/D11873
in the flush_queue:
1 2 3 4 5 6 7 8 9 10
and another 10 bio's go into the flush queue after only the first five
bio's are removed from the flush queue, the queue should look like:
6 7 8 9 10 11 12 13 14 15 16 17 18 19 20,
but because of the bug we end up with
6 11 12 13 14 15 16 17 18 19 20 7 8 9 10.
So the sequence of the bio's is damaged in the flush queue (and
therefore in the journal on disk !). This error can be triggered by
ffs_snapshot() when a block is read with readblock() and gjournal finds
this block in the broken flush queue before it goes to the correct
active queue.
The fix is to place all new blocks at the end of the queue.
Submitted by: Dr. Andreas Longwitz <longwitz@incore.de>
Discussed with: kib
MFC after: 1 week
system having over 4GB RAM. That's due to:
1) the limit being u_int instead of u_long like vm.kmem_size (the limit is
half of vm.kmem_size by default for amd64);
2) sysctl handler g_journal_cache_limit_sysctl() using u_int instead of u_long.
The fix is to replace u_int with u_long for the kern.geom.journal.cache.limit
sysctl variable.
PR: 198500
Submitted by: Dr. Andreas Longwitz <longwitz@incore.de>
Reported by: Eugene Grosbein
Discussed with: kib
MFC after: 1 week
While there, switch to FreeBSD internal callout active status.
Reviewed by: markj, hselasky
Sponsored by: iXsystems, Inc.
Differential Revision: https://reviews.freebsd.org/D11900
o Replace __riscv64 with (__riscv && __riscv_xlen == 64)
This is required to support new GCC 7.1 compiler.
This is compatible with current GCC 6.1 compiler.
RISC-V is extensible ISA and the idea here is to have built-in define
per each extension, so together with __riscv we will have some subset
of these as well (depending on -march string passed to compiler):
__riscv_compressed
__riscv_atomic
__riscv_mul
__riscv_div
__riscv_muldiv
__riscv_fdiv
__riscv_fsqrt
__riscv_float_abi_soft
__riscv_float_abi_single
__riscv_float_abi_double
__riscv_cmodel_medlow
__riscv_cmodel_medany
__riscv_cmodel_pic
__riscv_xlen
Reviewed by: ngie
Sponsored by: DARPA, AFRL
Differential Revision: https://reviews.freebsd.org/D11901
handle cases where they can only run on a single domain.
To allow all devices access to this set we need to move reading the domain
earlier in the boot as it was previously handled in the CPU driver, however
this is too late for the GICv3 ITS driver.
Sponsored by: DARPA, AFRL
but it was broken since r273800 (and r278522, its MFC to stable/10) because
identify_cpu() is called too late, i.e., after init_param1().
MFC after: 3 days
libefi/time.c is mix of different styles, this update does cleanup.
Also fix 0 versus NULL, and zero the tv structure for case we get error
from UEFI firmware.
Reviewed by: imp
Differential Revision: https://reviews.freebsd.org/D11861
This patch moves code necessary for the fmtdev functionality from
loader to libefi, allowing other applications to make use of it
Submitted by: Eric McCorkle
Differential Revision: https://reviews.freebsd.org/D11862
precision. These timers are already displayed in microseconds in the
sysctl MIB. Add variables to track these tunables while here.
MFC after: 3 days
Sponsored by: Chelsio Communications
multiple ITS devices, however we only want a single ITS device to be
configured on each CPU. To fix this only enable ITS when the node matches
the CPUs node.
Sponsored by: DARPA, AFRL
used to support the dual package ThunderX where we need to send MSI/MSI-X
interrupts to the same package as the device the interrupt came from.
Sponsored by: DARPA, AFRL
They are defined by XSI or newer SUS.
This is a follow-up to r318780.
Reported by: jbeich
Obtained from: DragonflyBSD commit e08b3836c962
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
This patch adds additional EFI utility functions to convert errno
values to EFI_STATUS errors, as well as EFI times to UNIX times.
Submitted by: Eric McCorkle
Differential Revision: https://reviews.freebsd.org/D11858
This patch moves some EFI ZFS functions from loader to libefi,
allowing them to be used by anything that links against libefi.
Submitted by: Eric McCorkle
Differential Revision: https://reviews.freebsd.org/D11855
This patch adds definitions and utility code for creating EFI drivers
using the EFI_DRIVER_BINDING_PROTOCOL.
Submitted by: Eric McCorkle
Differential Revision: https://reviews.freebsd.org/D11852
Introduce hw.nvme.use_nvd tunable. This tunable allows both nvd and
nda to be installed in the kernel, while allowing only one of them to
create devices. This is an all-or-nothing setting, and you can't
change it after boot-time. However, it will allow easier A/B testing.
Differential Revision: https://reviews.freebsd.org/D11825
Some C wrappers for x86 instructions do not touch global memory and only act
on their arguments; they can be marked __pure2, aka __const__. Without this
annotation, Clang 3.9.1 is not intelligent enough on its own to grok that
these functions are __const__.
Submitted by: Anton Rang <anton.rang AT isilon.com>
Sponsored by: Dell EMC Isilon
The previous limit of 24 was somewhat restrictive, and with this change
ceil(log2(sizeof(struct pfs_node))) is the same as before in both the ILP32
and LP64 models, so the malloc zone used for allocations of struct pfs_node
is the same as before.
Approved by: des
This was submitted by Rogiel Sulzbach (thank you!) but has a few last-minute
changes by me, mostly where the code interfaces to my still-utterly-deficient
imx6_ccm clocks implementation. So blame me for any mistakes.
Submitted by: Rogiel Sulzbach <rogiel@rogiel.com>
Differential Revision: https://reviews.freebsd.org/D11177
debug (cudbg) code, hooked up to the main driver via an ioctl.
The ioctl can be used to collect the chip's internal state in a
compressed dump file. These dumps can be decoded with the "view"
component of cudbg.
Obtained from: Chelsio Communications
MFC after: 2 months
Sponsored by: Chelsio Communications
My change had good intentions, but the implementation was incorrect:
- printf was returning the number of characters in the format string
plus the NUL, but failed in two regards implementation wise:
-- the pathological case, printf(""), wasn't being handled properly since
the pointer is always incremented, so the value returned would be
off-by-one.
-- printf(3) reports the number of characters printed post-conversion via
vfprintf, etc.
- putchar(3) should return the character printed or EOF, not the number
of characters output to the screen.
My goal in making the change (again) was to increase parity, but as bde
pointed out these are freestanding functions, so they don't have to
conform to libc/POSIX. I argued that the functions should be named
differently since the implementation is different enough to warrant it
and to allow boot2 code to be usable when linked against sys/boot and
libstand and other libraries in base. I have no interest in pushing
this change forward more though, as the original concern I had behind
the change with zfsboottest was resolved in r321849 and r321852. The
next person that updates the toolchain gets to deal with the
inconsistency if it's flagged by a newer compiler.
MFC after: 1 month
Reported by: ed, markj
This patch fixes an interopability issue between FreeBSD and non-FreeBSD
systems when the connection establishment is aborted. Refer to the
initial commit in Linux, drivers/infiniband/core/cm.c,
for a more detailed description.
Obtained from: Linux
MFC after: 3 days
Sponsored by: Mellanox Technologies
Code inspection reveals the busdma unload and free functions
do not write to the belonging dma tag and does not need to be
serialized. This allows mlx5_fwp_free() to be called from
software interrupt context.
MFC after: 3 days
Sponsored by: Mellanox Technologies
POSIX equivalents
Both printf and putchar return int, not void.
This will allow code that leverages the libcalls and checks/rely on the
return type to interchangeably between loader code and non-loader
code.
MFC after: 1 month
- Store the symbol table contents in an anonymous swap-backed object. Have
mmap(/dev/ksyms) map that object, and stop mapping the symbol table into
the calling process in ksyms_open(). Previously we would cache a pointer
to the pmap of the opening process, and mmap(/dev/ksyms) would create a
mapping using the physical address found by a pmap lookup at the initial
mapping address. However, this assumes that the cached pmap is valid,
which may not be the case. [1]
- Remove the ksyms ioctl interface. It appears to have been added to work
around a limitation in libelf that no longer exists; see r321842.
Moreover, the interface is difficult to support and isn't present in
illumos. Since ksyms was added specifically to support lockstat(1), it
is expected that this removal won't have any real impact.
- Simplify ksyms_read() to avoid unnecessary copying.
- Don't call the device handle destructor if we fail to capture a snapshot
of the kernel's symbol table. devfs will do that for us.
Reported by: Ilja van Sprundel <ivansprundel@ioactive.com> [1]
Reviewed by: kib (previous revision)
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D11789
"br" or "bridge" where - according to the terminology outlined in
comments of bridge.h and mmcbr_if.m around since their addition in
r163516 - the bus is meant and used instead. Some of these instances
are also rather old, while those in e. g. mmc_subr.c are as new as
r315430 and were caused by choosing mmc_wait_for_request(), i. e. the
one pre-r315430 outliner existing in mmc.c, as template for function
parameters in mmc_subr.c inadvertently. This correction translates to
renaming "brdev" to "busdev" and "mmcbr" to "mmcbus" respectively as
appropriate.
While at it, also rename "reqdev" to just "dev" in mmc_subr.[c,h]
for consistency with was already used in mmm.c pre-r315430, again
modulo mmc_wait_for_request() that is.
- Remove comment lines from bridge.h incorrectly suggesting that there
would be a MMC bridge base class driver.
- Update comments in bridge.h regarding the star topology of SD and SDIO;
since version 3.00 of the SDHCI specification, for eSD and eSDIO bus
topologies are actually possible in form of so called "shared buses"
(in some subcontext later on renamed to "embedded" buses).
as taking a register number, and that would get multiplied by 4 to make
a register address. But the header file that consumers have to reference
this stuff publishes register addresses, not numbers. So now everything
works in terms of register addresses.
Note that the HDMI init code was writing into the wrong register before
this change. Apparently whatever it wrote to was harmless, and apparently
HDMI was working because uboot had set up the right bits.
wide enough to hold the full 64-bit dev_t. Instead use the "dev" field in
the "linux_cdev" structure to store and lookup this value.
While at it remove superfluous use of parenthesis inside the
MAJOR(), MINOR() and MKDEV() macros in the LinuxKPI.
MFC after: 1 week
Sponsored by: Mellanox Technologies
and arm64 so move any truncation to the caller.
Submitted by: Mihai Carabas <mihai.carabas@gmail.com>
X-Differential Revision: https://reviews.freebsd.org/D10213
Similar to r321899, reduce sv_maxuser by one page inside of CloudABI.
This ensures that the stack, the vDSO and any allocations cannot touch
the top page of user virtual memory.
Considering that CloudABI userspace is completely oblivious to virtual
memory layout, don't bother making this conditional based on the CPU of
the running system.
Reviewed by: kib, truckman
Differential Revision: https://reviews.freebsd.org/D11808
Since traditional types for the macros values are int, remove the
cookie trick and just split the dev_t at the word boundary.
Reported by: Victor Stinner <victor.stinner@gmail.com>
PR: 221048
Sponsored by: The FreeBSD Foundation
when a signal is not intended to be sent.
The variable holding the signal number to send is left uninitialized,
which sometimes triggers invalid signal checks.
For NMI, a return to usermode without ast processing is done. On the
other hand, for spurious dtrace probe interrupt it is usermode which
triggered the interrupt, so handle it through userret() as any other
fault.
Reported by: Nils Beyer <nbe@renzel.net>
PR: 221151
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
from the top of user memory to one page lower on machines with the
Ryzen (AMD Family 17h) CPU. This pushes ps_strings and the stack
down by one page as well. On Ryzen there is some sort of interaction
between code running at the top of user memory address space and
interrupts that can cause FreeBSD to either hang or silently reset.
This sounds similar to the problem found with DragonFly BSD that
was fixed with this commit:
https://gitweb.dragonflybsd.org/dragonfly.git/commitdiff/b48dd28447fc8ef62fbc963accd301557fd9ac20
but our signal trampoline location was already lower than the address
that DragonFly moved their signal trampoline to. It also does not
appear to be related to SMT as described here:
https://www.phoronix.com/forums/forum/hardware/processors-memory/955368-some-ryzen-linux-users-are-facing-issues-with-heavy-compilation-loads?p=955498#post955498
"Hi, Matt Dillon here. Yes, I did find what I believe to be a
hardware issue with Ryzen related to concurrent operations. In a
nutshell, for any given hyperthread pair, if one hyperthread is
in a cpu-bound loop of any kind (can be in user mode), and the
other hyperthread is returning from an interrupt via IRETQ, the
hyperthread issuing the IRETQ can stall indefinitely until the
other hyperthread with the cpu-bound loop pauses (aka HLT until
next interrupt). After this situation occurs, the system appears
to destabilize. The situation does not occur if the cpu-bound
loop is on a different core than the core doing the IRETQ. The
%rip the IRETQ returns to (e.g. userland %rip address) matters a
*LOT*. The problem occurs more often with high %rip addresses
such as near the top of the user stack, which is where DragonFly's
signal trampoline traditionally resides. So a user program taking
a signal on one thread while another thread is cpu-bound can cause
this behavior. Changing the location of the signal trampoline
makes it more difficult to reproduce the problem. I have not
been because the able to completely mitigate it. When a cpu-thread
stalls in this manner it appears to stall INSIDE the microcode
for IRETQ. It doesn't make it to the return pc, and the cpu thread
cannot take any IPIs or other hardware interrupts while in this
state."
since the system instability has been observed on FreeBSD with SMT
disabled. Interrupts to appear to play a factor since running a
signal-intensive process on the first CPU core, which handles most
of the interrupts on my machine, is far more likely to trigger the
problem than running such a process on any other core.
Also lower sv_maxuser to prevent a malicious user from using mmap()
to load and execute code in the top page of user memory that was made
available when the shared page was moved down.
Make the same changes to the 64-bit Linux emulator.
PR: 219399
Reported by: nbe@renzel.net
Reviewed by: kib
Reviewed by: dchagin (previous version)
Tested by: nbe@renzel.net (earlier version)
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D11780
When all instances of a lock type are destroyed (for example, after a
module unload), the corresponding witness entry remains associated with
that lock type. In this case, we shouldn't panic if a new instance of the
lock type is created and its lock class does not match that recorded in the
witness entry.
Reviewed by: jhb
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D11788
According to the PCI Local Specification rev. 3.0 in case of a 64-bit
BAR both the low and the high parts of the register should be set to
~0 before attempting to read back the size.
So far I have found no single device that has problems with the
previous approach, but I think it's better to stay on the safe size.
This commit should not introduce any functional change.
MFC after: 3 weeks
Sponsored by: Citrix Systems R&D
Reviewed by: jhb
Differential revision: https://reviews.freebsd.org/D11750
The removed release stores are not needed since stores are totally
ordered on i386 and amd64.
Reviewed by: alc, kib (previous revision)
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D11790
'skip', which denote, respectively, the largest number of blocks that can be
managed by a subtree of that height, and one less than the number of nodes
in a subtree of that height. This change removes the 'skip' argument from
those functions because 'skip' can be trivially computed from 'radius'.
This change also redefines 'skip' so that it denotes the number of nodes in
the subtree, and so changes loop upper bound tests from '<= skip' to '<
skip' to account for the change.
The 'skip' field is also removed from the blist struct.
The self-test program is changed so that the print command includes the
cursor value in the output.
Submitted by: Doug Moore <dougm@rice.edu>
MFC after: 1 week
Linux specific things to the native fdescfs file system.
Unlike FreeBSD, the Linux fdescfs is a directory containing a symbolic
links to the actual files, which the process has open.
A readlink(2) call on this file returns a full path in case of regular file
or a string in a special format (type:[inode], anon_inode:<file-type>, etc..).
As well as in a FreeBSD, opening the file in the Linux fdescfs directory is
equivalent to duplicating the corresponding file descriptor.
Here we have mutually exclusive requirements:
- in case of readlink(2) call fdescfs lookup() method should return VLNK
vnode otherwise our kern_readlink() fail with EINVAL error;
- in the other calls fdescfs lookup() method should return non VLNK vnode.
For what new vnode v_flag VV_READLINK was added, which is set if fdescfs has beed
mounted with linrdlnk option an modified kern_readlinkat() to properly handle it.
For now For Linux ABI compatibility mount fdescfs volume with linrdlnk option:
mount -t fdescfs -o linrdlnk null /compat/linux/dev/fd
Reviewed by: kib@
MFC after: 1 week
Relnotes: yes
This prepares for the upcoming transparent VF support.
MFC after: 3 days
Sponsored by: Microsoft
Differential Revision: https://reviews.freebsd.org/D11708
Don't enable the oscillator when it is found to be stopped at init time,
just let the first setting of valid time start it. But still report a dead
battery if it's stopped at init time.
Don't force the chip into 24hr mode, just cope with whatever mode it is
already in.
Schedule the clock_settime() callbacks to align the RTC clock to top of
second when setting it.
subr_rtc code, switch from CLOCKF_SETTIME_NO_TS to CLOCKF_SETTIME_NO_ADJ
so that we get fed a timestamp, but it's not adjusted to compensate for
inaccuracy in setting time.
Update the use of the B_CACHE flag (since the May 1999 commit
that made it the correct test here).
Reported by: Andreas Longwitz <longwitz@incore.de>
Reviewed by: kib
Tested by: Peter Holm
MFC after: 1 week
Atomic updates to v_wire_count are a significant source of contention, so
combine multiple updates into one in this easy case. Also remove an old
printf that gets executed if the page is shared-busied, which is a case
that will lead to a panic anyway.
Reviewed by: alc, kib
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D11791
In this case we shouldn't assume that the thread has a valid frame pointer.
Reviewed by: kib
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D11787
This change adds two new tunables, allowing to control serialization for
read and write NFS requests separately. It does not change the default
behavior since there are too many factors to consider, but gives additional
space for further experiments and tuning.
The main motivation for this change is very low write speed in case of ZFS
with sync=always or when NFS clients requests sychronous operation, when
every separate request has to be written/flushed to ZIL, and requests are
processed one at a time. Setting vfs.nfsd.fha.write=0 in that case allows
to increase ZIL throughput by several times by coalescing writes and cache
flushes. There is a worry that doing it may increase data fragmentation
on disks, but I suppose it should not happen for pool with SLOG.
MFC after: 1 week
Sponsored by: iXsystems, Inc.
from enc_hhook().
This should solve the problem when pf is used with if_enc(4) interface,
and outbound packet with existing PCB checked by pf, and this leads to
deadlock due to pf does its own PCB lookup and tries to take rlock when
wlock is already held.
Now we pass PCB pointer if it is known to the pfil hook, this helps to
avoid extra PCB lookup and thus rlock acquiring is not needed.
For inbound packets it is safe to pass NULL, because we do not held any
PCB locks yet.
PR: 220217
MFC after: 3 weeks
Sponsored by: Yandex LLC
request that their clock_settime() methods be called at a given offset
from top-of-second. This adds a timeout_task to the rtc_instance so that
each clock can be separately added to taskqueue_thread with the scheduling
it prefers, instead of looping through all the clocks at once with a
single task on taskqueue_thread. If a driver doesn't call clock_schedule()
the default is the old behavior: clock_settime() is queued immediately.
The motivation behind this is that I was on the path of adding identical
code to a third RTC driver to figure out a delta to top-of-second and
sleep for that amount of time because writing the the RTC registers resets
the hardware's concept of top-of-second. (Sometimes it's not top-of-second,
some RTC clocks tick over a half second after you set their time registers.)
Worst-case would be to sleep for almost a full second, which is a rude thing
to do on a shared task queue thread.
over the scheduling precision than 'ticks' can offer, and because sometimes
you're already working with sbintime_t units and it's dumb to convert them
to ticks just so they can get converted back to sbintime_t under the hood.
Since device can pass multiple frames in a single payload temporary
Rx buffer was big enough to hold all of them; now the driver can
concatenate a single frame from multiple payloads.
The Rx buffer size may be configured via tunable (dev.rtwn.%d.rx_buf_size).
Tested with:
- rtl8188cus, rtl8188eu and rtl8821au (STA mode).
- (by kevlo) rtl8192cu and rtl8188eu.
PR: 218527
Reviewed by: kevlo
Differential Revision: https://reviews.freebsd.org/D11705
the informational print functions. Collapse the debug API a bit to be
more generic and not require as much code duplication. While here, fix
a bug in MPS that was already fixed in MPR.
between 12/24 hour mode. Also fix conversion between 12 and 24 hour mode.
It's not as easy as adding/subtracting 12, because the clock doesn't roll
over 11->0, it rolls over 12->1; 0 isn't a valid hour in AM/PM mode.
Don't enable the oscillator when it is found to be stopped at init time,
just let the first setting of valid time start it. But still report a dead
battery if it's stopped at init time.
Don't force the chip into 24hr mode, just cope with whatever mode it is
already in.
Align the RTC clock to top of second when setting it.
Resource allocation for parent device does not look good by itself, but
attempt to allocate them for unrelated device just does not end up good.
On Asus X99-E WS/USB3.1 system reporting ISA bridge via both PCI and ACPI
this reported to cause kernel panic on shutdown due to messed resources:
https://bugs.freenas.org/issues/25237.
MFC after: 1 week
Do the allocation before requesting the IOCFacts message. This triggers
the LSI firmware to recognize the multiqueue should be enabled if available.
Multiqueue isn't used by the driver yet, but this also fixes a problem with
the cached IOCFacts not matching latter checks, leading to potential problems
with error recovery.
As a side-effect, fetch the driver tunables as early as possible.
Reviewed by: slm
Obtained from: Netflix
Differential Revision: D9243
all the chips in the NXP PCA212x and PCA/PCF85xx series. In addition to
supporting more chips, this driver uses the countdown timer on the chips as
a fractional seconds counter, giving it a resolution of about 15 milliseconds.
No functional change.
This is handy for FreeBSD derivatives that want to modify the value of
MAXPATHLEN, but not the kld_file_stat ABI.
Submitted by: Siddhant Agarwal <sagarwal AT isilon.com>
Sponsored by: Dell EMC Isilon
When an NFS mount is hung against an unresponsive NFS server, the "umount -f"
option can be used to dismount the mount. Unfortunately, "umount -f" gets
hung as well if a "umount" without "-f" has already been done. Usually,
this is because of a vnode lock being held by the "umount" for the mounted-on
vnode.
This patch adds kernel code so that a new "-N" option can be added to "umount",
allowing it to avoid getting hung for this case.
It adds two flags. One indicates that a forced dismount is about to happen
and the other is used, along with setting mnt_data == NULL, to handshake
with the nfs_unmount() VFS call.
It includes a slight change to the interface used between the client and
common NFS modules, so I bumped __FreeBSD_version to ensure both modules are
rebuilt.
Tested by: pho
Reviewed by: kib
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D11735
Use them in some existing code that is vulnerable to roundoff errors.
The existing constant SBT_1NS is a honeypot, luring unsuspecting folks into
writing code such as long_timeout_ns*SBT_1NS to generate the argument for a
sleep call. The actual value of 1ns in sbt units is ~4.3, leading to a
large roundoff error giving a shorter sleep than expected when multiplying
by the trucated value of 4 in SBT_1NS. (The evil honeypot aspect becomes
clear after you waste a whole day figuring out why your sleeps return early.)
Currently in Virtio driver without TSO/GSO features enabled, the max scatter
gather segments for the TX path can be 4, which limits the support for 9K JUMBO
frames. 9K JUMBO frames results in more than 4 scatter gather segments and
virtio driver fails to send the frame down to host OS. With TSO/GSO feature
enabled max scatter gather segments can be 64, then 9K JUMBO frames are fine,
this is making virtio driver to support JUMBO frames only with TSO/GSO.
Increasing the VTNET_MIN_TX_SEGS which is the case for non TSO/GSO to 32 to
support upto 64K JUMBO frames to Host.
Submitted by: Lohith Bellad <lohithbsd@gmail.com>
Reviewed by: adrian
Differential Revision: https://reviews.freebsd.org/D8803
If the nfsrpc_createlayoutrpc() call in nfsrpc_getcreatelayout() fails,
the code used nfhpp when it might be set NULL. This patch checks for
the error cases (laystat != 0) and avoids using nfhpp for the failure case.
This would only affect NFSv4.1 mounts with the "pnfs" option.
Found while testing the "umount -N" patch not yet in head.
MFC after: 2 weeks
pmap_remap_vm_attr() function requires indexes to
pte2_attr_tab as the arguments (VM_MEMATTR_).
Mistakenly, instead of them, actual values from the
table were used (PTE2_ATTR_), when applying
work-around for Marvell Armada 38x SoCs.
Submitted by: Marcin Wojtas (mw@semihalf.com)
Reported by: Rafal Kozik (rk@semihalf.com)
Reviewed by: cognet (mentor)
Approved by: cognet (mentor)
Obtained from: Semihalf
Differential Revision: https://reviews.freebsd.org/D11704
The ksyms(4) device was added specifically for use by lockstat(1), which
as a DTrace consumer must run as root.
Discussed with: emaste
MFC after: 3 days
The TEX index is selected using (TEX0 C B) bits
from the L2 descriptor. Use correct index by masking
and shifting those bits accordingly.
Differential Revision: https://reviews.freebsd.org/D11703
queue lock when the uppoer stack is called inside TCP_LRO
Submitted by: Kevin Bowling <kevin.bowling@kev009.com>
Reviewed by: erj
Sponsored by: Limelight Networks
Differential Revision: https://reviews.freebsd.org/D11724
were redundant and not being used to set anything up.
Submitted by: Matt Macy <mmacy@mattmacy.io>
Reported by: Jeb Cramer <cramerj@intel.com>
Sponsored by: Limelight Networks
This patch defines a macro that checks for MNTK_UNMOUNTF and replaces
explicit checks with this macro. It has no effect on semantics, but
prepares the code for a future patch where there will also be a
NFS specific flag for "forced dismount about to occur".
Suggested by: kib
MFC after: 2 weeks
New kern.lognosys values are
1 - log to ctty
2 - log to console
3 - log to both.
Inspired by: eugen
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
This largely reverts FreeBSD SVN change 289937 from October 25th, 2015.
The intent of that change was to keep loop IDs persistent across
chip reinits.
The problem is that the change turned on the PREVLOOP /
PREV_ADDRESS bit (bit 7 in Firmware Options 2), which tells the
Qlogic chip to not participate in the loop if it can't get the
requested loop address. It also turned off soft addressing on 2400
(4Gb) and newer controllers.
The isp(4) driver defaults to loop address 0, and the tape drives
I have tested default to loop address 0 if hard addressing is turned
on. So when hard loop addressing is turned on on the drive, the isp(4)
driver just refuses to participate in the loop.
The solution is to largely revert that change. I left some elements
in place that are related to virtual ports, since they were new.
This does work with IBM tape drives with hard and soft addressing
turned on. I have tested it with 4Gb, 8Gb, and 16Gb controllers.
sys/dev/isp.c:
Largely revert FreeBSD SVN change 289937. I left the
ispmbox.h changes in place.
Don't use the PREV_ADDRESS bit on initialization. It tells
the chip to not participate if it can't get the requested
loop ID.
Do use soft addressing on 2400 and newer chips.
Use hard addressing when the user has requested a specific
initiator ID. (hint.isp.X.iid=N in /boot/loader.conf)
Leave some of the virtual port options from that change in
place, but don't turn on the PREV_ADDRESS bit.
Reviewed by: mav
MFC after: 3 days
Sponsored by: Spectra Logic
this to be too restrictive. We need to have both broadcast and unicast
enabled for loader to work. Set them in all cases to ensure this is true.
This allows the Cavium ThunderX 2s in the netperf cluster to netboot using
a USB NIC.
PR: 221001
Reviewed by: emaste, tsoome
Sponsored by: DARPA, AFRL
Differential Revision: https://reviews.freebsd.org/D11732
flowtable anymore (as flowtable was never considered to be useful in
the forwarding path).
Reviewed by: np
Differential Revision: https://reviews.freebsd.org/D11448
The attached patch lets adaasync() set ADA_STATE_WCACHE based on
ADA_FLAG_CAN_WCACHE instead of ADA_FLAG_CAN_RAHEAD.
This fixes a regression introduced in r300207 which changed
the flag names.
PR: 220948
Submitted by: Fabian Keil <fk@fabiankeil.de>
Obtained from: ElectroBSD
MFC after: 1 week
Reduce the use of local copies of switch register data.
The switch now works with the upstream dsa node (i.e. the upstream DTS).
Tested on: ClearFog Pro (88E6176), SG-3100 (88E6141)
Sponsored by: Rubicon Communications, LLC (Netgate)
for embedded slots. Fail in the sdhci(4) initialization for slot type
shared, which is completely unsupported by this driver at the moment. [1]
For Intel eMMC controllers, taking the embedded slot type into account
obsoltes setting SDHCI_QUIRK_ALL_SLOTS_NON_REMOVABLE so remove these quirk
entries.
- Hide the 1.8 V VDD capability when the slot is detected as non-embedded,
as the SDHCI specification explicitly states that 1.8 V VDD is applicable
to embedded slots only. [2]
- Define some easy bits of the SDHCI specification v4.20. [3]
- Don't leak bus_dma(9) resources in failure paths of sdhci_init_slot().
Obtained from: DragonFlyBSD 65704a46 [1], 7ba10b88 [2], 0df14648 [3]
r307901 was reverted in r321480, restoring an incorrect block
delimitation bug present in the original cc_cubic commit. Restore
only the bugfix (brace addition) from r307901.
CID: 1090182
Approved by: sbruno
Usually it is sufficient to use iicbus_transfer_excl(), or one of the
higher-level convenience functions that use it, to reserve the bus for the
duration of each register access. Occasionally it is important that a
series of accesses or read-modify-write operations must be done without any
other intervening access to the device, to prevent corrupting state.
Without support for nested request/release, slave device drivers would have
to stop using high-level convenience functions and resort to working with
arrays of iic_msg structs just for a few operations (often involving
one-time device setup or infrequent configuration changes).
The changes here appear large from a glance at the diff, but in fact they're
nearly trivial, and the large diff is because of changes in indentation and
the re-wrapping of comments caused by that. One notable change is that
iicbus_release_bus() now ignores the IICBUS_CALLBACK(IIC_RELEASE_BUS) return
value. The old error handling left the bus in a kind of limbo state where
it was still owned at the iicbus layer, but drivers rarely check the return
of the release call, and it's unclear what they would do to recover from an
error return anyway. No existing low-level drivers return any kind of error
from IIC_RELEASE_BUS except one EINVAL for "you don't own the bus", to which
the right response is probably to carry on with the process of releasing the
reference to the bus anyway.
on i2c devices, where the "register" can be any length.
Many (perhaps most) common i2c devices are organized as a collection of
(usually 1-byte-wide) registers, and are accessed by first writing a 1-byte
register index/offset number, then by reading or writing the data.
Generally there is an auto-increment feature so the when multiple bytes
are read or written, multiple contiguous registers are accessed.
Most existing slave device drivers allocate an array of iic_msg structures,
fill in all the transfer info, and invoke iicbus_transfer(). These new
functions commonize all that and reduce register access to a simple call
with a few arguments.
Suppose that a file on NFS has partially filled last page, and this
page is dirty. NFS VOP_PAGEOUT() method only marks the the page clean
up to the block of the last written byte, leaving other blocks dirty.
Also any page which erronously exists in the vnode vm_object past EOF
is also left marked as dirty.
With the introduction of the buf-cache coherent pager, each pass of
syncer over the object with such page results in creation of B_DELWRI
buffer due to VOP_WRITE() call. This buffer is noted on next syncer
pass, which results e.g. a visible manifestation of shutdown never
finishing vnode sync. Note that before buf-cache coherency commit, a
dirty page might left never synced to server if a partial writes
occur.
Fix this by clearing dirty bits after EOF. Only blocks of the partial
page which are completely after EOF are marked clean, to avoid
possible user data loss.
Reported by: mav
Reviewed by: alc, markj
Tested by: mav, pho
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D11697
The CloudABI specification has had some minor changes over the last half
year. No substantial features have been added, but some features that
are deemed unnecessary in retrospect have been removed:
- mlock()/munlock():
These calls tend to be used for two different purposes: real-time
support and handling of sensitive (cryptographic) material that
shouldn't end up in swap. The former use case is out of scope for
CloudABI. The latter may also be handled by encrypting swap.
Removing this has the advantage that we no longer need to worry about
having resource limits put in place.
- SOCK_SEQPACKET:
Support for SOCK_SEQPACKET is rather inconsistent across various
operating systems. Some operating systems supported by CloudABI (e.g.,
macOS) don't support it at all. Considering that they are rarely used,
remove support for the time being.
- getsockname(), getpeername(), etc.:
A shortcoming of the sockets API is that it doesn't allow you to
create socket(pair)s, having fake socket addresses associated with
them. This makes it harder to test applications or transparently
forward (proxy) connections to them.
With CloudABI, we're slowly moving networking connectivity into a
separate daemon called Flower. In addition to passing around socket
file descriptors, this daemon provides address information in the form
of arbitrary string labels. There is thus no longer any need for
requesting socket address information from the kernel itself.
This change also updates consumers of the generated code accordingly.
Even though system calls end up getting renumbered, this won't cause any
problems in practice. CloudABI programs always call into the kernel
through a kernel-supplied vDSO that has the numbers updated as well.
Obtained from: https://github.com/NuxiNL/cloudabi
The PCTRIE macro will be shortly applied in a situation where
LOOKUP_LE is not needed.
Reviewed by: alc, markj
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
X-Differential revision: https://reviews.freebsd.org/D11435
* While there clean up alignments and line wrapping in existing
definitions for rs API in if_iwmreg.h
Obtained from: dragonflybsd.git 085e37a042bdb17081e495e46919359ce43aa118