In particular, reset the DF_QUIET flag when detaching from a device so
that a driver that marks a device quiet doesn't dictate policy for a
different driver that may claim the device in the future.
Reviewed by: rpokala, wblock
MFC after: 2 weeks
Sponsored by: Chelsio Communications
Differential Revision: https://reviews.freebsd.org/D7803
- Certain pic_assign_cpu, e.g. msi_assign_cpu can have quite a long
call chain. For msi_assign_cpu, mutex makes complex PCI bridge
drivers more tricky, e.g. sleep can note be called, etc, it will
be pretty tricky for upcoming Hyper-V PCI bridge driver for PCI
pass-through.
- It is not used on any hot code path nor non-sleepable context, so
sx should have the same effect as mutex.
PIC list is still protected by mutex to keep suspend/resume work.
Discussed with: jhb
Reviewed by: jhb
MFC after: 3 weeks
Sponsored by: Microsoft
Differential Revision: https://reviews.freebsd.org/D7784
* Don't do RTS/CTS - experiments show that we get ACK frames for each of them
and this ends up causing the timestamps to look all funny.
* Set the HAL_TXDESC_POS bit, so the AR9300 HAL sets up the hardware to return
location and CSI information.
- evdev_set_methods call is not required if actual methods are no-ops
- evdev_set_serial is also optional if there is no meaningful input device
identifier
- evdev_set_id on the other hand is mandatory, so set virtual bus with
dummy vendor/product/version
Suggested by: Vladimir Kondratiev
Add generic evdev support to touchscreen part of ti_adc: two absolute
coordinates + button touch to indicate pen position. Pressure value
reporting is not implemented yet.
Tested on: Beaglebone Black + 4DCAPE-43T + tslib
evdev is a generic input event interface compatible with Linux
evdev API at ioctl level. It allows using unmodified (apart from
header name) input evdev drivers in Xorg, Wayland, Qt.
This commit has only generic kernel API. evdev support for individual
hardware drivers like ukbd, ums, atkbd, etc. will be committed later.
Project was started by Jakub Klama as part of GSoC 2014. Jakub's
evdev implementation was later used as a base, updated and finished
by Vladimir Kondratiev.
Submitted by: Vladimir Kondratiev <wulf@cicgroup.ru>
Reviewed by: adrian, hans
Differential Revision: https://reviews.freebsd.org/D6998
The flag specifies that the block which uses FPU must be executed in
critical section, i.e. take no context switches, and does not need an
FPU save area during the execution.
It is intended to be applied around fast and short code pathes where
save area allocation is impossible or undesirable, due to context or
due to the relative cost of calculation vs. allocation.
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks
Rename registers as in the manual.
Do a hard reset of the controller before a soft one.
Since DMA is always used remove dependancy on allwinner_soc_family, it was used
to differentiate SoC as the fdt compatible string were the same.
Tested on A10, A20, H3 and A64.
Reviewed by: jmcneill
Differential Revision: https://reviews.freebsd.org/D6868
Move PMAP_TS_REFERENCED_MAX out of the various pmap implementations and
into vm/pmap.h, and describe what its purpose is. Eliminate the archaic
"XXX" comment about its value. I don't believe that its exact value, e.g.,
5 versus 6, matters.
Update the arm64 and riscv pmap implementations of pmap_ts_referenced()
to opportunistically update the page's dirty field.
On amd64, use the PDE value already cached in a local variable rather than
dereferencing a pointer again and again.
Reviewed by: kib, markj
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D7836
An array of bucket locks is added.
All modifications still require the global cache_lock to be held for
writing. However, most readers only need the relevant bucket lock and in
effect can run concurrently to the writer as long as they use a
different lock. See the added comment for more details.
This is an intermediate step towards removal of the global lock.
Reviewed by: kib
Tested by: pho
Our shim for Solaris random_get_bytes() uses read_random(), that looks
reasonable, since it guaranties reliably seeded random data. On the other
side Solaris random_get_pseudo_bytes() does not provide this guarantie,
and its original Solaris implementation is equivalent to our arc4rand(),
using software crypto without stressing slower hardware RNG.
If wait4() or wait6() return 0 because of WNOHANG, the status, rusage and
wrusage information should not be returned.
PR: 212048
Reported by: Casey Lucas
MFC after: 2 weeks
that the latter can easily determine what the trap type actually is
after callers are fixed to encode the type unambigously.
ddb currently barely understands breakpoints, and it treats all
non-breakpoints as single-step traps. This works OK for stopping
after every instruction when single-stepping, but is broken for
single-stepping with a count > 1 (especially with a large count).
ddb needs to stop on the first non-single-step trap while single-
stepping. Otherwise, ddb doesn't even stop the first time for
fatal traps and external breakpoints like the one in kdb_enter().
The lower vnode is already referenced and nodeget is supposed to consume
the reference. Thus the extra vref call was causing a leak.
Reported by: pho
Reviewed by: kib
MFC after: 1 week
Trying to build a MIPS platform that uses INTRNG needs this
for this to work right in gpiobusvar.h :
#ifdef INTRNG
struct intr_map_data_gpio {
struct intr_map_data hdr;
...
};
#endif
* change the HT_RC_2_MCS to do MCS0..23
* Use it when looking up the ht20/ht40 array for bits-per-symbol
* add a clk_to_psec (picoseconds) routine, so we can get sub-microsecond
accuracy for the math
* .. and make that + clk_to_usec public, so higher layer code that is
returning clocks (eg the ANI diag routines, some upcoming locationing
experiments) can be converted to microseconds.
Whilst here, add a comment in ar5416 so i or someone else can revisit the
latency values.
On big endian hardware that uses 1 byte bool a type mismatch of bool vs int will
cause the least signifcant byte of db_cmd_loop_done to be set, but the MSB to be
read, and read as 0. This causes ddb to stay in an infinite loop.
MFC after: 1 week
Split the QUEUE_MACRO_DEBUG into QUEUE_MACRO_DEBUG_TRACE and
QUEUE_MACRO_DEBUG_TRASH.
Add the debug macrso QMD_IS_TRASHED() and QMD_SLIST_CHECK_PREVPTR().
Document these in queue.3.
Reviewed by: emaste
Sponsored by: Dell EMC Isilon
Differential Revision: https://reviews.freebsd.org/D3984
mounts in almost all cases instead of in most cases. Don't override
DOINGASYNC() by any condition except IO_SYNC.
Fix previous sprinking of DOINGASYNC() checks. Don't override IO_SYNC
by DOINGASYNC(). In ffs_write() and ffs_extwrite(), there were
intentional overrides that just broke O_SYNC of data. In
ffs_truncate(), there are 5 calls to ffs_update(), 4 with
apparently-unintentional overrides and 1 without; this had no effect
due to the main async mount hack descibed below.
Fix 1 place in ffs_truncate() where the caller's IO_ASYNC was overridden
for the soft updates case too (to do a delayed write instead of a sync
write). This is supposed to be the only change that affects anything
except async mounts.
In ffs_update(), remove the 19 year old efficiency hack of ignoring
the waitfor flag for async mounts, so that fsync() almost works for
async mounts. All callers are supposed to be fixed to not ask for a
sync update unless they are for fsync() or [I]O_SYNC operations.
fsync() now almost works for async mounts. It used to sync the data
but not the most important metdata (the inode). It still doesn't sync
associated directories.
This gave 10-20% fewer writes for my makeworld benchmark with async
mounted tmp and obj directories from an already small number.
Style fixes:
- in ffs_balloc.c, remove rotted quadruplicated comments about the
simplest part of the DOING*() decisions and rearrange the nearly-
quadruplicated code to be more nearly so.
- in ufs_vnops.c, use a consistent style with less negative logic and
no manual "optimization" of || to | in DOING*() expressions.
Reviewed by: kib (previous version)
This will be required for SMP support on MIPS Malta platform.
Reviewed by: adrian
Sponsored by: DARPA, AFRL
Sponsored by: HEIF5
Differential Revision: https://reviews.freebsd.org/D7835
In vm86.c, fix 2 (rarely used) cases where the return code lost the
single-step indicator. While here, fix 2 misspellings of PSL_T as
PSL_TF (TF is the CPU manufacturer's spelling, but we use T).
In trap.c, turn T_PROTFLT and T_STKFLT into T_TRCTRAP if
vm86_emulate() asked for this (it does this when the instruction is
being traced and was successully emulated). In the kernel case, we
used to deliver the trap as SIGTRAP to the process, where it always
terminated the process; now we deliver the trap as T_TRCTRAP to kdb,
where it normally gives single-stepping. In the user case, the only
difference is that we now clear PSL_T and initialize ucode properly.
Reviewed by: kib
Some statuses, such as "ATA pass through information available", are part
part of absolutely normal operation and do not worth reporting.
MFC after: 2 weeks
truncation failed.
Doing so resulted in inconsistent state of the ufs dirhash with regard
to the actual directory inode state, and could lead to spurious ENOENT
errors for lookups of existing files in production kernels, or
assertion failures in the debugging kernels.
Change the logic of calling ufsdirhash_dirtrunc() to be same as in
ufs_direnter(). Execute UFS_TRUNCATE() first, log error, and only do
dirtrunc() if UFS_TRUNCATE() succeeded.
Note that the problem was exacerbated by the bug in the
flush_newblk_dep() function (see r305599), which caused in the spurios
errors from ffs_sync() and then ffs_truncate().
In collaboration with: pho
Reviewed by: mckusick
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks
The buffer lock is retried on failed LK_SLEEPFAIL attempt, and error
from the failed attempt is irrelevant. But since there is path after
retry which does not clear error, it is possible to return spurious
error from the function.
The issue resulted in a spurious failure of softdep_sync_buf(),
causing further spurious failure of ffs_sync().
In collaboration with: pho
Reviewed by: mckusick
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks
This change is formally not needed, since i_endoff not used in all
code paths after the call to ufs_direnter(), and i_endoff is
recalculated by the next lookup. But having the value correct makes
the reasoning about code simpler.
Reported and tested by: pho
Reviewed by: mckusick
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks
cleared since nothing prevents completion of the parallel quotaoff.
There is nothing to sync in this case, and no reason to panic.
Reported and tested by: pho
Reviewed by: mckusick
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks
operations. Instead of upgrading, assert that the lock is exclusive.
Explain the cause in comments.
This effectively reverts r209367.
Tested by: pho
Reviewed by: mckusick
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks
going to re-read inodes.
Secondary write initiators, e.g. ufs_inactive(), might need to start a
write while owning the vnode lock. Since the suspended state
established by /dev/ufssuspend prevents them from entering
vn_start_secondary_write(), we get deadlock otherwise.
Note that it is arguably not very useful to re-read inodes after
/dev/ufssuspend suspension, because the suspension does not block
readers, and other threads might read existing files in parallel with
suspension owner (for now, only growfs(8)) operations. This
effectively means that suspension owner cannot safely modify existing
inodes, and then there is no sense in re-reading. But keep the code
enabled for now.
Reported and tested by: pho
Reviewed by: mckusick
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks
The GUID string provided by hypervisor has leading and trailing braces,
while our GUID string does not have braces at all. Both braces should
be ignored, when the GUID strings are compared.
Submitted by: Hongjiang Zhang <honzhan microsoft com>
Modified by: sephe
MFC after: 1 week
Sponsored by: Microsoft
Differential Revision: https://reviews.freebsd.org/D7809
While I'm here, pull up the channel callback related code too.
MFC after: 1 week
Sponsored by: Microsoft
Differential Revision: https://reviews.freebsd.org/D7805
The DS_FIELD_LARGE_BLOCKS macro has been unused since the integration of
this patch:
commit ca0cc3918a1789fa839194af2a9245f801a06b1a
Author: Matthew Ahrens <mahrens@delphix.com>
Date: Fri Jul 24 09:53:55 2015 -0700
5959 clean up per-dataset feature count code
Reviewed by: Toomas Soome <tsoome@me.com>
Reviewed by: George Wilson <george@delphix.com>
Reviewed by: Alex Reece <alex@delphix.com>
Approved by: Richard Lowe <richlowe@richlowe.net>
This patch simply removes this macro from dsl_dataset.h.
Reviewed by: Dan Kimmel <dan.kimmel@delphix.com>
Reviewed by: Prakash Surya <prakash.surya@delphix.com>
Reviewed by: Dan McDonald <danmcd@omniti.com>
Reviewed by: Igor Kozhukhov <ikozhukhov@gmail.com>
Author: Matthew Ahrens <mahrens@delphix.com>
When changing zfs_arc_max (e.g. as zdb does), it may be set to less
than the default arc_c_min. arc_c_min should decrease to not be more than
arc_c_max, but it doesn't; therefore tuning of arc_c_max is ineffective.
Reviewed by: Dan Kimmel <dan.kimmel@delphix.com>
Reviewed by: Paul Dagnelie <paul.dagnelie@delphix.com>
Reviewed by: Prakash Surya <prakash.surya@delphix.com>
Reviewed by: Igor Kozhukhov <ikozhukhov@gmail.com>
Author: Matthew Ahrens <mahrens@delphix.com>
openzfs/openzfs@608764bead
The cxgbev/cxlv driver supports Virtual Function devices for Chelsio
T4 and T4 adapters. The VF devices share most of their code with the
existing PF4 driver (cxgbe/cxl) and as such the VF device driver
currently depends on the PF4 driver.
Similar to the cxgbe/cxl drivers, the VF driver includes a t4vf/t5vf
PCI device driver that attaches to the VF device. It then creates
child cxgbev/cxlv devices representing ports assigned to the VF.
By default, the PF driver assigns a single port to each VF.
t4vf_hw.c contains VF-specific routines from the shared code used to
fetch VF-specific parameters from the firmware.
t4_vf.c contains the VF-specific PCI device driver and includes its
own attach routine.
VF devices are required to use a different firmware request when
transmitting packets (which in turn requires a different CPL message
to encapsulate messages). This alternate firmware request does not
permit chaining multiple packets in a single message, so each packet
results in a firmware request. In addition, the different CPL message
requires more detailed information when enabling hardware checksums,
so parse_pkt() on VF devices must examine L2 and L3 headers for all
packets (not just TSO packets) for VF devices. Finally, L2 checksums
on non-UDP/non-TCP packets do not work reliably (the firmware trashes
the IPv4 fragment field), so IPv4 checksums for such packets are
calculated in software.
Most of the other changes in the non-VF-specific code are to expose
various variables and functions private to the PF driver so that they
can be used by the VF driver.
Note that a limited subset of cxgbetool functions are supported on VF
devices including register dumps, scheduler classes, and clearing of
statistics. In addition, TOE is not supported on VF devices, only for
the PF interfaces.
Reviewed by: np
MFC after: 2 months
Sponsored by: Chelsio Communications
Differential Revision: https://reviews.freebsd.org/D7599
If a packet contains the Ethernet header (14 bytes) in the first mbuf
and the payload (IP + UDP + data) in the second mbuf, then the attempt
to fetch the l3hdr will return a NULL pointer. The first loop iteration
will drop len to zero and exit the loop without setting 'p'. However,
the desired data is at the start of the second mbuf, so the correct
behavior is to loop around and let the conditional set 'p' to m_data of
the next mbuf (and leave offset as 0).
Reviewed by: np
Sponsored by: Chelsio Communications
the data cache to the point of unification. This is the point where the
two caches are unified to a single unified cache so cleaning past here
is just extra unneeded work.
This was noticed when investigating r305545.
Reported by: bz
Obtained from: ABT Systems Ltd
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
page is non-executable the contents of the i-cache are unimportant so this
call is just adding unneeded overhead when inserting pages.
While doing research using gem5 with an O3 pipeline and 1k/32k/1M iTLB/L1
iCache/L2 Bjoern Zeeb (bz@) observed a fairly high rate of calls into
arm64_icache_sync_range() from pmap_enter() along with a high number of
instruction fetches and iTLB/iCache hits.
Limiting the calls to arm64_icache_sync_range() to only executable pages,
we observe the iTLB and iCache Hit going down by about 43%. These numbers
are quite misleading when looked at alone as at the same time instructions
retired were reduced by 19.2% and instruction fetches were reduced by 38.8%.
Overall this reduced the runtime of the test program by 22.4%.
On Juno hardware, in steady-state, running the same test, using the cycle
count to determine runtime, we do see a reduction of up to 28.9% in runtime.
While these numbers certainly depend on the program executed, we expect an
overall performance improvement.
Reported by: bz
Obtained from: ABT Systems Ltd
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Due to reading initialized variable, FIS receive area was always allocated
as 256 bytes, suitable for command-based switching, instead of 4096 bytes,
required for FIS-based switching. This caused memory corruption in case of
port multipliers used on FBS-capable HBAs (Marvell).
MFC after: 1 week
More changes to MIPS may be required, as commented in D7692, but this
revision aims to restore MIPS INTRNG functionality so we can move on
with working interrupts.
Reported by: yamori813@yahoo.co.jp
Tested by: mizhka (on BCM), sgalabov (on Mediatek)
Reviewed by: adrian, nwhitehorn (older version)
Sponsored by: Smartcom - Bulgaria AD
Differential Revision: https://reviews.freebsd.org/D7692
Let drivers for Alpine CCU, NB and Serdes take care of internal SoC configuration.
Obtained from: Semihalf
Submitted by: Michal Stanek <mst@semihalf.com>
Sponsored by: Annapurna Labs
Reviewed by: imp,wma
Differential Revision: https://reviews.freebsd.org/D7566
This commit adds drivers for Alpine Cache Coherency Unit
and North Bridge Service whose task is to configure
the system fabric and enable cache coherency.
Obtained from: Semihalf
Submitted by: Michal Stanek <mst@semihalf.com>
Sponsored by: Annapurna Labs
Reviewed by: wma
Differential Revision: https://reviews.freebsd.org/D7565
pmap_early_io_map()/pmap_early_io_unmap(), if used in pairs, should be used in
the form:
pmap_early_io_map()
..do stuff..
pmap_early_io_unmap()
Without other allocations in the middle. Without reclaiming memory this can
leave large holes in the device space.
While here, make a simple change to the unmap loop which now permits it to unmap
multiple TLB entries in the range.
Such errors can occur as the result of a write error or because the disk
backing the mirror element was removed. They result in a generation ID bump
on all active elements of the mirror, so we can safely disconnect the mirror
component rather than destroy it.
MFC after: 2 weeks
Sponsored by: EMC / Isilon Storage Division
Differential Revision: https://reviews.freebsd.org/D7750
These are useful for testing changes to I/O error handling, and for
reproducing existing bugs in a controlled manner. The fail points are
g_mirror_regular_request_read
g_mirror_regular_request_write
g_mirror_sync_request_read
g_mirror_sync_request_write
g_mirror_metadata_write
They all effectively allow one to inject an error value into the bio_error
field of a corresponding BIO request as it is being completed.
MFC after: 2 weeks
Sponsored by: EMC / Isilon Storage Division
have the serious problem of not actually attaching the hardware they
are driving at the bus level. This causes creator(4) and machfb(4)
to attach and drive the very same hardware in parallel when both
syscons(4) and vt(4) as well as their associated hardware drivers
are built into a kernel, i. e. GENERIC, at the same time.
Also, syscons(4) and its drivers still are way superior to vt(4) and
its equivalents; unlike the syscons(4) counterparts the vt(4) drivers
don't provide hardware acceleration resulting in considerably slower
screen drawing, creator_vt(4) doesn't provide a /dev/fb node as
required by the Xorg sunffb(4) etc. In theory, vt_ofwfb(4) should be
able to handle more devices than machfb(4). However, testing shows
that it hardly works with any hardware machfb(4) isn't also able to
drive, making vt(4) and vt_ofwfb(4) not favorable for the time being
from that perspective either.
MFC after: 3 days
Use C99 designators to set the value of each slot and the nitems macro to
check for valid entries. In the process, switch to indexing by signal
number rather than signal-1 for improved clarity.
Obtained from: CheriBSD (a6053c5abf03a5f53bbfcdd3a26429383f67e09f)
Sponsored by: DARPA, AFRL
Reviewed by: kib
The previous code was forcing an expensive walk in vop_stdvptocnp,
which was causing performance issues on highly contended zfs.
No objections: kib
MFC after: 2 weeks
Add routines to trigger a function level reset (FLR) of a PCI-express
device via the PCI-express device control register. This also includes
support routines to wait for pending transactions to complete as well
as calculating the maximum completion timeout permitted by a device.
Change the ppt(4) driver to reset pass through devices before attaching
to a VM during startup and before detaching from a VM during shutdown.
Reviewed by: imp, wblock (earlier version)
MFC after: 1 month
Sponsored by: Chelsio Communications
Differential Revision: https://reviews.freebsd.org/D7751
This driver supports two bindings:
- cpufreq-dt: systems which share clock and voltage across all CPUs
- arm_big_little_dt: systems which share clock and voltage across all
CPUs in a single cluster
Reviewed by: andrew, imp
Relnotes: yes
Differential Revision: https://reviews.freebsd.org/D7741
When the I/O MMU is active in bhyve, all PCI devices need valid entries
in the DMAR context tables. The I/O MMU code does a single enumeration
of the available PCI devices during initialization to add all existing
devices to a domain representing the host. The ppt(4) driver then moves
pass through devices in and out of domains for virtual machines as needed.
However, when new PCI devices were added at runtime either via SR-IOV or
HotPlug, the I/O MMU tables were not updated.
This change adds a new set of EVENTHANDLERS that are invoked when PCI
devices are added and deleted. The I/O MMU driver in bhyve installs
handlers for these events which it uses to add and remove devices to
the "host" domain.
Reviewed by: imp
Sponsored by: Chelsio Communications
Differential Revision: https://reviews.freebsd.org/D7667
This allows a pass through device to be reset to a normal device driver
on the host and reused on the host. ppt devices are now always active in
some I/O MMU domain when the I/O MMU is active, either the host domain
or the domain of a VM they are attached to.
Sponsored by: Chelsio Communications
Differential Revision: https://reviews.freebsd.org/D7666
This is required on my system, which loads nvidia, vmm, and zfs, and 48M is
no longer enough for that. nvidia-driver's recent update increased its size
by several megabytes.
Reviewed by: jhb
MFC after: 1 week
Other files including pci_host_generic.h failed to compile
due to missing declaration of enum pci_id_type.
Obtained from: Semihalf
Submitted by: Michal Stanek <mst@semihalf.com>
Sponsored by: Annapurna Labs
Reviewed by: wma
Differential Revision: https://reviews.freebsd.org/D7561
Split usbd_xfer_status() check:
- Check xfer length: must be longer, than Rx descriptor size.
- Check frame size: must be shorter than xfer length.
- Discard too short frames.
Tested with WUSB54GC, STA/MONITOR modes.
The upstream change introduced a new load state, SPA_LOAD_CREATE,
and vdev_geom code needs to be aware of it.
Tested by: cy
MFC after: 1 week
X-MFC with: r305331
This adds bhnd(4) bus-level support for querying backplane interrupt vector
routing, and delegating machine/bridge-specific interrupt handling to the
concrete bhnd(4) driver implementation.
On bhndb(4) bridged PCI devices, we provide the PCI/MSI interrupt directly
to attached cores.
On MIPS devices, we report a backplane interrupt count of 0, effectively
disabling the bus-level interrupt assignment. This allows mips/broadcom
to temporarily continue using hard-coded MIPS IRQs until bhnd_mips PIC
support is implemented.
Reviewed by: mizhka
Approved by: adrian (mentor, implicit)
Broadcom Intensi-fi chipsets provided a common set of IP cores; on PCI/PCIe
devices, the USB11 host controller is left floating.
Approved by: adrian (mentor, implicit)
This patch adds driver implementation for BHND USB core. Driver has been
imported from ZRouter project with small adaptions for FreeBSD 11.
Also it's enabled for BroadCom MIPS74k boards by default. It's fully tested
on Asus boards (RT-N16: external USB, RT-N53: USB bus between SoC and WiFi
chips).
Reviewed by: adrian (mentor), ray
Approved by: adrian (mentor)
Obtained from: ZRouter
Differential Revision: https://reviews.freebsd.org/D7781
by reviving the SX control request lock and refining which lock
protects the common scratch area in "struct usb_device".
The SX control request lock was removed by r246759 because it caused a
lock order reversal with the USB enumeration lock inside
usbd_transfer_setup() as a function of r246616. It was thought that
reducing the number of locks would resolve the LOR, but because some
USB device drivers use usbd_do_request_flags() inside callback
functions, like in taskqueues, a deadlock may occur when these are
drained from device_detach(). By restoring the SX control request
lock usbd_do_request_flags() is allowed to complete its execution
when a USB device driver is detaching. By using the SX control request
lock to protect the scratch area, the LOR introduced by r246616 is
also resolved.
Bump the FreeBSD version while at it to force recompilation of all USB
kernel modules.
Found by: avos@
MFC after: 1 week
like we've done elsewhere, e.g., amd64.
As an optimization to the machine-independent layer, change the machine-
dependent pmap_ts_referenced() so that it updates the page's dirty field
if a modified bit is found while counting reference bits. This
opportunistic update can be performed at low cost and can eliminate the
need for some future calls to pmap_is_modified() by the machine-
independent layer.
MFC after: 3 weeks
survived multiple world and kernel builds, and of poudriere building full
package sets.
I have observed a 3% reduction in buildworld times with superpages enabled,
however further testing is needed to see if this is observed in other
workloads.
Obtained from: ABT Systems Ltd
MFC after: 1 month
Sponsored by: The FreeBSD Foundation
warning:
sys/netinet/igmp.c:546:21: error: implicit conversion from 'int' to 'char' changes value from 148 to -108 [-Werror,-Wconstant-conversion]
p->ipopt_list[0] = IPOPT_RA; /* Router Alert Option */
~ ^~~~~~~~
sys/netinet/ip.h:153:19: note: expanded from macro 'IPOPT_RA'
#define IPOPT_RA 148 /* router alert */
^~~
This is because ipopt_list is an array of char, so IPOPT_RA is wrapped
to a negative value. It would be nice to change ipopt_list to an array
of u_char, but it changes the signature of the public struct ipoption,
so add an explicit cast to suppress the warning.
Reviewed by: imp
MFC after: 3 days
Differential Revision: https://reviews.freebsd.org/D7777
sys/dev/usb/serial/uplcom.c:543:29: error: implicit conversion from 'int' to 'int8_t' (aka 'signed char') changes value from 192 to -64 [-Werror,-Wconstant-conversion]
if (uplcom_pl2303_do(udev, UT_READ_VENDOR_DEVICE, UPLCOM_SET_REQUEST, 0x8484, 0, 1)
~~~~~~~~~~~~~~~~ ^~~~~~~~~~~~~~~~~~~~~
sys/dev/usb/usb.h:179:53: note: expanded from macro 'UT_READ_VENDOR_DEVICE'
#define UT_READ_VENDOR_DEVICE (UT_READ | UT_VENDOR | UT_DEVICE)
~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~
This is because UT_READ is 0x80, so the int8_t argument is wrapped to a
negative value. Fix this by using uint8_t instead.
Reviewed by: imp, hselasky
MFC after: 3 days
Differential Revision: https://reviews.freebsd.org/D7776
Since negative entries are managed with a LRU list, a hit requires a
modificaton.
Currently the code tries to upgrade the global lock if needed and is
forced to retry the lookup if it fails.
Provide a dedicated lock for use when the cache is only shared-locked.
Reviewed by: kib
MFC after: 1 week
This fixes bhnd(4) nvram handling on devices that map SPROM CSRs via PCI
configuration space.
The probe method previously required that a bhnd(4) device be attached to the
parent bridge; now that the bhnd_nvram device is always attached first, this
unnecessary sanity check always failed.
Approved by: adrian (mentor, implicit)
On BCM4321 chipsets, both PCI and PCIe cores are included, with one of
the cores potentially left floating.
Since the PCI core appears first in the device table, and the PCI
profiles appear first in the resource configuration tables, this resulted in
incorrectly matching and using the PCI/v1 resource configuration on PCIe
devices, rather than the correct PCIe/v1 profile.
Approved by: adrian (mentor, implicit)
Adds support for probing and initializing bhndb(4) bridge state using
the bhnd_erom API, ensuring that full bridge configuration is available
*prior* to actually attaching and enumerating the bhnd(4) child device,
allowing us to safely allocate bus-level agent/device resources during
bhnd(4) bus enumeration.
- Add a bhnd_erom_probe() method usable by bhndb(4). This is an analogue
to the existing bhnd_erom_probe_static() method, and allows the bhndb
bridge to discover the best available erom parser class prior to newbus
probing of its children.
- Add support for supplying identification hints when probing erom
devices. This is required on early EXTIF-only chipsets, where chip
identification registers are not available.
- Migrate bhndb over to the new bhnd_erom API, using bhnd_core_info
records rather than bridged bhnd(4) device_t references to determine
the bridged chipsets' capability/bridge configuration.
- The bhndb parent (e.g. if_bwn) is now required to supply a hardware
priority table to the bridge. The default table is currently sufficient
for our supported devices.
- Drop the two-pass attach approach we used for compatibility with bhndb(4) in
the bhnd(4) bus drivers, and instead perform bus enumeration immediately,
and allocate bridged per-child bus-level resources during that enumeration.
Approved by: adrian (mentor)
Differential Revision: https://reviews.freebsd.org/D7768
The pager getpages interface allows the caller to bound the number of
readahead and readbehind pages, and vm_fault_hold() makes use of this
feature. These bounds were ignored after r305056, causing the swap pager
to potentially page in more than the specified number of pages.
Reported and reviewed by: alc
X-MFC with: r305056
This defines a new bhnd_erom_if API, providing a common interface to device
enumeration on siba(4) and bcma(4) devices, for use both in the bhndb bridge
and SoC early boot contexts, and migrates mips/broadcom over to the new API.
This also replaces the previous adhoc device enumeration support implemented
for mips/broadcom.
Migration of bhndb to the new API will be implemented in a follow-up commit.
- Defined new bhnd_erom_if interface for bhnd(4) device enumeration, along
with bcma(4) and siba(4)-specific implementations.
- Fixed a minor bug in bhndb that logged an error when we attempted to map the
full siba(4) bus space (18000000-17FFFFFF) in the siba EROM parser.
- Reverted use of the resource's start address as the ChipCommon enum_addr in
bhnd_read_chipid(). When called from bhndb, this address is found within the
host address space, resulting in an invalid bridged enum_addr.
- Added support for falling back on standard bus_activate_resource() in
bhnd_bus_generic_activate_resource(), enabling allocation of the bhnd_erom's
bhnd_resource directly from a nexus-attached bhnd(4) device.
- Removed BHND_BUS_GET_CORE_TABLE(); it has been replaced by the erom API.
- Added support for statically initializing bhnd_erom instances, for use prior
to malloc availability. The statically allocated buffer size is verified both
at runtime, and via a compile-time assertion (see BHND_EROM_STATIC_BYTES).
- bhnd_erom classes are registered within a module via a linker set, allowing
mips/broadcom to probe available EROM parser instances without creating a
strong reference to bcma/siba-specific symbols.
- Migrated mips/broadcom to bhnd_erom_if, replacing the previous MIPS-specific
device enumeration implementation.
Approved by: adrian (mentor)
Differential Revision: https://reviews.freebsd.org/D7748
Idle page zeroing has been disabled by default on all architectures since
r170816 and has some bugs that make it seemingly unusable. Specifically,
the idle-priority pagezero thread exacerbates contention for the free page
lock, and yields the CPU without releasing it in non-preemptive kernels. The
pagezero thread also does not behave correctly when superpage reservations
are enabled: its target is a function of v_free_count, which includes
reserved-but-free pages, but it is only able to zero pages belonging to the
physical memory allocator.
Reviewed by: alc, imp, kib
Differential Revision: https://reviews.freebsd.org/D7714
sys/dev/cxgb/cxgb_sge.c:2873:44: error: implicit conversion from 'int'
to 'char' changes value from 128 to -128 [-Werror,-Wconstant-conversion]
*mtod(m, char *) = CPL_ASYNC_NOTIF;
~ ^~~~~~~~~~~~~~~
This is because CPL_ASYNC_NOTIF is 0x80, so the plain char argument is
wrapped to a negative value. Fix this by using uint8_t instead.
Reviewed by: np
MFC after: 3 days
Differential Revision: https://reviews.freebsd.org/D7772
not page aligned. To do this, use the ld script gnu ld installs on my
system.
This is imperfect: LDFLAGS_BIN and LD_FLAGS_BIN describe different
things. The loader script could be better named and take into account
other architectures. And having two different mechanisms to do
basically the same thing needs study. However, it's blocking forward
progress on lld, so I'll work in parallel to sort these out.
Differential Revision: https://reviews.freebsd.org/D7409
Reviewed by: emaste
sys/dev/ppbus/ppb_1284.c:296:46: error: implicit conversion from 'int'
to 'char' changes value from 144 to -112 [-Werror,-Wconstant-conversion]
if ((error = do_peripheral_wait(bus, SELECT | nBUSY, 0))) {
~~~~~~~~~~~~~~~~~~ ~~~~~~~^~~~~~~
sys/dev/ppbus/ppb_1284.c:785:48: error: implicit conversion from 'int'
to 'char' changes value from 240 to -16 [-Werror,-Wconstant-conversion]
if (do_1284_wait(bus, nACK | SELECT | PERROR | nBUSY,
~~~~~~~~~~~~ ~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~
sys/dev/ppbus/ppb_1284.c:786:29: error: implicit conversion from 'int'
to 'char' changes value from 240 to -16 [-Werror,-Wconstant-conversion]
nACK | SELECT | PERROR | nBUSY)) {
~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~
This is because nBUSY is 0x80, so the plain char argument is wrapped to
a negative value. Fix this in a minimal fashion, by using uint8_t in a
few places.
Reviewed by: emaste
MFC after: 3 days
Differential Revision: https://reviews.freebsd.org/D7771
The linker script CONSTRUCTORS keyword is only meaningful "when linking
object file formats which do not support arbitrary sections, such as
ECOFF and XCOFF"[1] and is ignored for other object file formats.
LLVM's lld does not yet accept (and ignore) CONSTRUCTORS, so just remove
CONSTRUCTORS from the linker script as it has no effect.
[1] https://sourceware.org/binutils/docs/ld/Output-Section-Keywords.html
Using a benchmark which has 32 threads creating 2 million files in the
same directory, on a machine with 16 CPU cores, I observed poor
performance. I noticed that dmu_tx_hold_zap() was using about 30% of
all CPU, and doing dnode_hold() 7 times on the same object (the ZAP
object that is being held).
dmu_tx_hold_zap() keeps a hold on the dnode_t the entire time it is
running, in dmu_tx_hold_t:txh_dnode, so it would be nice to use the
dnode_t that we already have in hand, rather than repeatedly calling
dnode_hold(). To do this, we need to pass the dnode_t down through
all the intermediate calls that dmu_tx_hold_zap() makes, making these
routines take the dnode_t* rather than an objset_t* and a uint64_t
object number. In particular, the following routines will need to have
analogous *_by_dnode() variants created:
dmu_buf_hold_noread()
dmu_buf_hold()
zap_lookup()
zap_lookup_norm()
zap_count_write()
zap_lockdir()
zap_count_write()
This can improve performance on the benchmark described above by 100%,
from 30,000 file creations per second to 60,000. (This improvement is on
top of that provided by working around the object allocation issue. Peak
performance of ~90,000 creations per second was observed with 8 CPUs;
adding CPUs past that decreased performance due to lock contention.) The
CPU used by dmu_tx_hold_zap() was reduced by 88%, from 340 CPU-seconds
to 40 CPU-seconds.
Sponsored by: Intel Corp.
Closes#109
Reviewed by: Steve Gonczi <steve.gonczi@delphix.com>
Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: Pavel Zakharov <pavel.zakharov@delphix.com>
Reviewed by: Ned Bass <bass6@llnl.gov>
Reviewed by: Brian Behlendorf <behlendorf1@llnl.gov>
Author: Matthew Ahrens <mahrens@delphix.com>
openzfs/openzfs@d3e523d489
This resolves two 'zfs recv' issues. First, when receiving into an
existing filesystem, a snapshot created during the receive process is
not added to the guid->dataset map for the stream, resulting in failed
lookups for deduped streams when a WRITE_BYREF record refers to a
snapshot received earlier in the stream. Second, the newly created
snapshot was also not set properly, referencing the snapshot before the
new receiving dataset rather than the existing filesystem.
Closes#159
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Dan Kimmel <dan.kimmel@delphix.com>
Author: Chris Williamson <chris.williamson@delphix.com>
openzfs/openzfs@b09697c8c1
zap_lockdir() / zap_unlockdir() should take a "void *tag" argument which
tags the hold on the zap. This will help diagnose programming errors
which misuse the hold on the ZAP.
Sponsored by: Intel Corp.
Closes#108
Reviewed by: Pavel Zakharov <pavel.zakharov@delphix.com>
Reviewed by: Steve Gonczi <steve.gonczi@delphix.com>
Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: Brian Behlendorf <behlendorf1@llnl.gov>
Author: Matthew Ahrens <mahrens@delphix.com>
openzfs/openzfs@0780b3eab5
7230 add assertions to dmu_send_impl() to verify that stream includes BEGIN and END records
illumos/illumos-gate@12b90ee2d3https://github.com/illumos/illumos-gate/commit/12b90ee2d3b10689fc45f4930d2392f5f
e1d9cfa
https://www.illumos.org/issues/7230
A test failure occurred where a send stream had only a BEGIN record. This
should not be possible if the send returns without error. Prevented this from
happening in the future by adding an assertion to dmu_send_impl() to verify
that if the function returns 0 (success) both a BEGIN and END record are
present. Did this by adding flags to dmu_sendarg_t (indicating whether BEGIN o
r
END records sent), having dump_record() set flags appropriately, adding VERIFY
statement to dmu_send_impl().
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Paul Dagnelie <pcd@delphix.com>
Reviewed by: Igor Kozhukhov <ikozhukhov@gmail.com>
Approved by: Robert Mustacchi <rm@joyent.com>
Author: Matt Krantz <matt.krantz@delphix.com>
illumos/illumos-gate@0f7643c737https://github.com/illumos/illumos-gate/commit/0f7643c7376dd69a08acbfc9d1d7d548b
10c846a
https://www.illumos.org/issues/7090
When write I/Os are issued, they are issued in block order but the ZIO pipelin
e
will drive them asynchronously through the allocation stage which can result i
n
blocks being allocated out-of-order. It would be nice to preserve as much of
the logical order as possible.
In addition, the allocations are equally scattered across all top-level VDEVs
but not all top-level VDEVs are created equally. The pipeline should be able t
o
detect devices that are more capable of handling allocations and should
allocate more blocks to those devices. This allows for dynamic allocation
distribution when devices are imbalanced as fuller devices will tend to be
slower than empty devices.
The change includes a new pool-wide allocation queue which would throttle and
order allocations in the ZIO pipeline. The queue would be ordered by issued
time and offset and would provide an initial amount of allocation of work to
each top-level vdev. The allocation logic utilizes a reservation system to
reserve allocations that will be performed by the allocator. Once an allocatio
n
is successfully completed it's scheduled on a given top-level vdev. Each top-
level vdev maintains a maximum number of allocations that it can handle
(mg_alloc_queue_depth). The pool-wide reserved allocations (top-levels *
mg_alloc_queue_depth) are distributed across the top-level vdevs metaslab
groups and round robin across all eligible metaslab groups to distribute the
work. As top-levels complete their work, they receive additional work from the
pool-wide allocation queue until the allocation queue is emptied.
Reviewed by: Adam Leventhal <ahl@delphix.com>
Reviewed by: Alex Reece <alex@delphix.com>
Reviewed by: Christopher Siden <christopher.siden@delphix.com>
Reviewed by: Dan Kimmel <dan.kimmel@delphix.com>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Paul Dagnelie <paul.dagnelie@delphix.com>
Reviewed by: Prakash Surya <prakash.surya@delphix.com>
Reviewed by: Sebastien Roy <sebastien.roy@delphix.com>
Approved by: Robert Mustacchi <rm@joyent.com>
Author: George Wilson <george.wilson@delphix.com>
7086 ztest attempts dva_get_dsize_sync on an embedded blockpointer
illumos/illumos-gate@926549256bhttps://github.com/illumos/illumos-gate/commit/926549256b71acd595f69b236779ff6b7
8fa08ef
https://www.illumos.org/issues/7086
In dbuf_dirty(), we need to grab the dn_struct_rwlock before looking at the
db_blkptr, to prevent it from being changed by syncing context.
Otherwise we may see that ztest got a segfault from this stack:
libzpool.so.1`dva_get_dsize_sync+0x98(872f000, b32b240, fed7811b, 0, b4cda20,
0)
libzpool.so.1`bp_get_dsize+0x60(872f000, b32b240, 0, 97cb780, 9d4c1a8, 0)
libzpool.so.1`dbuf_dirty+0x9b3(ce0a100, 97cb780, 9, fecd2530)
libzpool.so.1`dmu_buf_will_dirty+0xc3(ce0a100, 97cb780, ea293d6c, 1)
libzpool.so.1`zap_lockdir+0x1a0(8aaa3c0, 1, 0, 97cb780, 1, 1)
libzpool.so.1`zap_remove_norm+0x30(8aaa3c0, 1, 0, 8728b10, 0, 97cb780)
libzpool.so.1`zap_remove+0x29(8aaa3c0, 1, 0, 8728b10, 97cb780, a)
ztest_replay_remove+0x225(ea294588, 8728ae8, 0, 38010000, 0, 0)
ztest_remove+0x9f(ea294588, ea293f50, 4, 3)
ztest_object_init+0x78(ea294588, ea293f50, 4e0, 1)
ztest_dmu_object_alloc_free+0x71(ea294588, 13)
ztest_dmu_objset_create_destroy+0x224(80cef08, 13, 0, 805d36c, 9017ad44, 0)
ztest_execute+0x89(a, 807c720, 13, 0)
ztest_thread+0xea(13, 0, 0, 0)
libc.so.1`_thrp_setup+0x88(f0983240)
libc.so.1`_lwp_start(f0983240, 0, 0, 0, 0, 0)
Looking into it a bit, we see that this is an embedded blockpointer, so
BP_GET_NDVAS should have returned 0:
b32b240::blkptr
EMBEDDED [L0 ZAP_OTHER] et=0 LZ4 size=200L/4aP birth=80L
Instead, it looks like another thread is modifying this blockpointer:
b32b240::ugrep | ::whatis
f47a0e0c is in [ stack tid=0x19f ]
ebd6ec40 is in [ stack tid=0x226 ]
ea293bd0 is in [ stack tid=0x244 ]
ea293be4 is in [ stack tid=0x244 ]
Reviewed by: Prakash Surya <prakash.surya@delphix.com>
Reviewed by: George Wilson <george.wilson@delphix.com>
Approved by: Robert Mustacchi <rm@joyent.com>
Author: Matthew Ahrens <mahrens@delphix.com>
7072 zfs fails to expand if lun added when os is in shutdown state
illumos/illumos-gate@c39a2aae1ec39a2aae1ehttps://www.illumos.org/issues/7072
upstream:
38733 zfs fails to expand if lun added when os is in shutdown state
DLPX-36910 spares and caches should not display expandable space
DLPX-39262 vdev_disk_open spam zfs_dbgmsg buffer
Reviewed by: Igor Kozhukhov <ikozhukhov@gmail.com>
Reviewed by: Dan Kimmel <dan.kimmel@delphix.com>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Prakash Surya <prakash.surya@delphix.com>
Reviewed by: Alex Reece <alex@delphix.com>
Approved by: Dan McDonald <danmcd@omniti.com>
Author: George Wilson <george.wilson@delphix.com>
illumos/illumos-gate@dcbf3bd6a1dcbf3bd6a1https://www.illumos.org/issues/6950
When reading compressed data from disk, the ARC should keep the compressed
block cached and only decompress it when consumers access the block. The
uncompressed data should be short-lived allowing the ARC to cache a much larger
amount of data. The DMU would also maintain a smaller cache of uncompressed
blocks to minimize the impact of decompressing frequently accessed blocks.
Reviewed by: Prakash Surya <prakash.surya@delphix.com>
Reviewed by: Dan Kimmel <dan.kimmel@delphix.com>
Reviewed by: Matt Ahrens <mahrens@delphix.com>
Reviewed by: Paul Dagnelie <pcd@delphix.com>
Reviewed by: Don Brady <don.brady@intel.com>
Reviewed by: Richard Elling <Richard.Elling@RichardElling.com>
Approved by: Richard Lowe <richlowe@richlowe.net>
Author: George Wilson <george.wilson@delphix.com>