The implementation of ZFS refcount_t uses the emulated illumos mutex
(the sx lock) and the waiting memory allocation when ZFS_DEBUG is
enabled. This makes refcount_t unsuitable for use in GEOM g_up
thread where sleeping is prohibited.
When importing the ABD change I modified vdev_geom using illumos
vdev_disk as an example. As a result, I added a call to abd_return_buf
in vdev_geom_io_intr. The latter is called on g_up thread while the
former uses refcount_t.
This change fixes the problem by deferring the abd_return_buf call to
the previously unused vdev_geom_io_done that is called on a ZFS zio
taskqueue thread where sleeping is allowed.
A side bonus of this change is that now a vdev zio has a pointer
to its corresponding bio while the zio is active.
Reported by: Shawn Webb <shawn.webb@hardenedbsd.org>
Tested by: Shawn Webb <shawn.webb@hardenedbsd.org>
MFC after: 1 week
X-MFC with: r320156
This finishes what r245164 started and makes open(..., O_APPEND) work again
after r299753.
- Pass ioflags, incl. IO_APPEND, down to the direct write backend (r245164
added it to only the bio backend).
- (r299753 changed the WRONLY backend from bio to direct.)
PR: 220185
Reported by: Ben RUBSON <ben.rubson at gmail.com>
Reviewed by: bapt@, rmacklem@
Sponsored by: Dell EMC Isilon
Differential Revision: https://reviews.freebsd.org/D11348
The summary of changes is as follows..
Generic changes::
- Added configure support [2].
- Check for lchmod filesystem support with create_file(..); for
testcases that require lchmod, skip the testcase -- otherwise
use chmod directly [1].
- Added Travis CI integration [2].
- Added utimensat testcases [1].
Linux support::
- Fixed Linux support to pass on later supported versions of
Fedora/Ubuntu [2].
- Conditionally enable posix_fallocate(2) support [2].
OSX support::
- Fixed compilation on OSX [2].
- Added partial OSX support (the test run isn't fully green yet)
[2].
MFC after: 2 months
Obtained from: https://github.com/pjd/pjdfstest/tree/0.1
Relnotes: yes
Submitted by: asomers [1], ngie [2]
Tested with: UFS, ZFS
The summary of changes is as follows..
Generic changes::
- Added configure support [2].
- Check for lchmod filesystem support with create_file(..); for
testcases that require lchmod, skip the testcase -- otherwise
use chmod directly [1].
- Added Travis CI integration [2].
- Added utimensat testcases [1].
Linux support::
- Fixed Linux support to pass on later supported versions of
Fedora/Ubuntu [2].
- Conditionally enable posix_fallocate(2) support [2].
OSX support::
- Fixed compilation on OSX [2].
- Added partial OSX support (the test run isn't fully green yet)
[2].
Obtained from: https://github.com/pjd/pjdfstest/tree/0.1
Submitted by: asomers [1], ngie [2]
- Including bsd.own.mk isn't required since no MK_<foo> knobs are being
manipulated.
- Update documentation to note that ${FILES} is installed via bsd.progs.mk,
not bsd.prog.mk.
MFC after: 1 month
- TESTSDIR doesn't need to be specified after r289158.
- Including bsd.own.mk isn't required since no MK_<foo> knobs are being
manipulated.
- TESTS_SUBDIRS should be written out in an append format, one entry
per line, to provide a better, more conflict resistant example.
MFC after: 1 month
In jemalloc 5, there are no longer chunks, and as configured on
FreeBSD (the "retain" option defaults to false), the mmap()
requests are precisely sized for the specific needs, which means
the virtual memory overhead should be lower for small applications.
Reviewed by: jasone, ian
Differential Revision: https://reviews.freebsd.org/D11366
a hint.
Right now, for non-fixed mmap(2) calls, addr is de-facto interpreted
as the absolute minimal address of the range where the mapping is
created. The VA allocator only allocates in the range [addr,
VM_MAXUSER_ADDRESS]. This is too restrictive, the mmap(2) call might
unduly fail if there is no free addresses above addr but a lot of
usable space below it.
Lift this implementation limitation by allocating VA in two passes.
First, try to allocate above addr, as before. If that fails, do the
second pass with less restrictive constraints for the start of
allocation by specifying minimal allocation address at the max bss
end, if this limit is less than addr.
One important case where this change makes a difference is the
allocation of the stacks for new threads in libthr. Under some
configuration conditions, libthr tries to hint kernel to reuse the
main thread stack grow area for the new stacks. This cannot work by
design now after grow area is converted to stack, and there is no
unallocated VA above the main stack. Interpreting requested stack
base address as the hint provides compatibility with old libthr and
with (mis-)configured current libthr.
Reviewed by: alc
Tested by: dim (previous version)
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
the "effective MSS" for the connection. The chip expects it this way.
Submitted by: Krishnamraju Eraparaju @ Chelsio
MFC after: 3 days
Sponsored by: Chelsio Communications
If a peripheral driver (e.g. da, sa, cd) is added or removed from the
peripheral driver list while an unrelated peripheral driver instance (e.g.
da0, sa5, cd2) is going away and is inside camperiphfree(), we could
dereference an invalid pointer.
When peripheral drivers are added or removed (see periphdriver_register()
and periphdriver_unregister()), the peripheral driver array is resized
and existing entries are moved.
Although we hold the topology lock while we traverse the peripheral driver
list, we retain a pointer to the location of the peripheral driver pointer
and then drop the topology lock. So we are still vulnerable to the list
getting moved around while the lock is dropped.
To solve the problem, cache a copy of the peripheral driver pointer. If
its storage location in the list changes while we have the lock dropped, it
won't have any effect.
This doesn't solve the issue that peripheral drivers ("da", "cd", as opposed
to individual instances like "da0", "cd0") are not generally part of a
reference counting scheme to guard against deregistering them while there
are instances active. The caller (generally the person unloading a module)
has to be aware of active drivers and not unload something that is in use.
sys/cam/cam_periph.c:
In camperiphfree(), cache a pointer to the peripheral driver
instance to avoid holding a pointer to an invalid memory location
in the event that the peripheral driver list changes while we have
the topology lock dropped.
PR: kern/219701
Submitted by: avg
MFC after: 3 days
Sponsored by: Spectra Logic
Without the allocation length set, the target will either reject
the command or complete it without transferring any data.
This fixes the REPORT ZONES command for SCSI ZBC protocol devices,
as well as ATA ZAC protocol devices that are behind a SCSI to ATA
translation layer. (LSI/Broadcom's 12Gb SAS adapters translate ZBC
commands to ZAC commands.) Those are Host Aware and Host Managed SMR
drives.
This will fix REPORT ZONE commands sent to the da(4) driver via the
GEOM bio interface and zonectl, and REPORT ZONE commands sent from
camcontrol(8).
Note that in the case of camcontrol(8), we currently only send
SCSI ZBC commands to native SCSI protocol devices, not ATA devices
behind a SAT layer.
sys/cam/scsi/scsi_da.c:
Fill in the length field in scsi_zbc_in().
MFC after: 3 days
Sponsored by: Spectra Logic
and "next_skip" variables. The "skip" value in struct blist has long been
a 64-bit quantity but various functions have implicitly truncated this
value to 32 bits. Now, all arithmetic involving the "skip" value is 64
bits wide. (This should allow us to relax the size limit on a swap device
in the swap pager.)
Maintain the ability to test this allocator as a user-space application by
including <stdbool.h>.
Remove an unused variable from blst_radix_print().
Reviewed by: kib, markj
MFC after: 4 weeks
Differential Revision: https://reviews.freebsd.org/D11358
changed from %d to a long-width type.
Use uintmax_t casting and %ju to futureproof the format string against
potential changes with either the #define or the implementation-specific
definition for offsetof(..).
The fields exist on all versions of the filesystem and using them is a mount
option on linux. For FreeBSD, the corresponding i_uid and i_gid are always
long enough so use them by default.
Reviewed by: Fedor Uporov
MFC after: 1 month
Differential Revision: https://reviews.freebsd.org/D11354
isn't supported
This will make the error message reported in bug 220023 a bit more
intuitive for end-users that don't have access to the source code to
decode the procstat->type argument.
MFC after: 1 month
MFC with: r316286
PR: 220023
Without disabling interrupts it's possible for another thread to preempt
and update the registers post-read (tlb1_read_entry) or pre-write
(tlb1_write_entry), and confuse the kernel with mixed register states.
MFC after: 2 weeks
Only amd64 (because of i386) needs 32-bit time_t compat now, everything else is
64-bit time_t. Rather than checking on all 64-bit time_t archs, only check the
oddball amd64/i386.
Reviewed By: emaste, kib, andrew
Differential Revision: https://reviews.freebsd.org/D11364
RPI1-B, Alix and APU2 boards as well as NanoBSD with the following message:
vnode_pager_generic_getpages_done: I/O read error 5
Seems the breakage was because it was missed to include acr in glabel update.
Reported by: Peter Blok <pblok@bsd4all.org>,
madpilot, imp and trasz.
Reviewed by: trasz
Tested by: Peter Blok and madpilot.
MFC after: 3 days.
Sponsored by: iXsystems, Inc.
Differential Revision: https://reviews.freebsd.org/D11365