When using a kernel built with the GZIO config option, dumpon -z can be
used to configure gzip compression using the in-kernel copy of zlib.
This is useful on systems with large amounts of RAM, which require a
correspondingly large dump device. Recovery of compressed dumps is also
faster since fewer bytes need to be copied from the dump device.
Because we have no way of knowing the final size of a compressed dump
until it is written, the kernel will always attempt to dump when
compression is configured, regardless of the dump device size. If the
dump is aborted because we run out of space, an error is reported on
the console.
savecore(8) is modified to handle compressed dumps and save them to
vmcore.<index>.gz, as it does when given the -z option.
A new rc.conf variable, dumpon_flags, is added. Its value is added to
the boot-time dumpon(8) invocation that occurs when a dump device is
configured in rc.conf.
Reviewed by: cem (earlier version)
Discussed with: def, rgrimes
Relnotes: yes
Sponsored by: Dell EMC Isilon
Differential Revision: https://reviews.freebsd.org/D11723
In r322258 I made p1003_1b.aio_listio_max a tunable. However, further
investigation shows that there was never any good reason for that limit to
exist in the first place. It's used in two completely different ways:
* To size a UMA zone, which globally limits the number of concurrent
aio_suspend calls.
* To artifically limit the number of operations in a single lio_listio call.
There doesn't seem to be any memory allocation associated with this limit.
This change does two things:
* Properly names aio_suspend's UMA zone, and sizes it based on a new constant.
* Eliminates the artifical restriction on lio_listio. Instead, lio_listio
calls will now be limited by the more generous max_aio_queue_per_proc. The
old p1003_1b.aio_listio_max is now an alias for
vfs.aio.max_aio_queue_per_proc, so sysconf(3) will still work with
_SC_AIO_LISTIO_MAX.
Reported by: bde
Reviewed by: jhb
MFC after: 3 weeks
Sponsored by: Spectra Logic Corp
Differential Revision: https://reviews.freebsd.org/D12120
While here cache-align chains.
This shortens longest found chain during poudriere -j 80 from 32 to 16.
Pushing this higher up will probably require allocation on boot.
VBAD type.
FFS ffs_write() VOP catches such vnodes and panics, other VOPs do not
check for the type and their behaviour is really undefined. The
comment claims that this support was done for 'badsect' to flag bad
sectors, we do not have such facility in kernel anyway.
Reported by: Dmitry Vyukov <dvyukov@google.com>
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
- allocate value for new AT_HWCAP2 auxiliary vector on all platforms.
- expand 'struct sysentvec' by new 'u_long *sv_hwcap2', in exactly
same way as for AT_HWCAP.
MFC after: 1 month
Reviewed by: kib
Differential Revision: https://reviews.freebsd.org/D12699
The end of the loop must re-lookup the next buf since the bufobj lock
is dropped in the loop body. If the lookup fails, the loop is restarted.
This mechanism non-obviously also terminates the loop when the end of
the buf list is reached. Split up the two loops termination cases to
make the code a bit less fragile. No functional change intended.
Reviewed by: kib
MFC after: 1 week
Sponsored by: Dell EMC Isilon
Differential Revision: https://reviews.freebsd.org/D12730
1) shorten the fast path by pushing the lockstat probe to the slow path
2) test for kernel panic only after it turns out we will have to spin,
in particular test only after we know we are not recursing
MFC after: 1 week
The previous limit of just one page is hit by ps.
The entire mechanism should be reworked, if not whacked. It seems the intent
is to reduce kernel dos-ability - some handlers wire the amount of memory
passed here. Handlers should probably stop wiring in the first place or in
the worst case indicate they are doing so so that the check is done only if
necessary. It should also probably be a counter, not a lock.
MFC after: 1 week
Not only this lock doesn't play any role here, dirtying it slows down
other things a little bit as giant-held checks (e.g. DROP_GIANT) are
spread all over the kernel.
MFC after: 1 week
All of the kernel dump implementations keep track of the current offset
("dumplo") within the dump device. However, except for textdumps, they
all write the dump sequentially, so we can reduce code duplication by
having the MI code keep track of the current offset. The new
dump_append() API can be used to write at the current offset.
This is needed to implement support for kernel dump compression in the
MI kernel dump code.
Also simplify dump_encrypted_write() somewhat: use dump_write() instead
of duplicating its bounds checks, and get rid of the redundant offset
tracking.
Reviewed by: cem
Sponsored by: Dell EMC Isilon
Differential Revision: https://reviews.freebsd.org/D11722
mbpool existed to support NICs with memory interfaces and all remaining
comsumers were removed earlier this year with NATM.
Reviewed by: jhb
Sponsored by: DARPA, AFRL
Differential Revision: https://reviews.freebsd.org/D10513
MNT_VNODE_FOREACH_ALL() is supposed to avoid returning doomed vnodes,
but the VI_DOOMED check it used was done without the vnode interlock
held, so it could race with a concurrent vgone().
Submitted by: Don Morris <don.morris@isilon.com>
Reviewed by: kib, mckusick
MFC after: 1 week
Sponsored by: Dell EMC Isilon
Differential Revision: https://reviews.freebsd.org/D12704
- Fix it by replacing m_cat() with m_prev->m_next = m_new
(m_cat() will try to append data - as a result, there will be no
fragmentation).
- Move some constants out of the loop.
Was previously tested with D4077.
Differential Revision: https://reviews.freebsd.org/D4090
The stop drops process lock, which allows the signal mask to be
changed and our selected signal might become blocked, i.e. should be
returned to the process queue instead of delivery.
Also, for the existing check of the process no longer having an
attached debugger, we should not loose the signal, but requeue it.
Reported and tested by: bdrewery
Reviewed by: jhb
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Split two conditions into separate asserts. Print additional details,
like the signal number and action value.
Reviewed by: jhb
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
In r324542 I neglected to reset the first and last fields of struct
unrhdr. This causes a tmpfs to fail the unr(9) consistency checks with
DIAGNOSTIC on. Fix this by resetting the fields by calling init_unrhdr.
While here, change a loop to use TAILQ_FOREACH_SAFE since it is more
readable and equally fast.
Reported by: David Wolfskill <david@catwhisker.org>
Approved by: rstone (mentor)
Sponsored by: Dell EMC Isilon
/compat/linux/path before /path. Stop following symbolic links when
looking up /compat/linux/path so dead symbolic links aren't ignored.
This allows syscalls like readlink(2) and lstat(2) to work on such links.
And open(2) will return an error now instead of trying /path.
"optimization". First, sendfile(..., SF_NOCACHE) frees pages without
checking whether those pages are mapped. This can leave the system
with mappings to free or repurposed pages. Second, a page can be
busied between the time of the current busy test and acquiring the
object lock. Essentially, the test performed before the object lock
is acquired can only be regarded as an optimization to short-circuit
further work on the page. It cannot, however, be relied upon to prove
that it is safe to free the page. Third, when sendfile(..., SF_NOCACHE)
was originally implemented, vm_page_deactivate_noreuse() did not yet
exist. Use vm_page_deactivate_noreuse() instead of vm_page_deactivate(),
because it comes closer to freeing the page.
In collaboration with: glebius
Discussed with: gallatin, kib, markj
X-MFC after: r324448
The manipulations done by mountcheckdirs() are not that useful during
the unmount, they can bring about unexpected security consequences.
Thic change effectively reverts the change in r73241.
The change also allows to simplify the handling of rootvnode global
variable.
Discussed with: mckusick, mjg, kib
Reviewed by: trasz
MFC after: 1 month
Differential Revision: https://reviews.freebsd.org/D12366
posix_fallocate is logically equivalent to writing zero blocks to the
desired file size and there is no reason to prevent calling it in
capability mode. posix_fallocate already checked for the CAP_WRITE
right, so we merely need to list it in capabilities.conf.
Reviewed by: allanjude
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D12640
Previously before you could call unrhdr_delete you needed to
individually free every allocated unit. It is useful to be able to tear
down the unr without having to go through this process, as it is
significantly faster than freeing the individual units.
Reviewed by: cem, lidl
Approved by: rstone (mentor)
Sponsored by: Dell EMC Isilon
Differential Revision: https://reviews.freebsd.org/D12591
all cache the last system time (uptime + boottime). Only the format
differs. Do not re-calculate the bintime and simply use the value
used to calculate the microtime and nanotime.
Group all the updates under the relevant comment. Remove obsoleted
XXX part.
Submitted by: Sebastian Huber <sebastian.huber@embedded-brains.de>
MFC after: 1 week
Sendfile() should match the error checking order of send() which
is currently:
SBS_CANTSENDMORE
so_error
SS_ISCONNECTED
Submitted by: Jason Eggleston <jason@eggnet.com>
Reviewed by: glebius
MFC after: 2 weeks
Sponsored by: Limelight Networks
Differential Revision: https://reviews.freebsd.org/D12633
o Fall back to default m_ext free mech, using function pointer in
m_ext_free, and remove sf_ext_free() called directly from mbuf code.
Testing on modern CPUs showed no regression.
o Provide internally used flag EXT_FLAG_SYNC, to mark that I/O uses
SF_SYNC flag. Lack of the flag allows us not to dereference
ext_arg2, saving from a cache line miss.
o Create function sendfile_free_page() that later will be used, for
multi-page mbufs. For now compiler will inline it into
sendfile_free_mext().
In collaboration with: gallatin
Differential Revision: https://reviews.freebsd.org/D12615
in mb_free_ext() always use fields from the original refcount holding
mbuf (see. r296242) mbuf. Cuts another cache miss from mb_free_ext().
However, treat EXT_EXTREF mbufs differently, since they are different -
they don't have a refcount holding mbuf.
Provide longer comments in m_ext declaration to explain this change
and change from r296242.
In collaboration with: gallatin
Differential Revision: https://reviews.freebsd.org/D12615
All of these arguments are stored in m_ext, so there is no reason
to pass them in the argument list. Not all functions need the second
argument, some don't even need the first one. The second argument
lives in next cache line, so not dereferencing it is a performance
gain. This was discovered in sendfile(2), which will be covered by
next commits.
The second goal of this commit is to bring even more flexibility
to m_ext mbufs, allowing to create more fields in m_ext, opaque to
the generic mbuf code, and potentially set and dereferenced by
subsystems.
Reviewed by: gallatin, kbowling
Differential Revision: https://reviews.freebsd.org/D12615
properly dump all the sendqueues and not just the first one
History:
It appears that in the commit which introduced the code,
r165272, the array indexes of "sq_blocked[0]" and "td_name[i]"
were interchanged. In r180927 "td_name[i]" was corrected to
"td_name[0]", but "sq_blocked[0]" was left unchanged.
PR: 222624
Discussed with: kmacy @
MFC after: 1 week
Sponsored by: Mellanox Technologies
appearing only where the code explicitly set it, but since much of the
data was not initialized, '-1' appeared other places too, and led to
panics. Clear the allocated data before initializing nonzero values by
allocating with M_ZERO.
Submitted by: Doug Moore <dougm@rice.edu>
Reported by: Oleg V. Nauman <oleg@theweb.org.ua>, cy
Tested by: Oleg V. Nauman <oleg@theweb.org.ua>
MFC after: 1 week
X-MFC with: r324420
Differential Revision: https://reviews.freebsd.org/D12627
nodes to allocate for the blist, and to initialize them. The computation
can be done much more quickly by identifying the terminating node, if any,
at every level of the tree and then summing the number of nodes at each
level that precedes the topmost terminator. The initialization can also be
done quickly, since settings at the root mark the tree as all-allocated, and
only a few terminator nodes need to be marked in the rest of the tree.
Eliminate blst_radix_init, and perform its two functions more simply in
blist_create.
The allocation of the blist takes places in two pieces, but there's no good
reason to do so, when a single allocation is sufficient, and simpler.
Allocate the blist struct, and the array of nodes associated with it, with a
single allocation.
Submitted by: Doug Moore <dougm@rice.edu>
Reviewed by: markj (an earlier version)
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D11968
The detach case is slightly complicated by the fact that some in-kernel
consumers may want to know before a device detaches (so they can release
related resources, stop using the device, etc), but the detach can fail. So
there are pre- and post-detach notifications for those consumers who need to
handle all cases.
A couple salient comments from the review, they amount to some helpful
documentation about these events, but there's currently no good place for
such documentation...
Note that in the current newbus locking model, DETACH_BEGIN and
DETACH_COMPLETE/FAILED sequence of event handler invocation might interweave
with other attach/detach events arbitrarily. The handlers should be prepared
for such situations.
Also should note that detach may be called after the parent bus knows the
hardware has left the building. In-kernel consumers have to be prepared to
cope with this race.
Differential Revision: https://reviews.freebsd.org/D12557
When the EVENTHANDLER(9) subsystem was created, it was a documented feature
that an eventhandler callback function could safely deregister itself. In
r200652 that feature was inadvertantly broken by adding drain-wait logic to
eventhandler_deregister(), so that it would be safe to unload a module upon
return from deregistering its event handlers.
There are now 145 callers of EVENTHANDLER_DEREGISTER(), and it's likely many
of them are depending on the drain-wait logic that has been in place for 8
years. So instead of creating a separate eventhandler_drain() and adding it
to some or all of those 145 call sites, this creates a separate
eventhandler_drain_nowait() function for the specific purpose of
deregistering a callback from within the running callback.
Differential Revision: https://reviews.freebsd.org/D12561
connection was reset by the remote end, sendfile() would just report
ENOTCONN instead of ECONNRESET.
Submitted by: Jason Eggleston <jason@eggnet.com>
Reviewed by: glebius
Sponsored by: Limelight Networks
Differential Revision: https://reviews.freebsd.org/D12575
Lookups of the sort are rare compared to regular ones and succesfull ones
result in removing entries from the cache.
In the current code buckets are rlocked and a trylock dance is performed,
which can fail and cause a restart. Fixing it will require a little bit
of surgery and in order to keep the code maintaineable the 2 cases have
to split.
MFC after: 1 week