In r286570 (MFV of r277426) an unprotected write to b_flags to
set the compression mode was introduced. This would open a race
window where data is partially decompressed, modified, checksummed
and written to the pool, resulting in pool corruption due to the
partial decompression.
Prevent this by reintroducing b_compress
illumos/illumos-gate@d4cd038c92
Illumos issues:
6214 zpools going south
https://www.illumos.org/issues/6214
Rewrite the ZFS prefetch code to detect only forward, sequential
streams.
The following kstats have been added:
kstat.zfs.misc.arcstats.sync_wait_for_async
How many sync reads have waited for async read
to complete. (less is better)
kstat.zfs.misc.arcstats.demand_hit_predictive_prefetch
How many demand read didn't have to wait for I/O
because of predictive prefetch. (more is better)
zfetch kstats have been similified to hits, misses, and max_streams,
with max_streams representing times when we were not able to create
new stream because we already have the maximum number of sequences
for a file.
The sysctl variable/loader tunable vfs.zfs.zfetch.block_cap have been
replaced by vfs.zfs.zfetch.max_distance, which controls maximum bytes
to prefetch per stream.
illumos/illumos-gate@cf6106c8a0
Illumos ZFS issues:
5987 zfs prefetch code needs work
https://www.illumos.org/issues/5987
The pci bus driver handles the power state, it also manages
configuration state saving and restoring for its child devices. Thus a
PCI device driver does not have to worry about those things. In fact, I
observe a hard system hang when trying to suspend a system with active
radeonkms driver where both the bus driver and radeonkms driver try to
do the same thing. I suspect that it could be because of an access to a
PCI configuration register after the device is placed into D3 state.
Reviewed by: dumbbell, jhb
MFC after: 13 days
Differential Revision: https://reviews.freebsd.org/D3561
All requests arriving for processing after OFFLINE flag set are rejected
with BUSY status. Races around OFFLINE flag setting are closed by calling
taskqueue_drain_all().
in the routine, which queues an ERROR chunk, instead on relyinh
on the callers to do so. Since one caller missed this, this actially
fixes a bug.
MFC after: 1 week
running thread.
It is currently implemented only on amd64 and i386; on these
architectures, it is implemented by raising an NMI on the CPU on which
the target thread is currently running. Unlike stack_save_td(), it may
fail, for example if the thread is running in user mode.
This change also modifies the kern.proc.kstack sysctl to use this function,
so that stacks of running threads are shown in the output of "procstat -kk".
This is handy for debugging threads that are stuck in a busy loop.
Reviewed by: bdrewery, jhb, kib
Sponsored by: EMC / Isilon Storage Division
Differential Revision: https://reviews.freebsd.org/D3256
since on amd64 the first argument to a function is generally not on the
stack.
Revert an old DTrace bug fix to some code that assumed that
sizeof(struct amd64_frame) == 16.
Reviewed by: jhb, kib
Sponsored by: EMC / Isilon Storage Division
Differential Revision: https://reviews.freebsd.org/D3255
5930 fasttrap_pid_enable() panics when prfind() fails in forking process
Reviewed by: Adam Leventhal <ahl@delphix.com>
Reviewed by: Gordon Ross <gordon.ross@nexenta.com>
Approved by: Richard Lowe <richlowe@richlowe.net>
Author: Bryan Cantrill <bryan@joyent.com>
illumos/illumos-gate@9df7e4e12e
The only operation which is prevented by the hold is the kernel stack
swapout for the faulted thread, which should be fine to allow.
Remove useless checks for NULL curproc or curproc->p_vmspace from the
trap_pfault() wrappers on x86 and powerpc.
Reviewed by: alc (previous version)
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks
CTL HA functionality was originally implemented by Copan many years ago,
but large part of the sources was never published. This change includes
clean room implementation of the missing code and fixes for many bugs.
This code supports dual-node HA with ALUA in four modes:
- Active/Unavailable without interlink between nodes;
- Active/Standby with second node handling only basic LUN discovery and
reservation, synchronizing with the first node through the interlink;
- Active/Active with both nodes processing commands and accessing the
backing storage, synchronizing with the first node through the interlink;
- Active/Active with second node working as proxy, transfering all
commands to the first node for execution through the interlink.
Unlike original Copan's implementation, depending on specific hardware,
this code uses simple custom TCP-based protocol for interlink. It has
no authentication, so it should never be enabled on public interfaces.
The code may still need some polishing, but generally it is functional.
Relnotes: yes
Sponsored by: iXsystems, Inc.
Auto-tuning threshold discussions aside, it turns out that if you want
to lower this on say, rather memory-packed machines, you either set maxusers
or kern.maxfiles, or you set it in sysctl. The former is a non-exact
way to tune this; the latter doesn't actually affect anything in the
startup scripts.
This first occured because I wondered why the hell screen would take upwards
of 10 seconds to spawn a new screen. I then found python doing the same
thing during fork/exec of child processes - it calls close() on each FD
up to the current openfiles limit. On a 1TB machine this is like, 26 million
FDs per process. Ugh.
So:
* This allows it to be set early in /boot/loader.conf;
* It can be used to work around the ridiculous situation of
screen, python, etc doing a close() on potentially millions of FDs
even though you only have four open.
Tested:
* 4GB, 32GB, 64GB, 128GB, 384GB, 1TB systems with autotune, ensuring
screen and python forking doesn't result in some pretty hilariously
bad behaviour.
TODO:
* Note that the default login.conf sets openfiles-cur to unlimited,
effectively obeying kern.maxfilesperproc. Perhaps we should fix
this.
* .. and even if we do, we need to also ensure that daemons get
a soft limit of something reasonable and capped - they can request
more FDs themselves.
MFC after: 1 week
Sponsored by: Norse Corp, Inc.
named node, open(2) cannot create directories. But do allow the flag
combination to succeed if the directory already exists.
Declare the open("name", O_DIRECTORY | O_CREAT | O_EXCL) always
invalid for the same reason, since open(2) cannot create directory.
Note that there is an argument that O_DIRECTORY | O_CREAT should be
invalid always, regardless of the target directory existence or
O_EXCL. The current fix is conservative and allows the call to
succeed in the situation where it succeeded before the patch.
Reported by: Tom Ridge <freebsd@tom-ridge.com>
Reviewed by: rwatson
PR: 202892
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
* Fail when the length passed in is 0
* Remove an unneeded increment of the count on success
* Return ENAMETOOLONG when the input pointer is too long
Sponsored by: ABT Systems Ltd
mapped address without valid pte installed, when parallel wiring of
the entry happen. The entry must be copy on write. If entry is COW
but was already copied, and parallel wiring set
MAP_ENTRY_IN_TRANSITION, vm_fault() would sleep waiting for the
MAP_ENTRY_IN_TRANSITION flag to clear. After that, the fault handler
is restarted and vm_map_lookup() or vm_map_lookup_locked() trip over
the check. Note that this is race, if the address is accessed after
the wiring is done, the entry does not fault at all.
There is no reason in the current kernel to disallow write access to
the COW wired entry if the entry permissions allow it. Initially this
was done in r24666, since that kernel did not supported proper
copy-on-write for wired text, which was fixed in r199869. The r251901
revision re-introduced the r24666 fix for the current VM.
Note that write access must clear MAP_ENTRY_NEEDS_COPY entry flag by
performing COW. In reverse, when MAP_ENTRY_NEEDS_COPY is set in
vmspace_fork(), the MAP_ENTRY_USER_WIRED flag is cleared. Put the
assert stating the invariant, instead of returning the error.
Reported and debugging help by: peter
Reviewed by: alc
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
locking and doesn't sleep. Flag the consumer we create as such. In
addition, decrement the in flight index when we have an out of memory
error after having incremented it previously. This would have
prevented swapoff from working if the swap pager ever hit a resource
shortage trying to swap out something (the swap in path always waits
for a bio, so won't have this issue). Simplify the close logic by
abandoning the use of private and initializing the index to 1 and
dropping that reference when we previously set private.
Also, set sw_id only while sw_dev_mtx is held. This should only affect
swapping to a vnode, as opposed to a geom whose close always sets it to
NULL with sw_dev_mtx held.
Differential Review: https://reviews.freebsd.org/D3547
vendor supplied device trees contain the needed properties for us to select
the correct uart to use as the kernel console.
An example of this would be to add the following to loader.conf.
hw.fdt.console="/smb/uart@f7113000"
The intention of this is slightly different than the existing
hw.uart.console option. The new option will mean the boot serial
configuration will be derived from the device node, while the existing
option expects the user to configure all this themselves.
Further work is planned to allow the uart configuration to be set based on
the stdout-path property devicetree bindings.
Sponsored by: ABT Systems Ltd
Differential Revision: https://reviews.freebsd.org/D3559
Certain VM guest types (VMware, Xen) do not support MSI, so pci_alloc_msix()
always fails. isci(4) was not properly detecting the allocation failure,
and would try to proceed with MSIx resource initialization rather than
reverting to INTx.
Reported and tested by: Bradley W. Dutton (brad-fbsd-stable@duttonbros.com)
MFC after: 3 days
Sponsored by: Intel
BIOS always enables PCI busmaster on the isci device, which effectively
worked around this omission. But when passing the isci device through
to a guest VM, the hypervisor will disable busmaster and isci will not
work without calling pci_enable_busmaster().
MFC after: 3 days
Sponsored by: Intel
RANDOM_LOADABLE and RANDOM_YARROW's definitions from opt_random.h to
opt_global.h
This unbreaks `make depend` in sys/modules with multiple drivers (tmpfs, etc)
after r286839
X-MFC with: r286839
Reviewed by: imp
Submitted by: lwhsu
Differential Revision: D3486
so that there is only one place where pages are freed and only one place
where pages are moved to the tail of the queue.
Reviewed by: kib
Sponsored by: EMC / Isilon Storage Division
This is a subtle use-after-free race that results in some very undesirable
hang behaviour.
Reviewed by: pkelsey
Obtained from: Kip Macy, NextBSD (91a9bd1dbb)
The filedesc lock is only needed if ioctls caps are present, which is a
rare situation. This is a step towards reducing the scope of the filedesc
lock.
no option but to use the smbios information to fill in the blanks.
It's a good thing UGA is a protocol of the past and GOP has all the
info we need.
Anyway, the logic has been tweaked a little to get the easier bits
of information up front. This includes the resolution and the frame
buffer address. Then we look at the smbios information and define
expected values as well as the missing bits (frame buffer offset and
stride). If the values obtained match the expect values, we fill in
the blanks and return. Otherwise we use the existing detection logic
to figure it out.
Rename the environment variables from uga_framebuffer abd uga_stride
to hw.efifb.address and hw.efifb.stride. The latter names are more
in line with other variable names.
We currently have hardcoded settings for:
1. Mid-2007 iMac (iMac7,1)
2. Late-2007 MacBook (MacBook3,1)
store should have release semantics and will have due to the dsb above it
so add a comment to explain this. [1]
While here update the code to not reload the current thread, it's already
in a register, we just need to not trash it.
Suggested by: kib [1]
Sponsored by: ABT Systems Ltd
because the RSS hash may need to be recalculated.
Submitted by: Tiwei Bie <btw@mail.ustc.edu.cn>
Differential Revision: https://reviews.freebsd.org/D3564
This is preparation for possibility to open/close media several times
per LUN life cycle. While there, rename variables to reduce confusion.
As additional bonus this allows to open read-only media, such as ZFS
snapshots.
the size of the name cache hash table (mapping file names to vnodes)
and the vnode hash table (mapping mount point and inode number to vnode).
An appropriate locking strategy is the key to changing hash table sizes
while they are in active use.
Reviewed by: kib
Tested by: Peter Holm
Differential Revision: https://reviews.freebsd.org/D2265
MFC after: 2 weeks
striking a delicate balance between exhaustive searching and
banking on assumptions. The environment variables can be used
as a fall-back anyway. With this change, all known and tested
Macs with only UGA should have a working console out of the
box... for now...
pages will have left the inactive queue before the page daemon performs
its next scan. Also, ignore references to pages from terminated objects.
This allows the clean pages to be freed a little sooner.
Move some comments to their proper place, i.e., next to the code that
they describe, and update other nearby comments.
Reviewed by: kib
Sponsored by: EMC / Isilon Storage Division
hardware to perform address translation for us. These are useful to help
track down what caused us to enter the debugger.
Sponsored by: ABT Systems Ltd
o Unlike xor, in Jenkins hash every bit of input affects virtually
every bit of output, thus salting the hash actually works. With
xor salting only provides a false sense of security, since if
hash(x) collides with hash(y), then of course, hash(x) ^ salt
would also collide with hash(y) ^ salt. [1]
o Jenkins provides much better distribution than xor, very close to
ideal.
TCP connection setup/teardown benchmark has shown a 10% increase
with default hash size, and with bigger hashes that still provide
possibility for collisions. With enormous hash size, when dataset is
by an order of magnitude smaller than hash size, the benchmark has
shown 4% decrease in performance decrease, which is expected and
acceptable.
Noticed by: Jeffrey Knockel <jeffk cs.unm.edu> [1]
Benchmarks by: jch
Reviewed by: jch, pkelsey, delphij
Security: strengthens protection against hash collision DoS
Sponsored by: Nginx, Inc.
Some places in our network stack already have const
arguments (like if_output() routines and LLE functions).
Code using ifa_ifwith (and similar functins) along with
LLE/_output functions is currently bound to use tricks
like __DECONST(). Provide a cleaner way by making sockaddr
lookup key really constant.
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D3464
in the frame buffer when we flip pixels. Allow the detection
to be bypassed by setting the uga_framebuffer and uga_stride
variables. The kernel console works fine even when we can't
detect pixel changes in the frame buffer, which indicates
that the problem could be with reading from the frame buffer
and not writing to it.
nlge(4) is supposed to deprecate rge(4) for Broadcom XLR when it was
introduced 5 years ago.
rge doesn't build on -CURRENT due to MII changes. All the XLR kernel confs
use nlge. Let's get rid of the old driver for FreeBSD 11. We can use
10-STABLE or SVN to go back and look at the old driver if needed.
Differential Revision: https://reviews.freebsd.org/D3339
Submitted by: kevin.bowling@kev009.com
Gleaned from a public header file. 5402 and 5404 look like they may be
used on embedded devices. 5478 and 5488 are switch PHYs. 5754 change is just
to note a product alias.
Differential Revision: https://reviews.freebsd.org/D3338
Submitted by: kevin.bowling@kev009.com
Coredump notes depend on being able to invoke dump routines twice; once
in a dry-run mode to get the size of the note, and another to actually
emit the note to the corefile.
When a note helper emits a different length section the second time
around than the length it requested the first time, the kernel produces
a corrupt coredump.
NT_PROCSTAT_FILES output length, when packing kinfo structs, is tied to
the length of filenames corresponding to vnodes in the process' fd table
via vn_fullpath. As vnodes may move around during dump, this is racy.
So:
- Detect badly behaved notes in putnote() and pad underfilled notes.
- Add a fail point, debug.fail_point.fill_kinfo_vnode__random_path to
exercise the NT_PROCSTAT_FILES corruption. It simply picks random
lengths to expand or truncate paths to in fo_fill_kinfo_vnode().
- Add a sysctl, kern.coredump_pack_fileinfo, to allow users to
disable kinfo packing for PROCSTAT_FILES notes. This should avoid
both FILES note corruption and truncation, even if filenames change,
at the cost of about 1 kiB in padding bloat per open fd. Document
the new sysctl in core.5.
- Fix note_procstat_files to self-limit in the 2nd pass. Since
sometimes this will result in a short write, pad up to our advertised
size. This addresses note corruption, at the risk of sometimes
truncating the last several fd info entries.
- Fix NT_PROCSTAT_FILES consumers libutil and libprocstat to grok the
zero padding.
With suggestions from: bjk, jhb, kib, wblock
Approved by: markj (mentor)
Relnotes: yes
Sponsored by: EMC / Isilon Storage Division
Differential Revision: https://reviews.freebsd.org/D3548
operations that map a single page that has an associated vm_page_t.
This does not permit mapping larger regions (such as a PCI memory
BAR) and it does not permit mapping addresses beyond the top of RAM
(such as a 64-bit BAR located above the top of RAM).
Instead of using a single OBJT_DEVICE object and passing the physaddr via
the offset as a hack, create a new sglist and OBJT_SG object for each
mmap request. The requested memory attribute is applied to the object
thus affecting all pages mapped by the request.
Reviewed by: hselasky, np
MFC after: 1 week
Sponsored by: Chelsio
Differential Revision: https://reviews.freebsd.org/D3386
delete a logic volume on status change which is NOT what we want here.
The original code is correct in that when the volume changes status
the driver will only delete the volume if the status is one of the
fatal errors. A drive failure in a mirrored volume is NOT a situtation
where the volume should dissapear.
Reported on freebsd-scsi@:
https://lists.freebsd.org/pipermail/freebsd-scsi/2015-September/006800.html
MFC after: 3 days
PCI BARs does not necessarily correspond to the upper-left
most pixel. Scan the frame buffer for which byte changed
when changing the pixel at (0,0).
Use the same technique to determine the stride. Except for
changing the pixel at (0,0), we change the pixel at (0,1).
PR: 202730
Tested by: hartzell (at) alerce.com
so they were disabled during DTS transition. Though there are
no standard devices/drivers on them people might use iic(4) userland
interface to access these buses.
Originally it was added in order to prevent trashing of objects with
INVARIANTS enabled. The same effect is now provided with mere UMA_ZONE_NOFREE.
This reverts r286921.
Discussed with: kib
Objects obtained from such zones are supposed to retain type stability,
which was violated by aforementioned trashing.
This is a follow-up to r284861.
Discussed with: kib
broke in two ways. One, the pacing variable was accessed in multiple
threads in an unsafe way. Two, since large numbers of I/O could come
down from the buf layer at one time, large numbers of allocation
failures could happen all at once, resulting in a huge pace value that
would limit I/Os to 10 IOPS for minutes (or even hours) at a
time. While a real solution to these problems requires substantial
work (to go to a no-allocation after the first model, or to have some
way to wait for more memory with some kind of reserve for pager and
swapper requests), it is relatively easy to make this simplistic
pacing less pathological.
Move to using a volatile variable with loads and stores. While this is
a little racy, losing the race is safe: either you get memory and
proceed, or you don't and queue. Second, sleep for 1ms (or one tick, whichever
is larger) instead of 100ms. This removes the artificial 10 IOPS limit
while still easing up on new I/Os during memory shortages. Remove
tying the amount of time we do this to the number of failed requests
and do it only as long as we keep failing requests.
Finally, to avoid needless recursion when memory is tight (start ->
g_io_deliver() -> g_io_request() -> start -> ... until we use 1/2 the
stack), don't do direct dispatch while pacing. This should be a rare
event (not steady state) so the performance hit here is worth the
extra safety of not starving g_down() with directly dispatched I/O.
Differential Review: https://reviews.freebsd.org/D3546
Resetting some generations of the I/OAT hardware (just BDXDE for now)
resets the corresponding MSI-X registers. So, teardown and
re-initialize interrupts after resetting the hardware.
Reviewed by: jimharris
Approved by: markj (mentor)
Sponsored by: EMC / Isilon Storage Division
Differential Revision: https://reviews.freebsd.org/D3549
and exit events. procfs stop events for system call tracing report these
values (argument count for system call entry and code for system call exit),
but ptrace() does not provide this information. (Note that while the system
call code can be determined in an ABI-specific manner during system call
entry, it is not generally available during system call exit.)
The values are exported via new fields at the end of struct ptrace_lwpinfo
available via PT_LWPINFO.
Reviewed by: kib
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D3536
If net.link.bridge.pfil_bridge is set we can end up thinking we're forwarding in
pf_test6() because the rcvif and the ifp (output interface) are different.
In that case we're bridging though, and the rcvif the the bridge member on which
the packet was received and ifp is the bridge itself.
If we'd set dir to PF_FWD we'd end up calling ip6_forward() which is incorrect.
Instead check if the rcvif is a member of the ifp bridge. (In other words, the
if_bridge is the ifp's softc). If that's the case we're not forwarding but
bridging.
PR: 202351
Reviewed by: eri
Differential Revision: https://reviews.freebsd.org/D3534
SoC is used in the HiKey board from 96boards.
Currently on the SD card is working on the HiKey, as such devices 0 and 2
will need to be disabled, for example by adding the following to
loader.conf:
hint.hisi_dwmmc.0.disabled=1
hint.hisi_dwmmc.2.disabled=1
Relnotes: yes (Hikey board booting)
Sponsored by: ABT Systems Ltd
particular, this invalidates the knote kn_link linkage, making the
SLIST_FOREACH() loop accessing undefined values (e.g. trashed by
QUEUE_MACRO_DEBUG). If the knote is freed by other thread when kq
lock is released or when influx is cleared, e.g. by knote_scan() for
kqueue owning the knote, the iteration step would access freed memory.
Use SLIST_FOREACH_SAFE() to fix iteration.
Diagnosed by: avg
Tested by: avg, lstewart, pawel
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks
Explain why it is fine to not check for M_NOWAIT failures in
kqueue_register(). Remove unneeded check for NULL result from
waitable allocation in kqueue_scan(). uma_free(9) handles NULL
argument correctly, remove checks for NULL. Remove useless cast and
adjust style in knote_alloc().
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks
of the D_NEWBLK kinds of dependencies (i.e. D_ALLOCDIRECT and
D_ALLOCINDIR), which can exhaust kmem.
Handle excess of D_NEWBLK in the same way as excess of D_INODEDEP and
D_DIRREM, by scheduling ast to flush dependencies, after the thread,
which created new dep, left the VFS/FFS innards. For D_NEWBLK, the
only way to get rid of them is to do full sync, since items are
attached to data blocks of arbitrary vnodes. The check for D_NEWBLK
excess in softdep_ast_cleanup_proc() is unlocked.
For 32bit arches, reduce the total amount of allowed dependencies by
two. It could be considered increasing the limit for 64 bit platforms
with direct maps.
Reported and tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks
The typo was introduced in r278469 / 344ecf88af.
As a result of the bug there was a timing window where callout_reset()
would fail to cancel a concurrent execution of a callout that is about
to start and would schedule the callout again.
The callout would fire more times than it is scheduled.
That would happen even if the callout is initialized with a lock.
For example, the bug triggered the "Stray timeout" assertion in
taskqueue_timeout_func().
MFC after: 5 days
This makes it possible to analyze the performance of the new ZFS
write throttle with dtrace
PR: 200316
Submitted by: Lacey Powers <lacey.leanne@gmail.com>
Reviewed by: avg, smh, delphij (no objection)
Approved by: bapt (mentor)
MFC after: 1 month
Sponsored by: ScaleEngine Inc.
Differential Revision: https://reviews.freebsd.org/D3472
command called 'uga' to show whether UGA is implemented by the
firmware and what the settings are. It also includes filling
the efi_fb structure from the UGA information when GOP isn't
implemented by the firmware.
Since UGA does not provide information about the stride, we
set the stride to the horizontal resolution. This is likely
not correct and we should determine the stride by trial and
error. For now, this should show something on the console
rather than nothing.
Refactor this file to maximize code reuse.
PR: 202730
pins, they specify the bank and the pin in two separated cells.
This allow the use of vendor's DTS definitions by adding a gpio map
routine that copes with that.
scheduler types. It was intended to be used there, compare with the
min value, and with the test for correctness in ksched_setscheduler().
Note that P1B_PRIO_MAX and RTP_PRIO_MAX do have the same numerical
values, the change is cosmetical.
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
kernel configuration to A20.
There are other boards (namely the banana pi) that use exactly the same
devices.
Additionally, we are moving from static FDT support (DTB compiled
in-kernel) to DTB passed to kernel by the boot loader (ubldr). The u-boot
for these boards are already available on ports and as the crochet support
for these boards isn't committed yet, this should not bring any issues.
Discussed with: ian
it helps only the TCP timers callout(9) usage. As the benefit for
others callout(9) usages did not reach a consensus the historical
usage should prevail.
Differential Revision: https://reviews.freebsd.org/D3078
workaround for a callout(9) issue, it turns out it is instead the right
way to use callout in mpsafe mode without using callout_drain().
r284245 commit message:
Fix a callout race condition introduced in TCP timers callouts with r281599.
In TCP timer context, it is not enough to check callout_stop() return value
to decide if a callout is still running or not, previous callout_reset()
return values have also to be checked.
Differential Revision: https://reviews.freebsd.org/D2763
command has the following sub-commands:
list - list all possible modes (paged)
get - return the current mode
set <mode> - set the current mode to <mode>
Previously such LUNs were silently ignored. But while they indeed unable
to process most of SCSI commands, some, like RTPG, they still can.
MFC after: 1 month
r286951 by reinstating changes in r274628.
In l2arc_compress_buf(), we allocate a buffer to stash away the compressed
data in 'cdata', allocated of l2hdr->b_asize bytes.
We then ask zio_compress_data() to compress the buffer, b_l1hdr.b_tmp_cdata,
which is of l2hdr->b_asize bytes, and have the compressed size (or original
size, if compress didn't gain enough) stored in csize.
To pad the buffer to fit the optimal write size, we round up the compressed
size to L2 device's vdev_ashift.
Illumos code rounds up the size by at most SPA_MINBLOCKSIZE. Because we
know csize <= b_asize, and b_asize is integer multiple of SPA_MINBLOCKSIZE,
we are guaranteed that the rounded up csize would be <= b_asize. However,
this is not necessarily true when we round up to 1 << vdev_ashift, because
it could be larger than SPA_MINBLOCKSIZE.
So, in the worst case scenario, we are overwriting at most
(1 << vdev_ashift - SPA_MINBLOCKSIZE)
bytes of memory next to the compressed data buffer.
Andriy's original change in r274628 reorganized the code a little bit,
by moving the padding to after we determined that the compression was
beneficial. At which point, we would check rounded size against the
allocated buffer size, and the buffer overrun would not be possible.
as parent. In the case of a send or receive, the curproc would be the
userland application that issues the ioctl. This would trigger an assertion
failure introduced in Solaris compatibility shims in r196458 when kernel is
compiled with INVARIANTS.
Fix this by using p0 (proc0 or kernel) as the parent thread when creating
the kernel threads.
This mirrors the basic IPv4 implementation - IPv6 packets under RSS
now are checked for a correct RSS hash and if one isn't provided,
it's done in software.
This only handles the initial receive - it doesn't yet handle
reinjecting / rehashing packets after being decapsulated from
various tunneling setups. That'll come in some follow-up work.
For non-RSS users, this is almost a giant no-op.
It does change a couple of ipv6 methods to use const mbuf * instead of
mbuf * but it doesn't have any functional changes.
So, the following now occurs:
* If the NIC doesn't do any RSS hashing, it's all done in software.
Single-queue, non-RSS NICs will now have the RX path distributed
into multiple receive netisr queues.
* If the NIC provides the wrong hash (eg only IPv6 hash when we needed
an IPv6 TCP hash, or IPv6 UDP hash when we expected IPv6 hash)
then the hash is recalculated.
* .. if the hash is recalculated, it'll end up being injected into
the correct netisr queue for v6 processing.
Submitted by: Tiwei Bie <btw@mail.ustc.edu.cn>
Differential Revision: https://reviews.freebsd.org/D3504
sign on every directory exported via NFSv4 with NFSv4 ACLs enabled.
Reviewed by: rmacklem@
MFC after: 1 month
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D3502
Being clang the default compiler, we were always giving precedence to
the __has_attribute check. Unfortunately clang generally doesn't support
the new attributes (alloc_size was briefly supported and then reverted)
so we were always doing both checks. Give the precedence to GCC as that is
the working case now.
Do the same for __has_builtin() for consistency.
As part of this, clean up tlb1_init(), since bootinfo is always NULL here just
eliminate the loop altogether.
Also, fix a bug in mmu_booke_mapdev_attr() where it's possible to map a larger
immediately following a smaller page, causing the mappings to overlap. Instead,
break up the new mapping into smaller chunks. The downside to this is that it
uses more precious TLB1 entries, which, on smaller chips (e500v2) it could cause
problems with TLB1 being out of space (e500v2 only has 16 TLB1 entries).
Obtained from: Semihalf (partial)
Sponsored by: Alex Perez/Inertial Computing
in Marvell terms) to 32768. 32768 looks overkill but it will
ensure correct DMAed update. This change addresses occasional
watchdog timeouts reported on 10.2-RELEASE.
Tested by: Johann Hugo <jhugo@meraka.csir.co.za>
MFC after: 2 weeks
This was added in r51337 as part of the implementation of
madvise(MADV_DONTNEED). Its objective was to ensure that the page daemon
would eventually reclaim other unreferenced pages (i.e., unreferenced pages
not touched by madvise()) from the active queue.
Now that the pagedaemon performs steady scanning of the active page queue,
this weighted handling is unnecessary. Instead, always "cache" clean pages
by moving them to the head of the inactive page queue. This simplifies the
implementation of vm_page_advise() and eliminates the fragmentation that
resulted from the distribution of pages among multiple queues.
Suggested by: alc
Reviewed by: alc
Sponsored by: EMC / Isilon Storage Division
Differential Revision: https://reviews.freebsd.org/D3401
Go ahead and defined -D_STANDALONE for all targets (only strictly
needed for some architecture, but harmless on those it isn't required
for). Also add -msoft-float to all architectures uniformly rather
that higgley piggley like it is today.
Differential Revision: https://reviews.freebsd.org/D3496
only gpiobus configured via FDT is supported. Bus enumeration is
supported. Devices are created for each device found. 1-Wire
temperature controllers are supported, but other drivers could be
written. Temperatures are polled and reported via a sysctl. Errors
are reported via sysctl counters. Mis-wired bus detection is included
for more trouble shooting. See ow(4), owc(4) and ow_temp(4) for
details of what's supported and known issues.
This has been tested on Raspberry Pi-B, Pi2 and Beagle Bone Black
with up to 7 devices.
Differential Revision: https://reviews.freebsd.org/D2956
Relnotes: yes
MFC after: 2 weeks
Reviewed by: loos@ (with many insightful comments)
The crop/drop-ovl fragment scrub modes are not very useful and likely to confuse
users into making poor choices.
It's also a fairly large amount of complex code, so just remove the support
altogether.
Users who have 'scrub fragment crop|drop-ovl' in their pf configuration will be
implicitly converted to 'scrub fragment reassemble'.
Reviewed by: gnn, eri
Relnotes: yes
Differential Revision: https://reviews.freebsd.org/D3466
do not do what one would expect by name. Prefix them with "udp_"
to at least obviously limit the scope.
This is a non-functional change.
Reviewed by: gnn, rwatson
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D3505
To make it easier to understand how Capsicum interacts with linkat() and
renameat(), rename the rights to CAP_{LINK,RENAME}AT_{SOURCE,TARGET}.
This also addresses a shortcoming in Capsicum, where it isn't possible
to disable linking to files stored in a directory. Creating hardlinks
essentially makes it possible to access files with additional rights.
Reviewed by: rwatson, wblock
Differential Revision: https://reviews.freebsd.org/D3411
The __alloc_size and __alloc_align need to be defined to
nothingness for lint, but the existing check is deficient
and allows attributes with working __has_attrubute() to
slip through.
connectivity interact with the net80211 stack.
Historical background: originally wireless devices created an interface,
just like Ethernet devices do. Name of an interface matched the name of
the driver that created. Later, wlan(4) layer was introduced, and the
wlanX interfaces become the actual interface, leaving original ones as
"a parent interface" of wlanX. Kernelwise, the KPI between net80211 layer
and a driver became a mix of methods that pass a pointer to struct ifnet
as identifier and methods that pass pointer to struct ieee80211com. From
user point of view, the parent interface just hangs on in the ifconfig
list, and user can't do anything useful with it.
Now, the struct ifnet goes away. The struct ieee80211com is the only
KPI between a device driver and net80211. Details:
- The struct ieee80211com is embedded into drivers softc.
- Packets are sent via new ic_transmit method, which is very much like
the previous if_transmit.
- Bringing parent up/down is done via new ic_parent method, which notifies
driver about any changes: number of wlan(4) interfaces, number of them
in promisc or allmulti state.
- Device specific ioctls (if any) are received on new ic_ioctl method.
- Packets/errors accounting are done by the stack. In certain cases, when
driver experiences errors and can not attribute them to any specific
interface, driver updates ic_oerrors or ic_ierrors counters.
Details on interface configuration with new world order:
- A sequence of commands needed to bring up wireless DOESN"T change.
- /etc/rc.conf parameters DON'T change.
- List of devices that can be used to create wlan(4) interfaces is
now provided by net.wlan.devices sysctl.
Most drivers in this change were converted by me, except of wpi(4),
that was done by Andriy Voskoboinyk. Big thanks to Kevin Lo for testing
changes to at least 8 drivers. Thanks to pluknet@, Oliver Hartmann,
Olivier Cochard, gjb@, mmoll@, op@ and lev@, who also participated in
testing.
Reviewed by: adrian
Sponsored by: Netflix
Sponsored by: Nginx, Inc.
being serviced return 0 (fail) but it is applicable only
mpsafe callouts. Thanks to hselasky for finding this.
Differential Revision: https://reviews.freebsd.org/D3078 (Updated)
Submitted by: hselasky
Reviewed by: jch
was invalid. Don't trigger a mount failure (which by default means
a panic), but instead just move on to the next directive in the
configuration. This typically has us ask for the root mount.
PR: 163245
While here update the list of devices id to match the one in linux 3.8.13
Reviewed by: dumbbell
Differential Revision: https://reviews.freebsd.org/D3489
Keep a couple of old macros that will be removed lated when the rest of the code
will be updated to 3.8.13 equivalent.
Chase the renamed macros
Reviewed by: dumbbell
Differential Revision: https://reviews.freebsd.org/D3487
its_cmd_send() can be called by multiple CPUs simultaneously.
After the command is pushed to ITS command ring the completion
status is polled using global pointer to the next free ring slot.
Use copied pointer and provide correct locking to avoid spurious
pointer value when concurrent access occurs.
Obtained from: Semihalf
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D3436
be used with any SoC specific drivers, for example a ThunderX nic driver
would use something like the following in files.arm64:
arm64/cavium/thunder_nic.c optional soc_cavm_thunderx thndr_nic
Reviewed by: imp
Sponsored by: ABT Systems Ltd
Differential Revision: https://reviews.freebsd.org/D3479
Convert filemon_lock and struct filemon* lock to sx(9), rather than a
self-rolled reader-writer lock, and hold it for the entire time needed.
At least filemon_lock_write() was not checking for active readers when
it would successfully return with the write lock "held". This led to
a race with reading entries from filemon_inuse as they were removed. This
could be seen with QUEUE_MACRO_DEBUG enabled, causing -1 to be read as an
entry rather than a valid struct filemon*.
Fixing filemon_lock_write() to check readers was insufficient to fix the
races.
sx(9) was used as the lock could be held while taking proctree_lock and sleeping
in fo_write.
Sponsored by: EMC / Isilon Storage Division
MFC after: 2 weeks
correctly handle trying to access an invalid address in the debugger.
While here document that the breakpoint handler is supposed to fall
through to the following case.
Sponsored by: ABT Systems Ltd
If ARMv7 boots in HYP mode, switch to SVC32.
Reviewed by: ian
Submitted by: Wojciech Macek <wma@semihalf.com>
Jakub Palider <jpa@semihalf.com>
Obtained from: Semihalf
Sponsored by: Annapurna Labs
Differential Revision: https://reviews.freebsd.org/D1810
it may involve a pmap operation that iterates over the page's PV list, so
unnecessarily holding the page lock is undesirable.
MFC after: 1 week
Sponsored by: EMC / Isilon Storage Division
OpenBSD pf 4.5).
Fix argument ordering to memcpy as well as the size of the copy in the
(theoretical) case that pfi_buffer_cnt should be greater than ~_max.
This fix the failure when you hit the self table size and force it to be
resized.
MFC after: 3 days
Sponsored by: Rubicon Communications (Netgate)
I/OAT is also referred to as Crystal Beach DMA and is a Platform Storage
Extension (PSE) on some Intel server platforms.
This driver currently supports DMA descriptors only and is part of a
larger effort to upstream an interconnect between multiple systems using
the Non-Transparent Bridge (NTB) PSE.
For now, this driver is only built on AMD64 platforms. It may be ported
to work on i386 later, if that is desired. The hardware is exclusive to
x86.
Further documentation on ioat(4), including API documentation and usage,
can be found in the new manual page.
Bring in a test tool, ioatcontrol(8), in tools/tools/ioat. The test
tool is not hooked up to the build and is not intended for end users.
Submitted by: jimharris, Carl Delsey <carl.r.delsey@intel.com>
Reviewed by: jimharris (reviewed my changes)
Approved by: markj (mentor)
Relnotes: yes
Sponsored by: Intel
Sponsored by: EMC / Isilon Storage Division
Differential Revision: https://reviews.freebsd.org/D3456
in the parent, we will inherit the pmcids but cannot execute any operations
on them in the child. The reason for this is that pmc_find_pmc() only
tries to find the current process on the owners hash list, but given the
child does not own the attachment, we cannot find it.
Thus, in case the initial lookup fails, try to find the pmc_process state
affiliated with the child process, lookup the pmc from there using the
row index, and get the owner process from that pmc.
Then continue as normal and lookup the pmc context of the owner (process).
This allows us to call, e.g., pmc_start() in the child process before we
start the work there, but to collect the accumulated results later in
the parent.
Sponsored by: DARPA,AFRL
Obtained from: L41
Tested by: rwatson, L41
MFC after: 4 weeks
Reviewed by: gnn
Differential Revision: https://reviews.freebsd.org/D2052
reason this didn't result in an unclean shutdown is that devfs ignores
MNT_FORCE flag.
Reviewed by: kib@
MFC after: 1 month
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D3467
FreeBSD porting notes:
- only kernel-side changes are merged
- the new ioctl is not actually implemented yet
- thus, the goal is to synchronize DMU code
illumos/illumos-gate@2bcf0248e9https://www.illumos.org/issues/5692
we would like to expose the number of hole (sparse) blocks in a file.
this can be useful to for example if you want to fill in the holes with
some data; knowing the number of holes in advances allows you to report
progress on hole filling. We could use SEEK_HOLE to do that but it would
be O(n) where n is the number of holes present in the file.
Author: Max Grossman <max.grossman@delphix.com>
Reviewed by: Adam Leventhal <ahl@delphix.com>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Boris Protopopov <bprotopopov@hotmail.com>
Approved by: Richard Lowe <richlowe@richlowe.net>