neighbors, and is used in a way so that if entries a and b cannot be
merged, we consider them twice, first not-merging a with its successor
b, and then not-merging b with its predecessor a. This change replaces
vm_map_simplify_entry with vm_map_try_merge_entries, which compares
two adjacent entries only, and uses it to avoid duplicated
merge-checks.
Tested by: pho
Reviewed by: alc
Approved by: markj (implicit)
Differential Revision: https://reviews.freebsd.org/D20814
Some places only take the interlock to hold the vnode, which was a requiremnt
before they started being manipulated with atomics. Use the newly introduced
vholdnz to bump the count.
Reviewed by: kib
Tested by: pho
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D21358
Move pcpu KVA out of .bss into dynamically allocated VA at
pmap_bootstrap(). This avoids demoting superpage mapping .data/.bss.
Also it makes possible to use pmap_qenter() for installation of
domain-local pcpu page on NUMA configs.
Refactor pcpu and IST initialization by moving it to helper functions.
Reviewed by: markj
Tested by: pho
Discussed with: jeff
Sponsored by: The FreeBSD Foundation
Differential revision: https://reviews.freebsd.org/D21320
All these stacks are used only once (doublefault, boot) or very rare
(mce).
Reviewed by: markj
Tested by: pho
Sponsored by: The FreeBSD Foundation
Differential revision: https://reviews.freebsd.org/D21320
Store stack_guard_page * PAGE_SIZE into the gap->next_read field at
the time of the stack creation. This makes the used guard size
consistent between stack creation and stack grow time.
Suggested by: alc
Reviewed by: alc, markj
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Differential revision: https://reviews.freebsd.org/D21384
While it worked with the kenrel, it wasn't working with the loader.
It failed to handle dependencies correctly. The reason for that is
that we never created a nvme module with the DRIVER_MODULE, but
instead a nvme_pci and nvme_ahci module. Create a real nvme module
that nvd can be dependent on so it can import the nvme symbols it
needs from there.
Arguably, nvd should just be a simple child of nvme, but transitioning
to that (and winning that argument given why it was done this way) is
beyond the scope of this change.
Reviewed by: jhb@
Differential Revision: https://reviews.freebsd.org/D21382
All existing callers guarantee that the page does not have a
pre-existing dequeue pending. Thus, if the page is dequeued before
pqbatch_submit() acquires the page queue lock, we do not need to do
anything since vm_page_dequeue_complete() takes care of clearing all
page queue state flags for us.
With this change, vm_page_pqbatch_submit() has the nice property that it
does not directly modify any fields in the page structure.
Reviewed by: alc, kib
Tested by: pho (part of a larger change)
MFC after: 1 week
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D21372
It will become useful for the page daemon to be able to directly create
a batch queue entry for a page, and without modifying the page
structure. Rename vm_pqbatch_submit_page() to vm_page_pqbatch_submit()
to keep the namespace consistent. No functional change intended.
Reviewed by: alc, kib
MFC after: 1 week
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D21369
After all the changes, its dynamic scope is same as for MNTK_UNMOUNT,
but to allow the syncer vnode to be re-installed on unmount failure.
But the case of syncer was already handled by using the VV_FORCEINSMQ
flag for quite some time.
Tested by: pho
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Strip comments from the NOTES.armv[57] files as is done for other
NOTES files when building the corresponding LINT configs. Without
this, the LINT configs contained the NO_UNIVERSE comment from the
NOTES.armv[57] files.
Reviewed by: imp
MFC after: 1 week
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D21264
As part of marching gcc 4.2.1 out of the tree, turn off -Werror on gcc 4.2.1
compiles by default. It generates too many false positives and breaks CI
for no benefit.
Discussed on: arch@
Reviewed by: jhb@, emaste@, pfg@
Differential Revision: https://reviews.freebsd.org/D21378
This removes the last consumer of the modified zlib originally
bundled with Paul's PPP implementation, which will be removed
in a follow up commit.
PR: 229763
Differential Revision: https://reviews.freebsd.org/D21186
The Linux lockdep API assumes LA_LOCKED semantic in lockdep_assert_held(),
meaning that either a shared lock or write lock is Ok. On the other hand,
the timeout code uses lc_assert() with LA_XLOCKED, and we need both to
work.
For mutexes, because they can not be shared (this is unique among all lock
classes, and it is unlikely that we would add new lock class anytime soon),
it is easier to simply extend mtx_assert to handle LA_LOCKED there, despite
the change itself can be viewed as a slight abstraction violation.
Reviewed by: mjg, cem, jhb
MFC after: 1 month
Differential Revision: https://reviews.freebsd.org/D21362
queues, don't try to tear them down in the ctrlr_destroy
path. Otherwise, we dereference queue structures that are NULL and we
trap.
This fix is incomplete: we leak IRQ and MSI resources when this
happens. That's preferable to a crash but still should be fixed.
machine/regnum.h ends up being included by sys/procfs.h and sys/ptrace.h via
machine/reg.h. Many of the regnum definitions are too short and too generic
to be exposing to any userland application including one of these two
headers. Moreover, these actively cause build failures in googletest
(template <typename T1 ...> expanding to template <typename 9 ...>).
Hide the definitions behind _KERNEL or _WANT_MIPS_REGNUM, and patch all of
the userland consumers to define as needed.
Discussed with: imp, jhb
Reviewed by: imp, jhb
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D21330
load and people who pull in nvme/nvd from modules can't load nvd.ko
since it depends on nvme, not nvme_foo. The duplicate doesn't matter
since kldxref properly handles that case.
Turn off bus master after we detach the device (to match the prior
order). Release MSI after we're done detaching and have turned off
all the interrupts. Otherwise this may cause problems as other threads
race nvme_detach. This more closely matches the old order.
Reviewed by: mav@
Use pointer arithmetic (as now done in makefs, and in NetBSD) instead of
taking the address of array element. No functional change, but this
makes it easier to compare different versions of this file.
Reviewed by: kib
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D21365
After r351243 when ALTQ was enabled in the kernel, the inline functions
in ifq.h would not have full type information as if_var.h was not
included.
Given usb_ethernet.h already includes all the various headers (which)
is the cause of the problem here, add if_var.h to it. This fixes the
builds again.
Reported by: CI system, e.g. FreeBSD-head-aarch64-LINT
Without this patch, when an application performed lseek(SEEK_DATA/SEEK_HOLE)
on a file in a file system that does not have its own VOP_IOCTL(), the
lseek(2) fails with errno ENOTTY. This didn't seem appropriate, since
ENOTTY is not listed as an error return by either the lseek(2) man page
nor the POSIX draft for lseek(2).
This was discussed on freebsd-current@ here:
http://docs.FreeBSD.org/cgi/mid.cgi?CAOtMX2iiQdv1+15e1N_r7V6aCx_VqAJCTP1AW+qs3Yg7sPg9wA
This trivial patch maps ENOTTY to EINVAL for lseek(SEEK_DATA/SEEK_HOLE).
Reviewed by: markj
Relnotes: yes
Differential Revision: https://reviews.freebsd.org/D21300
This streams out an XML document over several GDB packets describing all
threads in the system; their ids, name, and any loosely defined "extra info"
we feel like including. For now, I have included a string version of the run
state, similar to some of the DDB logic to stringify thread state.
The benefit of supporting this in addition to the qfThreadInfo/qsThreadInfo
packing is that in this mode, the host gdb does not ask for every thread's
"qThreadExtraInfo," saving per-thread round-trips on "info threads."
To use this feature, (k)gdb needs to be built with the --with-expat option.
I would encourage enabling this option by default in our GDB port, if it is
not already.
Finally, there is another optional attribute you can specify per-thread
called a "handle." Handles are arbitrarily long sequences of bytes,
represented in the XML as hexadecimal. It is unclear to me how or if GDB
actually uses handles for anything. So I have left them out.
Specifically, use 'const' for the key passed to the 'setkey' method
and 'const' for the 'iv' passed to the 'reinit' method.
Reviewed by: cem
Sponsored by: Chelsio Communications
Differential Revision: https://reviews.freebsd.org/D21347
They follow the conventions set by rw and sx lock probes. There is
an additional lockstat:::lockmgr-disown probe.
Update lockstat(1) to report on contention and hold events for
lockmgr locks. Document the new probes in dtrace_lockstat.4, and
deduplicate some of the existing probe descriptions.
Reviewed by: mjg
MFC after: 1 week
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D21355
null_nodeget which follows almost always finds the target vnode in the hash,
avoiding insmntque1 altogether. Should it be needed, it already checks if the
lock needs to be upgraded.
Reviewed by: kib
Tested by: pho
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D20244
Intel has created RST and many laptops from vendors like Lenovo and Asus. It's a
mechanism for creating multiple boot devices under windows. It effectively hides
the nvme drive inside of the ahci controller. The details are supposed to be a
trade secret. However, there's a reverse engineered Linux driver, and this
implements similar operations to allow nvme drives to attach. The ahci driver
attaches nvme children that proxy the remapped resources to the child. nvme_ahci
is just like nvme_pci, except it doesn't do the PCI specific things. That's
moved into ahci where appropriate.
When the nvme drive is remapped, MSI-x interrupts aren't forwarded (the linux
driver doesn't know how to use this either). INTx interrupts are used
instead. This is suboptimal, but usually sufficient for the laptops these parts
are in.
This is based loosely on https://www.spinics.net/lists/linux-ide/msg53364.html
submitted, but not accepted by, Linux. It was written by Dan Williams. These
changes were written from scratch by Olivier Houchard.
Submitted by: cognet@ (Olivier Houchard)
Nvme drives can be attached in a number of different ways. Separate out the PCI
attachment so that we can have other attachment types, like ahci and various
types of NVMeoF.
Submitted by: cognet@
If device is unplugged from the system (CSTS register reads return
0xffffffff), it makes no sense to send any more recovery requests or
expect any responses back. If there is a detach call in such state,
just stop all activity and free resources. If there is no detach
call (hot-plug is not supported), rely on normal timeout handling,
but when it trigger controller reset, do not wait for impossible and
quickly report failure.
MFC after: 2 weeks
Sponsored by: iXsystems, Inc.
The original code came from a desire to minimize the number of updates
to v_wire_count, which prior to r329187 was updated using atomics.
However, there is no significant benefit to batching today, so simply
allocate pages using VM_ALLOC_WIRED and rely on system accounting.
Reviewed by: jhb
Differential Revision: https://reviews.freebsd.org/D21323
With r349546, it is a responsibility of the writer to clear PIPE_DIRECTW
after pinned data has been read. In particular, once a reader has
drained this data, there is a small window where the pipe is empty but
PIPE_DIRECTW is set. pipe_poll() was using the presence of PIPE_DIRECTW
to determine whether to return POLLIN, so in this window it would
claim that data was available to read when this was not the case.
Fix this by modifying several checks for PIPE_DIRECTW to instead look
at the number of residual bytes in data pinned by a direct writer. In
some cases we really do want to check for PIPE_DIRECTW, since the
presence of this flag indicates that any attempt to write to the pipe
will block on the existing direct writer.
Bisected and test case provided by: mav
Tested by: pho
Reviewed by: kib
MFC after: 3 days
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D21333
There is no need to duplicate this file when it can be trivially
shared (just exposing sections previously under #ifdef _KERNEL).
MFC with: r351273
Differential Revision: The FreeBSD Foundation
- Add a vm_pagequeue_remove() function to physically remove a page
from its queue and update the queue length.
- Remove vm_page_pagequeue_lockptr() and let vm_page_pagequeue()
return NULL for dequeued pages.
- Avoid unnecessarily reloading the queue index if vm_page_dequeue()
loses a race with a concurrent queue operation.
- Correct an always-true assertion: vm_page_dequeue() may be called
from the page allocator with the page unlocked. The assertion
m->order == VM_NFREEORDER simply tests whether the page has been
removed from the vm_phys free lists; instead, check whether the
page belongs to an object.
Reviewed by: kib
MFC after: 1 week
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D21341
It is useful for testing purposes to be able to drain UMA caches, so
do not limit the sysctl to DIAGNOSTIC kernels.
MFC after: 1 week
Sponsored by: Netflix
As of r332974 the page daemon does not requeue pages during a scan
of the active queue, so there is not much value in doing so here
either.
Reviewed by: alc, dougm, kib
MFC after: 1 week
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D21343
an retransmission of the initial SYN (with data) would
cause us to strip the SYN and decrement/increase offset/len
which then caused us a -1 offset and a panic.
Reported by: Larry Rosenman
(Michael Tuexen helped me debug this at the IETF)
Disable ar9300_swap_tx_desc() for the moment. It is an unused
function only tried to compile on big endian systems.
Found by: s390x buildkernel
MFC after: 3 months
seqc_consistent_* return bool, not seqc. [0]
While here annotate the rarely true condition - it is expected to run
into it on vare occasion (compared to the other case).
Reported by: oshogbo [0]
Sponsored by: The FreeBSD Foundation
There is no reason to duplicate this file when it can be trivially
shared (just exposing one section previously under #ifdef _KERNEL).
Reviewed by: imp, cem
MFC with: r351273
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D21346
This fixes possible double call of fail_fn, for example on hot removal.
It also allows ctrlr_fn to safely return NULL cookie in case of failure
and not get useless ns_fn or fail_fn call with NULL cookie later.
MFC after: 2 weeks
Otherwise the mutex needs to be dropped when copying out the midistat
sbuf, leading to a race which allows one to read kernel memory beyond
the end of the sbuf buffer.
Reported and tested by: pho
Security: CVE-2019-5612
Summary:
Reduce the diff between AIM and Book-E even more. This also cleans up
vmparam.h significantly.
Reviewed by: luporl
Differential Revision: https://reviews.freebsd.org/D21301
order to have struct mii_data available. However, it only really needs
a forward declaration of struct mii_data for use in pointer form for
the return type of a function prototype.
Custom kernel configuration that have usb and fdt enabled, but no miibus,
end up with compilation failures because miibus_if.h will not get
generated.
Due to the above, the following changes have been made to usb_ethernet.h:
* remove the inclusion of mii headers
* forward-declare struct mii_data
* include net/ifq.h to satify the need for complete struct ifqueue
Reviewed by: ian
Obtained from: Juniper Networks, Inc.
Differential Revision: https://reviews.freebsd.org/D21293
We suffer at least one round trip ACK latency every command / packet that
GDB has to send and receive, and the response format for 'info threads'
supports packing many threads IDs into a single packet, so do so.
Adds and uses a new API, gdb_txbuf_has_capacity(), which checks for a
certain number of bytes available in the outgoing txbuf.
On an example amd64 VM, the number of RTTs to transmit this list is reduced
by a factor of 110x. This is especially beneficial with recent GDB, which
seems to request the list at least twice during attach.
VM_OBJECT_DROP/VM_OBJECT_PICKUP to handle functions that are called with
uncertain lock state.
Reviewed by: kib, markj
Tested by: pho
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D21310
When tun/tap were merged, appropriate MODULE_VERSION should have been added
for things like modfind(2) to continue to do the right thing with the old
names.
Reported by: jhb
Return an empty string when the location is unknown instead of the
string "unknown". This ensures that all location entries are of
the form key=val.
Suggested by: imp
Approved by: jhb (mentor)
MFC after: 1 week
Sponsored by: Intel Corporation
Differential Revision: https://reviews.freebsd.org/D21326
nd_buf is used to buffer headers (for both the kernel dump itself and
for EKCD) before the final call to netdump_dumper(), which flushes
residual data in nd_buf. As a result, a small portion of the residual
data would be corrupted. This manifests when kernel dump compression
is enabled since both zstd and zlib detect the corruption during
decompression.
Reviewed by: cem
Differential Revision: https://reviews.freebsd.org/D21294
ti,dual-volt property say that the eMMC support 1.8V and 3.3V not 3.0V
Use the correct caps for the mmc stack.
Note that the MMCHS_SD_CAPA register can only be written once after bootup
so if one is using a u-boot compiled with eMMC support (this is the default)
this code is a no-op but just in case someone have u-boot compiled without
eMMC support this make eMMC works when the kernel is booted.
MFC after: 1 week
The fdt node for this driver is a simple-mfd and syscon compatible one
meaning that simplemfd will be the driver attached for it. The gpio driver
is attached to the 'gpio' subnode so use syscon_get_handle_default to
obtain the handle of the syscon from the parent device and use this
to read/write to the memory region.
MFC after: 1 week
fs-specific part of vfs_statfs routines only fill in small portion of the
structure. Previous code was always copying everything at a higher layer to
acoomodate it and this patch does the same.
'df' (no arguments) worked fine because the caller uses mnt_stat itself as the
target buffer, making all the copying a no-op for its own case.
'df /' and similar use a different consumer which passes its own buffer and
this is where you can run into trouble.
Reported by: cy
Fixes: r351193
Sponsored by: The FreeBSD Foundation
This is similar to checks for td_sx_slocks and td_rw_rlocks.
Although td_lk_slocks is an implementation detail, it still makes sense
to validate it.
MFC after: 1 week
Sponsored by: Panzura
Without this patch, when an application performed lseek(SEEK_DATA/SEEK_HOLE)
on a file in a file system that does not have its own VOP_IOCTL(), the
lseek(2) fails with errno ENOTTY. This didn't seem appropriate, since
ENOTTY is not listed as an error return by either the lseek(2) man page
nor the POSIX draft for lseek(2).
A discussion on freebsd-current@ seemed to indicate that implementing
a trivial algorithm that returns the offset argument for FIOSEEKDATA and
returns the file's size for FIOSEEKHOLE was the preferred fix.
http://docs.FreeBSD.org/cgi/mid.cgi?CAOtMX2iiQdv1+15e1N_r7V6aCx_VqAJCTP1AW+qs3Yg7sPg9wA
The Linux kernel appears to implement this trivial algorithm as well.
This patch adds a vop_stdioctl() that implements this trivial algorithm.
It returns errors consistent with vn_bmap_seekhole() and, as such, will
still return ENOTTY for non-regular files.
I have proposed a separate patch that maps errors not described by the
lseek(2) man page nor POSIX draft to EINVAL. This patch is under separate
review.
Reviewed by: kib
Relnotes: yes
Differential Revision: https://reviews.freebsd.org/D21299
NUMA domain that the pages describe. Patch original from gallatin.
Reviewed by: kib
Tested by: pho
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D21252
Add helper function for synchronous execution of HCI commands at probe
stage and use this function to check firmware state of Intel Wireless
8260/8265 bluetooth devices found in many post 2016 year laptops.
Attempt to initialize FreeBSD bluetooth stack while such a device is in
bootloader mode locks the adapter hardly so it requires power on/off
cycle to restore.
This change blocks ng_ubt attachment unless operational firmware is
loaded thus preventing the lock up.
PR: 237083
Reviewed by: hps, emax
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D21071
Suppose that a binary was executed from tmpfs mount, and the text
vnode was reclaimed while the binary was still running. It is
possible during even the normal operations since tmpfs vnode'
vm_object has swap type, and no references on the vnode is held. Also
assume that the text vnode was revived for some reason. Then, on the
process exit or exec, unmapping of the text mapping tries to remove
the text reference from the vnode, but since it went from
recycle/instantiation cycle, there is no reference kept, and assertion
in VOP_UNSET_TEXT_CHECKED() triggers.
Fix this by keeping a use reference on the tmpfs vnode for each exec
reference. This prevents the vnode reclamation while executable map
entry is active.
Do it by adding per-mount flag MNTK_TEXT_REFS that directs
vop_stdset_text() to add use ref on first vnode text use, and
per-vnode VI_TEXT_REF flag, to record the need on unref in
vop_stdunset_text() on last vnode text use going away. Set
MNTK_TEXT_REFS for tmpfs mounts.
Reported by: bdrewery
Tested by: sbruno, pho (previous version)
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
Require the vnode to be locked for the VOP_UNSET_TEXT() call. This
will be used by the following bug fix for a tmpfs issue.
Tested by: sbruno, pho (previous version)
Sponsored by: The FreeBSD Foundation
MFC after: 1 week
The struct is already populated on each mount (and remount). Fields are either
constant or not used by filesystem in the first place.
Some infrequently used functions use it to avoid having to allocate a new buffer
and are left alone.
The current code results in an avoidable copying single-threaded and significant
cache line bouncing multithreaded
While here deduplicate initial filling of the struct.
Reviewed by: kib
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D21317
Move fast entropy source registration to the earlier
SI_SUB_RANDOM:SI_ORDER_FOURTH and move random_harvestq_prime after that.
Relocate the registration routines out of the much later randomdev module
and into random_harvestq.
This is necessary for the fast random sources to actually register before we
perform random_harvestq_prime() early in the kernel boot.
No functional change.
Reviewed by: delphij, markjm
Approved by: secteam(delphij)
Differential Revision: https://reviews.freebsd.org/D21308
- add support for 'output-low', 'output-high', 'output-low' and
'output-enable' properties. These are use in RK3288 DT files
- add support for RK3288
- to reduce overall file size, use local macros for initialization
of pinctrl description structures.
MFC after: 2 weeks
- Properly handle IIC_M_NOSTOP and IIC_M_NOSTART flags.
- add polling mode, so driver can be used even if interrupts are not
enabled (this is necessary for proper support of PMICs).
- add support for RK3288
MFC after: 2 weeks
If simple multifuction device also provides syscon interface, its
childern should be able to consume it. Due to this:
- declare coresponding method in syscon interface
- implement it in simple multifunction device driver
MFC after: 1 week
NUMA aware boot time memory allocator that will be used to allocate early
domain correct structures. Code partially submitted by gallatin.
Reviewed by: gallatin, kib
Tested by: pho
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D21251
The mistake came about like this: the first attempt to commit was blocked by
a pre-commit hook due to missing SVN tags. svn revert doesn't delete new
files, I guess. While reapplying the fixed diff, the non-empty target file
was just concatenated with the new contents? Ugh. :-(
This regression was introduced in the r326169 Linux v4.9 Infiniband upgrade.
Restore the functionality.
Reviewed by: hselasky
Sponsored by: Dell EMC Isilon
Differential Revision: https://reviews.freebsd.org/D21298
- move allproc lock into the func, it is of no use prior to it
- the code would lock p1 and p2 while holding allproc to partially
construct it after it gets added to the list. instead we can do the
work prior to adding anything.
- protect lastpid with procid_lock
As a side effect we do less work with allproc held.
Sponsored by: The FreeBSD Foundation
The limit is almost never reached. Do the check only on failure to see if
we can override it.
No change in user-visible behavior.
Sponsored by: The FreeBSD Foundation
Code doing this is commented with a claim that these IDs are occupied by
daemons, but that's demonstrably false. To an extent the range is used by init
and kernel processes (and on sufficiently big machines it indeed is fully
populated).
On a sample box 40-way box the highest id in the range is 63. On a different one
it is 23. Just use the range.
Sponsored by: The FreeBSD Foundation
If vn_lock() failed, then the function returned the error but the vnode
obtained via zfs_zget() was never released.
MFC after: 10 days
Sponsored by: Panzura
This allows use of the shared _net_inet_sdp in more than one compilation
unit. (Nothing in-tree uses this today, but some of Isilon's out-of-tree
SDP enhancements add sysctls below the node.)
Sponsored by: Dell EMC Isilon
Commit message by Jake:
The iflib_register function exists to allocate and setup some common
structures used by both iflib_device_register and iflib_pseudo_register.
There is no associated cleanup function used to undo the steps taken in
this function.
Both iflib_device_deregister and iflib_pseudo_deregister have some of
the necessary steps scattered in their flow. However, most of the
necessary cleanup is not done during the error path of
iflib_device_register and iflib_pseudo_register.
Some examples of missed cleanup include:
the ifp pointer is not free'd during error cleanup
the STATE and CTX locks are not destroyed during error cleanup
the vlan event handlers are not removed during error cleanup
media added to the ifmedia structure is not removed
the kobject reference is never deleted
Additionally, when initializing the kobject class reference counter is
increased even though kobj_init already increases it. This results in
the class never being free'd again because the reference count would
never hit zero even after all driver instances are unloaded.
To aid in proper cleanup, implement an iflib_deregister function that
goes through the reverse steps taken by iflib_register.
Call this function during the error cleanup for iflib_device_register
and iflib_pseudo_register. Additionally call the function in the
iflib_device_deregister and iflib_pseudo_deregister functions near the
end of their flow. This helps reduce code duplication and ensures that
proper steps are taken to cleanup allocations and references in both the
regular and error cleanup flows.
Signed-off-by: Jacob Keller <jacob.e.keller@intel.com>
Submitted by: Jacob Keller <jacob.e.keller@intel.com>
Reviewed by: shurd@, erj@
MFC after: 3 days
Sponsored by: Intel Corporation
Differential Revision: https://reviews.freebsd.org/D21005
NTB Tool driver is meant for testing NTB hardware driver functionalities,
such as doorbell interrupts, link events, scratchpad registers and memory
windows. This is a port of ntb_tool driver from Linux. It has been
verified on top of AMD and PLX NTB HW drivers.
Submitted by: Arpan Palit <arpan.palit@amd.com>
Cleaned up by: mav
MFC after: 2 weeks
Relnotes: yes
Differential Revision: https://reviews.freebsd.org/D18819
It is unused, the ABI was broken in r322969, and it is broken by design
(more than MDNPAD md devices can exist and there is no way to retreive
them with this interface).
mdconfig(8) was converted to use libgeom to obtain this information
in r157160 and any other consumers of MDIOCLIST should likewise be
converted.
Reviewed by: emaste
Relnotes: yes
Sponsored by: DARPA, AFRL
Differential Revision: https://reviews.freebsd.org/D18936
Since r349596 the syscon driver will map the memory. Since the gpio/pinctrl
controller wants it too set it to SHAREABLE.
This fix the gpio controller for attaching and so consumer can use it.
Now the sdhci_xenon driver can detect presence of an sdcard and attach
the mmc driver.
MFC after: 1 week
In addition to pagedaemon initiating OOM, also do it from the
vm_fault() internals. Namely, if the thread waits for a free page to
satisfy page fault some preconfigured amount of time, trigger OOM.
These triggers are rate-limited, due to a usual case of several
threads of the same multi-threaded process to enter fault handler
simultaneously. The faults from pagedaemon threads participate in the
calculation of OOM rate, but are not under the limit.
Reviewed by: markj (previous version)
Tested by: pho
Discussed with: alc
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks
Differential revision: https://reviews.freebsd.org/D13671
The FUSE_LISTXATTR operation always returns the full list of a file's
extended attributes, in all namespaces. There's no way to filter the list
server-side. However, currently FreeBSD's fusefs driver sends a namespace
string with the FUSE_LISTXATTR request. That behavior was probably copied
from fuse_vnop_getextattr, which has an attribute name argument. It's
been there ever since extended attribute support was added in r324620. This
commit removes it.
Reviewed by: cem
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D21280
doing so adds more flexibility with less redundant code.
Reviewed by: jhb, markj, kib
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D21250
When the byte range for copy_file_range(2) doesn't go to EOF on the
output file and there is a hole in the input file, a hole must be
"punched" in the output file. This is done by writing a block of bytes
all set to 0.
Without this patch, the write is done unconditionally which means that,
if the output file already has a hole in that byte range, a unneeded data block
of all 0 bytes would be allocated.
This patch adds code to check for a hole in the output file, so that it can
skip doing the write if there is already a hole in that byte range of
the output file. This avoids unnecessary allocation of blocks to the
output file.
Reviewed by: kib
Differential Revision: https://reviews.freebsd.org/D21155
This adds safety net for the case of misconfigured NTB with too big
memory window, for which we may be unable to allocate a memory buffer,
which does not make much sense for the network interface. While there,
fix the code to really work with asymmetric window sizes setup.
This makes driver just print warning message on boot instead of hanging
if too large memory window is configured.
MFC after: 2 weeks
Sponsored by: iXsystems, Inc.
In r350842 I've switched the bus pass to resource so it matches the other
clock drivers but this cannot work as this drivers is meant to match
the dts node '/clocks' and if we don't do it at this pass simplebus is
catching this node and we cannot attach.
This solve booting on Allwinner boards that are still using /clocks (A20 SoC)
MFC after: 3 days
Fix the reported boot failures and revert r350510.
Note this commit is effectively merging ACPICA 20190703 again and applying
an upstream patch.
https://github.com/acpica/acpica/commit/73f6372
Tested by: scottl
illumos/illumos-gate@811964cd9f811964cd9fhttps://www.illumos.org/issues/10406
The large dnode changes from 8423 caused problems in zfs recv for a legacy
stream. This manifests when attempting to mount the received stream, but the
problem is in the receive code. We missed the following commit from ZoL which
fixes this.
commit da2feb42fb
Author: Tom Caputi <tcaputi@datto.com>
Date: Thu Jun 28 17:55:11 2018 -0400
Fix 'zfs recv' of non large_dnode send streams
Currently, there is a bug where older send streams without the
DMU_BACKUP_FEATURE_LARGE_DNODE flag are not handled correctly.
The code in receive_object() fails to handle cases where
drro->drr_dn_slots is set to 0, which is always the case when the
sending code does not support this feature flag. This patch fixes
the issue by ensuring that that a value of 0 is treated as
DNODE_MIN_SLOTS.
Author: Tom Caputi <tcaputi@datto.com>
MFC after: 3 weeks
X-MFC after: r351074
8423 8199 7432 Implement large_dnode pool feature
8423 Implement large_dnode pool feature
8199 multi-threaded dmu_object_alloc()
7432 Large dnode pool feature
llumos/illumos-gate@54811da5ac54811da5achttps://www.illumos.org/issues/8423https://www.illumos.org/issues/8199https://www.illumos.org/issues/7432
ZoL issues:
Improved dnode allocation #6564
Clean up large dnode code #6262
Fix dnode_hold() freeing dnode behavior #8172
Fix dnode allocation race #6414, #6439
Partial: Raw sends must be able to decrease nlevels #6821, #6864
Remove unnecessary txg syncs from receive_object() Closes#7197
This updates FreeBSD large_dnode code (that was imported from ZoL) to a version
that was committed to illumos. It has some cleanups, improvements and fixes
comparing to what we have in FreeBSD now. I think that the most significant
update is 8199 multi-threaded dmu_object_alloc().
Obtained from: illumos
MFC after: 3 weeks
This restores parity with AMD NTB driver. Though without any drivers
supporting more then one peer and respective KPI modification to pass
peer index to most of the calls this addition is pretty useless now.
MFC after: 2 weeks
The only thing blocking UMA_MD_SMALL_ALLOC from working on 64-bit booke
powerpc was a missing check in pmap_kextract(). Adding DMAP handling into
pmap_kextract(), we can now use UMA_MD_SMALL_ALLOC. This should improve
performance and stability a bit, since DMAP is always mapped in TLB1, so
this relieves pressure on TLB0.
MFC after: 3 weeks
expression howmany(BBSIZE, PAGE_SIZE), where BBSIZE is the size of the
boot block area. That can be less than 2 if PAGE_SIZE is big.
swapon(8) has an option to trim (delete) all the blocks of a device at
startup. However, if the first of those blocks is a bsd label, then
trimming those blocks is destructive. Change swapon to leave the
first BBSIZE bytes untrimmed.
Update manual pages to reflect changes in how swapon and how it may be
used, espeically in association with savecore.
Reviewed by: alc
Approved by: markj (mentor)
MFC after: 3 days
Differential Revision: https://reviews.freebsd.org/D21191
No functional change.
Add a verbose comment giving an example side-by-side comparison between the
prior and Concurrent modes of Fortuna, and why one should believe they
produce the same result.
The intent is to flip this on by default prior to 13.0, so testing is
encouraged. To enable, add the following to loader.conf:
kern.random.fortuna.concurrent_read="1"
The intent is also to flip the default blockcipher to the faster Chacha-20
prior to 13.0, so testing of that mode of operation is also appreciated.
To enable, add the following to loader.conf:
kern.random.use_chacha20_cipher="1"
Approved by: secteam(implicit)
enough that unification will have to wait for the next pass.
Reviewed by: jhb (verbal OK on irc)
Differential Revision: https://reviews.freebsd.org/D21248
specific, and only builds there. Likewise the module is built there. Move it to
the x86-only files.x86.
Reviewed by: jhb (verbal OK on irc)
Differential Revision: https://reviews.freebsd.org/D21248
Apart from one MD file, ACPI is a x86 implementation, not specific to either
i386 or amd64, so put it into files.x86. Other architectures include fewer
files for the same options, so it can't move into the MI files file.
Reviewed by: jhb (verbal OK on irc)
Differential Revision: https://reviews.freebsd.org/D21248
VIA Padlock support is for VIA C3, C7 and Eden processors, which are 64bit x86
processors.
Reviewed by: jhb (verbal OK on irc)
Differential Revision: https://reviews.freebsd.org/D21248
The HPT drivers are all x86 only. Move them to files.x86. Because of the way we
run uudecode, we can use $M instead of needing entries for them in separate
files.
Reviewed by: jhb (verbal OK on irc)
Differential Revision: https://reviews.freebsd.org/D21248
Move all the identical x86 lines to files.x86. The non-identical ones should be
unified and moved as well, but that would require additional changes that would
need a more careful review and may not be MFCable, so I'll do them
separately. I'll delete the mildly snarky comment when things are unified.
Reviewed by: jhb (verbal OK on irc)
Differential Revision: https://reviews.freebsd.org/D21248
In FUSE protocol 7.9, the size of the FUSE_GETATTR request has increased.
However, the fusefs driver is currently not sending the additional fields.
In our implementation, the additional fields are always zero, so I there
haven't been any test failures until now. But fusefs-lkl requires the
request's length to be correct.
Fix this bug, and also enhance the test suite to catch similar bugs.
PR: 239830
MFC after: 2 weeks
MFC-With: 350665
Sponsored by: The FreeBSD Foundation
Namespace Optimal I/O Boundary field added in NVMe 1.3 and Namespace
Preferred Write Granularity added in 1.4 allow upper layers to align
I/Os for improved SSD performance and endurance.
I don't have hardware reportig those yet, but NPWG could probably be
reported by bhyve.
MFC after: 2 weeks
Sponsored by: iXsystems, Inc.
This patch makes the DRM graphics driver in ports usable on aarch64.
Submitted by: Greg V <greg@unrelenting.technology>
Differential Revision: https://reviews.freebsd.org/D21008
MFC after: 1 week
Sponsored by: Mellanox Technologies
The Zstd format bumps the CLOOP major number to 4 to avoid incompatibility
with older systems. Support in geom_uzip(4) is conditional on the ZSTDIO
kernel option, which is enabled in amd64 GENERIC, but not all in-tree
configurations.
mkuzip(8) was modified slightly to always initialize the nblocks + 1'th
offset in the CLOOP file format. Previously, it was only initialized in the
case where the final compressed block happened to be unaligned w.r.t.
DEV_BSIZE. The "Fake" last+1 block change in r298619 means that the final
compressed block's 'blen' was never correct unless the compressed uzip image
happened to be BSIZE-aligned. This happened in about 1 out of every 512
cases. The zlib and lzma decompressors are probably tolerant of extra trash
following the frame they were told to decode, but Zstd complains that the
input size is incorrect.
Correspondingly, geom_uzip(4) was modified slightly to avoid trashing the
nblocks + 1'th offset when it is known to be initialized to a good value.
This corrects the calculated final real cluster compressed length to match
that printed by mkuzip(8).
mkuzip(8) was refactored somewhat to reduce code duplication and increase
ease of adding other compression formats.
* Input block size validation was pulled out of individual compression
init routines into main().
* Init routines now validate a user-provided compression level or select
an algorithm-specific default, if none was provided.
* A new interface for calculating the maximal compressed size of an
incompressible input block was added for each driver. The generic code
uses it to validate against MAXPHYS as well as to allocate compression
result buffers in the generic code.
* Algorithm selection is now driven by a table lookup, to increase ease of
adding other formats in the future.
mkuzip(8) gained the ability to explicitly specify a compression level with
'-C'. The prior defaults -- 9 for zlib and 6 for lzma -- are maintained.
The new zstd default is 9, to match zlib.
Rather than select lzma or zlib with '-L' or its absense, respectively, a
new argument '-A <algorithm>' is provided to select 'zlib', 'lzma', or
'zstd'. '-L' is considered deprecated, but will probably never be removed.
All of the new features were documented in mkuzip.8; the page was also
cleaned up slightly.
Relnotes: yes
With support for multiple namespaces and multiple ports in NVMe there is
now a need for reliable unique namespace identification alike to SCSI.
MFC after: 1 weeks
Sponsored by: iXsystems, Inc.
The DRM drivers use the lockdep assertion macros with spinlock_t locks
which are backed by mutexes, not sx locks. This causes compile
failures since you can't use sx_assert with a mutex. Instead, change
the lockdep macros to use lock_class methods. This works by assuming
that each LinuxKPI locking primitive embeds a FreeBSD lock as its
first structure and uses a cast to get to the underlying 'struct
lock_object'.
Reviewed by: hselasky
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D20992
Follow-up on r322318 and r322319 and remove the deprecated modules.
Shift some now-unused kernel files into userspace utilities that incorporate
them. Remove references to removed GEOM classes in userspace utilities.
Reviewed by: imp (earlier version)
Sponsored by: Dell EMC Isilon
Differential Revision: https://reviews.freebsd.org/D21249
Since ipvoly is used for checksum calculation, part of original IP
header is zeroed. This part includes ip_ttl field, that can be used
later in IP_MINTTL socket option handling.
PR: 239799
MFC after: 1 week
tcpratelimit isn't supported as there's now atomic_add_64, so add it to the exclusion list
Add comment for why PPC_PROBE_CHIPSET is on the list
Remove UKBD_DFLT_KEYMAP now that ukbd works on all platforms.
files.x86 is for the parts of the system that are common to both i386 and amd64
due too their nature. First up, to get the ball rolling, is fdc, the floppy disk
support. It works only on amd64 and i386 these days, and that's unlikely to
change.
Reviewed by: jhb, cem (earlier versrions)
Differential Revision: https://reviews.freebsd.org/D21210
Move the floppy driver to the x86 specific notes file.
Reviewed by: jhb, manu, jhibbits, emaste
Differential Revision: https://reviews.freebsd.org/D21208
x86 needs sc, as does sparc64. powerpc doesn't use it by default, but some old
powermac notebooks do not work with vt yet for reasons unknonw. Even so, I've
removed it from powerpc LINT. It's not in daily use there, and the intent is to
100% switch to vt now that it works for that platform to limit support burden.
All the other architectures omit some or all of the screen savers from their
lint config. Move them to the x86 NOTES files and remove the exclusions. This
reduces slightly the number of savers sparc64 compiles, but since they are in
GENERIC, the overage is adequate and if someone reaelly wants to sort them out
in sparc64 they can sweat the details and the testing.
Reviewed by: jhb (earlier version), manu (earlier version), jhibbits
Differential Revision: https://reviews.freebsd.org/D21233
Avoid empty structs, that have undefined behavior in C99 and
make compilers complain about it
(empty struct has size 0 in C, size 1 in C++).
Reviewed by: jhibbits
Differential Revision: https://reviews.freebsd.org/D21231
Fixed trap handler logic, in order to make it save FPU registers,
if FPU is enabled, before enabling VSX. Without this change, FPU
register contents were being lost when set before VSX was enabled.
illumos/illumos-gate@892586e8a1892586e8a1https://www.illumos.org/issues/6585
In any pool without the extensible dataset feature flag already enabled,
creating a dataset with dedup set to use one of the new checksums would result
in the following panic as soon as any data was added:
panic[cpu0]/thread=ffffff0006761c40: feature_get_refcount(spa, feature,
&refcount) != 48 (0x30 != 0x30), file: ../../common/fs/zfs/zfeature.c line 390
ffffff0006761830 fffffffffba8fbdd ()
ffffff0006761890 zfs:feature_do_action+11a ()
ffffff00067618c0 zfs:spa_feature_incr+1e ()
ffffff0006761920 zfs:dmu_object_zapify+b7 ()
ffffff00067619b0 zfs:dsl_dataset_activate_feature+97 ()
ffffff0006761a20 zfs:dsl_dataset_sync+ba ()
ffffff0006761ab0 zfs:dsl_pool_sync+153 ()
ffffff0006761b70 zfs:spa_sync+26e ()
ffffff0006761c20 zfs:txg_sync_thread+227 ()
ffffff0006761c30 unix:thread_start+8 ()
Inspection showed that feature->fi_feature was 7, which is the value of
SPA_FEATURE_EXTENSIBLE_DATASET in the spa_feature enum.
Testing shows that the panic can be prevented by explicitly setting extensible
dataset as a dependency for the sha512, edonr, and skein feature flags.
Alternatively, the new checksums code could possibly be changed to obviate the
need for the dependency.
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Richard Laager <rlaager@wiktel.com>
Approved by: Robert Mustacchi <rm@joyent.com>
Author: ilovezfs <ilovezfs@icloud.com>
Note that FreeBSD does not support ednor yet.
MFC after: 2 weeks
The race was introduced in r337669, the large dnode feature import from
ZoL. The problem was debugged by ZoL developers and then,
independently, on FreeBSD.
The fix is an early proposal by Brian Behlendorf:
50f32ed74e
This fix never went into ZoL. A larger change that was committed later
included a different solution because of the re-worked code.
Ideally, we want to revert this fix and re-synchronize FreeBSD large
dnode code with that in illumos (or newer ZoL). illumos has a later
import of the feature from ZoL that does not have the bug.
PR: 236480
Obtained from: Brian Behlendorf <behlendorf1@llnl.gov>
Submitted by: ncrogers@gmail.com (patch adaptation)
Reported by: ncrogers@gmail.com
Tested by: ncrogers@gmail.com,
Dennis Noordsij <dennis.noordsij@alumni.helsinki.fi>,
Julien Cigar <julien@perdition.city>
MFC after: 10 days
This is part 2 of r347078, pulling the page directory out of the Book-E
pmap. This breaks KBI for anything that uses struct pmap (such as vm_map)
so any modules that access this must be rebuilt.