r332341 introduced OF_getencprop_alloc_multi that should be used
instead of OF_getencprop_alloc to get multi-cell properties.
Fix example to reflect this change.
The nvlist_append_{bool,number,string,nvlist,descriptor}_array() functions
allows to dynamically extend array stored in the nvlist.
Submitted by: Mindaugas Rasiukevicius <rmind@netbsd.org>
All information which are need for those operations is already stored in
the cookie.
We decided not to bump libnv version because this API is not used yet in the
base system.
Reviewed by: pjd
Most kernel memory that is allocated after boot does not need to be
executable. There are a few exceptions. For example, kernel modules
do need executable memory, but they don't use UMA or malloc(9). The
BPF JIT compiler also needs executable memory and did use malloc(9)
until r317072.
(Note that a side effect of r316767 was that the "small allocation"
path in UMA on amd64 already returned non-executable memory. This
meant that some calls to malloc(9) or the UMA zone(9) allocator could
return executable memory, while others could return non-executable
memory. This change makes the behavior consistent.)
This change makes malloc(9) return non-executable memory unless the new
M_EXEC flag is specified. After this change, the UMA zone(9) allocator
will always return non-executable memory, and a KASSERT will catch
attempts to use the M_EXEC flag to allocate executable memory using
uma_zalloc() or its variants.
Allocations that do need executable memory have various choices. They
may use the M_EXEC flag to malloc(9), or they may use a different VM
interfact to obtain executable pages.
Now that malloc(9) again allows executable allocations, this change also
reverts most of r317072.
PR: 228927
Reviewed by: alc, kib, markj, jhb (previous version)
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D15691
callbacks to perform additional cleanup actions at the time a socket is
closed.
Michio Honda presented a use for this at BSDCan 2018.
(See https://www.bsdcan.org/2018/schedule/events/965.en.html .)
Submitted by: Michio Honda <micchie at sfc.wide.ad.jp> (previous version)
Reviewed by: lstewart (previous version)
Differential Revision: https://reviews.freebsd.org/D15706
There are risks associated with waiting on a preemptible epoch section.
Change the name to make them not be the default and document the issue
under CAVEATS.
Reported by: markj
Currently 'man -k iflib' would find you the right pages for iflib
documentation, namely iflibdd(9) and iflibdi(9) but 'man iflib' would leave
you in the dark. This allows both approaches to find the relevant
documentation.
Reviewed by: kmacy, shurd
Sponsored by: iXsystems, Inc.
Differential Revision: https://reviews.freebsd.org/D15219
Prior to this change the manual page documented ifdi_queues_alloc which has
been replaced by separate methods for tx and rx queues.
Reviewed by: kmacy, shurd
Sponsored by: iXsystems, Inc.
Differential Revision: https://reviews.freebsd.org/D15218
Half of implementations always failed (returned (-1)) and they were
previously used in only one place.
Reviewed by: kib, andrew
Obtained from: CheriBSD
Sponsored by: DARPA, AFRL
Differential Revision: https://reviews.freebsd.org/D15102
Also remove the commented out documentation. The documentation arrived
with the import of the copy.9 manpage. I suspect the implementations
came from NetBSD while bootstrapping the Arm and MIPS ports.
Reviewed by: andrew, jmallett
Sponsored by: DARPA, AFRL
Differential Revision: https://reviews.freebsd.org/D15108
Add a new "interleave" allocation policy which stripes pages across
domains with a stride or width keeping contiguity within a multi-page
region.
Move the kernel to the dedicated numbered cpuset #2 making it possible
to assign kernel threads and memory policy separately from user. This
also eliminates the need for the complicated interrupt binding code.
Add a sysctl API for viewing and manipulating domainsets. Refactor some
of the cpuset_t manipulation code using the generic bitset type so that
it can be used for both. This probably belongs in a dedicated subr file.
Attempt to improve the include situation.
Reviewed by: kib
Discussed with: jhb (cpuset parts)
Tested by: pho (before review feedback)
Sponsored by: Netflix, Dell/EMC Isilon
Differential Revision: https://reviews.freebsd.org/D14839
Forwarded packets passed through PFIL_OUT, which made it difficult for
firewalls to figure out if they were forwarding or producing packets. This in
turn is an issue for pf for IPv6 fragment handling: it needs to call
ip6_output() or ip6_forward() to handle the fragments. Figuring out which was
difficult (and until now, incorrect).
Having pfil distinguish the two removes an ugly piece of code from pf.
Introduce a new variant of the netpfil callbacks with a flags variable, which
has PFIL_FWD set for forwarded packets. This allows pf to reliably work out if
a packet is forwarded.
Reviewed by: ae, kevans
Differential Revision: https://reviews.freebsd.org/D13715
altq(4) to match altq(9). This makes preserving the history section as the
author of ALTQ easier in the history section, rather than calling it a framework
in the description & a system in the history.
Add a history section to altq(4) and extend the history section in altq(9)
Approved by: bcr (mentor)
MFC after: 5 days
Differential Revision: https://reviews.freebsd.org/D14774
Remove how to format K&R stuff. The project hasn't been using it in
new code for a long time. It's so obsolete, we don't need a statement
to never use it. Add a statement requesting that comments about
parameters be preserved when converting to ASNI style, per Kirk.
Differential Revision: https://reviews.freebsd.org/D14051
- Add fdt_pinctrl(4) with general information for the driver
- Add fdt_pinctrl(9) with fdt_pinctrl KPI description
Reviewed by: ian, manu, wblock
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D14235
camelCase tends to be preferred for function identifiers, while
internal_underscores are preferred for variable identifiers. This convention
makes it a little bit easier to eyeball whether variable/function usage is
correct.
The optional commas for final table values are preferred to reduce chances
for error.
The intent of this guideline is to avoid creating global variables in module
scope. Its main purpose is to serve as a reminder that variables at module
scope also need to be declared.
We want to avoid global variables in general, but this is easier to mess up
when designing things in the module scope.
VirtIO V1 provides configuration in multiple VENDOR capabilities so this
allows all of the configuration to be discovered.
Reviewed by: jhb
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D14325
This covers the lua style guidelines we've generally agreed on so far. It
will be revised as work continues and we run into more scenarios that need
specified.
Discussed with: cem, jilles
Differential Revision: https://reviews.freebsd.org/D14423
Additionally, move the overflow check logic out to WOULD_OVERFLOW() for
consumers to have a common means of testing for overflowing allocations.
WOULD_OVERFLOW() should be a secondary check -- on 64-bit platforms, just
because an allocation won't overflow size_t does not mean it is a sane size
to request. Callers should be imposing reasonable allocation limits far,
far, below overflow.
Discussed with: emaste, jhb, kp
Sponsored by: Dell EMC Isilon
Similar to calloc() the mallocarray() function checks for integer
overflows before allocating memory.
It does not zero memory, unless the M_ZERO flag is set.
Reviewed by: pfg, vangyzen (previous version), imp (previous version)
Obtained from: OpenBSD
Differential Revision: https://reviews.freebsd.org/D13766
This copies changes from NetBSD into FreeBSD's man page. I compared the
proposed changes against FreeBSD headers and modified them to match.
PR: 214602
Submitted by: fehmi noyan isi <fnoyanisi@yahoo.com>
Add atomic_load_<type> and atomic_store_<type>, and explain why they
exist.
Define the synchronizes-with relationship and its effects.
Reorder and revise some of the existing text. For example, more
precisely describe when ordinary accesses are atomic.
Reviewed by: jhb, kib
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D13522
the expected default board_vendor value on MIPS SoCs.
This is required by bwn(4) to differentiate between single-band and
dual-band device variants that otherwise share a common chip ID.
Approved by: adrian (mentor, implicit)
Sponsored by: The FreeBSD Foundation
Add missing support for specifying I/O control flags during core reset,
and resolve a number of siba(4)-specific reset issues:
- Add missing check for target reject flags in siba_is_hw_suspended().
- Remove incorrect wait on SIBA_TMH_BUSY when modifying any target state
register; this should only be done when waiting for initiated
transactions to clear.
- Add missing wait on SIBA_IM_BY when asserting SIBA_IM_RJ.
- Overwrite any previously set SIBA_TML_REJ flag when bringing the core
out of reset. This fixes a lockup that occured when we brought up a core
(after reboot) that had previously been placed into RESET by siba_bwn(4).
Approved by: adrian (mentor, implicit)
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D13039
This includes a number of copyedits for the inline code documentation
comments, updates to the existing bhnd(4), bhndb(4), bcma(4), and siba(4)
man pages, and new man pages for bhnd_chipc(4), bhnd_pmu(4), bhndb_pci(4),
bhnd(9), and bhnd_erom(9).
Approved by: adrian (mentor)
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D13021
The old description has been inaccurate since at least 243271, if not
before.
Submitted by: will
Reviewed by: kib
MFC after: 3 weeks
Sponsored by: Spectra Logic Corp
Differential Revision: https://reviews.freebsd.org/D13108
xlint is currently a fossil. We have much more useful and alive tools
to do now what xlint did twenty years ago.
I did not cleared some stuff which makes lint operational, in
sys/x86/include and sys/sys, but I might do it as followup. The
x86/include/ucontext.h and _types.h hacks made to please lint was the
main reason for my initial proposal to classify xlint as obsolete and
to remove it.
Also I do not intend to clear sccs ids.
Reviewed by: bapt, brooks, emaste, jhb, pfg
Sponsored by: The FreeBSD Foundation
Differential revision: https://reviews.freebsd.org/D13015
This introduces a facility to EVENTHANDLER(9) for explicitly defining a
reference to an event handler list. This is useful since previously all
invokers of events had to do a locked traversal of the global list of
event handler lists in order to find the appropriate event handler list.
By keeping a pointer to the appropriate list an invoker can avoid this
traversal completely. The pointer is initialized with SYSINIT(9) during
the eventhandler stage. Users registering interest in events do not need
to know if the event is backed by such a list, since the list is added
to the global list of lists. As with lists that are not pre-defined it
is safe to register for the events before the list has been created.
This converts the process_* and thread_* events to using the new
facility, as these are events whose locked traversals end up showing up
significantly in ports build workflows (and presumably other workflows
with many short lived threads/procs). It may be advantageous to convert
other events to using the new facility.
The el_flags field is now unused, but leave it be so that this revision
can be MFC'd.
Reviewed by: bdrewery, markj, mjg
Approved by: rstone (mentor)
In collaboration with: ian
MFC after: 4 weeks
Sponsored by: Dell EMC Isilon
Differential Revision: https://reviews.freebsd.org/D12814
fine when a lot of different flows to be ciphered/deciphered are involved.
However, when a software crypto driver is used, there are
situations where we could benefit from making crypto(9) multi threaded:
- a single flow is to be ciphered: only one thread is used to cipher it,
- a single ESP flow is to be deciphered: only one thread is used to
decipher it.
The idea here is to call crypto(9) using a new mode (CRYPTO_F_ASYNC) to
dispatch the crypto jobs on multiple threads, if the underlying crypto
driver is working in synchronous mode.
Another flag is added (CRYPTO_F_ASYNC_KEEPORDER) to make crypto(9)
dispatch the crypto jobs in the order they are received (an additional
queue/thread is used), so that the packets are reinjected in the network
using the same order they were posted.
A new sysctl net.inet.ipsec.async_crypto can be used to activate
this new behavior (disabled by default).
Submitted by: Emeric Poupon <emeric.poupon@stormshield.eu>
Reviewed by: ae, jmg, jhb
Differential Revision: https://reviews.freebsd.org/D10680
Sponsored by: Stormshield
Mention per-location total order, out of thin air, and torn writes
guarantees. Mention C11 standard' memory model and one most important
FreeBSD additional requirement, that is aligned ordinary loads and
stores are atomic on processors.
The text is introductional and informal. Reference the C11 and
C++1{1,4,7} standards for authorative description.
In collaboration with: alc
Sponsored by: The FreeBSD Foundation (kib)
MFC after: 1 week
mbpool existed to support NICs with memory interfaces and all remaining
comsumers were removed earlier this year with NATM.
Reviewed by: jhb
Sponsored by: DARPA, AFRL
Differential Revision: https://reviews.freebsd.org/D10513
Previously before you could call unrhdr_delete you needed to
individually free every allocated unit. It is useful to be able to tear
down the unr without having to go through this process, as it is
significantly faster than freeing the individual units.
Reviewed by: cem, lidl
Approved by: rstone (mentor)
Sponsored by: Dell EMC Isilon
Differential Revision: https://reviews.freebsd.org/D12591
All of these arguments are stored in m_ext, so there is no reason
to pass them in the argument list. Not all functions need the second
argument, some don't even need the first one. The second argument
lives in next cache line, so not dereferencing it is a performance
gain. This was discovered in sendfile(2), which will be covered by
next commits.
The second goal of this commit is to bring even more flexibility
to m_ext mbufs, allowing to create more fields in m_ext, opaque to
the generic mbuf code, and potentially set and dereferenced by
subsystems.
Reviewed by: gallatin, kbowling
Differential Revision: https://reviews.freebsd.org/D12615
When the EVENTHANDLER(9) subsystem was created, it was a documented feature
that an eventhandler callback function could safely deregister itself. In
r200652 that feature was inadvertantly broken by adding drain-wait logic to
eventhandler_deregister(), so that it would be safe to unload a module upon
return from deregistering its event handlers.
There are now 145 callers of EVENTHANDLER_DEREGISTER(), and it's likely many
of them are depending on the drain-wait logic that has been in place for 8
years. So instead of creating a separate eventhandler_drain() and adding it
to some or all of those 145 call sites, this creates a separate
eventhandler_drain_nowait() function for the specific purpose of
deregistering a callback from within the running callback.
Differential Revision: https://reviews.freebsd.org/D12561
during drain operations. When an sbuf is configured to use this feature by way
of the SBUF_DRAINTOEOR sbuf_new() flag, top-level sections started with
sbuf_start_section() create a record boundary marker that is used to avoid
flushing partial records.
Reviewed by: cem,imp,wblock
MFC after: 2 weeks
Sponsored by: Netflix, Inc.
Differential Revision: https://reviews.freebsd.org/D8536
it automatically after it runs.
The config_intrhook mechanism allows a driver to stall the boot process
until device(s) required for booting are available, by not allowing system
inits to proceed until all intrhook functions have been unregistered.
Virtually all existing code simply unregisters from within the hook function
when it gets called.
This new function makes that common usage more convenient. Instead of
allocating and filling in a struct, passing it to a function that might (in
theory) fail, and checking the return code, now a driver can simply call
this cannot-fail routine, passing just the intrhook function and its arg.
Differential Revision: https://reviews.freebsd.org/D11963
Implement disk_add_alias to allow aliases to be added to disks. All
disk have a primary name (say "foo") can also have secondary names
(say "bar") such that all instances of "foo" also have a "bar"
alias. So if you have foo0, foo0p1, foo1, foo1s1 and foo1s1a nodes
created by the foo driver and gpart, device nodes bar0, bar0p1, bar1,
bar1s1 and bar1s1a will appear as symlinks back to the original nodes.
This generalizes to multiple aliases. However, since the unit number
follows the primary name, multiple device drivers can't create the
same aliases unless those drives coorinate the unit number space (eg
you couldn't add an alias 'disk' to both 'da' and 'ada' because it's
possible to have da0 and ada0, because 'disk0' is ambiguous).
Differential Revision: https://reviews.freebsd.org/D11873
over the scheduling precision than 'ticks' can offer, and because sometimes
you're already working with sbintime_t units and it's dumb to convert them
to ticks just so they can get converted back to sbintime_t under the hood.
The benefit of BIT_FLS() is that ffsl() can be implemented with a
count leading zeros instruction which is more widespread available.
Submitted by: Sebastian Huber <sebastian.huber@embedded-brains.de>
MFC after: 1 week
stack modules.
It adds support for mangling symbols exported by a module by prepending
a string to them. (This avoids overlapping symbols in the kernel linker.)
It allows the use of a macro as the module name in the DECLARE_MACRO()
and MACRO_VERSION() macros.
It allows the code to register stack aliases (e.g. both a generic name
["default"] and version-specific name ["default_10_3p1"]).
With these changes, it is trivial to compile TCP stack modules with
the name defined in the Makefile and to load multiple versions of the
same stack simultaneously. This functionality can be used to enable
side-by-side testing of an old and new version of the same TCP stack.
It also could support upgrading the TCP stack without a reboot.
Reviewed by: gnn, sjg (makefiles only)
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D11086
Update the documentation to catch up with r273174, which renamed
getenv -> kern_getenv
setenv -> kern_setenv
unsetenv -> kern_unsetenv
Leave the old links in place to support finger memory.
MFC after: 3 days
Sponsored by: Dell EMC
This function permits a range of one scatter/gather list to be appended to
another sglist. This can be used to construct a scatter/gather list that
reorders or duplicates ranges from one or more existing scatter/gather
lists.
Sponsored by: Chelsio Communications
Attempt to catch up to the KPI changes from r292373, and perform
some other tidying while in the area.
Reviewed by: kib
Differential Revision: https://reviews.freebsd.org/D10579
- move sbuf_bcopyin(9) and sbuf_copyin(9) near sbuf_new_for_sysctl(9), as
all three functions are kernel-only APIs.
- add #ifdef _KERNEL around sbuf_*copyin and sbuf_new_for_sysctl(9) to
make it visually clear that they are kernel-only APIs.
MFC after: 2 months
Sponsored by: Dell EMC Isilon
This shortens the column count on many lines considerably.
While here, add "(void)" to sbuf_new_auto(3) for consistency with style(9)
recommendations.
MFC after: 2 months
Sponsored by: Dell EMC Isilon
igor:
- Fix typos.
- Delete trailing whitespace.
manlint:
- Use .Fo/.Fc/.Fa when describing functions.
- Use .Xr.
- Fill in SEE ALSO section.
- Fix .Dt use: the section was specified incorrectly and the name
had a lowercase character.
- Continue new sentences on new lines.
Miscellaneous:
- Remove unnecessary quotes around "SEE ALSO" section headers.
- Sprinkle .Dv use in spots with constants.
Reported by: igor, make manlint
Sponsored by: Dell EMC Isilon
Add missing section number when referring to PCI_IOV_*INIT(9) with .Xr
from the other corresponding manpage.
MFC after: 1 week
Reported by: make manlint
Sponsored by: Dell EMC Isilon
- Expand a contraction [1].
- Add a missing section number when referring to uma(9) with .Xr .
MFC after: 1 week
Reported by: igor [1], make manlint [2]
Sponsored by: Dell EMC Isilon
man(1) has some logic to use two spaces after a full stop, which is
useful for spotting sentence breaks in monospace fonts. However,
this logic is very simple, treating almost all '.' characters as
end-of-sentence markers, unless followed by certain other
characters. For example, '.,' is not end-of-sentence, and neither
is ".) ", but ".)" at the end of a line triggers the sentence-end
detection.
Apply a zero-width space to a few instances of this in share/man,
and also supply a missing full stop for an instance that occurred at
the end of a sentence.
Leave untouched several instances that are at the end of a sentence
or list element.
Reported by: 0mp (ieee80211.9)
I moved this branch from github to a private server, and pulled from the
wrong one when committing r315280, so I failed to include two recent commits.
Thankfully, they were only cosmetic and were included in the review.
Specifically:
Add documentation, polish comments, and improve style(9).
Tested by: pho (r315280)
MFC after: 2 weeks
Sponsored by: Dell EMC
Differential Revision: https://reviews.freebsd.org/D9791
sbuf_hexdump(9) should be linked to sbuf(9), not hexdump(3). Another
review will be posted to deduplicate the sbuf_hexdump reference in
in hexdump(3) or at the very least make the information less duplicative.
MFC after: 1 week
X-MFC with: r313437
Sponsored by: Dell EMC Isilon
to stdout in the non-kernel case and to the console+log
in the kernel case. For the kernel case it hooks the
putbuf() machinery underneath printf(9) so that the buffer
is written completely atomically and without a copy into
another temporary buffer. This is useful for fixing
compound console/log messages that become broken and
interleaved when multiple threads are competing for the
console.
Reviewed by: ken, imp
Sponsored by: Netflix
compile options. Remove doxygen pointers to now deleted files. Remove
EISA and VME as examples in bus_space.9.
Retained EISA mode code for IO PIC and MPTABLES because that's not
EISA bus, per se, and some people have abused EISA to mean "EISA-like
behavior as opposed to ISA" rather than using it for EISA add-in
cards.
Relnotes: yes
Replace archaic "busses" with modern form "buses."
Intentionally excluded:
* Old/random drivers I didn't recognize
* Old hardware in general
* Use of "busses" in code as identifiers
No functional change.
http://grammarist.com/spelling/buses-busses/
PR: 216099
Reported by: bltsrc at mail.ru
Sponsored by: Dell EMC Isilon
These primitives give the caller the read value if the exchange attempt
failed which saves an explicit reload for cmpset loops.
The man page was partially submitted by kib.
Reviewed by: kib (previous version), jhb (previous version)
I'm currently working on writing a metrics exporter for the Prometheus
monitoring system to provide access to sysctl metrics. Prometheus and
sysctl have some structural differences:
- sysctl is a tree of string component names.
- Prometheus uses a flat namespace for its metrics, but allows you to
attach labels with values to them, so that you can do aggregation.
An initial version of my exporter simply translated
hw.acpi.thermal.tz1.temperature
to
sysctl_hw_acpi_thermal_tz1_temperature_celcius
while we should ideally have
sysctl_hw_acpi_thermal_temperature_celcius{thermal_zone="tz1"}
allowing you to graph all thermal zones on a system in one go.
The change presented in this commit adds support for accomplishing this,
by providing the ability to attach labels to nodes. In the example I
gave above, the label "thermal_zone" would be attached to "tz1". As this
is a feature that will only be used very rarely, I decided to not change
the KPI too aggressively.
Discussed on: hackers@
Reviewed by: cem
Differential Revision: https://reviews.freebsd.org/D8775
VFP code to store the old context, with lazy loading of the new context
when needed.
FPU_KERN_NOCTX is missing as this is unused in the crypto code this has
been tested with, and I am unsure on the requirements of the UEFI
Runtime Services.
Reviewed by: kib
Obtained from: ABT Systeems Ltd
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D8276
- Add m_getclr(9) symlink to ObsoleteFiles.inc (removed in r295481).
- Add const qualifiers in m_dup(), m_dup_pkthdr() and m_tag_copy_chain()
(r286450).
- Fix m_dup_pkthdr() definition (it's not the same as m_move_pkthdr()).
MFC after: 5 days
function from restarting the timer.
Commonly taskqueue_enqueue_timeout() is called from within the task
function itself without any checks for teardown. Then it can happen
the timer stays active after the return of taskqueue_drain_timeout(),
because the timeout and task is drained separately.
This patch factors out the teardown flag into the timeout task itself,
allowing existing code to stay as-is instead of applying a teardown
flag to each and every of the timeout task consumers.
Add assert to taskqueue_drain_timeout() which prevents parallel
execution on the same timeout task.
Update manual page documenting the return value of
taskqueue_enqueue_timeout().
Differential Revision: https://reviews.freebsd.org/D8012
Reviewed by: kib, trasz
MFC after: 1 week
In particular, reset the DF_QUIET flag when detaching from a device so
that a driver that marks a device quiet doesn't dictate policy for a
different driver that may claim the device in the future.
Reviewed by: rpokala, wblock
MFC after: 2 weeks
Sponsored by: Chelsio Communications
Differential Revision: https://reviews.freebsd.org/D7803
The flag specifies that the block which uses FPU must be executed in
critical section, i.e. take no context switches, and does not need an
FPU save area during the execution.
It is intended to be applied around fast and short code pathes where
save area allocation is impossible or undesirable, due to context or
due to the relative cost of calculation vs. allocation.
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks
Add routines to trigger a function level reset (FLR) of a PCI-express
device via the PCI-express device control register. This also includes
support routines to wait for pending transactions to complete as well
as calculating the maximum completion timeout permitted by a device.
Change the ppt(4) driver to reset pass through devices before attaching
to a VM during startup and before detaching from a VM during shutdown.
Reviewed by: imp, wblock (earlier version)
MFC after: 1 month
Sponsored by: Chelsio Communications
Differential Revision: https://reviews.freebsd.org/D7751
When the I/O MMU is active in bhyve, all PCI devices need valid entries
in the DMAR context tables. The I/O MMU code does a single enumeration
of the available PCI devices during initialization to add all existing
devices to a domain representing the host. The ppt(4) driver then moves
pass through devices in and out of domains for virtual machines as needed.
However, when new PCI devices were added at runtime either via SR-IOV or
HotPlug, the I/O MMU tables were not updated.
This change adds a new set of EVENTHANDLERS that are invoked when PCI
devices are added and deleted. The I/O MMU driver in bhyve installs
handlers for these events which it uses to add and remove devices to
the "host" domain.
Reviewed by: imp
Sponsored by: Chelsio Communications
Differential Revision: https://reviews.freebsd.org/D7667
Idle page zeroing has been disabled by default on all architectures since
r170816 and has some bugs that make it seemingly unusable. Specifically,
the idle-priority pagezero thread exacerbates contention for the free page
lock, and yields the CPU without releasing it in non-preemptive kernels. The
pagezero thread also does not behave correctly when superpage reservations
are enabled: its target is a function of v_free_count, which includes
reserved-but-free pages, but it is only able to zero pages belonging to the
physical memory allocator.
Reviewed by: alc, imp, kib
Differential Revision: https://reviews.freebsd.org/D7714
alternate TCP stack in other then the closed state (pre-listen/connect).
The idea is that *if* that is supported by the alternate stack, it
is asked if its ok to switch. If it approves the "handoff" then we
allow the switch to happen. Also the fini() function now gets a flag
to tell if you are switching away *or* the tcb is destroyed. The
init() call into the alternate stack is moved to the end so the
tcb is more fully formed before the init transpires.
Sponsored by: Netflix Inc.
Differential Revision: D6790
The PCI_IOV option creates character devices in /dev/iov for each PF
device driver that registers support for creating VFs. By default the
character device is named after the PF device (e.g. /dev/iov/foo0).
This change adds a variant of pci_iov_attach() called pci_iov_attach_name()
that allows the name of the /dev/iov entry to be specified by the
driver.
Reviewed by: rstone
MFC after: 1 month
Sponsored by: Chelsio Communications
Differential Revision: https://reviews.freebsd.org/D7400
callout_when(9). See the man page update for the description of the
intended use.
Tested by: pho
Reviewed by: jhb, bjk (man page updates)
Sponsored by: The FreeBSD Foundation
MFC after: 1 month
X-Differential revision: https://reviews.freebsd.org/D7137
capabilities. It was removed in r243624 and r254804/r271006
respectively.
This file and mbuf(9) needs updates for other offloading
capabilities(i.e. CSUM_SCTP and CSUM_TSO).
not scheduled -> scheduled -> running -> not scheduled. The API and the
manual page assume that, some comments in the code assume that, and looks
like some contributors to the code also did. The problem is that this
paradigm isn't true. A callout can be scheduled and running at the same
time, which makes API description ambigouous. In such case callout_stop()
family of functions/macros should return 1 and 0 at the same time, since it
successfully unscheduled future callout but the current one is running.
Before this change we returned 1 in such a case, with an exception that
if running callout was migrating we returned 0, unless CS_MIGRBLOCK was
specified.
With this change, we now return 0 in case if future callout was unscheduled,
but another one is still in action, indicating to API users that resources
are not yet safe to be freed.
However, the sleepqueue code relies on getting 1 return code in that case,
and there already was CS_MIGRBLOCK flag, that covered one of the edge cases.
In the new return path we will also use this flag, to keep sleepqueue safe.
Since the flag CS_MIGRBLOCK doesn't block migration and now isn't limited to
migration edge case, rename it to CS_EXECUTING.
This change fixes panics on a high loaded TCP server.
Reviewed by: jch, hselasky, rrs, kib
Approved by: re (gjb)
Differential Revision: https://reviews.freebsd.org/D7042
Add accessor functions to toggle the state per VNET.
The base system (vnet0) will always enable itself with the normal
registration. We will share the registered protocol handlers in all
VNETs minimising duplication and management.
Upon disabling netisr processing for a VNET drain the netisr queue from
packets for that VNET.
Update netisr consumers to (de)register on a per-VNET start/teardown using
VNET_SYS(UN)INIT functionality.
The change should be transparent for non-VIMAGE kernels.
Reviewed by: gnn (, hiren)
Obtained from: projects/vnet
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D6691
specific order. VNET_SYSUNINITs however are doing exactly that.
Thus remove the VIMAGE conditional field from the domain(9) protosw
structure and replace it with VNET_SYSUNINITs.
This also allows us to change some order and to make the teardown functions
file local static.
Also convert divert(4) as it uses the same mechanism ip(4) and ip6(4) use
internally.
Slightly reshuffle the SI_SUB_* fields in kernel.h and add a new ones, e.g.,
for pfil consumers (firewalls), partially for this commit and for others
to come.
Reviewed by: gnn, tuexen (sctp), jhb (kernel.h)
Obtained from: projects/vnet
MFC after: 2 weeks
X-MFC: do not remove pr_destroy
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D6652
sglist_count_vmpages() determines the number of segments required for
a buffer described by an array of VM pages. sglist_append_vmpages()
adds the segments described by such a buffer to an sglist. The latter
function is largely pulled from sglist_append_bio(), and
sglist_append_bio() now uses sglist_append_vmpages().
Reviewed by: kib
Sponsored by: Chelsio Communications
Add a pair of bus methods that can be used to "map" resources for direct
CPU access using bus_space(9). bus_map_resource() creates a mapping and
bus_unmap_resource() releases a previously created mapping. Mappings are
described by 'struct resource_map' object. Pointers to these objects can
be passed as the first argument to the bus_space wrapper API used for bus
resources.
Drivers that wish to map all of a resource using default settings
(for example, using uncacheable memory attributes) do not need to change.
However, drivers that wish to use non-default settings can now do so
without jumping through hoops.
First, an RF_UNMAPPED flag is added to request that a resource is not
implicitly mapped with the default settings when it is activated. This
permits other activation steps (such as enabling I/O or memory decoding
in a device's PCI command register) to be taken without creating a
mapping. Right now the AGP drivers don't set RF_ACTIVE to avoid using
up a large amount of KVA to map the AGP aperture on 32-bit platforms.
Once RF_UNMAPPED is supported on all platforms that support AGP this
can be changed to using RF_UNMAPPED with RF_ACTIVE instead.
Second, bus_map_resource accepts an optional structure that defines
additional settings for a given mapping.
For example, a driver can now request to map only a subset of a resource
instead of the entire range. The AGP driver could also use this to only
map the first page of the aperture (IIRC, it calls pmap_mapdev() directly
to map the first page currently). I will also eventually change the
PCI-PCI bridge driver to request mappings of the subset of the I/O window
resource on its parent side to create mappings for child devices rather
than passing child resources directly up to nexus to be mapped. This
also permits bridges that do address translation to request suitable
mappings from a resource on the "upper" side of the bus when mapping
resources on the "lower" side of the bus.
Another attribute that can be specified is an alternate memory attribute
for memory-mapped resources. This can be used to request a
Write-Combining mapping of a PCI BAR in an MI fashion. (Currently the
drivers that do this call pmap_change_attr() directly for x86 only.)
Note that this commit only adds the MI framework. Each platform needs
to add support for handling RF_UNMAPPED and thew new
bus_map/unmap_resource methods. Generally speaking, any drivers that
are calling rman_set_bustag() and rman_set_bushandle() need to be
updated.
Discussed on: arch
Reviewed by: cem
Differential Revision: https://reviews.freebsd.org/D5237
translate the pci rid to a controller ID. The translation could be based
on the 'msi-map' OFW property, a similar ACPI option, or hard-coded for
hardware lacking the above options.
Reviewed by: wma
Obtained from: ABT Systems Ltd
Sponsored by: The FreeBSD Foundation
Add a new get_id interface to pci and pcib. This will allow us to both
detect failures, and get different PCI IDs.
For the former the interface returns an int to signal an error. The ID is
returned at a uintptr_t * argument.
For the latter there is a type argument that allows selecting the ID type.
This only specifies a single type, however a MSI type will be added
to handle the need to find the ID the hardware passes to the ARM GICv3
interrupt controller.
A follow up commit will be made to remove pci_get_rid.
Reviewed by: jhb, rstone (previous version)
Obtained from: ABT Systems Ltd
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D6239
detect failures, and get different PCI IDs.
For the former the interface returns an int to signal an error. The ID is
returned at a uintptr_t * argument.
For the latter there is a type argument that allows selecting the ID type.
This only specifies a single type, however a MSI type will be added
to handle the need to find the ID the hardware passes to the ARM GICv3
interrupt controller.
A follow up commit will be made to remove pci_get_rid.
Reviewed by: jhb, rstone
Obtained from: ABT Systems Ltd
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D6239
the bio to a pristine state should you wish to re-use it for another
I/O without freeing it. In the bast, a simple bzero was done to do
this, but that may not be sufficient in the future when the bio may
contain state that's not part of the documented API. Besides, it makes
the code clearer as to the intent...
Noticed by: smh@
bus_get_cpus() returns a specified set of CPUs for a device. It accepts
an enum for the second parameter that indicates the type of cpuset to
request. Currently two valus are supported:
- LOCAL_CPUS (on x86 this returns all the CPUs in the package closest to
the device when DEVICE_NUMA is enabled)
- INTR_CPUS (like LOCAL_CPUS but only returns 1 SMT thread for each core)
For systems that do not support NUMA (or if it is not enabled in the kernel
config), LOCAL_CPUS fails with EINVAL. INTR_CPUS is mapped to 'all_cpus'
by default. The idea is that INTR_CPUS should always return a valid set.
Device drivers which want to use per-CPU interrupts should start using
INTR_CPUS instead of simply assigning interrupts to all available CPUs.
In the future we may wish to add tunables to control the policy of
INTR_CPUS (e.g. should it be local-only or global, should it ignore
SMT threads or not).
The x86 nexus driver exposes the internal set of interrupt CPUs from the
the x86 interrupt code via INTR_CPUS.
The ACPI bus driver and PCI bridge drivers use _PXM to return a suitable
LOCAL_CPUS set when _PXM exists and DEVICE_NUMA is enabled. They also and
the global INTR_CPUS set from the nexus driver with the per-domain set from
_PXM to generate a local INTR_CPUS set for child devices.
Compared to the r298933, this version uses 'struct _cpuset' in
<sys/bus.h> instead of 'cpuset_t' to avoid requiring <sys/param.h>
(<sys/_cpuset.h> still requires <sys/param.h> for MAXCPU even though
<sys/_bitset.h> does not after recent changes).
It will be used for the upcoming LRO hash table initialization.
And probably will be useful in other cases, when M_WAITOK can't
be used.
Reviewed by: jhb, kib
Sponsored by: Microsoft OSTC
Differential Revision: https://reviews.freebsd.org/D6138
bus_get_cpus() returns a specified set of CPUs for a device. It accepts
an enum for the second parameter that indicates the type of cpuset to
request. Currently two valus are supported:
- LOCAL_CPUS (on x86 this returns all the CPUs in the package closest to
the device when DEVICE_NUMA is enabled)
- INTR_CPUS (like LOCAL_CPUS but only returns 1 SMT thread for each core)
For systems that do not support NUMA (or if it is not enabled in the kernel
config), LOCAL_CPUS fails with EINVAL. INTR_CPUS is mapped to 'all_cpus'
by default. The idea is that INTR_CPUS should always return a valid set.
Device drivers which want to use per-CPU interrupts should start using
INTR_CPUS instead of simply assigning interrupts to all available CPUs.
In the future we may wish to add tunables to control the policy of
INTR_CPUS (e.g. should it be local-only or global, should it ignore
SMT threads or not).
The x86 nexus driver exposes the internal set of interrupt CPUs from the
the x86 interrupt code via INTR_CPUS.
The ACPI bus driver and PCI bridge drivers use _PXM to return a suitable
LOCAL_CPUS set when _PXM exists and DEVICE_NUMA is enabled. They also and
the global INTR_CPUS set from the nexus driver with the per-domain set from
_PXM to generate a local INTR_CPUS set for child devices.
Reviewed by: wblock (manpage)
Differential Revision: https://reviews.freebsd.org/D5519
Specifically, mention that rman_get_bustag/handle/virtual are valid after
a resource is activated. Also, mention the wrapper API that accepts a
struct resource instead of a bus tag and handle.
The BUS_RESCAN() method rescans a single bus device checking for devices
that have been added or removed from the bus. A new 'rescan' command is
added to devctl(8) to trigger a rescan.
Differential Revision: https://reviews.freebsd.org/D6016
This is a minor follow-up to r297422, prompted by a Coverity warning. (It's
not a real defect, just a code smell.) OSD slot array reservations are an
array of pointers (void **) but were cast to void* and back unnecessarily.
Keep the correct type from reservation to use.
osd.9 is updated to match, along with a few trivial igor fixes.
Reported by: Coverity
CID: 1353811
Sponsored by: EMC / Isilon Storage Division
This corrects an error in r296947 in that it is not possible to assert
which thread holds a shared (or read) lock, but it is possible to assert
that one is held. Just not very useful.
MFC after: 1 week
Submitted by: wblock, jhb
Reviewed by: kib (earlier version), jhb, wblock
Sponsored by: EMC / Isilon Storage Division
Differential Revision: https://reviews.freebsd.org/D5659
SYSCTL_COUNTER_U64_ARRAY() macro.
- Add proper asserts to the SYSCTL_COUNTER_U64_ARRAY() macro that checks
the size of the first element of the array.
- Add an example to the counter(9) manual page how to use the
SYSCTL_COUNTER_U64_ARRAY() macro.
- Add some missing symbolic links for counter(9) while at it.
This is several year's worth of fail point upgrades done at EMC Isilon. They
are interdependent enough that it makes sense to put a single diff up for them.
Primarily, we added:
- Changing all mainline execution paths to be lockless, which lets us use fail
points in more sleep-sensitive areas, and allows more parallel execution
- A number of additional commands, including 'pause' that lets us do some
interesting deterministic repros of race conditions
- The ability to dump the stacks of all threads sleeping on a fail point
- A number of other API changes to allow marking up the fail point's context in
the code, and firing callbacks before and after execution
- A man page update
Submitted by: Matthew Bryan <matthew.bryan@isilon.com>
Reviewed by: cem (earlier version), jhb, kib, pho
With feedback from: bdrewery
Sponsored by: EMC / Isilon Storage Division
Differential Revision: https://reviews.freebsd.org/D5427
Since m_cat() may copy data from the second mbuf chain into the last mbuf
of the first chain, it may free the first mbuf of the second chain. Thus,
the second chain is not guaranteed to be valid after m_cat() returns.
Reviewed by: glebius
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D5497
taskqueue_enqueue() was changed to support both fast and non-fast
taskqueues 10 years ago in r154167. It has been a compat shim ever
since. It's time for the compat shim to go.
Submitted by: Howard Su <howard0su@gmail.com>
Reviewed by: sephe
Differential Revision: https://reviews.freebsd.org/D5131