VirtIO V1 provides configuration in multiple VENDOR capabilities so this
allows all of the configuration to be discovered.
Reviewed by: jhb
MFC after: 2 weeks
Differential Revision: https://reviews.freebsd.org/D14325
This covers the lua style guidelines we've generally agreed on so far. It
will be revised as work continues and we run into more scenarios that need
specified.
Discussed with: cem, jilles
Differential Revision: https://reviews.freebsd.org/D14423
Additionally, move the overflow check logic out to WOULD_OVERFLOW() for
consumers to have a common means of testing for overflowing allocations.
WOULD_OVERFLOW() should be a secondary check -- on 64-bit platforms, just
because an allocation won't overflow size_t does not mean it is a sane size
to request. Callers should be imposing reasonable allocation limits far,
far, below overflow.
Discussed with: emaste, jhb, kp
Sponsored by: Dell EMC Isilon
Similar to calloc() the mallocarray() function checks for integer
overflows before allocating memory.
It does not zero memory, unless the M_ZERO flag is set.
Reviewed by: pfg, vangyzen (previous version), imp (previous version)
Obtained from: OpenBSD
Differential Revision: https://reviews.freebsd.org/D13766
This copies changes from NetBSD into FreeBSD's man page. I compared the
proposed changes against FreeBSD headers and modified them to match.
PR: 214602
Submitted by: fehmi noyan isi <fnoyanisi@yahoo.com>
Add atomic_load_<type> and atomic_store_<type>, and explain why they
exist.
Define the synchronizes-with relationship and its effects.
Reorder and revise some of the existing text. For example, more
precisely describe when ordinary accesses are atomic.
Reviewed by: jhb, kib
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D13522
the expected default board_vendor value on MIPS SoCs.
This is required by bwn(4) to differentiate between single-band and
dual-band device variants that otherwise share a common chip ID.
Approved by: adrian (mentor, implicit)
Sponsored by: The FreeBSD Foundation
Add missing support for specifying I/O control flags during core reset,
and resolve a number of siba(4)-specific reset issues:
- Add missing check for target reject flags in siba_is_hw_suspended().
- Remove incorrect wait on SIBA_TMH_BUSY when modifying any target state
register; this should only be done when waiting for initiated
transactions to clear.
- Add missing wait on SIBA_IM_BY when asserting SIBA_IM_RJ.
- Overwrite any previously set SIBA_TML_REJ flag when bringing the core
out of reset. This fixes a lockup that occured when we brought up a core
(after reboot) that had previously been placed into RESET by siba_bwn(4).
Approved by: adrian (mentor, implicit)
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D13039
This includes a number of copyedits for the inline code documentation
comments, updates to the existing bhnd(4), bhndb(4), bcma(4), and siba(4)
man pages, and new man pages for bhnd_chipc(4), bhnd_pmu(4), bhndb_pci(4),
bhnd(9), and bhnd_erom(9).
Approved by: adrian (mentor)
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D13021
The old description has been inaccurate since at least 243271, if not
before.
Submitted by: will
Reviewed by: kib
MFC after: 3 weeks
Sponsored by: Spectra Logic Corp
Differential Revision: https://reviews.freebsd.org/D13108
xlint is currently a fossil. We have much more useful and alive tools
to do now what xlint did twenty years ago.
I did not cleared some stuff which makes lint operational, in
sys/x86/include and sys/sys, but I might do it as followup. The
x86/include/ucontext.h and _types.h hacks made to please lint was the
main reason for my initial proposal to classify xlint as obsolete and
to remove it.
Also I do not intend to clear sccs ids.
Reviewed by: bapt, brooks, emaste, jhb, pfg
Sponsored by: The FreeBSD Foundation
Differential revision: https://reviews.freebsd.org/D13015
This introduces a facility to EVENTHANDLER(9) for explicitly defining a
reference to an event handler list. This is useful since previously all
invokers of events had to do a locked traversal of the global list of
event handler lists in order to find the appropriate event handler list.
By keeping a pointer to the appropriate list an invoker can avoid this
traversal completely. The pointer is initialized with SYSINIT(9) during
the eventhandler stage. Users registering interest in events do not need
to know if the event is backed by such a list, since the list is added
to the global list of lists. As with lists that are not pre-defined it
is safe to register for the events before the list has been created.
This converts the process_* and thread_* events to using the new
facility, as these are events whose locked traversals end up showing up
significantly in ports build workflows (and presumably other workflows
with many short lived threads/procs). It may be advantageous to convert
other events to using the new facility.
The el_flags field is now unused, but leave it be so that this revision
can be MFC'd.
Reviewed by: bdrewery, markj, mjg
Approved by: rstone (mentor)
In collaboration with: ian
MFC after: 4 weeks
Sponsored by: Dell EMC Isilon
Differential Revision: https://reviews.freebsd.org/D12814
fine when a lot of different flows to be ciphered/deciphered are involved.
However, when a software crypto driver is used, there are
situations where we could benefit from making crypto(9) multi threaded:
- a single flow is to be ciphered: only one thread is used to cipher it,
- a single ESP flow is to be deciphered: only one thread is used to
decipher it.
The idea here is to call crypto(9) using a new mode (CRYPTO_F_ASYNC) to
dispatch the crypto jobs on multiple threads, if the underlying crypto
driver is working in synchronous mode.
Another flag is added (CRYPTO_F_ASYNC_KEEPORDER) to make crypto(9)
dispatch the crypto jobs in the order they are received (an additional
queue/thread is used), so that the packets are reinjected in the network
using the same order they were posted.
A new sysctl net.inet.ipsec.async_crypto can be used to activate
this new behavior (disabled by default).
Submitted by: Emeric Poupon <emeric.poupon@stormshield.eu>
Reviewed by: ae, jmg, jhb
Differential Revision: https://reviews.freebsd.org/D10680
Sponsored by: Stormshield
Mention per-location total order, out of thin air, and torn writes
guarantees. Mention C11 standard' memory model and one most important
FreeBSD additional requirement, that is aligned ordinary loads and
stores are atomic on processors.
The text is introductional and informal. Reference the C11 and
C++1{1,4,7} standards for authorative description.
In collaboration with: alc
Sponsored by: The FreeBSD Foundation (kib)
MFC after: 1 week
mbpool existed to support NICs with memory interfaces and all remaining
comsumers were removed earlier this year with NATM.
Reviewed by: jhb
Sponsored by: DARPA, AFRL
Differential Revision: https://reviews.freebsd.org/D10513
Previously before you could call unrhdr_delete you needed to
individually free every allocated unit. It is useful to be able to tear
down the unr without having to go through this process, as it is
significantly faster than freeing the individual units.
Reviewed by: cem, lidl
Approved by: rstone (mentor)
Sponsored by: Dell EMC Isilon
Differential Revision: https://reviews.freebsd.org/D12591
All of these arguments are stored in m_ext, so there is no reason
to pass them in the argument list. Not all functions need the second
argument, some don't even need the first one. The second argument
lives in next cache line, so not dereferencing it is a performance
gain. This was discovered in sendfile(2), which will be covered by
next commits.
The second goal of this commit is to bring even more flexibility
to m_ext mbufs, allowing to create more fields in m_ext, opaque to
the generic mbuf code, and potentially set and dereferenced by
subsystems.
Reviewed by: gallatin, kbowling
Differential Revision: https://reviews.freebsd.org/D12615
When the EVENTHANDLER(9) subsystem was created, it was a documented feature
that an eventhandler callback function could safely deregister itself. In
r200652 that feature was inadvertantly broken by adding drain-wait logic to
eventhandler_deregister(), so that it would be safe to unload a module upon
return from deregistering its event handlers.
There are now 145 callers of EVENTHANDLER_DEREGISTER(), and it's likely many
of them are depending on the drain-wait logic that has been in place for 8
years. So instead of creating a separate eventhandler_drain() and adding it
to some or all of those 145 call sites, this creates a separate
eventhandler_drain_nowait() function for the specific purpose of
deregistering a callback from within the running callback.
Differential Revision: https://reviews.freebsd.org/D12561
during drain operations. When an sbuf is configured to use this feature by way
of the SBUF_DRAINTOEOR sbuf_new() flag, top-level sections started with
sbuf_start_section() create a record boundary marker that is used to avoid
flushing partial records.
Reviewed by: cem,imp,wblock
MFC after: 2 weeks
Sponsored by: Netflix, Inc.
Differential Revision: https://reviews.freebsd.org/D8536
it automatically after it runs.
The config_intrhook mechanism allows a driver to stall the boot process
until device(s) required for booting are available, by not allowing system
inits to proceed until all intrhook functions have been unregistered.
Virtually all existing code simply unregisters from within the hook function
when it gets called.
This new function makes that common usage more convenient. Instead of
allocating and filling in a struct, passing it to a function that might (in
theory) fail, and checking the return code, now a driver can simply call
this cannot-fail routine, passing just the intrhook function and its arg.
Differential Revision: https://reviews.freebsd.org/D11963
Implement disk_add_alias to allow aliases to be added to disks. All
disk have a primary name (say "foo") can also have secondary names
(say "bar") such that all instances of "foo" also have a "bar"
alias. So if you have foo0, foo0p1, foo1, foo1s1 and foo1s1a nodes
created by the foo driver and gpart, device nodes bar0, bar0p1, bar1,
bar1s1 and bar1s1a will appear as symlinks back to the original nodes.
This generalizes to multiple aliases. However, since the unit number
follows the primary name, multiple device drivers can't create the
same aliases unless those drives coorinate the unit number space (eg
you couldn't add an alias 'disk' to both 'da' and 'ada' because it's
possible to have da0 and ada0, because 'disk0' is ambiguous).
Differential Revision: https://reviews.freebsd.org/D11873
over the scheduling precision than 'ticks' can offer, and because sometimes
you're already working with sbintime_t units and it's dumb to convert them
to ticks just so they can get converted back to sbintime_t under the hood.
The benefit of BIT_FLS() is that ffsl() can be implemented with a
count leading zeros instruction which is more widespread available.
Submitted by: Sebastian Huber <sebastian.huber@embedded-brains.de>
MFC after: 1 week
stack modules.
It adds support for mangling symbols exported by a module by prepending
a string to them. (This avoids overlapping symbols in the kernel linker.)
It allows the use of a macro as the module name in the DECLARE_MACRO()
and MACRO_VERSION() macros.
It allows the code to register stack aliases (e.g. both a generic name
["default"] and version-specific name ["default_10_3p1"]).
With these changes, it is trivial to compile TCP stack modules with
the name defined in the Makefile and to load multiple versions of the
same stack simultaneously. This functionality can be used to enable
side-by-side testing of an old and new version of the same TCP stack.
It also could support upgrading the TCP stack without a reboot.
Reviewed by: gnn, sjg (makefiles only)
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D11086
Update the documentation to catch up with r273174, which renamed
getenv -> kern_getenv
setenv -> kern_setenv
unsetenv -> kern_unsetenv
Leave the old links in place to support finger memory.
MFC after: 3 days
Sponsored by: Dell EMC
This function permits a range of one scatter/gather list to be appended to
another sglist. This can be used to construct a scatter/gather list that
reorders or duplicates ranges from one or more existing scatter/gather
lists.
Sponsored by: Chelsio Communications
Attempt to catch up to the KPI changes from r292373, and perform
some other tidying while in the area.
Reviewed by: kib
Differential Revision: https://reviews.freebsd.org/D10579
- move sbuf_bcopyin(9) and sbuf_copyin(9) near sbuf_new_for_sysctl(9), as
all three functions are kernel-only APIs.
- add #ifdef _KERNEL around sbuf_*copyin and sbuf_new_for_sysctl(9) to
make it visually clear that they are kernel-only APIs.
MFC after: 2 months
Sponsored by: Dell EMC Isilon
This shortens the column count on many lines considerably.
While here, add "(void)" to sbuf_new_auto(3) for consistency with style(9)
recommendations.
MFC after: 2 months
Sponsored by: Dell EMC Isilon
igor:
- Fix typos.
- Delete trailing whitespace.
manlint:
- Use .Fo/.Fc/.Fa when describing functions.
- Use .Xr.
- Fill in SEE ALSO section.
- Fix .Dt use: the section was specified incorrectly and the name
had a lowercase character.
- Continue new sentences on new lines.
Miscellaneous:
- Remove unnecessary quotes around "SEE ALSO" section headers.
- Sprinkle .Dv use in spots with constants.
Reported by: igor, make manlint
Sponsored by: Dell EMC Isilon
Add missing section number when referring to PCI_IOV_*INIT(9) with .Xr
from the other corresponding manpage.
MFC after: 1 week
Reported by: make manlint
Sponsored by: Dell EMC Isilon
- Expand a contraction [1].
- Add a missing section number when referring to uma(9) with .Xr .
MFC after: 1 week
Reported by: igor [1], make manlint [2]
Sponsored by: Dell EMC Isilon
man(1) has some logic to use two spaces after a full stop, which is
useful for spotting sentence breaks in monospace fonts. However,
this logic is very simple, treating almost all '.' characters as
end-of-sentence markers, unless followed by certain other
characters. For example, '.,' is not end-of-sentence, and neither
is ".) ", but ".)" at the end of a line triggers the sentence-end
detection.
Apply a zero-width space to a few instances of this in share/man,
and also supply a missing full stop for an instance that occurred at
the end of a sentence.
Leave untouched several instances that are at the end of a sentence
or list element.
Reported by: 0mp (ieee80211.9)
I moved this branch from github to a private server, and pulled from the
wrong one when committing r315280, so I failed to include two recent commits.
Thankfully, they were only cosmetic and were included in the review.
Specifically:
Add documentation, polish comments, and improve style(9).
Tested by: pho (r315280)
MFC after: 2 weeks
Sponsored by: Dell EMC
Differential Revision: https://reviews.freebsd.org/D9791
sbuf_hexdump(9) should be linked to sbuf(9), not hexdump(3). Another
review will be posted to deduplicate the sbuf_hexdump reference in
in hexdump(3) or at the very least make the information less duplicative.
MFC after: 1 week
X-MFC with: r313437
Sponsored by: Dell EMC Isilon
to stdout in the non-kernel case and to the console+log
in the kernel case. For the kernel case it hooks the
putbuf() machinery underneath printf(9) so that the buffer
is written completely atomically and without a copy into
another temporary buffer. This is useful for fixing
compound console/log messages that become broken and
interleaved when multiple threads are competing for the
console.
Reviewed by: ken, imp
Sponsored by: Netflix
compile options. Remove doxygen pointers to now deleted files. Remove
EISA and VME as examples in bus_space.9.
Retained EISA mode code for IO PIC and MPTABLES because that's not
EISA bus, per se, and some people have abused EISA to mean "EISA-like
behavior as opposed to ISA" rather than using it for EISA add-in
cards.
Relnotes: yes
Replace archaic "busses" with modern form "buses."
Intentionally excluded:
* Old/random drivers I didn't recognize
* Old hardware in general
* Use of "busses" in code as identifiers
No functional change.
http://grammarist.com/spelling/buses-busses/
PR: 216099
Reported by: bltsrc at mail.ru
Sponsored by: Dell EMC Isilon
These primitives give the caller the read value if the exchange attempt
failed which saves an explicit reload for cmpset loops.
The man page was partially submitted by kib.
Reviewed by: kib (previous version), jhb (previous version)
I'm currently working on writing a metrics exporter for the Prometheus
monitoring system to provide access to sysctl metrics. Prometheus and
sysctl have some structural differences:
- sysctl is a tree of string component names.
- Prometheus uses a flat namespace for its metrics, but allows you to
attach labels with values to them, so that you can do aggregation.
An initial version of my exporter simply translated
hw.acpi.thermal.tz1.temperature
to
sysctl_hw_acpi_thermal_tz1_temperature_celcius
while we should ideally have
sysctl_hw_acpi_thermal_temperature_celcius{thermal_zone="tz1"}
allowing you to graph all thermal zones on a system in one go.
The change presented in this commit adds support for accomplishing this,
by providing the ability to attach labels to nodes. In the example I
gave above, the label "thermal_zone" would be attached to "tz1". As this
is a feature that will only be used very rarely, I decided to not change
the KPI too aggressively.
Discussed on: hackers@
Reviewed by: cem
Differential Revision: https://reviews.freebsd.org/D8775
VFP code to store the old context, with lazy loading of the new context
when needed.
FPU_KERN_NOCTX is missing as this is unused in the crypto code this has
been tested with, and I am unsure on the requirements of the UEFI
Runtime Services.
Reviewed by: kib
Obtained from: ABT Systeems Ltd
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D8276
- Add m_getclr(9) symlink to ObsoleteFiles.inc (removed in r295481).
- Add const qualifiers in m_dup(), m_dup_pkthdr() and m_tag_copy_chain()
(r286450).
- Fix m_dup_pkthdr() definition (it's not the same as m_move_pkthdr()).
MFC after: 5 days
function from restarting the timer.
Commonly taskqueue_enqueue_timeout() is called from within the task
function itself without any checks for teardown. Then it can happen
the timer stays active after the return of taskqueue_drain_timeout(),
because the timeout and task is drained separately.
This patch factors out the teardown flag into the timeout task itself,
allowing existing code to stay as-is instead of applying a teardown
flag to each and every of the timeout task consumers.
Add assert to taskqueue_drain_timeout() which prevents parallel
execution on the same timeout task.
Update manual page documenting the return value of
taskqueue_enqueue_timeout().
Differential Revision: https://reviews.freebsd.org/D8012
Reviewed by: kib, trasz
MFC after: 1 week
In particular, reset the DF_QUIET flag when detaching from a device so
that a driver that marks a device quiet doesn't dictate policy for a
different driver that may claim the device in the future.
Reviewed by: rpokala, wblock
MFC after: 2 weeks
Sponsored by: Chelsio Communications
Differential Revision: https://reviews.freebsd.org/D7803
The flag specifies that the block which uses FPU must be executed in
critical section, i.e. take no context switches, and does not need an
FPU save area during the execution.
It is intended to be applied around fast and short code pathes where
save area allocation is impossible or undesirable, due to context or
due to the relative cost of calculation vs. allocation.
Sponsored by: The FreeBSD Foundation
MFC after: 2 weeks
Add routines to trigger a function level reset (FLR) of a PCI-express
device via the PCI-express device control register. This also includes
support routines to wait for pending transactions to complete as well
as calculating the maximum completion timeout permitted by a device.
Change the ppt(4) driver to reset pass through devices before attaching
to a VM during startup and before detaching from a VM during shutdown.
Reviewed by: imp, wblock (earlier version)
MFC after: 1 month
Sponsored by: Chelsio Communications
Differential Revision: https://reviews.freebsd.org/D7751
When the I/O MMU is active in bhyve, all PCI devices need valid entries
in the DMAR context tables. The I/O MMU code does a single enumeration
of the available PCI devices during initialization to add all existing
devices to a domain representing the host. The ppt(4) driver then moves
pass through devices in and out of domains for virtual machines as needed.
However, when new PCI devices were added at runtime either via SR-IOV or
HotPlug, the I/O MMU tables were not updated.
This change adds a new set of EVENTHANDLERS that are invoked when PCI
devices are added and deleted. The I/O MMU driver in bhyve installs
handlers for these events which it uses to add and remove devices to
the "host" domain.
Reviewed by: imp
Sponsored by: Chelsio Communications
Differential Revision: https://reviews.freebsd.org/D7667
Idle page zeroing has been disabled by default on all architectures since
r170816 and has some bugs that make it seemingly unusable. Specifically,
the idle-priority pagezero thread exacerbates contention for the free page
lock, and yields the CPU without releasing it in non-preemptive kernels. The
pagezero thread also does not behave correctly when superpage reservations
are enabled: its target is a function of v_free_count, which includes
reserved-but-free pages, but it is only able to zero pages belonging to the
physical memory allocator.
Reviewed by: alc, imp, kib
Differential Revision: https://reviews.freebsd.org/D7714
alternate TCP stack in other then the closed state (pre-listen/connect).
The idea is that *if* that is supported by the alternate stack, it
is asked if its ok to switch. If it approves the "handoff" then we
allow the switch to happen. Also the fini() function now gets a flag
to tell if you are switching away *or* the tcb is destroyed. The
init() call into the alternate stack is moved to the end so the
tcb is more fully formed before the init transpires.
Sponsored by: Netflix Inc.
Differential Revision: D6790
The PCI_IOV option creates character devices in /dev/iov for each PF
device driver that registers support for creating VFs. By default the
character device is named after the PF device (e.g. /dev/iov/foo0).
This change adds a variant of pci_iov_attach() called pci_iov_attach_name()
that allows the name of the /dev/iov entry to be specified by the
driver.
Reviewed by: rstone
MFC after: 1 month
Sponsored by: Chelsio Communications
Differential Revision: https://reviews.freebsd.org/D7400
callout_when(9). See the man page update for the description of the
intended use.
Tested by: pho
Reviewed by: jhb, bjk (man page updates)
Sponsored by: The FreeBSD Foundation
MFC after: 1 month
X-Differential revision: https://reviews.freebsd.org/D7137
capabilities. It was removed in r243624 and r254804/r271006
respectively.
This file and mbuf(9) needs updates for other offloading
capabilities(i.e. CSUM_SCTP and CSUM_TSO).
not scheduled -> scheduled -> running -> not scheduled. The API and the
manual page assume that, some comments in the code assume that, and looks
like some contributors to the code also did. The problem is that this
paradigm isn't true. A callout can be scheduled and running at the same
time, which makes API description ambigouous. In such case callout_stop()
family of functions/macros should return 1 and 0 at the same time, since it
successfully unscheduled future callout but the current one is running.
Before this change we returned 1 in such a case, with an exception that
if running callout was migrating we returned 0, unless CS_MIGRBLOCK was
specified.
With this change, we now return 0 in case if future callout was unscheduled,
but another one is still in action, indicating to API users that resources
are not yet safe to be freed.
However, the sleepqueue code relies on getting 1 return code in that case,
and there already was CS_MIGRBLOCK flag, that covered one of the edge cases.
In the new return path we will also use this flag, to keep sleepqueue safe.
Since the flag CS_MIGRBLOCK doesn't block migration and now isn't limited to
migration edge case, rename it to CS_EXECUTING.
This change fixes panics on a high loaded TCP server.
Reviewed by: jch, hselasky, rrs, kib
Approved by: re (gjb)
Differential Revision: https://reviews.freebsd.org/D7042
Add accessor functions to toggle the state per VNET.
The base system (vnet0) will always enable itself with the normal
registration. We will share the registered protocol handlers in all
VNETs minimising duplication and management.
Upon disabling netisr processing for a VNET drain the netisr queue from
packets for that VNET.
Update netisr consumers to (de)register on a per-VNET start/teardown using
VNET_SYS(UN)INIT functionality.
The change should be transparent for non-VIMAGE kernels.
Reviewed by: gnn (, hiren)
Obtained from: projects/vnet
MFC after: 2 weeks
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D6691
specific order. VNET_SYSUNINITs however are doing exactly that.
Thus remove the VIMAGE conditional field from the domain(9) protosw
structure and replace it with VNET_SYSUNINITs.
This also allows us to change some order and to make the teardown functions
file local static.
Also convert divert(4) as it uses the same mechanism ip(4) and ip6(4) use
internally.
Slightly reshuffle the SI_SUB_* fields in kernel.h and add a new ones, e.g.,
for pfil consumers (firewalls), partially for this commit and for others
to come.
Reviewed by: gnn, tuexen (sctp), jhb (kernel.h)
Obtained from: projects/vnet
MFC after: 2 weeks
X-MFC: do not remove pr_destroy
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D6652
sglist_count_vmpages() determines the number of segments required for
a buffer described by an array of VM pages. sglist_append_vmpages()
adds the segments described by such a buffer to an sglist. The latter
function is largely pulled from sglist_append_bio(), and
sglist_append_bio() now uses sglist_append_vmpages().
Reviewed by: kib
Sponsored by: Chelsio Communications
Add a pair of bus methods that can be used to "map" resources for direct
CPU access using bus_space(9). bus_map_resource() creates a mapping and
bus_unmap_resource() releases a previously created mapping. Mappings are
described by 'struct resource_map' object. Pointers to these objects can
be passed as the first argument to the bus_space wrapper API used for bus
resources.
Drivers that wish to map all of a resource using default settings
(for example, using uncacheable memory attributes) do not need to change.
However, drivers that wish to use non-default settings can now do so
without jumping through hoops.
First, an RF_UNMAPPED flag is added to request that a resource is not
implicitly mapped with the default settings when it is activated. This
permits other activation steps (such as enabling I/O or memory decoding
in a device's PCI command register) to be taken without creating a
mapping. Right now the AGP drivers don't set RF_ACTIVE to avoid using
up a large amount of KVA to map the AGP aperture on 32-bit platforms.
Once RF_UNMAPPED is supported on all platforms that support AGP this
can be changed to using RF_UNMAPPED with RF_ACTIVE instead.
Second, bus_map_resource accepts an optional structure that defines
additional settings for a given mapping.
For example, a driver can now request to map only a subset of a resource
instead of the entire range. The AGP driver could also use this to only
map the first page of the aperture (IIRC, it calls pmap_mapdev() directly
to map the first page currently). I will also eventually change the
PCI-PCI bridge driver to request mappings of the subset of the I/O window
resource on its parent side to create mappings for child devices rather
than passing child resources directly up to nexus to be mapped. This
also permits bridges that do address translation to request suitable
mappings from a resource on the "upper" side of the bus when mapping
resources on the "lower" side of the bus.
Another attribute that can be specified is an alternate memory attribute
for memory-mapped resources. This can be used to request a
Write-Combining mapping of a PCI BAR in an MI fashion. (Currently the
drivers that do this call pmap_change_attr() directly for x86 only.)
Note that this commit only adds the MI framework. Each platform needs
to add support for handling RF_UNMAPPED and thew new
bus_map/unmap_resource methods. Generally speaking, any drivers that
are calling rman_set_bustag() and rman_set_bushandle() need to be
updated.
Discussed on: arch
Reviewed by: cem
Differential Revision: https://reviews.freebsd.org/D5237
translate the pci rid to a controller ID. The translation could be based
on the 'msi-map' OFW property, a similar ACPI option, or hard-coded for
hardware lacking the above options.
Reviewed by: wma
Obtained from: ABT Systems Ltd
Sponsored by: The FreeBSD Foundation
Add a new get_id interface to pci and pcib. This will allow us to both
detect failures, and get different PCI IDs.
For the former the interface returns an int to signal an error. The ID is
returned at a uintptr_t * argument.
For the latter there is a type argument that allows selecting the ID type.
This only specifies a single type, however a MSI type will be added
to handle the need to find the ID the hardware passes to the ARM GICv3
interrupt controller.
A follow up commit will be made to remove pci_get_rid.
Reviewed by: jhb, rstone (previous version)
Obtained from: ABT Systems Ltd
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D6239
detect failures, and get different PCI IDs.
For the former the interface returns an int to signal an error. The ID is
returned at a uintptr_t * argument.
For the latter there is a type argument that allows selecting the ID type.
This only specifies a single type, however a MSI type will be added
to handle the need to find the ID the hardware passes to the ARM GICv3
interrupt controller.
A follow up commit will be made to remove pci_get_rid.
Reviewed by: jhb, rstone
Obtained from: ABT Systems Ltd
Sponsored by: The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D6239
the bio to a pristine state should you wish to re-use it for another
I/O without freeing it. In the bast, a simple bzero was done to do
this, but that may not be sufficient in the future when the bio may
contain state that's not part of the documented API. Besides, it makes
the code clearer as to the intent...
Noticed by: smh@
bus_get_cpus() returns a specified set of CPUs for a device. It accepts
an enum for the second parameter that indicates the type of cpuset to
request. Currently two valus are supported:
- LOCAL_CPUS (on x86 this returns all the CPUs in the package closest to
the device when DEVICE_NUMA is enabled)
- INTR_CPUS (like LOCAL_CPUS but only returns 1 SMT thread for each core)
For systems that do not support NUMA (or if it is not enabled in the kernel
config), LOCAL_CPUS fails with EINVAL. INTR_CPUS is mapped to 'all_cpus'
by default. The idea is that INTR_CPUS should always return a valid set.
Device drivers which want to use per-CPU interrupts should start using
INTR_CPUS instead of simply assigning interrupts to all available CPUs.
In the future we may wish to add tunables to control the policy of
INTR_CPUS (e.g. should it be local-only or global, should it ignore
SMT threads or not).
The x86 nexus driver exposes the internal set of interrupt CPUs from the
the x86 interrupt code via INTR_CPUS.
The ACPI bus driver and PCI bridge drivers use _PXM to return a suitable
LOCAL_CPUS set when _PXM exists and DEVICE_NUMA is enabled. They also and
the global INTR_CPUS set from the nexus driver with the per-domain set from
_PXM to generate a local INTR_CPUS set for child devices.
Compared to the r298933, this version uses 'struct _cpuset' in
<sys/bus.h> instead of 'cpuset_t' to avoid requiring <sys/param.h>
(<sys/_cpuset.h> still requires <sys/param.h> for MAXCPU even though
<sys/_bitset.h> does not after recent changes).
It will be used for the upcoming LRO hash table initialization.
And probably will be useful in other cases, when M_WAITOK can't
be used.
Reviewed by: jhb, kib
Sponsored by: Microsoft OSTC
Differential Revision: https://reviews.freebsd.org/D6138
bus_get_cpus() returns a specified set of CPUs for a device. It accepts
an enum for the second parameter that indicates the type of cpuset to
request. Currently two valus are supported:
- LOCAL_CPUS (on x86 this returns all the CPUs in the package closest to
the device when DEVICE_NUMA is enabled)
- INTR_CPUS (like LOCAL_CPUS but only returns 1 SMT thread for each core)
For systems that do not support NUMA (or if it is not enabled in the kernel
config), LOCAL_CPUS fails with EINVAL. INTR_CPUS is mapped to 'all_cpus'
by default. The idea is that INTR_CPUS should always return a valid set.
Device drivers which want to use per-CPU interrupts should start using
INTR_CPUS instead of simply assigning interrupts to all available CPUs.
In the future we may wish to add tunables to control the policy of
INTR_CPUS (e.g. should it be local-only or global, should it ignore
SMT threads or not).
The x86 nexus driver exposes the internal set of interrupt CPUs from the
the x86 interrupt code via INTR_CPUS.
The ACPI bus driver and PCI bridge drivers use _PXM to return a suitable
LOCAL_CPUS set when _PXM exists and DEVICE_NUMA is enabled. They also and
the global INTR_CPUS set from the nexus driver with the per-domain set from
_PXM to generate a local INTR_CPUS set for child devices.
Reviewed by: wblock (manpage)
Differential Revision: https://reviews.freebsd.org/D5519
Specifically, mention that rman_get_bustag/handle/virtual are valid after
a resource is activated. Also, mention the wrapper API that accepts a
struct resource instead of a bus tag and handle.
The BUS_RESCAN() method rescans a single bus device checking for devices
that have been added or removed from the bus. A new 'rescan' command is
added to devctl(8) to trigger a rescan.
Differential Revision: https://reviews.freebsd.org/D6016
This is a minor follow-up to r297422, prompted by a Coverity warning. (It's
not a real defect, just a code smell.) OSD slot array reservations are an
array of pointers (void **) but were cast to void* and back unnecessarily.
Keep the correct type from reservation to use.
osd.9 is updated to match, along with a few trivial igor fixes.
Reported by: Coverity
CID: 1353811
Sponsored by: EMC / Isilon Storage Division
This corrects an error in r296947 in that it is not possible to assert
which thread holds a shared (or read) lock, but it is possible to assert
that one is held. Just not very useful.
MFC after: 1 week
Submitted by: wblock, jhb
Reviewed by: kib (earlier version), jhb, wblock
Sponsored by: EMC / Isilon Storage Division
Differential Revision: https://reviews.freebsd.org/D5659
SYSCTL_COUNTER_U64_ARRAY() macro.
- Add proper asserts to the SYSCTL_COUNTER_U64_ARRAY() macro that checks
the size of the first element of the array.
- Add an example to the counter(9) manual page how to use the
SYSCTL_COUNTER_U64_ARRAY() macro.
- Add some missing symbolic links for counter(9) while at it.
This is several year's worth of fail point upgrades done at EMC Isilon. They
are interdependent enough that it makes sense to put a single diff up for them.
Primarily, we added:
- Changing all mainline execution paths to be lockless, which lets us use fail
points in more sleep-sensitive areas, and allows more parallel execution
- A number of additional commands, including 'pause' that lets us do some
interesting deterministic repros of race conditions
- The ability to dump the stacks of all threads sleeping on a fail point
- A number of other API changes to allow marking up the fail point's context in
the code, and firing callbacks before and after execution
- A man page update
Submitted by: Matthew Bryan <matthew.bryan@isilon.com>
Reviewed by: cem (earlier version), jhb, kib, pho
With feedback from: bdrewery
Sponsored by: EMC / Isilon Storage Division
Differential Revision: https://reviews.freebsd.org/D5427
Since m_cat() may copy data from the second mbuf chain into the last mbuf
of the first chain, it may free the first mbuf of the second chain. Thus,
the second chain is not guaranteed to be valid after m_cat() returns.
Reviewed by: glebius
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D5497
taskqueue_enqueue() was changed to support both fast and non-fast
taskqueues 10 years ago in r154167. It has been a compat shim ever
since. It's time for the compat shim to go.
Submitted by: Howard Su <howard0su@gmail.com>
Reviewed by: sephe
Differential Revision: https://reviews.freebsd.org/D5131
Summary:
Many instances of bus_alloc_resource() simply use 0 and ~0 for start and end to
denote 'anywhere' with a given count. To clean this up, introduce a
bus_alloc_resource_anywhere() convenience function.
Bump __FreeBSD_version for the new API.
Reviewed By: jhb
Differential Revision: https://reviews.freebsd.org/D5370
Summary:
Migrate to using the semi-opaque type rman_res_t to specify rman resources. For
now, this is still compatible with u_long.
This is step one in migrating rman to use uintmax_t for resources instead of
u_long.
Going forward, this could feasibly be used to specify architecture-specific
definitions of resource ranges, rather than baking a specific integer type into
the API.
This change has been broken out to facilitate MFC'ing drivers back to 10 without
breaking ABI.
Reviewed By: jhb
Sponsored by: Alex Perez/Inertial Computing
Differential Revision: https://reviews.freebsd.org/D5075
control algorithm options. The argument is variable length and is opaque
to TCP, forwarded directly to the algorithm's ctl_output method.
Provide new includes directory netinet/cc, where algorithm specific
headers can be installed.
The new API doesn't yet have any in tree consumers.
The original code written by lstewart.
Reviewed by: rrs, emax
Sponsored by: Netflix
Differential Revision: https://reviews.freebsd.org/D711
This API has no in-tree consumers at the moment but is useful to at least
one out-of-tree consumer, and naturally complements existing vnode refcount
functions (vholdl(9), vdropl(9)).
Obtained from: kib (sys/ portion)
Sponsored by: EMC / Isilon Storage Division
Differential Revision: https://reviews.freebsd.org/D4947
Differential Revision: https://reviews.freebsd.org/D4953
Immediate problem fixed by the new KPI is the long-standing race
between device creation and assignments to cdev->si_drv1 and
cdev->si_drv2, which allows the window where cdevsw methods might be
called with si_drv1,2 fields not yet set. Devices typically checked
for NULL and returned spurious errors to usermode, and often left some
methods unchecked.
The new function interface is designed to be extensible, which should
allow to add more features to make_dev_s(9) without inventing yet
another name for function to create devices, while maintaining KPI and
even KBI backward-compatibility.
Reviewed by: hps, jhb
Sponsored by: The FreeBSD Foundation
MFC after: 3 weeks
Differential revision: https://reviews.freebsd.org/D4746
While here, explicitly note the requirement that the BAR(s) must be
allocated prior to calling pci_alloc_msix().
Reviewed by: andrew, emaste
MFC after: 1 week
Differential Revision: https://reviews.freebsd.org/D4688
exhausted.
It is possible for a bug in the code (or, theoretically, even unusual
network conditions) to exhaust all possible mbufs or mbuf clusters.
When this occurs, things can grind to a halt fairly quickly. However,
we currently do not call mb_reclaim() unless the entire system is
experiencing a low-memory condition.
While it is best to try to prevent exhaustion of one of the mbuf zones,
it would also be useful to have a mechanism to attempt to recover from
these situations by freeing "expendable" mbufs.
This patch makes two changes:
a) The patch adds a generic API to the UMA zone allocator to set a
function that should be called when an allocation fails because the
zone limit has been reached. Because of the way this function can be
called, it really should do minimal work.
b) The patch uses this API to try to free mbufs when an allocation
fails from one of the mbuf zones because the zone limit has been
reached. The function schedules a callout to run mb_reclaim().
Differential Revision: https://reviews.freebsd.org/D3864
Reviewed by: gnn
Comments by: rrs, glebius
MFC after: 2 weeks
Sponsored by: Juniper Networks
o With new KPI consumers can request contiguous ranges of pages, and
unlike before, all pages will be kept busied on return, like it was
done before with the 'reqpage' only. Now the reqpage goes away. With
new interface it is easier to implement code protected from race
conditions.
Such arrayed requests for now should be preceeded by a call to
vm_pager_haspage() to make sure that request is possible. This
could be improved later, making vm_pager_haspage() obsolete.
Strenghtening the promises on the business of the array of pages
allows us to remove such hacks as swp_pager_free_nrpage() and
vm_pager_free_nonreq().
o New KPI accepts two integer pointers that may optionally point at
values for read ahead and read behind, that a pager may do, if it
can. These pages are completely owned by pager, and not controlled
by the caller.
This shifts the UFS-specific readahead logic from vm_fault.c, which
should be file system agnostic, into vnode_pager.c. It also removes
one VOP_BMAP() request per hard fault.
Discussed with: kib, alc, jeff, scottl
Sponsored by: Nginx, Inc.
Sponsored by: Netflix
These helper functions can be used to read in or write a buffer from or to
an arbitrary process' address space. Without them, this can only be done
using proc_rwmem(), which requires the caller to fill out a uio. This is
onerous and results in code duplication; the new functions provide a simpler
interface which is sufficient for most existing callers of proc_rwmem().
This change also adds a manual page for proc_rwmem() and the new functions.
Reviewed by: jhb, kib
Differential Revision: https://reviews.freebsd.org/D4245
It was allowed before, but make it very explicit it is allowed now. And
prefer 'bool' to older types that were used for the same purpose -- int and
boolean_t.
Like with the C99 fixed-width types, use common sense when changing old
code.
No igor regressions.
Suggested by: bde <20151205031713.T3286@besplex.bde.org>
Reviewed by: glebius, davide, bapt (earlier versions)
Reviewed by: imp
Feedback from: julian
Sponsored by: EMC / Isilon Storage Division
Differential Revision: https://reviews.freebsd.org/D4384
like the various d_*_t typedefs since it declared a function pointer rather
than a function. Add a new d_priv_dtor_t typedef that declares the function
and can be used as a function prototype. The previous typedef wasn't
useful outside of the cdevpriv implementation, so retire it.
The name d_priv_dtor_t was chosen to be more consistent with cdev methods
since it is commonly used in place of d_close_t even though it is not a
direct pointer in struct cdevsw.
Reviewed by: kib, imp
MFC after: 1 month
Differential Revision: https://reviews.freebsd.org/D4340
critical section.
uma_zalloc_arg()/uma_zalloc_free() may acquire a sleepable lock on the
zone. The malloc() family of functions may call uma_zalloc_arg() or
uma_zalloc_free().
The malloc(9) man page currently claims that free() will never sleep.
It also implies that the malloc() family of functions will not sleep
when called with M_NOWAIT. However, it is more correct to say that
these functions will not sleep indefinitely. Indeed, they may acquire
a sleepable lock. However, a developer may overlook this restriction
because the WITNESS check that catches attempts to call the malloc()
family of functions within a critical section is inconsistenly
applied.
This change clarifies the language of the malloc(9) man page to clarify
the restriction against calling the malloc() family of functions
while in a critical section or holding a spin lock. It also adds
KASSERTs at appropriate points to make the enforcement of this
restriction more consistent.
PR: 204633
Differential Revision: https://reviews.freebsd.org/D4197
Reviewed by: markj
Approved by: gnn (mentor)
Sponsored by: Juniper Networks
new return codes of -1 were mistakenly being considered "true". Callout_stop
now returns -1 to indicate the callout had either already completed or
was not running and 0 to indicate it could not be stopped. Also update
the manual page to make it more consistent no non-zero in the callout_stop
or callout_reset descriptions.
MFC after: 1 Month with associated callout change.
Igor has many less complaints now. I think the two remaining are bogus, but I
am also not sure why Igor is producing them.
The page still needs more work.
Sponsored by: EMC / Isilon Storage Division
should be used by TCP for sure in its cleanup of the IN-PCB (will be coming shortly).
Sponsored by: Netflix Inc.
Differential Revision: https://reviews.freebsd.org/D4076
Add S8, S16, S32, and U32 types; add SYSCTL*() macros for them, as well
as for the existing 64-bit types. (While SYSCTL*QUAD and UQUAD macros
already exist, they do not take the same sort of 'val' parameter that
the other macros do.)
Clean up the documented "types" in the sysctl.9 document. (These are
macros and thus not real types, but the manual page documents intent.)
The sysctl_add_oid(9) arg2 has been bumped from intptr_t to intmax_t to
accommodate 64-bit types on 32-bit pointer architectures.
This is just the kernel support piece; the userspace sysctl(1) support
will follow in a later patch.
Submitted by: Ravi Pokala <rpokala@panasas.com>
Reviewed by: cem
Relnotes: no
Sponsored by: Panasas
Differential Revision: https://reviews.freebsd.org/D4091
PCI-Express capability registers (that is, PCI config registers in the
standard PCI config space belonging to the PCI-Express capability
register set).
Note that all of the current PCI-e registers are either 16 or 32-bits,
so only widths of 2 or 4 bytes are supported.
Reviewed by: imp
MFC after: 1 week
Sponsored by: Chelsio
Differential Revision: https://reviews.freebsd.org/D4088