Commit Graph

2503 Commits

Author SHA1 Message Date
John Baldwin
71499f6a2d Make device_quiet() an attachment property.
In particular, reset the DF_QUIET flag when detaching from a device so
that a driver that marks a device quiet doesn't dictate policy for a
different driver that may claim the device in the future.

Reviewed by:	rpokala, wblock
MFC after:	2 weeks
Sponsored by:	Chelsio Communications
Differential Revision:	https://reviews.freebsd.org/D7803
2016-09-12 18:06:42 +00:00
Konstantin Belousov
cf1c47763f Add FPU_KERN_NOCTX flag to the fpu_kern_enter() function on amd64.
The flag specifies that the block which uses FPU must be executed in
critical section, i.e. take no context switches, and does not need an
FPU save area during the execution.

It is intended to be applied around fast and short code pathes where
save area allocation is impossible or undesirable, due to context or
due to the relative cost of calculation vs. allocation.

Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
2016-09-11 09:14:07 +00:00
John Baldwin
da0fc9250c Reset PCI pass through devices via PCI-e FLR during VM start and end.
Add routines to trigger a function level reset (FLR) of a PCI-express
device via the PCI-express device control register.  This also includes
support routines to wait for pending transactions to complete as well
as calculating the maximum completion timeout permitted by a device.

Change the ppt(4) driver to reset pass through devices before attaching
to a VM during startup and before detaching from a VM during shutdown.

Reviewed by:	imp, wblock (earlier version)
MFC after:	1 month
Sponsored by:	Chelsio Communications
Differential Revision:	https://reviews.freebsd.org/D7751
2016-09-06 21:15:35 +00:00
John Baldwin
64414cc00f Update the I/O MMU in bhyve when PCI devices are added and removed.
When the I/O MMU is active in bhyve, all PCI devices need valid entries
in the DMAR context tables. The I/O MMU code does a single enumeration
of the available PCI devices during initialization to add all existing
devices to a domain representing the host. The ppt(4) driver then moves
pass through devices in and out of domains for virtual machines as needed.
However, when new PCI devices were added at runtime either via SR-IOV or
HotPlug, the I/O MMU tables were not updated.

This change adds a new set of EVENTHANDLERS that are invoked when PCI
devices are added and deleted. The I/O MMU driver in bhyve installs
handlers for these events which it uses to add and remove devices to
the "host" domain.

Reviewed by:	imp
Sponsored by:	Chelsio Communications
Differential Revision:	https://reviews.freebsd.org/D7667
2016-09-06 20:17:54 +00:00
Mark Johnston
dbbaf04f1e Remove support for idle page zeroing.
Idle page zeroing has been disabled by default on all architectures since
r170816 and has some bugs that make it seemingly unusable. Specifically,
the idle-priority pagezero thread exacerbates contention for the free page
lock, and yields the CPU without releasing it in non-preemptive kernels. The
pagezero thread also does not behave correctly when superpage reservations
are enabled: its target is a function of v_free_count, which includes
reserved-but-free pages, but it is only able to zero pages belonging to the
physical memory allocator.

Reviewed by:	alc, imp, kib
Differential Revision:	https://reviews.freebsd.org/D7714
2016-09-03 20:38:13 +00:00
John Baldwin
72ac509e46 Remove warning about pci_addr_t being different sizes.
pci_addr_t has always been 64-bits since r163805.

MFC after:	1 week
2016-09-01 21:30:12 +00:00
Kevin Lo
a951a78821 Update a comment to reflect r305051. 2016-08-30 08:34:49 +00:00
Mariusz Zaborski
6ccb0d8756 Bump date in the man page. 2016-08-27 18:08:25 +00:00
Mariusz Zaborski
7dcd0f0e7b Introduce cnv man page.
Submitted by:		Adam Starak <starak.adam@gmail.com>
Reviewed by:		cem@, wblock@
Differential Revision:	https://reviews.freebsd.org/D7249
2016-08-27 13:47:52 +00:00
Randall Stewart
587d67c008 Here we update the modular tcp to be able to switch to an
alternate TCP stack in other then the closed state (pre-listen/connect).
The idea is that *if* that is supported by the alternate stack, it
is asked if its ok to switch. If it approves the "handoff" then we
allow the switch to happen. Also the fini() function now gets a flag
to tell if you are switching away *or* the tcb is destroyed. The
init() call into the alternate stack is moved to the end so the
tcb is more fully formed before the init transpires.

Sponsored by:	Netflix Inc.
Differential Revision:	D6790
2016-08-16 15:11:46 +00:00
Edward Tomasz Napierala
37e56c6efe Remove lockmgr_waiters(9) and BUF_LOCKWAITERS(9); they were not used
for anything.

Reviewed by:	kib@
MFC after:	1 month
Differential Revision:	https://reviews.freebsd.org/D7420
2016-08-05 13:53:28 +00:00
John Baldwin
0aee83cc1d Permit the name of the /dev/iov entry to be set by the driver.
The PCI_IOV option creates character devices in /dev/iov for each PF
device driver that registers support for creating VFs.  By default the
character device is named after the PF device (e.g. /dev/iov/foo0).
This change adds a variant of pci_iov_attach() called pci_iov_attach_name()
that allows the name of the /dev/iov entry to be specified by the
driver.

Reviewed by:	rstone
MFC after:	1 month
Sponsored by:	Chelsio Communications
Differential Revision:	https://reviews.freebsd.org/D7400
2016-08-03 17:09:12 +00:00
Eric van Gyzen
71aa6fbfe9 Fix two return types in the cpuset(9) and bitset(9) man pages
The *_FFS() and *_COUNT() functions return int, not size_t.

MFC after:	3 days
Sponsored by:	Dell Inc.
2016-07-29 21:12:48 +00:00
Konstantin Belousov
a9e182e895 Extract the calculation of the callout fire time into the new function
callout_when(9).  See the man page update for the description of the
intended use.

Tested by:	pho
Reviewed by:	jhb, bjk (man page updates)
Sponsored by:	The FreeBSD Foundation
MFC after:	1 month
X-Differential revision:	https://reviews.freebsd.org/D7137
2016-07-28 08:57:01 +00:00
Konstantin Belousov
90b581f2cc Implement mtx_trylock_spin(9).
Discussed with:	bde
Reviewed by:	jhb
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
Differential revision:	https://reviews.freebsd.org/D7192
2016-07-23 05:30:55 +00:00
Pyun YongHyeon
f2e1e9bd53 Belatedly remove CSUM_IP_FRAGS and CSUM_FRAGMENT offloading
capabilities.  It was removed in r243624 and r254804/r271006
respectively.
This file and mbuf(9) needs updates for other offloading
capabilities(i.e. CSUM_SCTP and CSUM_TSO).
2016-07-11 06:49:56 +00:00
Gleb Smirnoff
d153eeee97 The paradigm of a callout is that it has three consequent states:
not scheduled -> scheduled -> running -> not scheduled. The API and the
manual page assume that, some comments in the code assume that, and looks
like some contributors to the code also did. The problem is that this
paradigm isn't true. A callout can be scheduled and running at the same
time, which makes API description ambigouous. In such case callout_stop()
family of functions/macros should return 1 and 0 at the same time, since it
successfully unscheduled future callout but the current one is running.
Before this change we returned 1 in such a case, with an exception that
if running callout was migrating we returned 0, unless CS_MIGRBLOCK was
specified.

With this change, we now return 0 in case if future callout was unscheduled,
but another one is still in action, indicating to API users that resources
are not yet safe to be freed.

However, the sleepqueue code relies on getting 1 return code in that case,
and there already was CS_MIGRBLOCK flag, that covered one of the edge cases.
In the new return path we will also use this flag, to keep sleepqueue safe.

Since the flag CS_MIGRBLOCK doesn't block migration and now isn't limited to
migration edge case, rename it to CS_EXECUTING.

This change fixes panics on a high loaded TCP server.

Reviewed by:	jch, hselasky, rrs, kib
Approved by:	re (gjb)
Differential Revision:	https://reviews.freebsd.org/D7042
2016-07-05 18:47:17 +00:00
Jonathan T. Looney
cf3c688cc9 Document support for alternate TCP stacks.
Differential Revision:	https://reviews.freebsd.org/D6940
Reviewed by:	hiren
Approved by:	re (gjb)
Sponsored by:	Juniper Networks
2016-06-28 13:37:01 +00:00
John Baldwin
2ab0398d94 Add pci_get_max_payload() to fetch the PCI-express maximum payload size.
Approved by:	re (gjb)
MFC after:	2 weeks
Sponsored by:	Chelsio Communications
Differential Revision:	https://reviews.freebsd.org/D6951
2016-06-24 17:26:42 +00:00
Edward Tomasz Napierala
16e3675494 Fix a bunch of "xref refers to *this* page" igor(1) warnings.
MFC after:	1 month
2016-06-09 06:55:00 +00:00
Edward Tomasz Napierala
646fa38767 Fix typos.
MFC after:	1 month
2016-06-08 10:38:00 +00:00
Edward Tomasz Napierala
c8e10ea4ab Fix some trailing whitespaces.
MFC after:	1 month
2016-06-08 10:26:17 +00:00
Edward Tomasz Napierala
7851d429a6 Fix a bunch of "sentence not on new line" warnings in section 9.
MFC after:	1 month
2016-06-08 09:19:47 +00:00
Bjoern A. Zeeb
484149def8 Introduce a per-VNET flag to enable/disable netisr prcessing on that VNET.
Add accessor functions to toggle the state per VNET.
The base system (vnet0) will always enable itself with the normal
registration. We will share the registered protocol handlers in all
VNETs minimising duplication and management.
Upon disabling netisr processing for a VNET drain the netisr queue from
packets for that VNET.

Update netisr consumers to (de)register on a per-VNET start/teardown using
VNET_SYS(UN)INIT functionality.

The change should be transparent for non-VIMAGE kernels.

Reviewed by:	gnn (, hiren)
Obtained from:	projects/vnet
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D6691
2016-06-03 13:57:10 +00:00
Mark Johnston
4a07671ebd Remove the BUGS entry in memguard's man page.
UMA refcounting is gone as of r296243, so this bug no longer exists. In
particular, it's now possible to guard mbuf clusters with memguard.
2016-06-01 22:34:21 +00:00
Bjoern A. Zeeb
3f58662dd9 The pr_destroy field does not allow us to run the teardown code in a
specific order.  VNET_SYSUNINITs however are doing exactly that.
Thus remove the VIMAGE conditional field from the domain(9) protosw
structure and replace it with VNET_SYSUNINITs.
This also allows us to change some order and to make the teardown functions
file local static.
Also convert divert(4) as it uses the same mechanism ip(4) and ip6(4) use
internally.

Slightly reshuffle the SI_SUB_* fields in kernel.h and add a new ones, e.g.,
for pfil consumers (firewalls), partially for this commit and for others
to come.

Reviewed by:		gnn, tuexen (sctp), jhb (kernel.h)
Obtained from:		projects/vnet
MFC after:		2 weeks
X-MFC:			do not remove pr_destroy
Sponsored by:		The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D6652
2016-06-01 10:14:04 +00:00
Bryan Drewery
f0c619b22f Be more clear about LOCKLEAF being exclusive and add LOCKSHARED. 2016-05-23 21:29:57 +00:00
John Baldwin
20fee1093e Add sglist functions for working with arrays of VM pages.
sglist_count_vmpages() determines the number of segments required for
a buffer described by an array of VM pages. sglist_append_vmpages()
adds the segments described by such a buffer to an sglist.  The latter
function is largely pulled from sglist_append_bio(), and
sglist_append_bio() now uses sglist_append_vmpages().

Reviewed by:	kib
Sponsored by:	Chelsio Communications
2016-05-20 23:28:43 +00:00
John Baldwin
cc981af204 Add new bus methods for mapping resources.
Add a pair of bus methods that can be used to "map" resources for direct
CPU access using bus_space(9).  bus_map_resource() creates a mapping and
bus_unmap_resource() releases a previously created mapping.  Mappings are
described by 'struct resource_map' object.  Pointers to these objects can
be passed as the first argument to the bus_space wrapper API used for bus
resources.

Drivers that wish to map all of a resource using default settings
(for example, using uncacheable memory attributes) do not need to change.
However, drivers that wish to use non-default settings can now do so
without jumping through hoops.

First, an RF_UNMAPPED flag is added to request that a resource is not
implicitly mapped with the default settings when it is activated.  This
permits other activation steps (such as enabling I/O or memory decoding
in a device's PCI command register) to be taken without creating a
mapping.  Right now the AGP drivers don't set RF_ACTIVE to avoid using
up a large amount of KVA to map the AGP aperture on 32-bit platforms.
Once RF_UNMAPPED is supported on all platforms that support AGP this
can be changed to using RF_UNMAPPED with RF_ACTIVE instead.

Second, bus_map_resource accepts an optional structure that defines
additional settings for a given mapping.

For example, a driver can now request to map only a subset of a resource
instead of the entire range.  The AGP driver could also use this to only
map the first page of the aperture (IIRC, it calls pmap_mapdev() directly
to map the first page currently).  I will also eventually change the
PCI-PCI bridge driver to request mappings of the subset of the I/O window
resource on its parent side to create mappings for child devices rather
than passing child resources directly up to nexus to be mapped.  This
also permits bridges that do address translation to request suitable
mappings from a resource on the "upper" side of the bus when mapping
resources on the "lower" side of the bus.

Another attribute that can be specified is an alternate memory attribute
for memory-mapped resources.  This can be used to request a
Write-Combining mapping of a PCI BAR in an MI fashion.  (Currently the
drivers that do this call pmap_change_attr() directly for x86 only.)

Note that this commit only adds the MI framework.  Each platform needs
to add support for handling RF_UNMAPPED and thew new
bus_map/unmap_resource methods.  Generally speaking, any drivers that
are calling rman_set_bustag() and rman_set_bushandle() need to be
updated.

Discussed on:	arch
Reviewed by:	cem
Differential Revision:	https://reviews.freebsd.org/D5237
2016-05-20 17:57:47 +00:00
John Baldwin
886b793d84 Remove dangling references to rman_await_resource().
This function was removed when RF_TIMESHARE was removed a couple of years
ago.

MFC after:	3 days
2016-05-20 01:17:38 +00:00
Warner Losh
c6755605f6 Per Ravi Pokala's suggestion, rewrite the g_reset_bio description to
be clearer. It also describes it with more nuance. Add missing MLINKS
noticed by trasz@. Bump the date.
2016-05-17 17:08:13 +00:00
Eitan Adler
cef367e6a1 Don't repeat the the word 'the'
(one manual change to fix grammar)

Confirmed With: db
Approved by: secteam (not really, but this is a comment typo fix)
2016-05-17 12:52:31 +00:00
Andrew Turner
1e43b18c4b Add a pcib interface for use by interrupt controllers that need to
translate the pci rid to a controller ID. The translation could be based
on the 'msi-map' OFW property, a similar ACPI option, or hard-coded for
hardware lacking the above options.

Reviewed by:	wma
Obtained from:	ABT Systems Ltd
Sponsored by:	The FreeBSD Foundation
2016-05-16 09:31:44 +00:00
Andrew Turner
d7be980dbe Re-commit r299467 having fixed the build:
Add a new get_id interface to pci and pcib. This will allow us to both
detect failures, and get different PCI IDs.

For the former the interface returns an int to signal an error. The ID is
returned at a uintptr_t * argument.

For the latter there is a type argument that allows selecting the ID type.
This only specifies a single type, however a MSI type will be added
to handle the need to find the ID the hardware passes to the ARM GICv3
interrupt controller.

A follow up commit will be made to remove pci_get_rid.

Reviewed by:    jhb, rstone (previous version)
Obtained from:  ABT Systems Ltd
Sponsored by:   The FreeBSD Foundation
Differential Revision:  https://reviews.freebsd.org/D6239
2016-05-16 09:15:50 +00:00
Sepherosa Ziehau
dfdc9a05c6 atomic: Add testandclear on i386/amd64
Reviewed by:	kib
Sponsored by:	Microsoft OSTC
Differential Revision:	https://reviews.freebsd.org/D6381
2016-05-16 07:19:33 +00:00
Conrad Meyer
f41be0f076 Revert r299467 to fix the kernel build.
$ svn merge -c -299467 .

Approved by:	build being broken for six hours
2016-05-11 23:00:12 +00:00
Andrew Turner
9a36a337ff Add a new get_id interface to pci and pcib. This will allow us to both
detect failures, and get different PCI IDs.

For the former the interface returns an int to signal an error. The ID is
returned at a uintptr_t * argument.

For the latter there is a type argument that allows selecting the ID type.
This only specifies a single type, however a MSI type will be added
to handle the need to find the ID the hardware passes to the ARM GICv3
interrupt controller.

A follow up commit will be made to remove pci_get_rid.

Reviewed by:	jhb, rstone
Obtained from:	ABT Systems Ltd
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D6239
2016-05-11 17:07:29 +00:00
John Baldwin
a7b81566bc Add a missing section to a cross-reference.
While here, add bus_space(9) to the SEE ALSO section.
2016-05-10 16:13:54 +00:00
Warner Losh
1f1bf434c6 Bump date. Forgotten in r299312. 2016-05-10 04:01:04 +00:00
Warner Losh
7043b9898f Document g_reset_bio(). This is long overdue. g_reset_bio will reset
the bio to a pristine state should you wish to re-use it for another
I/O without freeing it. In the bast, a simple bzero was done to do
this, but that may not be sufficient in the future when the bio may
contain state that's not part of the documented API. Besides, it makes
the code clearer as to the intent...

Noticed by: smh@
2016-05-10 03:57:47 +00:00
John Baldwin
8d791e5af1 Add a new bus method to fetch device-specific CPU sets.
bus_get_cpus() returns a specified set of CPUs for a device.  It accepts
an enum for the second parameter that indicates the type of cpuset to
request.  Currently two valus are supported:

 - LOCAL_CPUS (on x86 this returns all the CPUs in the package closest to
   the device when DEVICE_NUMA is enabled)
 - INTR_CPUS (like LOCAL_CPUS but only returns 1 SMT thread for each core)

For systems that do not support NUMA (or if it is not enabled in the kernel
config), LOCAL_CPUS fails with EINVAL.  INTR_CPUS is mapped to 'all_cpus'
by default.  The idea is that INTR_CPUS should always return a valid set.

Device drivers which want to use per-CPU interrupts should start using
INTR_CPUS instead of simply assigning interrupts to all available CPUs.
In the future we may wish to add tunables to control the policy of
INTR_CPUS (e.g. should it be local-only or global, should it ignore
SMT threads or not).

The x86 nexus driver exposes the internal set of interrupt CPUs from the
the x86 interrupt code via INTR_CPUS.

The ACPI bus driver and PCI bridge drivers use _PXM to return a suitable
LOCAL_CPUS set when _PXM exists and DEVICE_NUMA is enabled.  They also and
the global INTR_CPUS set from the nexus driver with the per-domain set from
_PXM to generate a local INTR_CPUS set for child devices.

Compared to the r298933, this version uses 'struct _cpuset' in
<sys/bus.h> instead of 'cpuset_t' to avoid requiring <sys/param.h>
(<sys/_cpuset.h> still requires <sys/param.h> for MAXCPU even though
<sys/_bitset.h> does not after recent changes).
2016-05-09 20:50:21 +00:00
Sepherosa Ziehau
f8ce3dfaf1 kern: Add phashinit_flags(), which allows malloc(M_NOWAIT)
It will be used for the upcoming LRO hash table initialization.
And probably will be useful in other cases, when M_WAITOK can't
be used.

Reviewed by:	jhb, kib
Sponsored by:	Microsoft OSTC
Differential Revision:	https://reviews.freebsd.org/D6138
2016-05-03 07:17:13 +00:00
John Baldwin
8a08b7d36b Revert bus_get_cpus() for now.
I really thought I had run this through the tinderbox before committing,
but many places need <sys/types.h> -> <sys/param.h> for <sys/bus.h> now.
2016-05-03 01:17:40 +00:00
John Baldwin
bc153c692f Add a new bus method to fetch device-specific CPU sets.
bus_get_cpus() returns a specified set of CPUs for a device.  It accepts
an enum for the second parameter that indicates the type of cpuset to
request.  Currently two valus are supported:

 - LOCAL_CPUS (on x86 this returns all the CPUs in the package closest to
   the device when DEVICE_NUMA is enabled)
 - INTR_CPUS (like LOCAL_CPUS but only returns 1 SMT thread for each core)

For systems that do not support NUMA (or if it is not enabled in the kernel
config), LOCAL_CPUS fails with EINVAL.  INTR_CPUS is mapped to 'all_cpus'
by default.  The idea is that INTR_CPUS should always return a valid set.

Device drivers which want to use per-CPU interrupts should start using
INTR_CPUS instead of simply assigning interrupts to all available CPUs.
In the future we may wish to add tunables to control the policy of
INTR_CPUS (e.g. should it be local-only or global, should it ignore
SMT threads or not).

The x86 nexus driver exposes the internal set of interrupt CPUs from the
the x86 interrupt code via INTR_CPUS.

The ACPI bus driver and PCI bridge drivers use _PXM to return a suitable
LOCAL_CPUS set when _PXM exists and DEVICE_NUMA is enabled.  They also and
the global INTR_CPUS set from the nexus driver with the per-domain set from
_PXM to generate a local INTR_CPUS set for child devices.

Reviewed by:	wblock (manpage)
Differential Revision:	https://reviews.freebsd.org/D5519
2016-05-02 18:00:38 +00:00
Warren Block
7c64ddd5b0 Spelling fixes supplied by pfg@, detected with codespell, plus
additional misspellings detected by igor.

MFC after:	1 week
2016-05-01 22:00:41 +00:00
John Baldwin
9aa021d416 Add some notes about the implicit resource mapping for activated resources.
Specifically, mention that rman_get_bustag/handle/virtual are valid after
a resource is activated.  Also, mention the wrapper API that accepts a
struct resource instead of a bus tag and handle.
2016-04-28 18:23:18 +00:00
John Baldwin
9a1be350ea Document RF_PREFETCHABLE. 2016-04-28 18:01:25 +00:00
John Baldwin
55c661caad Document PCI_RES_BUS as a possible resource type. 2016-04-28 17:50:16 +00:00
John Baldwin
de70bf980b Remove a stale reference to the removed RF_TIMESHARE flag. 2016-04-28 17:48:52 +00:00
John Baldwin
a907c6914c Add a new rescan method to the bus interface.
The BUS_RESCAN() method rescans a single bus device checking for devices
that have been added or removed from the bus.  A new 'rescan' command is
added to devctl(8) to trigger a rescan.

Differential Revision:	https://reviews.freebsd.org/D6016
2016-04-27 16:29:03 +00:00