transparent layering and better fragmentation.
- Normalize functions that allocate memory to use kmem_*
- Those that allocate address space are named kva_*
- Those that operate on maps are named kmap_*
- Implement recursive allocation handling for kmem_arena in vmem.
Reviewed by: alc
Tested by: pho
Sponsored by: EMC / Isilon Storage Division
support cache flush and write barrier commands.
sys/dev/xen/blkfront/block.h:
Add per-command flag that specifies that the I/O queue must
be frozen after this command is dispatched. This is used
to implement "single-stepping".
Remove the unused per-command flag that indicates a polled
command.
Add block device instance flags to record backend features.
Add a block device instance flag to indicate the I/O queue
is frozen until all outstanding I/O completes.
Enhance the queue API to allow the number of elements in a
queue to be interrogated.
Prefer "inline" to "__inline".
sys/dev/xen/blkfront/blkfront.c:
Formalize queue freeze semantics by adding methods for both
global and command-associated queue freezing.
Provide mechanism to freeze the I/O queue until all outstanding
I/O completes. Use this to implement barrier semantics
(BIO_ORDERED) when the backend does not support
BLKIF_OP_WRITE_BARRIER commands.
Implement BIO_FLUSH as either a BLKIF_OP_FLUSH_DISKCACHE
command or a 0 byte write barrier. Currently, all publicly
available backends perform a diskcache flush when processing
barrier commands, and this frontend behavior matches what
is done in Linux.
Simplify code by using new queue length API.
Report backend features during device attach and via sysctl.
Submitted by: Roger Pau Monné
Submitted by: gibbs (Merge with new driver queue API, sysctl support)
only re-enable I/O when all reasons have cleared.
sys/dev/xen/blkfront/block.h:
In the block front driver softc, replace the boolean
XBDF_FROZEN flag with a count of commands and driver global
issues that freeze the I/O queue. So long xbd_qfrozen_cnt
is non-zero, I/O is halted.
Add flags to xbd_flags for tracking grant table entry and
free command resource shortages. Each of these classes can
increment xbd_qfrozen_cnt at most once.
Add a command flag (XBDCF_ASYNC_MAPPING) that is set whenever
the initial mapping attempt of a command fails with EINPROGRESS.
sys/dev/xen/blkfront/blkfront.c:
In xbd_queue_cb(), use new XBDCF_ASYNC_MAPPING flag to definitively
know if an async bus dmamap load has occurred.
Add xbd_freeze() and xbd_thaw() helper methods for managing
xbd_qfrozen_cnt and use them to implement all queue freezing logic.
Add missing "thaw" to restart I/O processing once grant references
become available.
Sponsored by: Spectra Logic Corporation
scheme for defining inline command queuing functions.
Prefer enums to #defines.
sys/dev/xen/blkfront/block.h
Replace inline function generation performed by the
XBDQ_COMMAND_QUEUE() macro with single instances of each
inline function (init, enqueue, dequeue, remove). This was
made possible by using queue indexes instead of bit flags
in the command structure, and passing the index enum as
an argument to the functions.
Improve panic/assert messages in the queue functions.
Combine queue data and stats into a single data structure
and declare an array of them instead of each queue individually.
Convert command flags, softc state, and softc flags to enums.
sys/dev/xen/blkfront/blkfront.c
Mechanical adjustments for new queue api.
Sponsored by: Spectra Logic Corporation
MFC after: 1 week
In netif_free(), call ifmedia_removeall() after ether_ifdetach()
so that bpf listeners are detached, any link state processing
is completed, and there is no chance for external reference to media
information.
Suggested by: yongari
MFC after: 1 week
Xen host side can handle after defragmentation.
This prevents the driver from throwing away too long TSO chains and
improves the performance on Amazon AWS instances with 10GigE virtual
interfaces to the normally expected throughput.
Submitted by: cperciva (earlier version)
Reviewed by: cperciva
Tested by: cperciva
MFC after: 1 week
Remove local, and incorrect, definition for the value of an invalid
grant reference.
Extract ring cleanup code into xbd_free_ring() function for
symetry with xbd_alloc_ring(). This process also eliminated
an initialized but unused variable.
Sponsored by: Spectra Logic Corporation
MFC after: 1 week
o Group functions by by their functionality.
o Remove superfluous declarations.
o Remove more unused (#ifdef'd out) code.
Sponsored by: Spectra Logic Corporation
o This driver is the "xbd" driver, not the "blkfront", "blkif", "xbf", or
"xb" driver. Use the "xbd_" naming conventions for all functions,
structures, and constants.
o The prevailing convention for structure fields in this driver is to
prefix them with an abreviation of the structure type. Update
"recently added" fields to match this style.
o Remove unused data structures.
o Remove superfluous casts.
o Make a pass over the whole driver and bring it closer to
style(9) conformance.
Sponsored by: Spectra Logic Corporation
MFC after: 1 week
dev/xen/netfront:
In netif_free(), properly stop the interface and drain any pending
timers prior to disconnecting from the backend device.
Remove all media and detach our interface object from the system
prior to deleting it.
PR: kern/176471
Submitted by: Roger Pau Monne <roger.pau@citrix.com>
Reviewed by: gibbs
MFC after: 1 week
__func__ and add some missing ones.
- Remove a stale comment.
- Remove unused NUM_ELEMENTS macro.
- Remove extra empty lines.
- Use DEVMETHOD_END.
- Use NULL rather than 0 for pointers.
MFC after: 3 days
- Replace incorrect function names in printf(9) strings with __func__.
- Make xctrl_shutdown_reasons table const.
- Use nitems() rather than rolling an own version.
- Use DEVMETHOD_END.
- Use NULL rather than 0 for pointers.
MFC after: 3 days
In particular, do not lock Giant conditionally when calling into the
filesystem module, remove the VFS_LOCK_GIANT() and related
macros. Stop handling buffers belonging to non-mpsafe filesystems.
The VFS_VERSION is bumped to indicate the interface change which does
not result in the interface signatures changes.
Conducted and reviewed by: attilio
Tested by: pho
a da(4) instance going away while GEOM is still probing it.
In this case, the GEOM disk class instance has been created by
disk_create(), and the taste of the disk is queued in the GEOM
event queue.
While that event is queued, the da(4) instance goes away. When the
open call comes into the da(4) driver, it dereferences the freed
(but non-NULL) peripheral pointer provided by GEOM, which results
in a panic.
The solution is to add a callback to the GEOM disk code that is
called when all of its resources are cleaned up. This is
implemented inside GEOM by adding an optional callback that is
called when all consumers have detached from a provider, and the
provider is about to be deleted.
scsi_cd.c,
scsi_da.c: In the register routine for the cd(4) and da(4)
routines, acquire a reference to the CAM peripheral
instance just before we call disk_create().
Use the new GEOM disk d_gone() callback to register
a callback (dadiskgonecb()/cddiskgonecb()) that
decrements the peripheral reference count once GEOM
has finished cleaning up its resources.
In the cd(4) driver, clean up open and close
behavior slightly. GEOM makes sure we only get one
open() and one close call, so there is no need to
set an open flag and decrement the reference count
if we are not the first open.
In the cd(4) driver, use cam_periph_release_locked()
in a couple of error scenarios to avoid extra mutex
calls.
geom.h: Add a new, optional, providergone callback that
is called when a provider is about to be deleted.
geom_disk.h: Add a new d_gone() callback to the GEOM disk
interface.
Bump the DISK_VERSION to version 2. This probably
should have been done after a couple of previous
changes, especially the addition of the d_getattr()
callback.
geom_disk.c: Add a providergone callback for the disk class,
g_disk_providergone(), that calls the user's
d_gone() callback if it exists.
Bump the DISK_VERSION to 2.
geom_subr.c: In g_destroy_provider(), call the providergone
callback if it has been provided.
In g_new_geomf(), propagate the class's
providergone callback to the new geom instance.
blkfront.c: Callers of disk_create() are supposed to pass in
DISK_VERSION, not an explicit disk API version
number. Update the blkfront driver to do that.
disk.9: Update the disk(9) man page to include information
on the new d_gone() callback, as well as the
previously added d_getattr() callback, d_descr
field, and HBA PCI ID fields.
MFC after: 5 days
XenServer configurations that advertise the multi-page ring extension,
but only allow a single page of ring space.
sys/dev/xen/blkfront/blkfront.c:
If only one page of ring space is being used, do not publish
in the XenStore the number of pages in use (1), via either
of the supported multi-page ring extension schemes.
Single page operation is the same with or without the
ring-page extension being negotiated. Relying on the
legacy behavior avoids an incompatible difference in how
the two ring-page extension schemes that are out in the
wild, deal with the base case of a single page. The
Amazon/Red Hat drivers use the same XenStore variable as
if the extension was not negotiated. The Citrix drivers
assume the new ring reference XenStore variables will be
available
Reported by: Oliver Schonefeld <schonefeld@ids-mannheim.de>
MFC after: 3 days
remaining drivers that haven't been converted have various problems or
complexities that will be dealt with later. This list includes:
hptrr, hptmv, hpt27xx - device aggregation across multiple parents
drm - want to talk to the maintainer first
tsec, sec - Openfirmware devices, not sure if changes are warranted
fatm - Done except for unused testing code
usb - want to talk to the maintainer first
ce, cp, ctau, cx - Significant driver changes needed to convey parent info
There are also devices tucked into architecture subtrees that I'll leave
for the respective maintainers to deal with.
devices that are unplugged via QEMU.
sys/dev/xen/blkback/blkback.c:
Toolstack initiated closures change the frontend's state
to Closing. The backend must change to Closing as well,
even if we can't actually close yet, in order for the
frontend to notice and start the closing process.
MFC after: 3 days
The previous code did not limit the I/O request size based on
the maximum number of segments supported by the back-end. In
current practice, since the only back-end supporting chained
requests is the FreeBSD implementation, this limit was never
exceeded.
sys/dev/xen/blkfront/block.h:
Add two macros, XBF_SEGS_TO_SIZE() and XBF_SIZE_TO_SEGS(),
to centralize the logic of reserving a segment to deal with
non-page-aligned I/Os.
sys/dev/xen/blkfront/blkfront.c:
o When negotiating transfer parameters, limit the
max_request_size we use and publish, if it is greater
than the maximum, unaligned, I/O we can support with
the number of segments advertised by the backend.
o Don't unilaterally reduce the I/O size published to
the disk layer by a single page. max_request_size
is already properly limited in the transfer parameter
negotiation code.
o Fix typos in printf strings:
"max_requests_segments" -> "max_request_segments"
"specificed" -> "specified"
MFC after: 1 day
FreeBSD's front and back Xen blkif interface drivers.
sys/dev/xen/blkfront/block.h:
sys/dev/xen/blkfront/blkfront.c:
sys/dev/xen/blkback/blkback.c:
Replace FreeBSD specific multi-page ring impelementation with
support for both the Citrix and Amazon/RedHat versions of this
extension.
sys/dev/xen/blkfront/blkfront.c:
o Add a per-instance sysctl tree that exposes all negotiated
transport parameters (ring pages, max number of requests,
max request size, max number of segments).
o In blkfront_vdevice_to_unit() add a missing return statement
so that we properly identify the unit number for high numbered
xvd devices.
sys/dev/xen/blkback/blkback.c:
o Add static dtrace probes for several events in this driver.
o Defer connection shutdown processing until the front-end
enters the closed state. This avoids prematurely tearing
down the connection when buggy front-ends transition to the
closing state, even though the device is open and they
veto the close request from the tool stack.
o Add nodes for maximum request size and the number of active
ring pages to the exising, per-instance, sysctl tree.
o Miscelaneous style cleanup.
sys/xen/interface/io/blkif.h:
o Add extensive documentation of the XenStore nodes used to
implement the blkif interface.
o Document the startup sequence between a front and back driver.
o Add structures and documenatation for the "discard" feature
(AKA Trim).
o Cleanup some definitions related to FreeBSD's request
number/size/segment-limit extension.
sys/dev/xen/blkfront/blkfront.c:
sys/dev/xen/blkback/blkback.c:
sys/xen/xenbus/xenbusvar.h:
Add the convenience function xenbus_get_otherend_state() and
use it to simplify some logic in both block-front and block-back.
MFC after: 1 day
netback.c: Add missing VM includes.
xen/xenvar.h,
xen/xenpmap.h: Move some XENHVM macros from <machine/xen/xenpmap.h> to
<machine/xen/xenvar.h> on i386 to match the amd64 headers.
conf/files: Add netback to the build.
Submitted by: jhb
MFC after: 3 days
share/man/man4/Makefile,
share/man/man4/xnb.4,
sys/dev/xen/netback/netback.c,
sys/dev/xen/netback/netback_unit_tests.c:
Rewrote the netback driver for xen to attach properly via newbus
and work properly in both HVM and PVM mode (only HVM is tested).
Works with the in-tree FreeBSD netfront driver or the Windows
netfront driver from SuSE. Has not been extensively tested with
a Linux netfront driver. Does not implement LRO, TSO, or
polling. Includes unit tests that may be run through sysctl
after compiling with XNB_DEBUG defined.
sys/dev/xen/blkback/blkback.c,
sys/xen/interface/io/netif.h:
Comment elaboration.
sys/kern/uipc_mbuf.c:
Fix page fault in kernel mode when calling m_print() on a
null mbuf. Since m_print() is only used for debugging, there
are no performance concerns for extra error checking code.
sys/kern/subr_scanf.c:
Add the "hh" and "ll" width specifiers from C99 to scanf().
A few callers were already using "ll" even though scanf()
was handling it as "l".
Submitted by: Alan Somers <alans@spectralogic.com>
Submitted by: John Suykerbuyk <johns@spectralogic.com>
Sponsored by: Spectra Logic
MFC after: 1 week
Reviewed by: ken
At the moment grab and ungrab methods of all console drivers are no-ops.
Current intended meaning of the calls is that the kernel takes control of
console input. In the future the semantics may be extended to mean that
the calling thread takes full ownership of the console (e.g. console
output from other threads could be suspended).
Inspired by: bde
MFC after: 2 months
If I interpret the C standard correctly, the storage specifier should be
placed before the inline keyword. While at it, replace __inline by
inline in the files affected.
one. Interestingly, these are actually the default for quite some time
(bus_generic_driver_added(9) since r52045 and bus_generic_print_child(9)
since r52045) but even recently added device drivers do this unnecessarily.
Discussed with: jhb, marcel
- While at it, use DEVMETHOD_END.
Discussed with: jhb
- Also while at it, use __FBSDID.
The SYSCTL_NODE macro defines a list that stores all child-elements of
that node. If there's no SYSCTL_DECL macro anywhere else, there's no
reason why it shouldn't be static.
back-end features.
sys/dev/xen/netfront/netfront.c:
o Add xn_query_features() which reads the XenStore and
records the TSO, LRO, and chained ring-request support
of the backend.
o Rename xn_configure_lro() to xn_configure_features() and
use this routine to manage the setup of TSO, LRO, and
checksum offload.
o In create_netdev(), initialize if_capabilities and
if_hwassist to the capabilities found on all backends.
Delegate configuration of if_capenable and the TSO flag
if if_hwassist to xn_configure_features().
Reported by: Hugo Silva (fix inspired by patch provided)
Approved by: re
MFC after: 1 week
PV devices with the ioemu attribute set.
sys/dev/xen/netfront/netfront.c:
o If a mac address for the interface cannot be found
in the front-side XenStore tree, look for an entry
in the back-side tree. With ioemu devices, the
emulator does not populate the front side tree and
neither does Xend.
o Return an error rather than panic when an attach
attempt fails.
Reported by: Janne Snabb (fix inspired by patch provided)
PR: kern/154302
Approved by: re
Sponsored by: BQ Internet
sys/dev/xen/netfront/netfront.c:
o Implement netfront_suspend(), a specialized suspend
handler for the netfront driver. This routine simply
disables the carrier so the driver is idle during
system suspend processing.
o Fix a leak when re-initializing LRO during a link reset.
o In netif_release_tx_bufs(), when cleaning up the grant
references for our TX ring, use gnttab_end_foreign_access_ref
instead of attempting to grant the page again.
o In netif_release_tx_bufs(), we do not track mbufs associated
with mbuf chains, but instead just free each mbuf directly.
Use m_free(), not m_freem(), to avoid double frees of mbufs.
o Refactor some code to enhance clarity.
Approved by: re
MFC after: 1 week
Sponsored by: BQ Internet
sys/dev/xen/blkfront/block.h:
sys/dev/xen/blkfront/blkfront.c:
Remove now unused blkif_vdev_t from the blkfront soft.
sys/dev/xen/blkfront/blkfront.c:
o In blkfront_suspend(), indicate the desire to suspend
by changing the softc connected state to SUSPENDED, and
then wait for any I/O pending on the remote peer to
drain. Cancel suspend processing if I/O does not
drain within 30 seconds.
o Enable and update blkfront_resume(). Since I/O is
drained prior to the suspension of the VM, the complicated
recovery process performed by other Xen blkfront
implementations is avoided. We simply tear down the
connection to our old peer, and then re-connect.
o In blkif_initialize(), fix a resource leak and botched
return if we cannot allocate shadow memory for our
requests.
o In blkfront_backend_changed(), correct our response to
the XenbusStateInitialised state. This state indicates
that our backend peer has published sufficient data for
blkfront to publish ring information and other XenStore
data, not that a connection can occur. Blkfront now
will only perform connection processing in response to
the XenbusStateConnected state. This corrects an issue
where blkfront connected before the backend was ready
during resume processing.
Approved by: re
MFC after: 1 week