Commit Graph

412 Commits

Author SHA1 Message Date
Roger Pau Monné
83c2fa73e6 xen-blkfront: fix memory leak in xbd_connect error path
If gnttab_grant_foreign_access() fails for any of the indirection
pages, the code breaks out of both the loops without freeing the local
variable indirectpages, causing a memory leak.

Submitted by:		Pratyush Yadav <pratyush@freebsd.org>
Differential Review:	https://reviews.freebsd.org/D16136
2018-07-30 11:27:51 +00:00
Roger Pau Monné
8b19549b0e xen-blkfront: fix length check
Length is an unsigned integer, so checking against < 0 doesn't make
sense. While there also make clear that a length of 0 always succeeds.

Submitted by:		Pratyush Yadav <pratyush@freebsd.org>
Differential Review:	https://reviews.freebsd.org/D16045
2018-07-30 11:15:20 +00:00
Roger Pau Monné
3653af112f xen: attach the PV CPU if no CPU device is present
When booted as PVHv2, there's no ACPI CPU object, so attach the PV CPU
device in order to take it's place.

This is required in case some device or driver tries to poke at the
PCPU device field.

Sponsored by: Citrix Systems R&D
2018-07-19 08:00:52 +00:00
Roger Pau Monné
fa60904232 xen: do not limit PV console usage to PV guests
The Xen PV console is also available to HVM and PVHv2 guests, so don't
limit the console usage to PV guests only.

Sponsored by: Citrix Systems R&D
2018-07-19 07:58:24 +00:00
Roger Pau Monné
cfa0b7b82f xen: remove direct usage of HYPERVISOR_start_info
HYPERVISOR_start_info is only available to PV and PVHv1 guests, HVM
and PVHv2 guests get this data from HVM parameters that are fetched
using a hypercall.

Instead provide a set of helper functions that should be used to fetch
this data. The helper functions have different implementations
depending on whether FreeBSD is running as PVHv1 or HVM/PVHv2 guest
type.

This helps to cleanup generic Xen code by removing quite a lot of
xen_pv_domain and xen_hvm_domain macro usages.

Sponsored by:	Citrix Systems R&D
2018-07-19 07:54:45 +00:00
Roger Pau Monné
d9b664fd45 xen-netback: fix LOR
lock order reversal: (sleepable after non-sleepable)
 1st 0xfffffe00357ff538 xnb_softc (xen netback softc lock) @ /usr/src/sys/dev/xen/netback/netback.c:1069
 2nd 0xffffffff81fdccb0 intrsrc (intrsrc) @ /usr/src/sys/x86/x86/intr_machdep.c:224

There's no need to hold the lock since the cleaning of the interrupt
cannot happen in parallel due to the XNBF_IN_SHUTDOWN flag being set.
Note that the locking in netback needs some improvement or
clarification.

While there also remove a double newline.

Sponsored by:   Citrix Systems R&D
2018-06-26 14:07:11 +00:00
Roger Pau Monné
de06f02ea4 xen: check if there are clients waiting in gnttab_end_foreign_access_references
Without a call to check_free_callbacks() clients waiting for grant
references would not be woken up even when there are sufficient grant
references available.

The check was likely left out as a mistake when the function was first
added.

Note that other functions used to free grant references already call
check_free_callbacks.

Submitted by:		pratyush
Reviewed by:		royger
Differential review:	https://reviews.freebsd.org/D15899
2018-06-21 15:47:47 +00:00
Roger Pau Monné
791ca5907a xen/evtchn: fix LOR in evtchn device
Remove the device from the list before unbinding it. Doing it in this
order allows calling xen_intr_unbind without holding the bind_mutex
lock.

Sponsored by:	Citrix Systems R&D
2018-05-24 10:20:42 +00:00
Roger Pau Monné
e2e4a0e02a xen-blkback: don't unbind the interrupt while holding the lock
There's no need to perform the interrupt unbind while holding the
blkback lock, and doing so leads to the following LOR:

lock order reversal: (sleepable after non-sleepable)
 1st 0xfffff8000802fe90 xbbd1 (xbbd1) @ /usr/src/sys/dev/xen/blkback/blkback.c:3423
 2nd 0xffffffff81fdf890 intrsrc (intrsrc) @ /usr/src/sys/x86/x86/intr_machdep.c:224
stack backtrace:
#0 0xffffffff80bdd993 at witness_debugger+0x73
#1 0xffffffff80bdd814 at witness_checkorder+0xe34
#2 0xffffffff80b7d798 at _sx_xlock+0x68
#3 0xffffffff811b3913 at intr_remove_handler+0x43
#4 0xffffffff811c63ef at xen_intr_unbind+0x10f
#5 0xffffffff80a12ecf at xbb_disconnect+0x2f
#6 0xffffffff80a12e54 at xbb_shutdown+0x1e4
#7 0xffffffff80a10be4 at xbb_frontend_changed+0x54
#8 0xffffffff80ed66a4 at xenbusb_back_otherend_changed+0x14
#9 0xffffffff80a2a382 at xenwatch_thread+0x182
#10 0xffffffff80b34164 at fork_exit+0x84
#11 0xffffffff8101ec9e at fork_trampoline+0xe

Reported by:    Nathan Friess <nathan.friess@gmail.com>
Sponsored by:   Citrix Systems R&D
2018-05-24 10:19:54 +00:00
Roger Pau Monné
b3a5ba30e5 dev/xenstore: prevent transaction hijacking
The user-space xenstore device is currently lacking a check to make
sure that the caller is only using transaction ids currently assigned
to it. This allows users of the xenstore device to hijack transactions
not started by them, although the scope is limited to transactions
started by the same domain.

Tested by:      Nathan Friess <nathan.friess@gmail.com>
Sponsored by:   Citrix Systems R&D
2018-05-24 10:18:31 +00:00
Roger Pau Monné
5d7476948f dev/xenstore: add support for watches
Allow user-space applications to register watches using the xenstore
device.  This is needed in order to run toolstack operations on
domains different than the one where xenstore is running (in which
case the device is not used, since the connection to xenstore is done
using a plain socket).

Tested by:      Nathan Friess <nathan.friess@gmail.com>
Sponsored by:   Citrix Systems R&D
2018-05-24 10:17:49 +00:00
Roger Pau Monné
7c743c89a0 xenstore: don't wait with the PCATCH flag
Due to the current synchronous xenstore implementation in FreeBSD, we
cannot return from xs_read_reply without reading a reply, or else the
ring gets out of sync and the next request will read the previous
reply and crash due to the type mismatch. A proper solution involves
making use of the req_id field in the message and allowing multiple
in-flight messages at the same time on the ring.

Remove the PCATCH flag so that signals don't interrupt the wait.

Tested by:      Nathan Friess <nathan.friess@gmail.com>
Sponsored by:   Citrix Systems R&D
2018-05-24 10:17:03 +00:00
Roger Pau Monné
5f8f664619 xenstore: remove the suspend sx lock
There's no need to prevent suspend while doing xenstore transactions,
callers of transactions are supposed to be prepared for a transaction
to fail.

This fixes a bug that could be triggered from the xenstore user-space
device, since starting a transaction from user-space would result in
returning there with a sx lock held, that causes a WITNESS check to
trigger.

Tested by:      Nathan Friess <nathan.friess@gmail.com>
Sponsored by:   Citrix Systems R&D
2018-05-24 10:16:11 +00:00
Roger Pau Monné
ffe4446b33 xen-blkback: do not use state 3 (XenbusStateInitialised)
Linux will not connect to a backend that's in state 3
(XenbusStateInitialised), it needs to be in state 2
(XenbusStateInitWait) for Linux to attempt to connect to the backend.

The protocol seems to suggest that the backend should indeed wait in
state 2 for the frontend to connect, which makes state 3 unusable for
disk backends.

Also make sure blkback will connect to the frontend if the frontend
reaches state 3 (XenbusStateInitialised) before blkback has processed
the results from the hotplug script (Submitted by Nathan Friess).

MFC after:	1 week
2018-05-22 08:51:16 +00:00
Matt Macy
d7c5a620e2 ifnet: Replace if_addr_lock rwlock with epoch + mutex
Run on LLNW canaries and tested by pho@

gallatin:
Using a 14-core, 28-HTT single socket E5-2697 v3 with a 40GbE MLX5
based ConnectX 4-LX NIC, I see an almost 12% improvement in received
packet rate, and a larger improvement in bytes delivered all the way
to userspace.

When the host receiving 64 streams of netperf -H $DUT -t UDP_STREAM -- -m 1,
I see, using nstat -I mce0 1 before the patch:

InMpps OMpps  InGbs  OGbs err TCP Est %CPU syscalls csw     irq GBfree
4.98   0.00   4.42   0.00 4235592     33   83.80 4720653 2149771   1235 247.32
4.73   0.00   4.20   0.00 4025260     33   82.99 4724900 2139833   1204 247.32
4.72   0.00   4.20   0.00 4035252     33   82.14 4719162 2132023   1264 247.32
4.71   0.00   4.21   0.00 4073206     33   83.68 4744973 2123317   1347 247.32
4.72   0.00   4.21   0.00 4061118     33   80.82 4713615 2188091   1490 247.32
4.72   0.00   4.21   0.00 4051675     33   85.29 4727399 2109011   1205 247.32
4.73   0.00   4.21   0.00 4039056     33   84.65 4724735 2102603   1053 247.32

After the patch

InMpps OMpps  InGbs  OGbs err TCP Est %CPU syscalls csw     irq GBfree
5.43   0.00   4.20   0.00 3313143     33   84.96 5434214 1900162   2656 245.51
5.43   0.00   4.20   0.00 3308527     33   85.24 5439695 1809382   2521 245.51
5.42   0.00   4.19   0.00 3316778     33   87.54 5416028 1805835   2256 245.51
5.42   0.00   4.19   0.00 3317673     33   90.44 5426044 1763056   2332 245.51
5.42   0.00   4.19   0.00 3314839     33   88.11 5435732 1792218   2499 245.52
5.44   0.00   4.19   0.00 3293228     33   91.84 5426301 1668597   2121 245.52

Similarly, netperf reports 230Mb/s before the patch, and 270Mb/s after the patch

Reviewed by:	gallatin
Sponsored by:	Limelight Networks
Differential Revision:	https://reviews.freebsd.org/D15366
2018-05-18 20:13:34 +00:00
Roger Pau Monné
2602ef7cfa xen: fix gntdev
Current interface to the gntdev in FreeBSD is wrong, and mostly worked
out of luck before the PTI FreeBSD fixes, when kernel and user-space
where sharing the same page tables.

On FreeBSD ioctls have the size of the passed struct encoded in the
ioctl number, because the generic ioctl handler in the OS takes care
of copying the data from user-space to kernel space, and then calls
the device specific ioctl handler. Thus using ioctl structs with
variable sizes is not possible.

The fix is to turn the array of structs at the end of
ioctl_gntdev_alloc_gref and ioctl_gntdev_map_grant_ref into pointers,
that can be properly accessed from the kernel gntdev driver using the
copyin/copyout functions. Note that this is exactly how it's done for
the privcmd driver.

Sponsored by:   Citrix Systems R&D
2018-05-02 10:19:17 +00:00
Ed Maste
315fbaeca2 Correct pseudo misspelling in sys/ comments
contrib code and #define in intel_ata.h unchanged.
2018-02-23 18:15:50 +00:00
Pedro F. Giffuni
ac2fffa4b7 Revert r327828, r327949, r327953, r328016-r328026, r328041:
Uses of mallocarray(9).

The use of mallocarray(9) has rocketed the required swap to build FreeBSD.
This is likely caused by the allocation size attributes which put extra pressure
on the compiler.

Given that most of these checks are superfluous we have to choose better
where to use mallocarray(9). We still have more uses of mallocarray(9) but
hopefully this is enough to bring swap usage to a reasonable level.

Reported by:	wosch
PR:		225197
2018-01-21 15:42:36 +00:00
Pedro F. Giffuni
26c1d774b5 dev: make some use of mallocarray(9).
Focus on code where we are doing multiplications within malloc(9). None of
these is likely to overflow, however the change is still useful as some
static checkers can benefit from the allocation attributes we use for
mallocarray.

This initial sweep only covers malloc(9) calls with M_NOWAIT. No good
reason but I started doing the changes before r327796 and at that time it
was convenient to make sure the sorrounding code could handle NULL values.
2018-01-13 22:30:30 +00:00
Pedro F. Giffuni
718cf2ccb9 sys/dev: further adoption of SPDX licensing ID tags.
Mainly focus on files that use BSD 2-Clause license, however the tool I
was using misidentified many licenses so this was mostly a manual - error
prone - task.

The Software Package Data Exchange (SPDX) group provides a specification
to make it easier for automated tools to detect and summarize well known
opensource licenses. We are gradually adopting the specification, noting
that the tags are considered only advisory and do not, in any way,
superceed or replace the license texts.
2017-11-27 14:52:40 +00:00
Pedro F. Giffuni
7282444b10 sys/dev: further adoption of SPDX licensing ID tags.
Mainly focus on files that use BSD 3-Clause license.

The Software Package Data Exchange (SPDX) group provides a specification
to make it easier for automated tools to detect and summarize well known
opensource licenses. We are gradually adopting the specification, noting
that the tags are considered only advisory and do not, in any way,
superceed or replace the license texts.

Special thanks to Wind River for providing access to "The Duke of
Highlander" tool: an older (2014) run over FreeBSD tree was useful as a
starting point.
2017-11-20 19:36:21 +00:00
Ian Lepore
c82d887d47 Stop calling atrtc_set() from the xen timer clock_settime() method. That
removes the only reference to atrtc_set() from outside of atrtc.c, so make
it static.

The xen timer driver registers as a realtime clock with 1us resolution.  In
the past that resulted in only the xen timer's clock_settime() getting
called, so it would call atrtc_set() to set the hardware clock as well.  As
of r32090, the clock_settime() method of all registered realtime clocks gets
called, so the xen driver no longer needs to chain-call the lower-resolution
driver.

Thanks to royger@ for talking me through the xen stuff, and for testing.
2017-08-11 19:02:11 +00:00
Jason A. Harmening
eb36b1d0bc Clean up MD pollution of bus_dma.h:
--Remove special-case handling of sparc64 bus_dmamap* functions.
  Replace with a more generic mechanism that allows MD busdma
  implementations to generate inline mapping functions by
  defining WANT_INLINE_DMAMAP in <machine/bus_dma.h>.  This
  is currently useful for sparc64, x86, and arm64, which all
  implement non-load dmamap operations as simple wrappers
  around map objects which may be bus- or device-specific.

--Remove NULL-checked bus_dmamap macros.  Implement the
  equivalent NULL checks in the inlined x86 implementation.
  For non-x86 platforms, these checks are a minor pessimization
  as those platforms do not currently allow NULL maps.  NULL
  maps were originally allowed on arm64, which appears to have
  been the motivation behind adding arm[64]-specific barriers
  to bus_dma.h, but that support was removed in r299463.

--Simplify the internal interface used by the bus_dmamap_load*
  variants and move it to bus_dma_internal.h

--Fix some drivers that directly include sys/bus_dma.h
  despite the recommendations of bus_dma(9)

Reviewed by:	kib (previous revision), marius
Differential Revision:	https://reviews.freebsd.org/D10729
2017-07-01 05:35:29 +00:00
Ryan Libby
98018db419 netfront.c: avoid gcc variably-modified warning
gcc produces a "variably modified X at file scope" warning for
structures that use these size definitions.  I think the definitions are
actually fine but can be rephrased with the __CONST_RING_SIZE macro more
cleanly anyway.

Reviewed by:	markj, royger
Approved by:	markj (mentor)
Sponsored by:	Dell EMC Isilon
Differential revision:	https://reviews.freebsd.org/D11417
2017-06-30 22:14:22 +00:00
Colin Percival
c74415ed3b Skip setting the MTU in the netfront driver (xn# devices) if the new MTU
is the same as the old MTU.  In particular, on Amazon EC2 "T2" instances
without this change, the network interface is reinitialized every 30
minutes due to the MTU being (re)set when a new DHCP lease is obtained,
causing packets to be dropped, along with annoying syslog messages about
the link state changing.

As a side note, the behaviour this commit fixes was responsible for
exposing the locking problems fixed via r318523 and r318631.

Maintainers of other network interface drivers may wish to consider making
the corresponding change; the handling of SIOCSIFMTU does not seem to
exhibit a great deal of consistency between drivers.

MFC after:	1 week
2017-06-02 07:03:31 +00:00
Roger Pau Monné
477a40c74f xen/netfront: don't drop the RX lock in xn_rxeof
Since netfront uses different locks for the RX and TX paths there's no need to
drop the RX lock before calling if_input.

Suggested by:	jhb
Tested by:	cperciva
Sponsored by:	Citrix Systems R&D
MFC with:	r318523
2017-05-22 11:33:44 +00:00
Roger Pau Monné
bf319173f2 xen/netfront: don't drop the ring RX lock with inconsistent ring state
Make sure the RX ring lock is only released when the state of the ring is
consistent, or else concurrent calls to xn_rxeof might get an inconsistent ring
state and thus some packets might be processed twice.

Note that this is not very common, and could only happen when an interrupt is
delivered while in xn_ifinit.

Reported by:	cperciva
Tested by:	cperciva
MFC after:	1 week
Sponsored by:	Citrix Systems R&D
2017-05-19 08:19:51 +00:00
Roger Pau Monné
e5d27b37e3 xen/blkfront: correctly detach a disk with active users
Call disk_gone when the backend switches to the "Closing" state and blkfront
still has pending users. This allows the disk to be detached, and will call
into xbd_closing by itself when the geom layout cleanup has finished.

Reported by:		bapt
Tested by:		manu
Reviewed by:		bapt
Sponsored by:		Citrix Systems R&D
MFC after:		1 week
Differential revision:	https://reviews.freebsd.org/D10772
2017-05-19 08:11:15 +00:00
Gleb Smirnoff
6286dc78d4 Remove unneeded include of vm_phys.h. 2017-04-17 16:51:04 +00:00
Kevin Lo
dd4b1792c7 Don't initialize if_output to ether_output(), ether_ifattach() does it for
us already.  While here, remove NOTYET code since if_watchdog is no longer
used.

Reviewed by:	royger
MFC after:	3 days
2017-03-24 01:23:07 +00:00
Roger Pau Monné
a81683c371 xen/netfront: fix inbound packet flags for checksum offload
Currently netfront is setting the flags of inbound packets with the checksum
not present (offloaded) to (CSUM_IP_CHECKED | CSUM_IP_VALID | CSUM_DATA_VALID |
CSUM_PSEUDO_HDR). According to the mbuf(9) man page this is not the correct
combination of flags, it should instead be (CSUM_DATA_VALID |
CSUM_PSEUDO_HDR).

Reviewed by:		Wei Liu <wei.liu2@citrix.com>
MFC after:		2 weeks
Sponsored by:		Citrix Systems R&D
Differential revision:	https://reviews.freebsd.org/D9831
2017-03-07 09:18:52 +00:00
Roger Pau Monné
41716b8d51 xenstore: fix suspension when using the xenstore device
Lock the xenstore request mutex when suspending user-space processes, in order
to prevent any process from holding this lock when going into suspension, or
else the xenstore suspend process is going to deadlock.

Submitted by:		Liuyingdong <liuyingdong@huawei.com>
Reviewed by:		royger
MFC after:		2 weeks
Differential revision:	https://reviews.freebsd.org/D9638
2017-03-07 09:17:48 +00:00
Roger Pau Monné
8dee0e9bd6 xen: add support for canceled suspend
When running on Xen, it's possible that a suspend request to the hypervisor
fails (return from HYPERVISOR_suspend different than 0). This means that the
suspend hasn't succeed, and the resume procedure needs to properly handle this
case.

First of all, when such situation happens there's no need to reset the vector
callback, hypercall page, shared info, event channels or grant table, because
it's state is preserved. Also, the PV drivers don't need to be reset to the
initial state, since the connection with the backed has not been interrupted.

Submitted by:		Liuyingdong <liuyingdong@huawei.com>
Reviewed by:		royger
MFC after:		2 weeks
Differential revision:	https://reviews.freebsd.org/D9635
2017-03-07 09:16:51 +00:00
Roger Pau Monné
b8aa60db3d xen/gntdev: prevent unsynchronized accesses to the map entry
vm_map_lookup_done should only be called when the gntdev has finished poking at
the entry.

Reported by:	alc
Reviewed by:	alc
MFC after:	1 week
Sponsored by:	Citrix Systems R&D
2017-02-27 15:31:15 +00:00
Warner Losh
28586889c2 Convert PCIe Hot Plug to using pci_request_feature
Convert PCIe hot plug support over to asking the firmware, if any, for
permission to use the HotPlug hardware. Implement pci_request_feature
for ACPI. All other host pci connections to allowing all valid feature
requests.

Sponsored by: Netflix
2017-02-25 06:11:59 +00:00
Alan Somers
65244d585f Fix the xnb(4) unit tests
One test was inadvertently expecting a bug in the kernel's sscanf
implementation circa 2012. I don't know when that bug got fixed.

Reported by:	royger
Reviewed by:	royger
MFC after:	3 weeks
Sponsored by:	Spectra Logic Corp
Differential Revision:	https://reviews.freebsd.org/D9766
2017-02-23 16:54:30 +00:00
Alan Somers
407d708cc7 Misc Coverity fixes in xnb(4)
Most of these are null pointer dereferences or missing error checks in the
unit tests. One is a missing error check in xnb_attach_failed. None can
cause real problems in running systems.

Reported by:	Coverity
CIDs:		1092469 1092468 1092467 2092466 1092465 1092512 1092511 1092510
CIDs:		1092510 1092509 1092508 1092507
Reviewed by:	royger
MFC after:	3 weeks
Sponsored by:	Spectra Logic Corp
Differential Revision:	https://reviews.freebsd.org/D9234
2017-02-23 16:31:04 +00:00
Roger Pau Monné
d908d2ef5e xen/gndev: use UOFF_TO_IDX instead of OFF_TO_IDX
The Xen grant table device treats the mmap offset parameter as an unsigned
type, and as so it must use the newly introduced UOFF_TO_IDX.

Sponsored by:   Citrix Systems R&D
MFC after:      2 weeks
X-MFC-with:     r313690
2017-02-23 13:14:28 +00:00
Roger Pau Monné
de7d5ac603 xen/timer: mark the Xen PV timer as not safe for suspension
Note that the timer itself fully supports suspension, but due to the lack of
ordering during the resume process FreeBSD cannot guarantee that the timer is
resumed before any device attempts to use it.

Submitted by:		Liuyingdong <liuyingdong@huawei.com>
Reviewed by:		royger
Differential Revision:	https://reviews.freebsd.org/D9639
2017-02-22 09:22:17 +00:00
Olivier Houchard
36ea572167 In the netfront_rxq struct, we should use NET_RX_RING_SIZE, not
NET_TX_RING_SIZE.

Reviewed by:	royger
2017-01-03 17:24:56 +00:00
Roger Pau Monné
ca7af67ac9 xen: fix IPI setup with EARLY_AP_STARTUP
Current Xen IPI setup functions require that the caller provide a device in
order to obtain the name of the interrupt from it. With early AP startup this
device is no longer available at the point where IPIs are bound, and a KASSERT
would trigger:

panic: NULL pcpu device_t
cpuid = 0
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xffffffff82233a20
vpanic() at vpanic+0x186/frame 0xffffffff82233aa0
kassert_panic() at kassert_panic+0x126/frame 0xffffffff82233b10
xen_setup_cpus() at xen_setup_cpus+0x5b/frame 0xffffffff82233b50
mi_startup() at mi_startup+0x118/frame 0xffffffff82233b70
btext() at btext+0x2c

Fix this by no longer requiring the presence of a device in order to bind IPIs,
and simply use the "cpuX" format where X is the CPU identifier in order to
describe the interrupt.

Reported by:            sbruno, cperciva
Tested by:              sbruno
X-MFC-With:             r310177
Sponsored by:           Citrix Systems R&D
2016-12-22 16:09:44 +00:00
Dimitry Andric
085def3f0a In xbd_connect(), use correct scanf conversion specifiers for the
feature_barrier and feature_flush variables.  Otherwise, adjacent
variables on the stack, such as sector_size, may be overwritten, with
disastrous results.

Note that I did not see a good reason to revert the addition of zero
checks introduced in r310013.  Better safe than sorry.

PR:		215209
Tested by:	royger
MFC after:	3 days
2016-12-14 19:28:19 +00:00
Colin Percival
93954c2da3 Check that blkfront devices have a non-zero number of sectors and a
non-zero sector size.  Such a device would be a virtual disk of zero
bytes; clearly not useful, and not something we should try to attach.

As a fortuitous side effect, checking that these values are non-zero
here results in them not *becoming* zero later on the function.  This
odd behaviour began with r309124 (clang 3.9.0) but is challenging to
debug; making any changes to this function whatsoever seems to affect
the llvm optimizer behaviour enough to make the unexpected zeroing of
the sector_size variable cease.

PR:		215209
Security:	The potential for variables to unexpectedly become zero
		has worrying consequences for security in general, but
		not so much in this particular context.
2016-12-13 06:54:13 +00:00
Roger Pau Monné
78eb32933b xen: add a grant-table user-space device
A grant-table user-space device will allow user-space applications to map
and share grants (Xen way to share memory) among Xen domains. This grant
table user-space device has been tested with the QEMU Qdisk Xen backed.

Submitted by:		jaggi
Reviewed by:		royger
Differential review:	https://reviews.freebsd.org/D7293
2016-10-31 13:12:58 +00:00
Roger Pau Monné
b2fd6999db xen/netfront: fix statistics
Fix the statistics used by netfront.

Reported by:    Trond.Endrestol@ximalas.info
Submitted by:   ae
Reviewed by:    royger, Wei Liu <wei.liu2@citrix.com>
MFC after:	4 weeks
PR:		213439
2016-10-31 11:31:11 +00:00
Roger Pau Monné
3c9d594089 xen-netfront: improve the logic when handling nic features from ioctl
Simplify the logic involved in changing the nic features on the fly, and
only reset the frontend when really needed (when changing RX features). Also
don't return from the ioctl until the interface has been properly
reconfigured.

While there, make sure XN_CSUM_FEATURES is used consistently.

Reported by:	julian
MFC after:	5 days
X-MFC-with:	r303488
Sponsored by:	Citrix Systems R&D
2016-08-05 15:48:56 +00:00
Roger Pau Monné
339690b541 xen-netfront: fix trying to send packets with disconnected netfront
In certain circumstances xn_txq_mq_start might be called with num_queues ==
0 during the resume phase after a migration, which can trigger a KASSERT.
Fix this by making sure the carrier is on before trying to transmit, or else
return that the queues are full.

Just as a note, I haven't been able to reproduce this crash on my test
systems, but I still think it's possible and worth fixing.

Reported by:		Karl Pielorz <kpielorz_lst@tdx.co.uk>
Sponsored by:		Citrix Systems R&D
MFC after:		5 days
Reviewed by:		Wei Liu <wei.liu2@citrix.com>
Differential revision:	https://reviews.freebsd.org/D7349
2016-07-29 16:33:45 +00:00
Roger Pau Monné
b8d1a37638 xen/timer: re-introduce the inittodr call in the resume path
r298930 removed the inittodr call, but it seems like this prevents
"calcru: runtime went backwards ..." messages from occasionally appearing
when resuming from migration.

Reported by:	Karl Pielorz <kpielorz@tdx.co.uk>
Sponsored by:	Citrix Systems R&D
2016-06-09 16:15:01 +00:00
Roger Pau Monné
6567125372 xen-netfront: fix initialization
A couple of mostly cosmetic fixes for the final initialization of netfront:

 - Switch to "connected" state before starting to kick the rings.
 - Correctly use "rxq" in the initialization loop (previously rxq was not
   updated in the loop, and netfront would kick np->rxq[N] several times).
 - Declare and define xn_connect as static, it's not used outside of this
   file.

Reviewed by:		Wei Liu <wei.liu2@citrix.com>
Sponsored by:		Citrix Systems R&D
Differential revision:	https://reviews.freebsd.org/D6657
2016-06-06 15:01:24 +00:00
Roger Pau Monné
3e0522bc8f xen-blkback: fix error path on failed attach
The current error path in case of failure during attach/initialization is
not correct and leaves blkback in a stuck state. This is due to blkback
waiting for blkfront to switch to state XenbusStateClosed, but if blkfront
never attached (because the guest is not even started) it cannot possibly
make it to that state.

Instead just wait for the frontend to be in a state different than
XenbusStateConnected in order to proceed with the shutdown. Also, it is
wrong to call xbb_detach directly because it destroys the lock which can
still be used by xbb_frontend_changed.

Sponsored by: Citrix Systems R&D
2016-06-03 11:39:35 +00:00
Roger Pau Monné
de0bad0001 blkback: add support for hotplug scripts
Hotplug scripts are needed in order to use fancy disk configurations in xl,
like iSCSI disks. The job of hotplug scripts is to locally attach the disk
and present it to blkback as a block device or a regular file.

This change introduces a new xenstore node in the blkback hierarchy, called
"physical-device-path". This is a straigh replacement for the "params" node,
which was used before.

Hotplug scripts will need to read the "params" node, perform whatever
actions are necessary and then write the "physical-device-path" node. The
hotplug script is also in charge of detaching the disk once the domain has
been shutdown.

Sponsored by: Citrix Systems R&D
2016-06-03 11:38:52 +00:00
Roger Pau Monné
bf7b50db15 xen-netfront: use callout_reset_curcpu instead of callout_reset
This should help distribute the load of the callbacks.

Suggested by:	hps
Sponsored by:	Citrix Systems R&D
2016-06-02 14:25:10 +00:00
Roger Pau Monné
c2d12e5e0d xen-netfront: perform an interface reset when changing options
The PV backend will only pick the new options when the interface is detached
and reattached again, so perform a full reset when changing options. This is
very fast, and should not be noticeable by the user.

Reviewed by:		Wei Liu <wei.liu2@citrix.com>
Sponsored by:		Citrix Systems R&D
Differential revision:	https://reviews.freebsd.org/D6658
2016-06-02 11:21:00 +00:00
Roger Pau Monné
d039b0700b xen-netfront: release grant references used for the shared rings
Just calling gnttab_end_foreign_access_ref doesn't free the references,
instead call gnttab_end_foreign_access with a NULL page argument in order to
have the grant references freed. The code that maps the ring
(xenbus_map_ring) already uses gnttab_grant_foreign_access which takes care
of allocating a grant reference.

Reviewed by:		Wei Liu <wei.liu2@citrix.com>
Sponsored by:		Citrix Systems R&D
Differential revision:	https://reviews.freebsd.org/D6608
2016-06-02 11:19:16 +00:00
Roger Pau Monné
c21b47d8c9 xen-netfront: fix two hotplug related issues
This patch fixes two issues seen on hot-unplug. The first one is a panic
caused by calling ether_ifdetach after freeing the internal netfront queue
structures. ether_ifdetach will call xn_qflush, and this needs to be done
before freeing the queues. This prevents the following panic:

Fatal trap 9: general protection fault while in kernel mode
cpuid = 2; apic id = 04
instruction pointer	= 0x20:0xffffffff80b1687f
stack pointer	        = 0x28:0xfffffe009239e770
frame pointer	        = 0x28:0xfffffe009239e780
code segment		= base 0x0, limit 0xfffff, type 0x1b
			= DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags	= interrupt enabled, resume, IOPL = 0
current process		= 0 (thread taskq)
[ thread pid 0 tid 100015 ]
Stopped at      strlen+0x1f:    movq    (%rcx),%rax
db> bt
Tracing pid 0 tid 100015 td 0xfffff800038a6000
strlen() at strlen+0x1f/frame 0xfffffe009239e780
kvprintf() at kvprintf+0xfa0/frame 0xfffffe009239e890
vsnprintf() at vsnprintf+0x31/frame 0xfffffe009239e8b0
kassert_panic() at kassert_panic+0x5a/frame 0xfffffe009239e920
__mtx_lock_flags() at __mtx_lock_flags+0x164/frame 0xfffffe009239e970
xn_qflush() at xn_qflush+0x59/frame 0xfffffe009239e9b0
if_detach() at if_detach+0x17e/frame 0xfffffe009239ea10
netif_free() at netif_free+0x97/frame 0xfffffe009239ea30
netfront_detach() at netfront_detach+0x11/frame 0xfffffe009239ea40
[...]

Another panic can be triggered by hot-plugging a NIC:

Fatal trap 18: integer divide fault while in kernel mode
cpuid = 0; apic id = 00
instruction pointer	= 0x20:0xffffffff80902203
stack pointer	        = 0x28:0xfffffe00508d3660
frame pointer	        = 0x28:0xfffffe00508d36a0
code segment		= base 0x0, limit 0xfffff, type 0x1b
			= DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags	= interrupt enabled, resume, IOPL = 0
current process		= 2960 (ifconfig)
[ thread pid 2960 tid 100088 ]
Stopped at      xn_txq_mq_start+0x33:   divl    %esi,%eax
db> bt
Tracing pid 2960 tid 100088 td 0xfffff8000850aa00
xn_txq_mq_start() at xn_txq_mq_start+0x33/frame 0xfffffe00508d36a0
ether_output() at ether_output+0x570/frame 0xfffffe00508d3720
arprequest() at arprequest+0x433/frame 0xfffffe00508d3820
arp_ifinit() at arp_ifinit+0x49/frame 0xfffffe00508d3850
xn_ioctl() at xn_ioctl+0x1a2/frame 0xfffffe00508d3890
in_control() at in_control+0x882/frame 0xfffffe00508d3910
ifioctl() at ifioctl+0xda1/frame 0xfffffe00508d39a0
kern_ioctl() at kern_ioctl+0x246/frame 0xfffffe00508d3a00
sys_ioctl() at sys_ioctl+0x171/frame 0xfffffe00508d3ae0
amd64_syscall() at amd64_syscall+0x2db/frame 0xfffffe00508d3bf0
Xfast_syscall() at Xfast_syscall+0xfb/frame 0xfffffe00508d3bf0
--- syscall (54, FreeBSD ELF64, sys_ioctl), rip = 0x8011e185a, rsp =
0x7fffffffe478, rbp = 0x7fffffffe4c0 ---

This is caused by marking the driver as active before it's fully
initialized, and thus calling xn_txq_mq_start with num_queues set to 0.

Reviewed by:		Wei Liu <wei.liu2@citrix.com>
Sponsored by:		Citrix Systems R&D
Differential revision:	https://reviews.freebsd.org/D6646
2016-06-02 11:18:02 +00:00
Roger Pau Monné
da695b059d xen-netfront: switch to using an interrupt handler
In order to use custom taskqueues we would have to mask the interrupt, which
is basically what is already done for an interrupt handler, or else we risk
loosing interrupts. This switches netfront to the same interrupt handling
that was done before multiqueue support was added.

Reviewed by:	Wei Liu <wei.liu2@citrix.com>
Sponsored by:	Citrix Systems R&D
2016-06-02 11:16:35 +00:00
Roger Pau Monné
2568ee6747 xen-netfront: always keep the Rx ring full of requests
This is based on Linux commit 1f3c2eba1e2d866ef99bb9b10ade4096e3d7607c from
David Vrabel:

A full Rx ring only requires 1 MiB of memory.  This is not enough memory
that it is useful to dynamically scale the number of Rx requests in the ring
based on traffic rates, because:

a) Even the full 1 MiB is a tiny fraction of a typically modern Linux
   VM (for example, the AWS micro instance still has 1 GiB of memory).

b) Netfront would have used up to 1 MiB already even with moderate
   data rates (there was no adjustment of target based on memory
   pressure).

c) Small VMs are going to typically have one VCPU and hence only one
   queue.

Keeping the ring full of Rx requests handles bursty traffic better than
trying to converge on an optimal number of requests to keep filled.

Reviewed by:	Wei Liu <wei.liu2@citrix.com>
Sponsored by:	Citrix Systems R&D
2016-06-02 11:14:26 +00:00
Roger Pau Monné
d9a66b6ded xen-netfront: fix receiving TSO packets
Currently FreeBSD is not properly fetching the TSO information from the Xen
PV ring, and thus the received packets didn't have all the necessary
information, like the segment size or even the TSO flag set.

Sponsored by: Citrix Systems R&D
2016-06-02 11:12:11 +00:00
John Baldwin
fdce57a042 Add an EARLY_AP_STARTUP option to start APs earlier during boot.
Currently, Application Processors (non-boot CPUs) are started by
MD code at SI_SUB_CPU, but they are kept waiting in a "pen" until
SI_SUB_SMP at which point they are released to run kernel threads.
SI_SUB_SMP is one of the last SYSINIT levels, so APs don't enter
the scheduler and start running threads until fairly late in the
boot.

This change moves SI_SUB_SMP up to just before software interrupt
threads are created allowing the APs to start executing kernel
threads much sooner (before any devices are probed).  This allows
several initialization routines that need to perform initialization
on all CPUs to now perform that initialization in one step rather
than having to defer the AP initialization to a second SYSINIT run
at SI_SUB_SMP.  It also permits all CPUs to be available for
handling interrupts before any devices are probed.

This last feature fixes a problem on with interrupt vector exhaustion.
Specifically, in the old model all device interrupts were routed
onto the boot CPU during boot.  Later after the APs were released at
SI_SUB_SMP, interrupts were redistributed across all CPUs.

However, several drivers for multiqueue hardware allocate N interrupts
per CPU in the system.  In a system with many CPUs, just a few drivers
doing this could exhaust the available pool of interrupt vectors on
the boot CPU as each driver was allocating N * mp_ncpu vectors on the
boot CPU.  Now, drivers will allocate interrupts on their desired CPUs
during boot meaning that only N interrupts are allocated from the boot
CPU instead of N * mp_ncpu.

Some other bits of code can also be simplified as smp_started is
now true much earlier and will now always be true for these bits of
code.  This removes the need to treat the single-CPU boot environment
as a special case.

As a transition aid, the new behavior is available under a new kernel
option (EARLY_AP_STARTUP).  This will allow the option to be turned off
if need be during initial testing.  I plan to enable this on x86 by
default in a followup commit in the next few days and to have all
platforms moved over before 11.0.  Once the transition is complete,
the option will be removed along with the !EARLY_AP_STARTUP code.

These changes have only been tested on x86.  Other platform maintainers
are encouraged to port their architectures over as well.  The main
things to check for are any uses of smp_started in MD code that can be
simplified and SI_SUB_SMP SYSINITs in MD code that can be removed in
the EARLY_AP_STARTUP case (e.g. the interrupt shuffling).

PR:		kern/199321
Reviewed by:	markj, gnn, kib
Sponsored by:	Netflix
2016-05-14 18:22:52 +00:00
Roger Pau Monné
107cfbb743 xen-netfront: fix feature detection
Current netfront code relies on xs_scanf returning a value < 0 on error,
which is not right, xs_scanf returns a positive value on error.

MFC after:	3 days
Tested by:	Stephen Jones <StephenJo@LivingComputerMuseum.org>
Sponsored by:	Citrix Systems R&D
2016-05-12 16:18:02 +00:00
Roger Pau Monné
4ea0b4ad1a xen/resume: only send BITMAP IPIs if CPUs > 1
This is quite harmless on HEAD, but it's worse on stable/10 where
lapic_ipi_vectored is the local APIC native IPI implementation. On
stable/10 cpu_ops.ipi_vectored should be used instead.

MFC after:	5 days
Sponsored by:	Citrix Systems R&D
2016-05-11 10:10:25 +00:00
Edward Tomasz Napierala
084d207584 Remove misc NULL checks after M_WAITOK allocations.
MFC after:	1 month
Sponsored by:	The FreeBSD Foundation
2016-05-10 10:26:07 +00:00
Roger Pau Monné
288b2385b8 xen/privcmd: fix integer truncation in IOCTL_PRIVCMD_MMAPBATCH
The size field in the XENMEM_add_to_physmap_range is an uint16_t, and the
privcmd driver was doing an implicit truncation of an int into an uint16_t
when filling the hypercall parameters.

Fix this by adding a loop and making sure privcmd splits ioctl request into
2^16 chunks when issuing the hypercalls.

Reported and tested by:	Marcin Cieslak <saper@saper.info>
Sponsored by:		Citrix Systems R&D
2016-05-06 16:44:46 +00:00
Alan Somers
8907f744ff Improve performance and functionality of the bitstring(3) api
Two new functions are provided, bit_ffs_at() and bit_ffc_at(), which allow
for efficient searching of set or cleared bits starting from any bit offset
within the bit string.

Performance is improved by operating on longs instead of bytes and using
ffsl() for searches within a long. ffsl() is a compiler builtin in both
clang and gcc for most architectures, converting what was a brute force
while loop search into a couple of instructions.

All of the bitstring(3) API continues to be contained in the header file.
Some of the functions are large enough that perhaps they should be uninlined
and moved to a library, but that is beyond the scope of this commit.

sys/sys/bitstring.h:
        Convert the majority of the existing bit string implementation from
        macros to inline functions.

        Properly protect the implementation from inadvertant macro expansion
        when included in a user's program by prefixing all private
        macros/functions and local variables with '_'.

        Add bit_ffs_at() and bit_ffc_at(). Implement bit_ffs() and
        bit_ffc() in terms of their "at" counterparts.

        Provide a kernel implementation of bit_alloc(), making the full API
        usable in the kernel.

        Improve code documenation.

share/man/man3/bitstring.3:
        Add pre-exisiting API bit_ffc() to the synopsis.

        Document new APIs.

        Document the initialization state of the bit strings
        allocated/declared by bit_alloc() and bit_decl().

        Correct documentation for bitstr_size(). The original code comments
        indicate the size is in bytes, not "elements of bitstr_t". The new
        implementation follows this lead. Only hastd assumed "elements"
        rather than bytes and it has been corrected.

etc/mtree/BSD.tests.dist:
tests/sys/Makefile:
tests/sys/sys/Makefile:
tests/sys/sys/bitstring.c:
        Add tests for all existing and new functionality.

include/bitstring.h
	Include all headers needed by sys/bitstring.h

lib/libbluetooth/bluetooth.h:
usr.sbin/bluetooth/hccontrol/le.c:
        Include bitstring.h instead of sys/bitstring.h.

sbin/hastd/activemap.c:
        Correct usage of bitstr_size().

sys/dev/xen/blkback/blkback.c
        Use new bit_alloc.

sys/kern/subr_unit.c:
        Remove hard-coded assumption that sizeof(bitstr_t) is 1.  Get rid of
        unrb.busy, which caches the number of bits set in unrb.map.  When
        INVARIANTS are disabled, nothing needs to know that information.
        callapse_unr can be adapted to use bit_ffs and bit_ffc instead.
        Eliminating unrb.busy saves memory, simplifies the code, and
        provides a slight speedup when INVARIANTS are disabled.

sys/net/flowtable.c:
        Use the new kernel implementation of bit-alloc, instead of hacking
        the old libc-dependent macro.

sys/sys/param.h
        Update __FreeBSD_version to indicate availability of new API

Submitted by:   gibbs, asomers
Reviewed by:    gibbs, ngie
MFC after:      4 weeks
Sponsored by:   Spectra Logic Corp
Differential Revision:  https://reviews.freebsd.org/D6004
2016-05-04 22:34:11 +00:00
Roger Pau Monné
2ed46a6f7d xen/pvclock: set the correct resolution for the Xen PV clock
The Xen PV clock has a resolution of 1ns, so set the resolution to the
highest one that FreeBSD supports, which is 1us.

MFC after:	2 weeks
Sponsored by:	Citrix Systems R&D
2016-05-04 13:49:59 +00:00
Pedro F. Giffuni
453130d9bf sys/dev: minor spelling fixes.
Most affect comments, very few have user-visible effects.
2016-05-03 03:41:25 +00:00
Roger Pau Monné
6e2a4a5f48 xen/control: improve suspend/resume
Implement several small improvements to the suspend/resume Xen sequence:

 - Call the power_suspend_early event before stopping all processes.
 - Stop all processes. This was done implicitly previously by putting all
   the CPUs in a known IPI handler.
 - Warm up the timecounter.
 - Re-initialize the time of day register.

Sponsored by: Citrix Systems R&D
2016-05-02 18:23:48 +00:00
Roger Pau Monné
f8af716b04 xen/time: fix PV clock resolution
The current resolution of the Xen PV clock is too high, which causes an
adjustment of 5s to be applied to it. Reduce the resolution to be the same
as the RTC plus one, so it's always selected as the best source when
available on x86.

Also don't reset the clock on resume, it's pointless and discards any
previous adjustments.

Sponsoted by: Citrix Systems R&D
2016-05-02 16:16:08 +00:00
Roger Pau Monné
eac636b0ce xen/time: allow Dom0 to set the host time
Dom0 should be able to set the host time. This is implemented by first
writing to the RTC (as would be done on bare metal), and then using the
XENPF_settime64 hypercall in order to force Xen to update the wallclock
shared page of all domains.

Sponsored by: Citrix Systems R&D
2016-05-02 16:15:28 +00:00
Roger Pau Monné
af5bb69c5a xen/timer: remove the timer setup loop
With the removal of the usage of the VCPU_SSHOTTMR_future flag, now
all errors from xentimer_vcpu_start_timer should be considered fatal, and
the loop is no longer needed since in case of setting the timer in the past
we will get an event interrupt right away (instead of returning ETIME).

Sponsored by:	Citrix Systems R&D
MFC after :	2 weeks
2016-05-02 16:13:55 +00:00
Roger Pau Monné
04423b622a xen/x86: don't lose event interrupts
On slow platforms with unreliable TSC, such as QEMU emulated machines,
it is possible for the FreeBSD kernel to request the next event in the
past. In that case, in the current implementation of
xentimer_vcpu_start_timer, we simply return -ETIME. To be precise Xen
returns -ETIME and we pass it on. As a consequence we need to loop
around to function to make sure that the timer is properly set.

Instead it is better to always ask the hypervisor for a timer event,
even if the timeout is past. To do that, remove the VCPU_SSHOTTMR_future
flag.

Submitted by:	Stefano Stabellini <sstabellini@kernel.org>
Reviewed by:	royger
MFC after:	2 weeks
2016-05-02 16:13:11 +00:00
Pedro F. Giffuni
057b4402bf sys/dev: extend use of the howmany() macro when available.
We have a howmany() macro in the <sys/param.h> header that is
convenient to re-use as it makes things easier to read.
2016-04-26 15:03:15 +00:00
John Baldwin
652523175b Associate device_t objects with ACPI handles via PCI_CHILD_ADDED().
Previously, the ACPI PCI bus driver did a single pass over the devices in
the namespace that were a child of a given PCI bus to associate the
PCI bus-enumerated device_t devices with the corresponding ACPI handles.
However, this meant that handles were only established at runtime for devices
found during the initial PCI bus scan.

PCI_IOV adds devices that show up after the initial PCI bus scan, and coming
changes to add a bus rescan can also add devices after the initial scan.

This change adds a pci_child_added() callback to the ACPI PCI bus that walks
the namespace to find the ACPI handle for each device that is added.  Using
a callback means that the handle is correctly set for any device no matter
how it is added (initial scan, IOV, or a bus rescan).
2016-04-07 17:15:16 +00:00
John Baldwin
b406166f66 Remove a redundant check.
cpu_suspend_map is always empty if smp_started is false.

Sponsored by:	Netflix
2016-04-05 00:10:07 +00:00
Alexander Motin
d5d7399d2d Pass through some new block device features.
MFC after:	1 month
2016-04-03 11:18:20 +00:00
Sepherosa Ziehau
6dd38b8716 tcp/lro: Use tcp_lro_flush_all in device drivers to avoid code duplication
And factor out tcp_lro_rx_done, which deduplicates the same logic with
netinet/tcp_lro.c

Reviewed by:	gallatin (1st version), hps, zbb, np, Dexuan Cui <decui microsoft com>
Sponsored by:	Microsoft OSTC
Differential Revision:	https://reviews.freebsd.org/D5725
2016-04-01 06:28:33 +00:00
John Baldwin
cbc4d2db75 Remove taskqueue_enqueue_fast().
taskqueue_enqueue() was changed to support both fast and non-fast
taskqueues 10 years ago in r154167.  It has been a compat shim ever
since.  It's time for the compat shim to go.

Submitted by:	Howard Su <howard0su@gmail.com>
Reviewed by:	sephe
Differential Revision:	https://reviews.freebsd.org/D5131
2016-03-01 17:47:32 +00:00
Gleb Smirnoff
56a5f52e80 New way to manage reference counting of mbuf external storage.
The m_ext.ext_cnt pointer becomes a union. It can now hold the refcount
value itself. To tell that m_ext.ext_flags flag EXT_FLAG_EMBREF is used.
The first mbuf to attach a cluster stores the refcount. The further mbufs
to reference the cluster point at refcount in the first mbuf. The first
mbuf is freed only when the last reference is freed.

The benefit over refcounts stored in separate slabs is that now refcounts
of different, unrelated mbufs do not share a cache line.

For EXT_EXTREF mbufs the zone_ext_refcnt is no longer needed, and m_extadd()
becomes void, making widely used M_EXTADD macro safe.

For EXT_SFBUF mbufs the sf_ext_ref() is removed, which was an optimization
exactly against the cache aliasing problem with regular refcounting.

Discussed with:		rrs, rwatson, gnn, hiren, sbruno, np
Reviewed by:		rrs
Differential Revision:	https://reviews.freebsd.org/D5396
Sponsored by:		Netflix
2016-03-01 00:17:14 +00:00
Colin Percival
bcccdfa37b Don't dereference a pointer immediately after determining that it is
equal to NULL. [1]

While I'm here, s/xb/xbd/ (the name changed a long time ago but this
instance wasn't corrected).

Reported by:	PVS-Studio [1]
2016-02-14 13:42:16 +00:00
Roger Pau Monné
8f28a42ee7 xen-netfront: remove useless NULL check in netif_free
xn_ifp is allocated in create_netdev with if_alloc(IFT_ETHER).
According to the current arrangement it can't be NULL.

Coverity ID:		1349805
Submitted by:		Wei Liu <wei.liu2@citrix.com>
Reviewed by:		royger
Sponsored by:		Citrix Systems R&D
Differential revision:	https://reviews.freebsd.org/D5252
2016-02-11 11:57:12 +00:00
Roger Pau Monné
d4dae2b1fb xen-netfront: rearrange error paths in setup_txqs
Coverity spotted double free errors in error path. Fix that by
removing the extraneous calls.

Coverity ID:		1349798
Submitted by:		Wei Liu <wei.liu2@citrix.com>
Reviewed by:		royger
Sponsored by:		Citrix Systems R&D
Differential revision:	https://reviews.freebsd.org/D5251
2016-02-11 11:53:32 +00:00
Roger Pau Monné
7803499440 xen-netfront: remove pointless assignment in xn_ioctl
The variable error is assigned to 0 before entering the switch.
Assigning error to 0 before break pointless rewrites the real error
value that should be returned.

Coverity ID:		1304974
Submitted by:		Wei Liu <wei.liu2@citrix.com>
Reviewed by:		royger
Sponsored by:		Citrix Systems R&D
Differential revision:	https://reviews.freebsd.org/D5250
2016-02-11 11:50:31 +00:00
Roger Pau Monné
96375eac94 xen-netfront: add multiqueue support
Add support for multiple TX and RX queue pairs. The default number of queues
is set to 4, but can be easily changed from the sysctl node hw.xn.num_queues.

Also heavily refactor netfront driver: break out a bunch of helper
functions and different structures. Use threads to handle TX and RX.
Remove some dead code and fix quite a few bugs as I go along.

Submitted by:		Wei Liu <wei.liu2@citrix.com>
Reviewed by:		royger
Sponsored by:		Citrix Systems R&D
Relnotes:		Yes
Differential Revision:	https://reviews.freebsd.org/D4193
2016-01-20 15:02:43 +00:00
Colin Percival
cbb261aec7 Add two more assertions to catch busdma problems. Each segment provided
by busdma to the blkfront driver must be an integer number of sectors,
and must be aligned in memory on a "sector" boundary.

Having these assertions yesterday would have made finding the bug fixed
in r293698 somewhat easier.
2016-01-11 21:02:30 +00:00
Roger Pau Monné
1522652230 xen: fix dropping bitmap IPIs during resume
Current Xen resume code clears all pending bitmap IPIs on resume, which is
not correct. Instead re-inject bitmap IPI vectors on resume to all CPUs in
order to acknowledge any pending bitmap IPIs.

Sponsored by:		Citrix Systems R&D
MFC after:		2 weeks
2015-11-18 18:11:19 +00:00
Roger Pau Monné
a55a04a892 xen-blkfront: add support for unmapped IO
Using unmapped IO is really beneficial when running inside of a VM,
since it avoids IPIs to other vCPUs in order to invalidate the
mappings.

This patch adds unmapped IO support to blkfront. The following tests
results have been obtained when running on a Xen host without HAP:

PVHVM
     3165.84 real      6354.17 user      4483.32 sys
PVHVM with unmapped IO
     2099.46 real      4624.52 user      2967.38 sys

This is because when running using shadow page tables TLB flushes and
range invalidations are much more expensive, so using unmapped IO
provides a very important performance boost.

Sponsored by:	Citrix Systems R&D
MFC after:	2 weeks
X-MFC-with:	r290610

dev/xen/blkfront/blkfront.c:
 - Add and announce support for unmapped IO.
2015-11-09 12:22:44 +00:00
Roger Pau Monné
d5b4f139f5 xen-netfront: remove unused header files
Submitted by:		Wei Liu <wei.liu2@citrix.com>
Reviewed by:		royger
Sponsored by:		Citrix Systems R&D
Differential Revision:	https://reviews.freebsd.org/D4079
2015-11-05 14:37:17 +00:00
Simon J. Gerraty
ce8df48b73 Do not FALLTHROUGH for SIOC{ADD,DEL}MULTI
ifmedia_ioctl() returns EINVAL

Differential Revision:	3897
Submitted by:	aronen@juniper.net
Reviewed by:	marcel
2015-10-30 17:12:15 +00:00
Roger Pau Monné
f4576dd975 x86/dma_bounce: revert r289834 and r289836
The new load_ma implementation can cause dereferences when used with
certain drivers, back it out until the reason is found:

Fatal trap 12: page fault while in kernel mode
cpuid = 11; apic id = 03
fault virtual address   = 0x30
fault code              = supervisor read data, page not present
instruction pointer     = 0x20:0xffffffff808a2d22
stack pointer           = 0x28:0xfffffe07cc737710
frame pointer           = 0x28:0xfffffe07cc737790
code segment            = base 0x0, limit 0xfffff, type 0x1b
                        = DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags        = interrupt enabled, resume, IOPL = 0
current process         = 13 (g_down)
trap number             = 12
panic: page fault
cpuid = 11
KDB: stack backtrace:
#0 0xffffffff80641647 at kdb_backtrace+0x67
#1 0xffffffff80606762 at vpanic+0x182
#2 0xffffffff806067e3 at panic+0x43
#3 0xffffffff8084eef1 at trap_fatal+0x351
#4 0xffffffff8084f0e4 at trap_pfault+0x1e4
#5 0xffffffff8084e82f at trap+0x4bf
#6 0xffffffff80830d57 at calltrap+0x8
#7 0xffffffff8063beab at _bus_dmamap_load_ccb+0x1fb
#8 0xffffffff8063bc51 at bus_dmamap_load_ccb+0x91
#9 0xffffffff8042dcad at ata_dmaload+0x11d
#10 0xffffffff8042df7e at ata_begin_transaction+0x7e
#11 0xffffffff8042c18e at ataaction+0x9ce
#12 0xffffffff802a220f at xpt_run_devq+0x5bf
#13 0xffffffff802a17ad at xpt_action_default+0x94d
#14 0xffffffff802c0024 at adastart+0x8b4
#15 0xffffffff802a2e93 at xpt_run_allocq+0x193
#16 0xffffffff802c0735 at adastrategy+0xf5
#17 0xffffffff80554206 at g_disk_start+0x426
Uptime: 2m29s
2015-10-26 14:50:35 +00:00
Roger Pau Monné
ee74891fc7 blkfront: add support for unmapped IO
Using unmapped IO is really beneficial when running inside of a VM,
since it avoids IPIs to other vCPUs in order to invalidate the
mappings.

This patch adds unmapped IO support to blkfront. The following tests
results have been obtained when running on a Xen host without HAP:

PVHVM
     3165.84 real      6354.17 user      4483.32 sys
PVHVM with unmapped IO
     2099.46 real      4624.52 user      2967.38 sys

This is because when running using shadow page tables TLB flushes and
range invalidations are much more expensive, so using unmapped IO
provides a very important performance boost.

Sponsored by:	Citrix Systems R&D
MFC after:	2 weeks
X-MFC-with:	r289834
2015-10-23 15:46:42 +00:00
Roger Pau Monné
3778878d7c netfront: fix LINT-NOIP
r289587 broke LINT-NOIP kernels because the lro and queued local variables
are defined but not used. Add preprocessor guards around them.

Reported by:	emaste
Sponsored by:	Citrix Systems R&D
2015-10-21 13:53:07 +00:00
Roger Pau Monné
2f9ec994bc xen: Code cleanup and small bug fixes
xen/hypervisor.h:
 - Remove unused helpers: MULTI_update_va_mapping, is_initial_xendomain,
   is_running_on_xen
 - Remove unused define CONFIG_X86_PAE
 - Remove unused variable xen_start_info: note that it's used inpcifront
   which is not built at all
 - Remove forward declaration of HYPERVISOR_crash

xen/xen-os.h:
 - Remove unused define CONFIG_X86_PAE
 - Drop unused helpers: test_and_clear_bit, clear_bit,
   force_evtchn_callback
 - Implement a generic version (based on ofed/include/linux/bitops.h) of
   set_bit and test_bit and prefix them by xen_ to avoid any use by other
   code than Xen. Note that It would be worth to investigate a generic
   implementation in FreeBSD.
 - Replace barrier() by __compiler_membar()
 - Replace cpu_relax() by cpu_spinwait(): it's exactly the same as rep;nop
   = pause

xen/xen_intr.h:
 - Move the prototype of xen_intr_handle_upcall in it: Use by all the
   platform

x86/xen/xen_intr.c:
 - Use BITSET* for the enabledbits: Avoid to use custom helpers
 - test_bit/set_bit has been renamed to xen_test_bit/xen_set_bit
 - Don't export the variable xen_intr_pcpu

dev/xen/blkback/blkback.c:
 - Fix the string format when XBB_DEBUG is enabled: host_addr is typed
   uint64_t

dev/xen/balloon/balloon.c:
 - Remove set but not used variable
 - Use the correct type for frame_list: xen_pfn_t represents the frame
   number on any architecture

dev/xen/control/control.c:
 - Return BUS_PROBE_WILDCARD in xs_probe: Returning 0 in a probe callback
   means the driver can handle this device. If by any chance xenstore is the
   first driver, every new device with the driver is unset will use
   xenstore.

dev/xen/grant-table/grant_table.c:
 - Remove unused cmpxchg
 - Drop unused include opt_pmap.h: Doesn't exist on ARM64 and it doesn't
   contain anything required for the code on x86

dev/xen/netfront/netfront.c:
 - Use the correct type for rx_pfn_array: xen_pfn_t represents the frame
   number on any architecture

dev/xen/netback/netback.c:
 - Use the correct type for gmfn: xen_pfn_t represents the frame number on
   any architecture

dev/xen/xenstore/xenstore.c:
 - Return BUS_PROBE_WILDCARD in xctrl_probe: Returning 0 in a probe callback
   means the driver can handle this device. If by any chance xenstore is the
  first driver, every new device with the driver is unset will use xenstore.

Note that with the changes, x86/include/xen/xen-os.h doesn't contain anymore
arch-specific code. Although, a new series will add some helpers that differ
between x86 and ARM64, so I've kept the headers for now.

Submitted by:		Julien Grall <julien.grall@citrix.com>
Reviewed by:		royger
Differential Revision:	https://reviews.freebsd.org/D3921
Sponsored by:		Citrix Systems R&D
2015-10-21 10:44:07 +00:00
Roger Pau Monné
4955cbf300 xen-netfront: use "netfront" in lock description
Missed from r289585.

Submitted by:		Wei Liu <wei.liu2@citrix.com>
Reviewed by:		royger
Differential Revision:	https://reviews.freebsd.org/D3937
Sponsored by:		Citrix Systems R&D
2015-10-19 15:34:24 +00:00
Roger Pau Monné
1a2928b740 xen-netfront: fix netfront create_dev error path
The failure path for allocating rx grant refs should not try to free tx
grant refs because tx grant refs were allocated after that. Also fix the
error path for xen_net_read_mac.

Submitted by:		Wei Liu <wei.liu2@citrix.com>
Reviewed by:		royger
Differential Revision:	https://reviews.freebsd.org/D3891
Sponsored by:		Citrix Systems R&D
2015-10-19 14:47:37 +00:00
Roger Pau Monné
b31a0d731b xen-netfront: no need to set if_output
This is redundant because ether_ifattach will set that field.

Submitted by:		Wei Liu <wei.liu2@citrix.com>
Reviewed by:		royger
Differential Revision:	https://reviews.freebsd.org/D3918
Sponsored by:		Citrix Systems R&D
2015-10-19 14:37:17 +00:00
Roger Pau Monné
08c9c2e0a1 xen-netfront: remove a bunch of FreeBSD version check
We're way beyond FreeBSD 7 at this point.

Submitted by:		Wei Liu <wei.liu2@citrix.com>
Reviewed by:		royger
Differential Revision:	https://reviews.freebsd.org/D3892
Sponsored by:		Citrix Systems R&D
2015-10-19 14:34:45 +00:00
Roger Pau Monné
177e3f1366 xen-netfront: remove XN_LOCK_{INIT,DESTROY}
Multiqueue feature will make the number of queues dynamic, so XN_LOCK_INIT
won't be that useful. Remove the macro and call mtx_init directly.

XN_LOCK_DESTROY is just dead code.

Submitted by:		Wei Liu <wei.liu2@citrix.com>
Reviewed by:		royger
Differential Revision:	https://reviews.freebsd.org/D3890
Sponsored by:		Citrix Systems R&D
2015-10-19 14:26:40 +00:00
Roger Pau Monné
9a7f9feaf5 xen-netfront: clean up netfront stats structure
Rename it with netfront_ prefix and purge a bunch of unused fields.

Submitted by:		Wei Liu <wei.liu2@citrix.com>
Reviewed by:		royger
Differential Revision:	https://reviews.freebsd.org/D3889
Sponsored by:		Citrix Systems R&D
2015-10-19 14:22:57 +00:00
Roger Pau Monné
d0f3a8b902 xen-netfront: purge page flipping support
Currently neither Linux nor FreeBSD netback supports page flipping. NetBSD
still supports that. It is not sure how many people actually use page
flipping, but page flipping is supposed to be slower than copying nowadays.
It will also shatter frontend / backend address space.

Overall this feature is more of a burden than a benefit.

Submitted by:		Wei Liu <wei.liu2@citrix.com>
Reviewed by:		royger
Differential Revision:	https://reviews.freebsd.org/D3888
Sponsored by:		Citrix Systems R&D
2015-10-19 14:20:06 +00:00
Roger Pau Monné
17374b6c3b xen-netfront: delete all trailing white spaces
Submitted by:		Wei Liu <wei.liu2@citrix.com>
Reviewed by:		royger
Differential Revision:	https://reviews.freebsd.org/D3886
Sponsored by:		Citrix Systems R&D
2015-10-19 14:12:15 +00:00
Roger Pau Monné
a231723cc0 xen/console: Introduce a new console driver for Xen guest
The current Xen console driver is crashing very quickly when using it on
an ARM guest. This is because the console lock is recursive and it may
lead to recursion on the tty lock and/or corrupt the ring pointer.

Furthermore, the console lock is not always taken where it should be and has
to be released too early because of the way the console has been designed.

Over the years, code has been modified to support various new features but
the driver has not been reworked.

This new driver has been rewritten with the idea of only having a small set
of specific function to write either via the shared ring or the hypercall
interface.

Note that HVM support has been left aside for now because it requires
additional features which are not yet supported. A follow-up patch will be
sent with HVM guest support.

List of items that may be good to have but not mandatory:
 - Avoid to flush for each character written when using the tty
 - Support multiple consoles

Submitted by:		Julien Grall <julien.grall@citrix.com>
Reviewed by:		royger
Differential Revision:	https://reviews.freebsd.org/D3698
Sponsored by:		Citrix Systems R&D
2015-10-08 16:39:43 +00:00
Roger Pau Monné
1a52c10530 Update Xen headers from 4.2 to 4.6
Pull the latest headers for Xen which allow us to add support for ARM and
use new features in FreeBSD.

This is a verbatim copy of the xen/include/public so every headers which
don't exits anymore in the Xen repositories have been dropped.

Note the interface version hasn't been bumped, it will be done in a
follow-up. Although, it requires fix in the code to get it compiled:

 - sys/xen/xen_intr.h: evtchn_port_t is already defined in the headers so
   drop it.

 - {amd64,i386}/include/intr_machdep.h: NR_EVENT_CHANNELS now depends on
   xen/interface/event_channel.h, so include it.

 - {amd64,i386}/{amd64,i386}/support.S: It's not neccessary to include
   machine/intr_machdep.h. This is also fixing build compilation with the
   new headers.

 - dev/xen/blkfront/blkfront.c: The typedef for blkif_request_segmenthas
   been dropped. So directly use struct blkif_request_segment

Finally, modify xen/interface/xen-compat.h to throw a preprocessing error if
__XEN_INTERFACE_VERSION__ is not set. This is allow us to catch any file
where xen/xen-os.h is not correctly included.

Submitted by:		Julien Grall <julien.grall@citrix.com>
Reviewed by:		royger
Differential Revision:	https://reviews.freebsd.org/D3805
Sponsored by:		Citrix Systems R&D
2015-10-06 11:29:44 +00:00
Zbigniew Bodek
18c72666ce Add domain support to PCI bus allocation
When the system has more than a single PCI domain, the bus numbers
are not unique, thus they cannot be used for "pci" device numbering.
Change bus numbers to -1 (i.e. to-be-determined automatically)
wherever the code did not care about domains.

Reviewed by:   jhb
Obtained from: Semihalf
Sponsored by:  The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D3406
2015-09-16 23:34:51 +00:00
Marcelo Araujo
f0c2f5e202 Code cleanup unused-but-set-variable spotted by gcc.
Reviewed by:	royger
Approved by:	bapt (mentor)
Differential Revision:	D3476
2015-08-25 15:34:28 +00:00
Roger Pau Monné
f8f1bb83f7 xen: allow disabling PV disks and nics
Introduce two new loader tunnables that can be used to disable PV disks and
PV nics at boot time. They default to 0 and should be set to 1 (or any
number different than 0) in order to disable the PV devices:

hw.xen.disable_pv_disks=1
hw.xen.disable_pv_nics=1

In /boot/loader.conf will disable both PV disks and nics.

Sponsored by:	Citrix Systems R&D
Tested by:	Karl Pielorz <kpielorz_lst@tdx.co.uk>
MFC after:	1 week
2015-08-21 15:53:08 +00:00
John Baldwin
3ebe4c01f7 Remove another remnant of PV domU support and assume that we always run
with an automatically translated physmap under XEN.

Reviewed by:	royger (earlier version)
Differential Revision:	https://reviews.freebsd.org/D3325
2015-08-14 18:38:39 +00:00
John Baldwin
3c790178c5 Remove some more vestiges of the Xen PV domu support. Specifically,
use vtophys() directly instead of vtomach() and retire the no-longer-used
headers <machine/xenfunc.h> and <machine/xenvar.h>.

Reported by:	bde (stale bits in <machine/xenfunc.h>)
Reviewed by:	royger (earlier version)
Differential Revision:	https://reviews.freebsd.org/D3266
2015-08-06 17:07:21 +00:00
Colin Percival
aaebf69062 Add support for Xen blkif indirect segment I/Os. This makes it possible for
the blkfront driver to perform I/Os of up to 2 MB, subject to support from
the blkback to which it is connected and the initiation of such large I/Os
by the rest of the kernel.  In practice, the I/O size is increased from 40 kB
to 128 kB.

The changes to xen/interface/io/blkif.h consist merely of merging updates
from the upstream Xen repository.

In dev/xen/blkfront/block.h we add some convenience macros and structure
fields used for indirect-page I/Os: The device records its negotiated limit
on the number of indirect pages used, while each I/O command structure gains
permanently allocated page(s) for indirect page references and the Xen grant
references for those pages.

In dev/xen/blkfront/blkfront.c we now check in xbd_queue_cb whether a request
is small enough to handle without an indirection page, and either follow the
previous behaviour or use new code for issuing an indirect segment I/O.  In
xbd_connect we read the size of indirect segment I/Os supported by the backend
and select the maximum size we will use; then allocate the pages and Xen grant
references for each I/O command structure.  In xbd_free those grants and pages
are released.

A new loader tunable, hw.xbd.xbd_enable_indirect, can be set to 0 in order to
disable this functionality; it works by pretending that the backend does not
support this feature.  Some backends exhibit a loss of performance with large
I/Os, so users may wish to test with and without this functionality enabled.

Reviewed by:	royger
MFC after:	3 days
Relnotes:	yes
2015-07-30 03:50:01 +00:00
Mateusz Guzik
8a08cec166 Create a dedicated function for ensuring that cdir and rdir are populated.
Previously several places were doing it on its own, partially
incorrectly (e.g. without the filedesc locked) or even actively harmful
by populating jdir or assigning rootvnode without vrefing it.

Reviewed by:	kib
2015-07-11 16:22:48 +00:00
Roger Pau Monné
6a8e9695ba netfront: preserve configuration across migrations
Try to preserve the xn configuration when migrating. This is not always
possible since the backend might not have the same set of options
available, in which case we will try to preserve as many as possible.

MFC after:    2 weeks
PR:           183139
Reported by:  mcdouga9@egr.msu.edu
Sponsored by: Citrix Systems R&D
2015-07-03 12:09:05 +00:00
Colin Percival
7a70e15235 Rename mksegarray to xbd_mksegarray for consistency with other function
names in this file.

Submitted by:	royger
2015-06-23 06:50:03 +00:00
Colin Percival
ad935ed241 Garbage collect comments and a macro which related to the pre-r284296
support for a "segment block" extension in FreeBSD's Xen blkfront/blkback
drivers.

This commit should not result in any functional changes.
2015-06-21 06:52:03 +00:00
Colin Percival
91fb36cfa8 Move the bus_dma_tag creation and per-transaction data allocation from
xbd_initialize to xbd_connect.  Both of these initialization steps need
to know what the maximum possible I/O size will be, and when we gain
support for indirect segment I/Os we won't know that value until we
reach xbd_connect.  Since none of this data is used before xbd_connect
completes, moving the initialization is harmless.

This commit should not result in any functional changes.
2015-06-21 05:36:58 +00:00
Colin Percival
0115209538 If we fail to allocate memory, pass ENOMEM as the error code, not the
"error" variable (which is always zero at this point).
2015-06-21 05:32:56 +00:00
Colin Percival
d0ecc14d49 Refactor xbd_queue_cb, extracting the code which converts bus_dma segments
into blkif segments, and moving it into a new function.  This will be used
by upcoming support for indirect-segment blkif requests.

This commit should not result in any functional changes.
2015-06-20 00:02:03 +00:00
Colin Percival
d33a1217bd Minor clean up to xbd_queue_cb:
* nsegs must be at most BLKIF_MAX_SEGMENTS_PER_REQUEST (since we specify
  that limit to bus_dma_tag_create), so KASSERT that rather than silently
  adjusting the request.
* block_segs is now a synonym for nsegs, so garbage collect that variable.
* nsegs is never read during or after the while loop, so remove the dead
  decrement from the loop.

These were all left behind from the pre-r284296 support for a "segment
block" extension.
2015-06-19 22:40:58 +00:00
Roger Pau Monné
112cacaee4 xen-blk{front/back}: remove broken FreeBSD extensions
The FreeBSD extension adds a new request type, called blkif_segment_block
which has a size of 112bytes for both i386 and amd64. This is fine on
amd64, since requests have a size of 112B there also. But this is not true
for i386, where requests have a size of 108B. So on i386 we basically
overrun the ring slot when queuing a request of type blkif_segment_block_t,
which is very bad.

Remove this extension (including a cleanup of the public blkif.h header
file) from blkfront and blkback.

Sponsored by: Citrix Systems R&D
Tested-by: cperciva
2015-06-12 07:50:34 +00:00
Jung-uk Kim
fd90e2ed54 CALLOUT_MPSAFE has lost its meaning since r141428, i.e., for more than ten
years for head.  However, it is continuously misused as the mpsafe argument
for callout_init(9).  Deprecate the flag and clean up callout_init() calls
to make them more consistent.

Differential Revision:	https://reviews.freebsd.org/D2613
Reviewed by:	jhb
MFC after:	2 weeks
2015-05-22 17:05:21 +00:00
Roger Pau Monné
dbf82bde19 netfront: wait for backend to connect before sending ARP
Netfront has to wait for the backend to switch to state XenbusStateConnected
before sending the ARP request, or else the backend might not be connected
and thus the packet will be lost.

Sponsored by: Citrix Systems R&D
MFC after: 1 week
2015-05-14 16:29:11 +00:00
Roger Pau Monné
0df8b29da3 xen: introduce a newbus function to allocate unused memory
In order to map memory from other domains when running on Xen FreeBSD uses
unused physical memory regions. Until now this memory has been allocated
using bus_alloc_resource, but this is not completely safe as we can end up
using unreclaimed MMIO or ACPI regions.

Fix this by introducing a new newbus method that can be used by Xen drivers
to request for unused memory regions. On amd64 we make sure this memory
comes from regions above 4GB in order to prevent clashes with MMIO/ACPI
regions. On i386 there's nothing we can do, so just fall back to the
previous mechanism.

Sponsored by: Citrix Systems R&D
Tested by: Gustau Pérez <gperez@entel.upc.edu>
2015-05-08 14:48:40 +00:00
John Baldwin
ed95805e90 Remove support for Xen PV domU kernels. Support for HVM domU kernels
remains.  Xen is planning to phase out support for PV upstream since it
is harder to maintain and has more overhead.  Modern x86 CPUs include
virtualization extensions that support HVM guests instead of PV guests.
In addition, the PV code was i386 only and not as well maintained recently
as the HVM code.
- Remove the i386-only NATIVE option that was used to disable certain
  components for PV kernels.  These components are now standard as they
  are on amd64.
- Remove !XENHVM bits from PV drivers.
- Remove various shims required for XEN (e.g. PT_UPDATES_FLUSH, LOAD_CR3,
  etc.)
- Remove duplicate copy of <xen/features.h>.
- Remove unused, i386-only xenstored.h.

Differential Revision:	https://reviews.freebsd.org/D2362
Reviewed by:	royger
Tested by:	royger (i386/amd64 HVM domU and amd64 PVH dom0)
Relnotes:	yes
2015-04-30 15:48:48 +00:00
Marcelo Araujo
d8edb414c9 Remove unused variable.
Differential Revision:	D2333
Reviewed by:		royger
2015-04-20 17:30:13 +00:00
Roger Pau Monné
df62b8a25f xen: add a handler for the debug interrupt
Handle the VIRQ_DEBUG signal and print a stack trace of each vCPU on the Xen
console. This is only used for debug purposes and is triggered by the
administrator of the Xen host.

Sponsored by: Citrix Systems R&D
MFC after: 1 week
2015-03-30 07:09:07 +00:00
Roger Pau Monné
691a22f94e netback: disable GSO
The current GSO implementation in netback is broken and causes errors on the
guest tx path. While this is fixed disable GSO in order to have a working
netback.

Sponsored by: Citrix Systems R&D
Discussed with: gibbs
2015-02-28 15:21:06 +00:00
Gleb Smirnoff
c2d9c6f035 Use m_getjcl() instead of old mbuf(9) KPIs.
Tested by:	royger
2015-02-27 19:12:35 +00:00
Gleb Smirnoff
49e6be9c3d Previous version of mbufq were fine initialized by M_ZERO, while
new one require explicti initialization.

Reported by:	royger
2015-02-23 18:55:26 +00:00
Gleb Smirnoff
c578b6aca0 Provide a set of inline functions to manage simple mbuf(9) queues, based
on queue(3)'s STAILQ.  Utilize them in cxgb(4) and Xen, deleting home
grown implementations.

Sponsored by:	Netflix
Sponsored by:	Nginx, Inc.
2015-02-19 01:19:42 +00:00
Roger Pau Monné
f79cdf2998 xen: fix xenstore dev
Xenstore user-space device has two problems currently:
 - It does not correctly handle concurrent clients, because it's storing
   each client data in dev->si_drv1.
 - It does not correctly free this data when the client closes the device.

In order to solve both of this issues store the per-client data using
cdevpriv, which also comes with a hook in order to perform the necessary
cleanup on device close.

While there also make the device eternal.

Sponsored by: Citrix Systems R&D
Reported and Tested by: thompsa
MFC after: 2 weeks
2015-02-16 09:53:43 +00:00
Bryan Venteicher
d3ccddf3ce Generalized parts of the XEN timer code into a generic pvclock
KVM clock shares the same data structures between the guest and the host
as Xen so it makes sense to just have a single copy of this code.

Differential Revision: https://reviews.freebsd.org/D1429
Reviewed by:	royger (eariler version)
MFC after:	1 month
2015-02-04 08:26:43 +00:00
Xin LI
25baf019f1 Use the common codepath to handle SIOCGIFADDR.
Before this change, the current code handles SIOCGIFADDR the same
way with SIOCSIFADDR, which involves full arp_ifinit, et al.  They
should be unnecessary for SIOCGIFADDR case.

Differential Revision: https://reviews.freebsd.org/D1508
Reviewed by:	glebius
MFC after:	2 weeks
2015-01-13 05:32:51 +00:00
Robert Watson
2a8c860fe3 In order to reduce use of M_EXT outside of the mbuf allocator and
socket-buffer implementations, introduce a return value for MCLGET()
(and m_cljget() that underlies it) to allow the caller to avoid testing
M_EXT itself.  Update all callers to use the return value.

With this change, very few network device drivers remain aware of
M_EXT; the primary exceptions lie in mbuf-chain pretty printers for
debugging, and in a few cases, custom mbuf and cluster allocation
implementations.

NB: This is a difficult-to-test change as it touches many drivers for
which I don't have physical devices.  Instead we've gone for intensive
review, but further post-commit review would definitely be appreciated
to spot errors where changes could not easily be made mechanically,
but were largely mechanical in nature.

Differential Revision:	https://reviews.freebsd.org/D1440
Reviewed by:	adrian, bz, gnn
Sponsored by:	EMC / Isilon Storage Division
2015-01-06 12:59:37 +00:00
Hans Petter Selasky
f515135ff9 Remove duplicate pci_driver class declaration. 2015-01-02 08:57:36 +00:00
Roger Pau Monné
1093cd82e0 xen: convert the Grant-table code to a NewBus device
This allows the Grant-table code to attach directly to the xenpv bus,
allowing us to remove the grant-table initialization done in xenpv.

Sponsored by: Citrix Systems R&D
2014-12-10 11:35:41 +00:00
Roger Pau Monné
0767e98a2d xen: move grant table code
Mave the grant table code into the dev/xen folder in preparation for turning
it into a device using the newbus interface. This is just code motion, no
functional changes.

Sponsored by: Citrix Systems R&D
2014-12-10 11:21:52 +00:00
Roger Pau Monné
f35b3592e6 xen: create a new PCI bus override
When running as a Xen PVH Dom0 we need to add custom buses that override
some of the functionality present in the ACPI PCI Bus and the PCI Bus. We
currently override the ACPI PCI Bus, but not the PCI Bus, so add a new
override for the PCI Bus and share the generic functions between them.

Reported by: David P. Discher <dpd@dpdtech.com>
Sponsored by: Citrix Systems R&D

conf/files.amd64:
 - Add the new files.

x86/xen/xen_pci_bus.c:
 - Generic file that contains the PCI overrides so they can be used by the
   several PCI specific buses.

xen/xen_pci.h:
 - Prototypes for the generic overried functions.

dev/xen/pci/xen_pci.c:
 - Xen specific override for the PCI bus.

dev/xen/pci/xen_acpi_pci.c:
 - Xen specific override for the ACPI PCI bus.
2014-12-09 18:03:25 +00:00
Warner Losh
40e6bdaf1e opt_global.h is included automatically in the build. No need to
explicitly include it in these places.

Sponsored by: Netflix
2014-11-18 17:06:56 +00:00
Roger Pau Monné
ddcc16cf2f netback: change xnb naming convention
Current FreeBSD netback names the interface with xnb<device unit>, but
this is not suitable for usage with the Xen toolstack, which expects
something similar to <prefix><domid><handle>. In order to solve this,
change the netback naming convention to use xnb<domid>.<handle>.

Sponsored by: Citrix Systems R&D

dev/xen/netback/netback.c:
 - Change netback to use the nomenclature stated above.
2014-10-22 17:09:12 +00:00
Roger Pau Monné
bf7313e3b7 xen: implement the privcmd user-space device
This device is only attached to priviledged domains, and allows the
toolstack to interact with Xen. The two functions of the privcmd
interface is to allow the execution of hypercalls from user-space, and
the mapping of foreign domain memory.

Sponsored by: Citrix Systems R&D

i386/include/xen/hypercall.h:
amd64/include/xen/hypercall.h:
 - Introduce a function to make generic hypercalls into Xen.

xen/interface/xen.h:
xen/interface/memory.h:
 - Import the new hypercall XENMEM_add_to_physmap_range used by
   auto-translated guests to map memory from foreign domains.

dev/xen/privcmd/privcmd.c:
 - This device has the following functions:
   - Allow user-space applications to make hypercalls into Xen.
   - Allow user-space applications to map memory from foreign domains,
     this is accomplished using the newly introduced hypercall
     (XENMEM_add_to_physmap_range).

xen/privcmd.h:
 - Public ioctl interface for the privcmd device.

x86/xen/hvm.c:
 - Remove declaration of hypercall_page, now it's declared in
   hypercall.h.

conf/files:
 - Add the privcmd device to the build process.
2014-10-22 17:07:20 +00:00
Roger Pau Monné
5779d8ad57 xen: import a proper event channel user-space device
The user-space event channel device is used by applications to receive
and send event channel interrupts. This device is based on the Linux
evtchn device.

Sponsored by: Citrix Systems R&D

xen/evtchn/evtchn_dev.c:
 - Remove the old event channel device, which was already disabled in
   the build system.

dev/xen/evtchn/evtchn_dev.c:
 - Import a new event channel device based on the one present in
   Linux.
 - This device allows the following operations:
   - Bind VIRQ event channels (ioctl).
   - Bind regular event channels (ioctl).
   - Create and bind new event channels (ioctl).
   - Unbind event channels (ioctl).
   - Send notifications to event channels (ioctl).
   - Reset the device shared memory ring (ioctl).
   - Unmask event channels (write).
   - Receive event channel upcalls (read).
 - The new code is MP safe, and can be used concurrently.

conf/files:
 - Add the new device to the build system.
2014-10-22 16:57:11 +00:00
Hans Petter Selasky
f0188618f2 Fix multiple incorrect SYSCTL arguments in the kernel:
- Wrong integer type was specified.

- Wrong or missing "access" specifier. The "access" specifier
sometimes included the SYSCTL type, which it should not, except for
procedural SYSCTL nodes.

- Logical OR where binary OR was expected.

- Properly assert the "access" argument passed to all SYSCTL macros,
using the CTASSERT macro. This applies to both static- and dynamically
created SYSCTLs.

- Properly assert the the data type for both static and dynamic
SYSCTLs. In the case of static SYSCTLs we only assert that the data
pointed to by the SYSCTL data pointer has the correct size, hence
there is no easy way to assert types in the C language outside a
C-function.

- Rewrote some code which doesn't pass a constant "access" specifier
when creating dynamic SYSCTL nodes, which is now a requirement.

- Updated "EXAMPLES" section in SYSCTL manual page.

MFC after:	3 days
Sponsored by:	Mellanox Technologies
2014-10-21 07:31:21 +00:00
Roger Pau Monné
59adbba20f xen: fix blkback pushing responses before releasing internal resources
Fix a problem where the blockback driver could run out of requests,
despite the fact that we allocate enough request and reqlist
structures to satisfy the maximum possible number of requests.

The problem was that we were sending responses back to the other
end (blockfront) before freeing resources. The Citrix Windows
driver is pretty agressive about queueing, and would queue more I/O
to us immediately after we sent responses to it. We would run into
a resource shortage and stall out I/O until we freed resources.

It isn't clear whether the request shortage condition was an
indirect cause of the I/O hangs we've been seeing between Windows
with the Citrix PV drivers and FreeBSD's blockback, but the above
problem is certainly a bug.

Sponsored by: Spectra Logic
Submitted by: ken
Reviewed by: royger

dev/xen/blkback/blkback.c:
 - Break xbb_send_response() into two sub-functions,
   xbb_queue_response() and xbb_push_responses().
   Remove xbb_send_response(), because it is no longer
   used.

 - Adjust xbb_complete_reqlist() so that it calls the
   two new functions, and holds the mutex around both
   calls.  The mutex insures that another context
   can't come along and push responses before we've
   freed our resources.

 - Change xbb_release_reqlist() so that it requires
   the mutex to be held instead of acquiring the mutex
   itself.  Both callers could easily hold the mutex
   while calling it, and one really needs to hold the
   mutex during the call.

 - Add two new counters, accessible via sysctl
   variables.  The first one counts the number of
   I/Os that are queued and waiting to be pushed
   (reqs_queued_for_completion).  The second one
   (reqs_completed_with_error) counts the number of
   requests we've completed with an error status.
2014-09-30 17:41:16 +00:00
Roger Pau Monné
22c1633270 xen/balloon: fix accounting of current memory pages on PVH
Using realmem on PVH is not realiable, since in this case the realmem value
is computed from Maxmem, which contains the higher memory address found. Use
HYPERVISOR_start_info->nr_pages instead, which is set by the hypervisor and
contains the exact number of memory pages assigned to the domain.

Sponsored by: Citrix Systems R&D
2014-09-30 17:38:21 +00:00
Roger Pau Monné
557077b5fc xen: add xenstored user-space device
This device is used by the user-space daemon that runs xenstore
(xenstored). It allows xenstored to map the xenstore memory page, and
reports the event channel xenstore is using.

Sponsored by: Citrix Systems R&D

dev/xen/xenstore/xenstored_dev.c:
 - Add the xenstored character device that's used to map the xenstore
   memory into user-space, and to report the event channel used by
   xenstore.

conf/files:
 - Add the device to the build process.
2014-09-30 17:37:26 +00:00
Roger Pau Monné
45ce037de2 xen: convert the xenstore user-space char device to a newbus device
Convert the xenstore user-space device (/dev/xen/xenstore) to a device
using the newbus interface. This allows us to make the device
initialization dependant on the initialization of xenstore itself in
the kernel.

Sponsored by: Citrix Systems R&D

dev/xen/xenstore/xenstore.c:
 - Convert to a newbus device, this removes the xs_dev_init function.

xen/xenstore/xenstore_internal.h:
 - Remove xs_dev_init prototype.

dev/xen/xenstore/xenstore.c:
 - Don't call xs_dev_init anymore, the device will attach itself when
   xenstore is started.
2014-09-30 17:31:04 +00:00
Roger Pau Monné
1d84e2b3c8 xen: defer xenstore initialization until xenstored is started
The xenstore related devices in the kernel cannot be started until
xenstored is running, which will happen later in the Dom0 case. If
start_info_t doesn't contain a valid xenstore event channel, defer all
xenstore related devices attachment to later.

Sponsored by: Citrix Systems R&D

dev/xen/xenstore/xenstore.c:
 - Prevent xenstore from trying to attach it's descendant devices if
   xenstore is not initialized.
 - Add a callback in the xenstore interrupt filter that will trigger
   the plug of xenstore descendant devices on the first received
   interrupt. This interrupt is generated when xenstored attaches to
   the event channel, and serves as a notification that xenstored is
   running.
2014-09-30 17:27:56 +00:00
Roger Pau Monné
a6aedc5d49 xen: move xenstore devices
Move xenstore related devices (xenstore.c and xenstore_dev.c) from
xen/xenstore to dev/xen/xenstore. This is just code motion, no
functional changes.

Sponsored by: Citrix Systems R&D
2014-09-30 17:14:11 +00:00
Roger Pau Monné
ae3078d9e4 xen: make xen balloon a driver that depends on xenstore
This is done so we can prevent the Xen Balloon driver from attaching
before xenstore is setup.

Sponsored by: Citrix Systems R&D

dev/xen/balloon/balloon.c:
 - Make xen balloon a driver that depends on xenstore.
2014-09-30 16:53:08 +00:00
Hans Petter Selasky
9fd573c39d Improve transmit sending offload, TSO, algorithm in general.
The current TSO limitation feature only takes the total number of
bytes in an mbuf chain into account and does not limit by the number
of mbufs in a chain. Some kinds of hardware is limited by two
factors. One is the fragment length and the second is the fragment
count. Both of these limits need to be taken into account when doing
TSO. Else some kinds of hardware might have to drop completely valid
mbuf chains because they cannot loaded into the given hardware's DMA
engine. The new way of doing TSO limitation has been made backwards
compatible as input from other FreeBSD developers and will use
defaults for values not set.

Reviewed by:	adrian, rmacklem
Sponsored by:	Mellanox Technologies
MFC after:	1 week
2014-09-22 08:27:27 +00:00
Gleb Smirnoff
c8dfaf382f Mechanically convert to if_inc_counter(). 2014-09-19 03:51:26 +00:00
Hans Petter Selasky
72f3100047 Revert r271504. A new patch to solve this issue will be made.
Suggested by:	adrian @
2014-09-13 20:52:01 +00:00