Commit Graph

105400 Commits

Author SHA1 Message Date
Alexander Motin
ebef81ecea Add unmapped I/O support to ata(4) driver.
Main problem there was PIO mode support, that required KVA mapping.
Handle that case using recently added pmap_quick_enter_page(9) KPI,
mapping data pages to KVA one at a time.
2015-08-07 14:38:26 +00:00
Alexander Motin
3301406331 Add more ifdefs to fix build with GCC after r286406. 2015-08-07 14:12:51 +00:00
Gleb Smirnoff
7a3151955b Fix !MWL_DEBUG build. 2015-08-07 12:34:20 +00:00
Gleb Smirnoff
79d2c5e857 Change KPI of how device drivers that provide wireless connectivity interact
with the net80211 stack.

Historical background: originally wireless devices created an interface,
just like Ethernet devices do. Name of an interface matched the name of
the driver that created. Later, wlan(4) layer was introduced, and the
wlanX interfaces become the actual interface, leaving original ones as
"a parent interface" of wlanX. Kernelwise, the KPI between net80211 layer
and a driver became a mix of methods that pass a pointer to struct ifnet
as identifier and methods that pass pointer to struct ieee80211com. From
user point of view, the parent interface just hangs on in the ifconfig
list, and user can't do anything useful with it.

Now, the struct ifnet goes away. The struct ieee80211com is the only
KPI between a device driver and net80211. Details:

- The struct ieee80211com is embedded into drivers softc.
- Packets are sent via new ic_transmit method, which is very much like
  the previous if_transmit.
- Bringing parent up/down is done via new ic_parent method, which notifies
  driver about any changes: number of wlan(4) interfaces, number of them
  in promisc or allmulti state.
- Device specific ioctls (if any) are received on new ic_ioctl method.
- Packets/errors accounting are done by the stack. In certain cases, when
  driver experiences errors and can not attribute them to any specific
  interface, driver updates ic_oerrors or ic_ierrors counters.

Details on interface configuration with new world order:
- A sequence of commands needed to bring up wireless DOESN"T change.
- /etc/rc.conf parameters DON'T change.
- List of devices that can be used to create wlan(4) interfaces is
  now provided by net.wlan.devices sysctl.

Most drivers in this change were converted by me, except of wpi(4),
that was done by Andriy Voskoboinyk. Big thanks to Kevin Lo for testing
changes to at least 8 drivers. Thanks to Olivier Cochard, gjb@, mmoll@,
op@ and lev@, who also participated in testing. Details here:

https://wiki.freebsd.org/projects/ifnet/net80211

Still, drivers: ndis, wtap, mwl, ipw, bwn, wi, upgt, uath were not
tested. Changes to mwl, ipw, bwn, wi, upgt are trivial and chances
of problems are low. The wtap wasn't compilable even before this change.
But the ndis driver is complex, and it is likely to be broken with this
commit. Help with testing and debugging it is appreciated.

Differential Revision:	D2655, D2740
Sponsored by:	Nginx, Inc.
Sponsored by:	Netflix
2015-08-07 11:43:14 +00:00
Andrew Turner
d1b2133d03 Attach dwmmc to the ofwbus, som devicetrees place it here.
Sponsored by:	ABT Systems Ltd
2015-08-07 08:57:58 +00:00
Andrew Turner
a336c37514 Stop including machine/fdt.h, it's unneeded, and purposefully
unimplemented on arm64.

Sponsored by:	ABT Systems Ltd
2015-08-07 08:54:50 +00:00
Marcelo Araujo
8b3ae99560 Wrap some unused functions with notyet, it is necessary to be able to
build the modules/ctl directly.
Remove a dead MALLOC_DEFINE.

Differential Revision:	D3329
Reviewed by:		mav
Sponsored by:		gandi.net
2015-08-07 08:30:43 +00:00
Konstantin Belousov
347e9d5495 Minor style cleanup of the code surrounding r286404.
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2015-08-07 08:24:12 +00:00
Konstantin Belousov
9b34965019 The condition to use direct processing for the unmapped bio is
reverted.  We can do direct processing when g_io_check() does not need
to perform transient remapping of the bio, otherwise the thread has to
sleep.

Reviewed by:	mav (previous version)
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2015-08-07 08:13:34 +00:00
Konstantin Belousov
c3f62b5352 Remove unused i386 header privatespace.h. For the native kernel, its
use was removed in r173592 (Nov 2007), yet Xen PV bits continued
referencing the privatespace structure, and were removed in r282274
(Apr 2015).

Discussed with:	jhb
Sponsored by:	The FreeBSD Foundation
2015-08-07 05:59:58 +00:00
Alan Cox
e0a63baae4 Introduce a sysctl for reporting the number of fully populated reservations. 2015-08-06 21:27:50 +00:00
Ian Lepore
a046541623 Return the current ftdi bitbang mode with the UFTDIIOC_GET_BITMODE ioctl.
The ftdi chip itself has a "get bitmode" command that doesn't actually
return the current bitmode, just a snapshot of the gpio lines.  The chip
apparently has no way to provide the current bitmode.

This implements the functionality at the driver level.  The driver starts
out assuming the chip is in UART mode (which it will be, coming out of
reset) and keeps track of every successful set-bitmode operation so that
it can always return the current mode with UFTDIIOC_GET_BITMODE.
2015-08-06 19:47:04 +00:00
Ian Lepore
fc43ff0865 Add support to the uftdi driver for reading and writing the serial eeprom
that can be attached to the chips, via ioctl() calls.
2015-08-06 19:29:26 +00:00
Konstantin Belousov
a8bf83d618 Formally pair store_rel(&smp_started) with load_acq(&smp_started).
The expected semantic is to have misc. data, e.g. CPU bitmaps, visible
in the BSP after smp_started is written by the last started AP, which
formally requires acquire barrier on the load.  The change is mostly
nop due to the ordered behaviour of the x86 CPUs.

Reviewed by:	alc
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
2015-08-06 18:02:54 +00:00
Pawel Jakub Dawidek
5ee9ea19fe After crypto_dispatch() bio might be already delivered and destroyed,
so we cannot access it anymore. Setting an error later lead to memory
corruption.

Assert that crypto_dispatch() was successful. It can fail only if we pass a
bogus crypto request, which is a bug in the program, not a runtime condition.

PR:		199705
Submitted by:	luke.tw
Reviewed by:	emaste
MFC after:	3 days
2015-08-06 17:13:34 +00:00
John Baldwin
3c790178c5 Remove some more vestiges of the Xen PV domu support. Specifically,
use vtophys() directly instead of vtomach() and retire the no-longer-used
headers <machine/xenfunc.h> and <machine/xenvar.h>.

Reported by:	bde (stale bits in <machine/xenfunc.h>)
Reviewed by:	royger (earlier version)
Differential Revision:	https://reviews.freebsd.org/D3266
2015-08-06 17:07:21 +00:00
John Baldwin
fada4adf95 The changes that introduced fo_mmap() treated all character device
mappings as if MAP_SHARED was always present since in general MAP_PRIVATE
is not permitted for character devices.  However, there is one exception
in that MAP_PRIVATE mappings are permitted for /dev/zero.

Only require a writable file descriptor (FWRITE) for shared, writable
mappings of character devices.  vm_mmap_cdev() will reject any private
mappings for other devices.

Reviewed by:	kib
Reported by:	sbruno (broke qemu cross-builds), peter
Differential Revision:	https://reviews.freebsd.org/D3316
2015-08-06 16:50:37 +00:00
Allan Jude
9322ac3f6e Remove guards around overwriting loader.rc and menu.rc
There have been .local version of each for user modifications for some time
This allows users to receive future updates to these files

PR:		183765
Submitted by:	Bertram Scharpf, Nikolai Lifanov (patch)
Reviewed by:	dteske, loos, eadler
Approved by:	bapt (mentor)
MFC after:	1 month
Relnotes:	yes
Sponsored by:	ScaleEngine Inc.
Differential Revision:	https://reviews.freebsd.org/D3176
2015-08-06 16:07:27 +00:00
Enji Cooper
fcc8461cfb Make some debug printf's into DPRINTF's to reduce noise on attach/detach
Differential Revision: https://reviews.freebsd.org/D3306
MFC after: 1 week
Reviewed by: loos
Sponsored by: EMC / Isilon Storage Division
2015-08-06 15:30:14 +00:00
Andrew Turner
756e25bfd7 Fill in dump_avail based on the physical memory from EFI.
Obtained from:	ABT Systems Ltd
Sponsored by:	The FreeBSD Foundation
2015-08-06 14:49:23 +00:00
Gleb Smirnoff
e369e734f4 Make it compilable. No idea if it works. 2015-08-06 14:05:17 +00:00
Ed Schouten
0f85ff377b Add file_open(): the underlying system call of openat().
CloudABI purely operates on file descriptor rights (CAP_*). File
descriptor access modes (O_ACCMODE) are emulated on top of rights.

Instead of accepting the traditional flags argument, file_open() copies
in an fdstat_t object that contains the initial rights the descriptor
should have, but also file descriptor flags that should persist after
opening (APPEND, NONBLOCK, *SYNC). Only flags that don't persist (EXCL,
TRUNC, CREAT, DIRECTORY) are passed in as an argument.

file_open() first converts the rights, the persistent flags and the
non-persistent flags to fflags. It then calls into vn_open(). If
successful, it installs the file descriptor with the requested
rights, trimming off rights that don't apply to the type of
the file that has been opened.

Unlike kern_openat(), this function does not support /dev/fd/*. I can't
think of a reason why we need to support this for CloudABI.

Obtained from:	https://github.com/NuxiNL/freebsd
Differential Revision:	https://reviews.freebsd.org/D3235
2015-08-06 06:47:28 +00:00
Conrad Meyer
b5af3f30a7 nfsclient: Protest loudly when GETATTR responses are invalid
BROKEN NFS SERVER OR MIDDLEWARE: Certain WAN "accelerators" attempt to cache
NFS GETATTR traffic, but actually corrupt it (e.g., responding to requests
with attributes for totally different files).

Warn very verbosely when this is detected. Linux' NFS client has a similar
warning.

Adds a sysctl/tunable (vfs.nfs.fileid_maxwarnings) to configure the quantity
of warnings; default to 10. (Zero disables; -1 is unlimited.)

Adds a failpoint to aid in validating the warning / behavior with a
non-broken server. Use something like:

    sysctl 'debug.fail_point.nfscl_force_fileid_warning=10%return(1)'

Reviewed by:	rmacklem
Approved by:	markj (mentor)
Sponsored by:	EMC / Isilon Storage Division
Differential Revision:	https://reviews.freebsd.org/D3304
2015-08-05 22:27:30 +00:00
Alexander Motin
7d0d4342e3 Pass SYNCHRONIZE CACHE command parameters to backends.
At this point IMMED flag is translated to MNT_NOWAIT flag of VOP_FSYNC(),
hoping that file system implements that (ZFS seems doesn't).

MFC after:	2 weeks
Sponsored by:	iXsystems, Inc.
2015-08-05 22:24:49 +00:00
Alexander Motin
f2a20b166a Relax serialization of SYNCHRONIZE CACHE commands.
Before this change SYNCHRONIZE CACHE commands were executed exclusively,
as if they had ORDERED tag.  But looking through SCSI specs I've found
no any reason to be so strict.  For reads this ordering seems pointless.
For writes it looks less obvious, so I left ordering against preceeding
write commands, while following ones are no longer required to wait.

MFC after:	2 weeks
Sponsored by:	iXsystems, Inc.
2015-08-05 21:58:32 +00:00
Adrian Chadd
70c81b2077 Add a hack-around to this fatal taskqueue running whilst the NIC
is detaching.

This mostly fixes a panic - the reset path shouldn't run whilst
the NIC is being torn down.

It's not locked, so it's "mostly" ok, but most of the rest of
the driver doesn't read sc->invalid with sensible locking. Grr.

The real solution is to cleanly tear down taskqueues in the detach/suspend
phase, but ..
2015-08-05 21:22:25 +00:00
Adrian Chadd
66b870f3cb Add a missing method - ath_hal_settsf64().
This is required for TDMA slave mode.
2015-08-05 21:16:12 +00:00
Navdeep Parhar
d94b89b915 cxgbe(4): Update T5 and T4 firmwares bundled with the driver to 1.14.4.0. The
changes in the firmwares since 1.11.27.0 are listed here (straight copy-paste
from the "Release Notes.txt" accompanying the Chelsio Unified Wire 2.11.1.0
release on the website).

22.1. T5 Firmware
+++++++++++++++++++++++++++++++++

Version : 1.14.4.0
Date    : 08/05/2015
================================================================================

FIXES
-----

BASE:
- Fixes a potential data path hang by properly programming PMTX congestion
  threshold settings.
- Fixes a potential initialization error when accessing a configuration file
  stored on the flash.
- Fixes a regression where SGE resources can be miss-sized if iWARP is disabled.

ETH:
- Fixes a timing issue that would prevent CR4 links from coming up with some
  switches.

FOFCoE:
- Defers fcoe linkdown mailbox command handling till LOGO is sent.
- Updates vlan prio for all outstanding IOs during dcbx update.

ENHANCEMENTS
------------

BASE:
- Adds support for PAUSE OFF watchdog.
- Reports devlog access information in PCIE_FW_PF register 7.

ETH:
- Enhances segmentation offload to include VxLAN and Geneve.
- Adds PTP support.
- Adds new interface to allow the driver to query the VI rss table base
  addresses.
- Allows the driver to program the SGE ingrext contxt CongDrop field.

OFLD:
- Adds new interface for the driver to specify offloaded connections TCP snd
  and rcv scale factors.

iSCSI:
- Adds support for iscsi segmentatation offload (ISO).
- Adds support for iscsi t10-dif offload.

FOiSCSI:
- Sets FORCE_BIT for cut through processing for FOiSCSI.

FOFCoE:
- Adds support for FCoE BB6.
- Improves WRITE performance.

================================================================================
================================================================================

Version : 1.13.32.0
Date    : 03/25/2015
================================================================================

FIXES
-----

BASE:
- Fixes FW_CAPS_CONFIG_CMD return value on error (was positive instead of
   negative)
- Fixes FW_PARAMS_PARAM_DEV_FLOWC_BUFFIFO_SZ indication (was wrong on certain
   adapter configurations)
- Fixes config file based PL_TIMEOUT register programming

ETH:
- Fixes a potential EO UDP SEG header corruption
- Fixes an issue where 1000Base-X was not enabled correctly when using QSA
   modules

OFLD:
- Fixes timeout issue with half-open connections
- Fixes FW_FLOWC_WR processing when state is set to finwait1

FOFCoE:
- Fixes fcoe xchg leaks in linkdown/peer down path
- Fixes cleanup in FCoE linkdown and fixed buf timer flowid abuse
- Fixes fw crash by clearing fcf flowc during bye

FOiSCSI:
- Don't create a new tcp socket if ERL0 attempt has timed out.

ENHANCEMENTS
------------

BASE:
- Adds support for VFs on PFs 4 to 7
- Adds support for QPs/CQs on any physical and virtual function

ETH:
- Stops sending LACP frames on loopback interface
- Adds an AUTOEQU indication to CPL_SGE_EGR_UPDATE
- Adds support for CR4 links (BEAN/AEC on 40G TwinAx cables)

OFLD:
- Improves default settings of LAN and CLUSTER TCP timer settings
- Sends Negative Advice CPLs to software

FOISCSI:
- Adds IPv6 support for foiscsi. Keeps backward compatibility with
   old foiscsi drivers which doesn't support ipv6.

FOFCoE:
- Added fcoe debug support in flowc dump

================================================================================
================================================================================

Version : 1.12.25.0
Date    : 10/22/2014
================================================================================

FIXES
-----

BASE:
- Improves precision of the Weight Round Robing Traffic Management Algorithm
- Fixes an issue where the link would intermittently fail to come up
- Fixes an issue where adapters with an external PHY couldn't run at 100Mbps
- Fixes an issue where active optical cables were not recognized
- Fixes link advertising issues on T520-BT (speed and pause frames) that would
  cause the link to negotiate unexpected settings
- Forces link restart when auto-negotiation is disabled
- Fix an issue where pause frames wouldn't be fully disabled even if requested

ETH:
- Fixes NVGRE Segmentation Offload network header generation.

DCBX:
- Fixes an issue where some settings were not being sent to the switch
  correctly
- Fixes an issue where back-to-back DCBX port updates could get overwritten by
  FW
- Fixes a firmware crash on DCBX APP information request before link up

FOiSCSI:
- Fixes abort task leak in tmf response handling
- Fixes TCP RST handling while in iSCSI ERL0
- Fixes a firmware crash on BYE without INIT

ENHANCEMENTS
-------------

BASE:
- Adds link partner settings reporting when available
- Adds QSA support (in conjunction with QSA VPD)
- Adds T520-BT LED support
- Reports NOTSUPPORTED for modules with an unhandled identifier

DCBX:
- Adds version reporting (indicating which version FW is trying to negotiate)
- Adds IEEE support
- Reports LLDP time outs

FOiSCSI:
- Add support for multiple iSCSI DDP client
- Sends DHCP renew request when lease expires

================================================================================

22.2. T4 Firmware
+++++++++++++++++

Version : 1.14.4.0
Date    : 08/05/2015
================================================================================

FIXES
-----

BASE:
- Fixes a potential initialization error when accessing a configuration file
  stored on the flash.
- Initialize PCIE_DBG_INDIR_REQ.Enable to 0, as hardware failed to do so and
  register dumps could result in errors.

ETH:
- Fixes an issue that sometimes prevented the link from coming up in CR adapters.

ENHANCEMENTS
------------

BASE:
- Adds support for PAUSE OFF watchdog.
- Reports devlog access information in PCIE_FW_PF register 7.

ETH:
- Adds new interface to allow the driver to query the VI rss table base
  addresses.

OFLD:
- Adds new interface for the driver to specify offloaded connections TCP snd
  and rcv scale factors.

================================================================================
================================================================================

Version : 1.13.32.0
Date    : 03/25/2015
================================================================================

FIXES
-----

BASE:
- Fixes FW_CAPS_CONFIG_CMD return value on error (was positive instead of
    negative)
- Fixes FW_PARAMS_PARAM_DEV_FLOWC_BUFFIFO_SZ indication (was wrong on certain
    adapter configurations)
- Fixes config file based PL_TIMEOUT register programming

ETH:
- Fixes a potential EO UDP SEG header corruption

OFLD:
- Fixes timeout issue with half-open connections
- Fixes FW_FLOWC_WR processing when state is set to finwait1

FOiSCSI:
- Don't create a new tcp socket if ERL0 attempt has timed out.

ENHANCEMENTS
------------

ETH:
- Stops sending LACP frames on loopback interface
- Adds an AUTOEQU indication to CPL_SGE_EGR_UPDATE

OFLD:
- Improves default settings of LAN and CLUSTER TCP timer settings
- Sends Negative Advice CPLs to software

================================================================================
================================================================================

Version : 1.12.25.0
Date    : 10/22/2014
================================================================================

FIXES
-----

BASE:
- Improves precision of the Weight Round Robing Traffic Management Algorithm
- Forces link restart when auto-negotiation is disabled
- Fix an issue where pause frames wouldn't be fully disabled even if requested

DCBX:
- Fixes an issue where some settings were not being sent to the switch
  correctly
- Fixes an issue where back-to-back DCBX port updates could get overwritten by
  FW
- Fixes a firmware crash on DCBX APP information request before link up

FOiSCSI:
- Fixes abort task leak in tmf response handling
- Fixes TCP RST handling while in iSCSI ERL0
- Fixes a firmware crash on BYE without INIT

ENHANCEMENTS
------------

BASE:
- Adds link partner settings reporting when available
- Firmware now reports NOTSUPPORTED for modules with an unhandled identifier

DCBX:
- Adds version reporting (indicating which version FW is trying to negotiate)
- Adds IEEE support
- Reports LLDP time outs

FOiSCSI:
- Adds support for multiple iSCSI DDP clients
- Sends DHCP renew request when lease expires

================================================================================

Obtained from:	Chelsio Communications
MFC after:	2 weeks
Sponsored by:	Chelsio Communications
2015-08-05 19:45:11 +00:00
Adrian Chadd
711b0fa045 Add TXOP enforce support to the AR9300 HAL.
This is required for (more) correct TDMA support.  Without it, the
code tries to calculate the required guard interval based on the
current rate, and since this is an 11n NIC and people try using
11n, it calls ath_hal_computetxtime() on an 11n rate which then
panics.

This doesn't fix TDMA slave mode on AR9300 - it just makes it
have one less bug.

Reported by:	Berislav Purgar <bpurgar@gmail.com>
2015-08-05 19:32:35 +00:00
Ed Maste
fc8c856029 Rationalize BSD license on sys/*/include/in_cksum.h
Remove the advertising clause from the Regents of the University of
California's license, per the letter dated July 22, 1999.

Update clause numbering.
2015-08-05 19:05:12 +00:00
Jung-uk Kim
fb396e55da Fix more style issues.
Submitted by:	bde
2015-08-05 17:21:42 +00:00
Ed Maste
96226a9aa7 Rationalize BSD license on sys/*/include/float.h
Remove the advertising clause from the Regents of the University of
California's license, per the letter dated July 22, 1999.

Update clause numbering.
2015-08-05 17:05:35 +00:00
Ed Schouten
aaf53ab2aa Correct the previous commit: remove the DECLARE_MODULE().
It looks like a MODULE_VERSION() can also appear on its own -- there is
no need to use explicitly use DECLARE_MODULE(). Looking at other
modules, this seems common practice.
2015-08-05 16:53:49 +00:00
Ed Schouten
b6efa27589 Add DECLARE_MODULE() to the "cloudabi" kernel module.
This kernel module does not require any explicit initialization, but a
module declaration is needed to let the "cloudabi64" kernel module
automatically pull this in.

Obtained from:	https://github.com/NuxiNL/freebsd
2015-08-05 16:45:47 +00:00
Ed Schouten
36310bcd1d Make fcntl(F_SETFL) work.
The stat_put() system call can be used to modify file descriptor
attributes, such as flags, but also Capsicum permission bits. Support
for changing Capsicum bits will be added as soon as its dependent
changes have been pushed through code review.

Obtained from:	https://github.com/NuxiNL/freebsd
2015-08-05 16:15:43 +00:00
Li-Wen Hsu
79b7e3e2c2 Fix make depend in sys/modules
Reviewed by:	delphij
Differential Revision:	https://reviews.freebsd.org/D3291
2015-08-05 14:45:52 +00:00
Alexander Motin
73942c5ce0 Issue all reads of single XCOPY segment simultaneously.
During vMotion and Clone VMware by default runs multiple sequential 4MB
XCOPY requests same time.  If CTL issues reads sequentially in 1MB chunks
for each XCOPY command, reads from different commands are not detected
as sequential by serseq option code and allowed to execute simultaneously.
Such read pattern confused ZFS prefetcher, causing suboptimal disk access.
Issuing all reads same time make serseq code work properly, serializing
reads both within each XCOPY command and between them.

My tests with ZFS pool of 14 disks in RAID10 shows prefetcher efficiency
improved from 37% to 99.7%, copying speed improved by 10-60%, average
read latency reduced twice on HDD layer and by five times on zvol layer.

MFC after:	2 weeks
Sponsored by:	iXsystems, Inc.
2015-08-05 13:46:15 +00:00
Ed Schouten
2412ae2b8e Regenerate the system call table. 2015-08-05 13:10:13 +00:00
Ed Schouten
2837d9ed43 Import the latest CloudABI system call definitions and table.
We're going to need these for next code I'm going to send out for
review: support for poll() and kqueue() on CloudABI.
2015-08-05 13:09:46 +00:00
Konstantin Belousov
c8fbdcc10d Fix UP build after r286296, ensure that CPU_FOREACH() is defined.
Sponsored by:	The FreeBSD Foundation
2015-08-05 10:50:33 +00:00
Jason A. Harmening
0a3e154709 Properly sort the function declarations added in r286296
Submitted by:	alc
Approved by:	kib (mentor)
2015-08-05 10:48:32 +00:00
Ed Schouten
db1c8ee585 Add the remaining pointer size independent CloudABI socket system calls.
CloudABI uses a structure called cloudabi_sockstat_t. Think of it as
'struct stat' for sockets. It is used by functions such as
getsockname(), getpeername(), some of the getsockopt() values, etc.

This change implements the sock_stat_get() system call that returns a
copy of this structure. The accept() system call should also return a
full copy of this structure eventually, but for now we're only
interested in the peer address. Add a TODO() to make sure this is
patched up later on.

Differential Revision:	https://reviews.freebsd.org/D3218
2015-08-05 08:18:05 +00:00
Ed Schouten
4958fab8cd Allow the creation of polling descriptors (kqueues) on CloudABI. 2015-08-05 07:37:06 +00:00
Ed Schouten
a2034cc98a Allow the creation of kqueues with a restricted set of Capsicum rights.
On CloudABI we want to create file descriptors with just the minimal set
of Capsicum rights in place. The reason for this is that it makes it
easier to obtain uniform behaviour across different operating systems.

By explicitly whitelisting the operations, we can return consistent
error codes, but also prevent applications from depending OS-specific
behaviour.

Extend kern_kqueue() to take an additional struct filecaps that is
passed on to falloc_caps(). Update the existing consumers to pass in
NULL.

Differential Revision:	https://reviews.freebsd.org/D3259
2015-08-05 07:36:50 +00:00
Ed Schouten
2433a4eb04 Make it possible to implement poll(2) on top of kqueue(2).
It looks like EVFILT_READ and EVFILT_WRITE trigger under the same
conditions as poll()'s POLLRDNORM and POLLWRNORM as described by POSIX.
The only difference is that POLLRDNORM has to be triggered on regular
files unconditionally, whereas EVFILT_READ only triggers when not EOF.

Introduce a new flag, NOTE_FILE_POLL, that can be used to make
EVFILT_READ and EVFILT_WRITE behave identically to poll(). This flag
will be used by cloudlibc's poll() function.

Reviewed by:	jmg
Differential Revision:	https://reviews.freebsd.org/D3303
2015-08-05 07:34:29 +00:00
Justin Hibbits
daebf39a41 Remove one more that crept in unnecessarily from previous commit. 2015-08-05 01:52:52 +00:00
Justin Hibbits
6ee5cf50ee Remove some unnecessary includes. 2015-08-05 01:52:11 +00:00
Jason A. Harmening
713841afb2 Add two new pmap functions:
vm_offset_t pmap_quick_enter_page(vm_page_t m)
void pmap_quick_remove_page(vm_offset_t kva)

These will create and destroy a temporary, CPU-local KVA mapping of a specified page.

Guarantees:
--Will not sleep and will not fail.
--Safe to call under a non-sleepable lock or from an ithread

Restrictions:
--Not guaranteed to be safe to call from an interrupt filter or under a spin mutex on all platforms
--Current implementation does not guarantee more than one page of mapping space across all platforms. MI code should not make nested calls to pmap_quick_enter_page.
--MI code should not perform locking while holding onto a mapping created by pmap_quick_enter_page

The idea is to use this in busdma, for bounce buffer copies as well as virtually-indexed cache maintenance on mips and arm.

NOTE: the non-i386, non-amd64 implementations of these functions still need review and testing.

Reviewed by:	kib
Approved by:	kib (mentor)
Differential Revision:	http://reviews.freebsd.org/D3013
2015-08-04 19:46:13 +00:00
Rui Paulo
7b80d5ad13 BEAGLEBONE: remove dtrace from MODULES_EXTRA.
This config is already building all modules, so we don't need the
MODULES_EXTRA definition.  It was also causing problems to users who
rely on MODULES_OVERRIDE to do the right thing.

Discussed with:	ian
2015-08-04 19:04:02 +00:00
Jung-uk Kim
ca23ca33f2 Fix style(9) bugs. 2015-08-04 18:59:54 +00:00
John-Mark Gurney
a2bc81bf7c Make IPsec work with AES-GCM and AES-ICM (aka CTR) in OCF... IPsec
defines the keys differently than NIST does, so we have to muck with
key lengths and nonce/IVs to be standard compliant...

Remove the iv from secasvar as it was unused...

Add a counter protected by a mutex to ensure that the counter for GCM
and ICM will never be repeated..  This is a requirement for security..
I would use atomics, but we don't have a 64bit one on all platforms..

Fix a bug where IPsec was depending upon the OCF to ensure that the
blocksize was always at least 4 bytes to maintain alignment... Move
this logic into IPsec so changes to OCF won't break IPsec...

In one place, espx was always non-NULL, so don't test that it's
non-NULL before doing work..

minor style cleanups...

drop setting key and klen as they were not used...

Enforce that OCF won't pass invalid key lengths to AES that would
panic the machine...

This was has been tested by others too...  I tested this against
NetBSD 6.1.5 using mini-test suite in
https://github.com/jmgurney/ipseccfgs and the only things that don't
pass are keyed md5 and sha1, and 3des-deriv (setkey syntax error),
all other modes listed in setkey's man page...  The nice thing is
that NetBSD uses setkey, so same config files were used on both...

Reviewed by:	gnn
2015-08-04 17:47:11 +00:00
Konstantin Belousov
3208d3ff46 Give large kernel stack to the initial thread . Otherwise, ZFS
overflows the stack during root mount in some configurations.

Tested by:	Fabian Keil <freebsd-listen@fabiankeil.de> (previous version)
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
2015-08-04 13:50:52 +00:00
Konstantin Belousov
35dfc644f5 Copy the fencing of the algorithm to do lock-less update and reading
of the timehands, from the kern_tc.c implementation to vdso.  Add
comments giving hints where to look for the algorithm explanation.

To compensate the removal of rmb() in userspace binuptime(), add
explicit lfence instruction before rdtsc.  On i386, add usual
complications to detect SSE2 presence; assume that old CPUs which do
not implement SSE2 also execute rdtsc almost in order.

Reviewed by:	alc, bde (previous version)
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
2015-08-04 12:33:51 +00:00
Edward Tomasz Napierala
72800098bf Fix panic triggered by code like this:
open("/dev/md0", O_EXEC);

Discussed with:	kib@, mav@
MFC after:	1 month
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D3051
2015-08-04 10:40:08 +00:00
Hans Petter Selasky
5884383f19 Avoid calling into the random subsystem before it is initialized.
Sponsored by:	Mellanox Technologies
2015-08-04 09:45:10 +00:00
Edward Tomasz Napierala
57a73b26e0 Mark vgonel() as static. It was already declared static earlier;
no idea why compilers don't warn about this.

MFC after:	1 month
Sponsored by:	The FreeBSD Foundation
2015-08-04 08:51:56 +00:00
Ed Schouten
0c0964844e Let the CloudABI futex code use umtx_keys.
The CloudABI kernel still passes all of the cloudlibc unit tests.

Reviewed by:	vangyzen
Differential Revision:	https://reviews.freebsd.org/D3286
2015-08-04 06:02:03 +00:00
Ed Schouten
dc4b532479 Fix bad arithmetic in umtx_key_get() to compute object offset.
It looks like umtx_key_get() has the addition and subtraction the wrong
way around, meaning that it fails to match in certain cases. This causes
the cloudlibc unit tests to deadlock in certain cases.

Reviewed by:	kib
Differential Revision:	https://reviews.freebsd.org/D3287
2015-08-04 06:01:13 +00:00
Jung-uk Kim
45c2c9a84a Always define __va_list for amd64 and restore pre-r232261 behavior for i386.
Note it allows exotic compilers, e.g., TCC, to build with our stdio.h, etc.

PR:		201749
MFC after:	1 week
2015-08-04 00:11:39 +00:00
Luiz Otavio O Souza
9224217213 Remove the mtx_sleep() from the kqueue f_event filter.
The filter is called from the network hot path and must not sleep.

The filter runs with the descriptor lock held and does not manipulates the
buffers, so it is not necessary sleep when the hold buffer is in use.

Just ignore the hold buffer contents when it is being copied to user space
(when hold buffer in use is set).

This fix the "Sleeping thread owns a non-sleepable lock" panic when the
userland thread is too busy reading the packets from bpf(4).

PR:		200323
MFC after:	2 weeks
Sponsored by:	Rubicon Communications (Netgate)
2015-08-03 22:14:45 +00:00
Ed Schouten
52942c1eae Add missing const keyword to function parameter.
The umtx_key_get() function does not dereference the address off the
userspace object. The pointer can safely be const.
2015-08-03 21:11:33 +00:00
John Baldwin
92de34df2c kgdb uses td_oncpu to determine if a thread is running and should use
a pcb from stoppcbs[] rather than the thread's PCB.  However, exited threads
retained td_oncpu from the last time they ran, and newborn threads had their
CPU fields cleared to zero during fork and thread creation since they are
in the set of fields zeroed when threads are setup.  To fix, explicitly
update the CPU fields for exiting threads in sched_throw() to reflect the
switch out and reset the CPU fields for new threads in sched_fork_thread()
to NOCPU.

Reviewed by:	kib
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D3193
2015-08-03 20:43:36 +00:00
Alan Cox
d8015db3b5 Refinements to r281079's sequential access optimization: Prefetched pages,
which constitute the majority of the pages that are processed by
vm_fault_dontneed(), are already near the tail of the inactive queue.  Only
the pages at faulting virtual addresses are actually moved by
vm_page_advise(..., MADV_DONTNEED).  However, vm_page_advise(...,
MADV_DONTNEED) is simultaneously too aggressive and passive for the moved
pages.  It makes most of these pages too easily reclaimable, and at the same
time it leaves enough pages in the active queue to trigger pageouts by the
page daemon.  Instead, with this change, the pages at faulting virtual
addresses are moved to the tail of the inactive queue, where they are
relatively close to the pages prefetched by the same page fault.

Discussed with:	jeff
Sponsored by:	EMC / Isilon Storage Division
2015-08-03 20:30:27 +00:00
Luiz Otavio O Souza
98fa5d858c Add a KASSERT() to make sure we wont rotate the buffers twice (rotate the
buffers while the hold buffer is in use).

Suggested by:	ed, ghelmer
MFC with:	r286142
2015-08-03 18:22:31 +00:00
Mark Johnston
8f980c016b The mbuf parameter to ip_output_pfil() must be an output parameter since
pfil(9) hooks may modify the chain.

X-MFC-With:	r286028
2015-08-03 17:47:02 +00:00
Mark Johnston
1c9a705223 Remove a couple of unused fields from the FBT probe struct. 2015-08-03 17:39:36 +00:00
Sean Bruno
e5cb6169d3 A misplaced #endif in ixgbe_ioctl() causes interface MTU to become
zero when INET and INET6 are undefined.

PR:		162028
Differential Revision:	https://reviews.freebsd.org/D3187
Submitted by:	hoomanfazaeli@gmail.com pluknet
Reviewed by:	erj hiren gelbius
MFC after:	2 weeks
2015-08-03 16:39:25 +00:00
Edward Tomasz Napierala
d6cc35b287 Fix panic that would happen on forcibly unmounting devfs (note that
as it is now, devfs ignores MNT_FORCE anyway, so it needs to be modified
to trigger the panic) with consumers still opened.

Note that this still results in a leak of r/w/e counters.  It seems
to be harmless, though.  If anyone knows a better way to approach
this - please tell.

Discussed with:	kib@, mav@
MFC after:	1 month
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D3050
2015-08-03 16:35:18 +00:00
Edward Tomasz Napierala
8d9ed17366 Fix a problem which made loader(8) load non-kld files twice.
For example, without this patch, the following three lines
in /boot/loader.conf would result in /boot/root.img being preloaded
twice, and two md(4) devices - md0 and md1 - being created.

initmd_load="YES"
initmd_type="md_image"
initmd_name="/boot/root.img"

Reviewed by:	marcel@
MFC after:	1 month
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D3204
2015-08-03 16:27:36 +00:00
Zbigniew Bodek
4cbca60875 Add missing exception number to EL0 sync. abort on ARM64
When doing a data abort from userland it is possible to get
more than one data abort inside the same exception level.
Add an appropriate exception number to allow nesting of
data_abort handler for EL0.

Reviewed by:   andrew
Obtained from: Semihalf
Sponsored by:  The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D3276
2015-08-03 14:58:46 +00:00
Warner Losh
75333e6435 Add pmspvc device back to GENERIC. The issues with the device playing
grabby hands with other driver's devices has been solved.

MFC After: 3 weeks
2015-08-03 13:49:46 +00:00
Ed Schouten
ee95773383 Let CloudABI use the SV_CAPSICUM flag.
CloudABI processes will now start up in capabilities mode.

Reviewed by:	kib
2015-08-03 13:42:52 +00:00
Ed Schouten
39f5ebb774 Add sysent flag to switch to capabilities mode on startup.
CloudABI processes should run in capabilities mode automatically. There
is no need to switch manually (e.g., by calling cap_enter()). Add a
flag, SV_CAPSICUM, that can be used to call into cap_enter() during
execve().

Reviewed by:	kib
2015-08-03 13:41:47 +00:00
Konstantin Belousov
f94cc23475 Clear the IA32_MISC_ENABLE MSR bit, which limits the max CPUID
reported, on APs.  We already did this on BSP.

Otherwise, the userspace software which depends on the features
reported by the high CPUID levels is misbehaving.  In particular, AVX
detection is non-functional, depending on which CPU thread happens to
execute when doing CPUID.  Another victim is the libthr signal
handlers interposer, which needs to save full FPU extended state.

Reported and tested by:	Andre Meiser <ortadur@web.de>
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
2015-08-03 12:14:42 +00:00
Julien Charbon
ff9b006d61 Decompose TCP INP_INFO lock to increase short-lived TCP connections scalability:
- The existing TCP INP_INFO lock continues to protect the global inpcb list
  stability during full list traversal (e.g. tcp_pcblist()).

- A new INP_LIST lock protects inpcb list actual modifications (inp allocation
  and free) and inpcb global counters.

It allows to use TCP INP_INFO_RLOCK lock in critical paths (e.g. tcp_input())
and INP_INFO_WLOCK only in occasional operations that walk all connections.

PR:			183659
Differential Revision:	https://reviews.freebsd.org/D2599
Reviewed by:		jhb, adrian
Tested by:		adrian, nitroboost-gmail.com
Sponsored by:		Verisign, Inc.
2015-08-03 12:13:54 +00:00
Edward Tomasz Napierala
e553ca4994 Rework the way iSCSI initiator handles system shutdown. This fixes
hangs on shutdown with LUNs with mounted filesystems over a disconnected
iSCSI session.

MFC after:	1 month
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D3052
2015-08-03 11:57:11 +00:00
Andrew Turner
f692e32555 Pass the pcb to store the vfp state in to vfp_save_state. This fixes a bug
in savectx where it will be used to store the current state however will
pass in a pcb when vfp_save_state expected a thread pointer.

Obtained from:	ABT Systems Ltd
Sponsored by:	The FreeBSD Foundation
2015-08-03 11:05:02 +00:00
Steven Hartland
ebbc56ecd6 Fix KSTACK_PAGES check in ZFS module
The check introduced by r285946 failed to add the dependency on
opt_kstack_pages.h which meant the default value for the platform instead
of the customised options KSTACK_PAGES=X was being tested.

Also wrap in #ifdef __FreeBSD__ for portability.

MFC after:	3 days
Sponsored by:	Multiplay
2015-08-03 09:34:09 +00:00
Ed Schouten
75c9f22394 Set p_osrel to __FreeBSD_version on process startup.
Certain system calls have quirks applied to make them work as if called
on an older version of FreeBSD. As CloudABI executables don't have the
FreeBSD OS release number in the ELF header, this value is set to zero,
making the system calls fall back to typically historic, non-standard
behaviour.

Reviewed by:	kib
2015-08-03 07:29:57 +00:00
Oleksandr Tymoshenko
7b25d1d63b Pass correct type of argument to ti_gpio_unmask_irq in ti_gpio_activate_resource 2015-08-03 01:22:49 +00:00
John-Mark Gurney
bba6880eab looks like all archs either have clang or cdefs included before..
drop this include as unnecessary..

Requested by:	bde
2015-08-02 21:33:40 +00:00
Warner Losh
123049cf36 Don't forget to check the vendor when probing. Also, there's no need
to double check for if the card has probed before. In fact, there's no
reason to single check either. Simplify the code as a result.
$FreeBSD$ added to lxutil.c in a non-standard way to help keep the
diffs with upstream to a minimum.

Differential Revision: https://reviews.freebsd.org/D3263
2015-08-02 16:26:41 +00:00
Michael Tuexen
e7e71dd7f3 Don't take the port numbers for packets containing ABORT chunks from
a freed mbuf. Just use them from the stcb.

MFC after: 3 days
2015-08-02 16:07:30 +00:00
Andrey V. Elsukov
51a01baf23 Properly handle IPV6_NEXTHOP socket option in selectroute().
o remove disabled code;
 o if nexthop address is link-local, use embedded scope zone id to
   determine outgoing interface;
 o properly fill ro_dst before doing route lookup;
 o remove LLE lookup, instead check rt_flags for RTF_GATEWAY bit.

Sponsored by:	Yandex LLC
2015-08-02 12:40:56 +00:00
Andrey V. Elsukov
a6f7dea1fe Remove redundant check. 2015-08-02 11:58:24 +00:00
John-Mark Gurney
70e47040b0 convert to C11's _Static_assert, and pull in sys/cdefs.h for
compatibility w/ older non-C11 compilers...

passed make tinerdbox..

Suggested by:	imp
2015-08-02 00:15:52 +00:00
Mark Johnston
48fcd357c4 Avoid dereferencing curthread->td_proc->p_cred in DTrace probe context.
When a process is exiting, there is a narrow window where p_cred may be
NULL while its threads are still executing. Specifically, the last thread
to exit a process sets the process state to PRS_ZOMBIE with the proc
spinlock held and then calls thread_exit(). thread_exit() drops the spin
lock, permitting the process to be reaped and thus causing its cred struct
to be released. However, the exiting thread may still cause DTrace probes
to fire by calling sched_throw(), resulting in a double fault if such a
probe enabling attempts to access the GID or UID DIF variables.

The thread's cred reference is not susceptible to this race since it is not
released until after the thread has exited.

MFC after:	1 week
2015-08-02 00:11:56 +00:00
Mark Johnston
ce1c953ee0 Don't modify curthread->td_locks unless INVARIANTS is enabled.
This field is only used in a KASSERT that verifies that no locks are held
when returning to user mode. Moreover, the td_locks accounting is only
correct when LOCK_DEBUG > 0, which is implied by INVARIANTS.

Reviewed by:	jhb
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D3205
2015-08-02 00:03:08 +00:00
Oleksandr Tymoshenko
c3321180ec Set output pin initial value based on pin's pinmux pullup/pulldown setup
Some of FDT blobs for AM335x-based devices use pinmux pullup/pulldown
flag to setup initial GPIO ouputp value, e.g. 4DCAPE-43 sets LCD DATAEN
signal this way. It works for Linux because Linux driver does not enforce
pin direction until after it's requested by consumer. So input with pullup
flag set acts as output with GPIO_HIGH value

Reviewed by:	loos
2015-08-01 23:10:36 +00:00
Hans Petter Selasky
577c341353 Free mbufs when busdma loading fails.
Reviewed by:	erj, sbruno
MFC after:	1 month
2015-08-01 20:40:37 +00:00
John Baldwin
98685dc8af Clear P_TRACED before reparenting a detached process back to its
original parent. Otherwise the debugee will be set as an orphan of
the debugger.

Add tests for tracing forks via PT_FOLLOW_FORK.

Reviewed by:	kib
MFC after:	2 weeks
Differential Revision:	https://reviews.freebsd.org/D2809
2015-08-01 16:27:52 +00:00
Ed Schouten
f52c3dd415 Allow CloudABI processes to create shared memory objects.
Summary:
Use the newly created `kern_shm_open()` function to create objects with
just the rights that are actually needed.

Reviewers: jhb, kib

Subscribers: imp

Differential Revision: https://reviews.freebsd.org/D3260
2015-08-01 07:51:48 +00:00
Ed Schouten
7ee1b208c3 Add kern_shm_open().
This allows you to specify the capabilities that the new file descriptor
should have. This allows us to create shared memory objects that only
have the rights we're interested in.

The idea behind restricting the rights is that it makes it a lot easier
for CloudABI to get consistent behaviour across different operating
systems. We only need to make sure that a shared memory implementation
consistently implements the operations that are whitelisted.

Approved by:	kib
Obtained from:	https://github.com/NuxiNL/freebsd
2015-08-01 07:21:14 +00:00
Luiz Otavio O Souza
f87e372ef2 Remove two unnecessary sleeps from the hot path in bpf(4).
The first one never triggers because bpf_canfreebuf() can only be true for
zero-copy buffers and zero-copy buffers are not read with read(2).

The second also never triggers, because we check the free buffer before
calling ROTATE_BUFFERS().  If the hold buffer is in use the free buffer
will be NULL and there is nothing else to do besides drop the packet.  If
the free buffer isn't NULL the hold buffer _is_ free and it is safe to
rotate the buffers.

Update the comment in ROTATE_BUFFERS macro to match the logic described
here.

While here fix a few typos in comments.

MFC after:	2 weeks
Sponsored by:	Rubicon Communications (Netgate)
2015-07-31 21:43:27 +00:00
Luiz Otavio O Souza
faa693cdbe Remove the sleep from the buffer allocation routine.
The buffer must be allocated (or even changed) before the interface is set
and thus, there is no need to verify if the buffer is in use.

MFC after:	2 weeks
Sponsored by:	Rubicon Communications (Netgate)
2015-07-31 20:25:54 +00:00
Luiz Otavio O Souza
4f42daa4a3 Do not allocate the buffers at opening of the descriptor, because once
the buffer is allocated we are committed to a particular buffer method
(BPF_BUFMODE_BUFFER in this case).

If we are using zero-copy buffers, the userland program must register its
buffers before set the interface.

If we are using kernel memory buffers, we can allocate the buffer at the
time that the interface is being set.

This fix allows the usage of BIOCSETBUFMODE after r235746.

Update the comments to reflect the recent changes.

MFC after:	2 weeks
Sponsored by:	Rubicon Communications (Netgate)
2015-07-31 20:02:12 +00:00
Andrew Turner
9e2529043b Try to put the CPU into a low power state if we failed to otherwise halt
the system.

Obtained from:	ABT Systems Ltd
Sponsored by:	The FreeBSD Foundation
2015-07-31 15:54:34 +00:00
Andrew Turner
176739d3f5 Load the stack in stack_save and stack_save_td. This uses the generalised
unwind_frame function to read each stack frame until either the pc or stack
are no longer withing the kernel's address space.

Obtained from:	ABT Systems Ltd
Sponsored by:	The FreeBSD Foundation
2015-07-31 15:32:32 +00:00
Glen Barber
45e1c1a38d Pull pmspcv (pms(4)) from GENERIC. It has PCI ID conflicts
with ahd(4), mvs(4), and likely other drivers.

MFC after:	immediately
With hat:	re
Sponsored by:	The FreeBSD Foundation
2015-07-31 15:23:48 +00:00
Andrew Turner
36baf858c9 Add support for uma_small_alloc and uma_small_free, and make use of these.
This is copied from the amd64 version with minor changes. These should be
merged into a single file as from a quick look there are other copies of
the same file in other parts of the tree.

Obtained from:	ABT Systems Ltd
Sponsored by:	The FreeBSD Foundation
2015-07-31 14:17:26 +00:00
Andrew Turner
872df66596 Add memrw. This has had minimal testing, and will likely panic the kernel
when trying to read data from outside the DMAP region. I expect this panic
to be from within uiomove_fromphys, which needs to grow support to support
such addresses.

Obtained from:	ABT Systems Ltd
Sponsored by:	The FreeBSD Foundation
2015-07-31 13:39:51 +00:00
Andrew Turner
9b8c3c4f0b Add more atomic_swap_* functions.
Obtained from:	ABT Systems Ltd
Sponsored by:	The FreeBSD Foundation
2015-07-31 13:34:43 +00:00
Andrew Turner
71d72ea14f Add VIRT_IN_DMAP to check if a virtual address is from the DMAP range.
Obtained from:	ABT Systems Ltd
Sponsored by:	The FreeBSD Foundation
2015-07-31 13:32:25 +00:00
Ed Schouten
6236e71bfe Fix accidental line wrapping introduced in r286122. 2015-07-31 10:46:45 +00:00
Ed Schouten
367a13f905 Limit rights on process descriptors.
On CloudABI, the rights bits returned by cap_rights_get() match up with
the operations that you can actually perform on the file descriptor.

Limiting the rights is good, because it makes it easier to get uniform
behaviour across different operating systems. If process descriptors on
FreeBSD would suddenly gain support for any new file operation, this
wouldn't become exposed to CloudABI processes without first extending
the rights.

Extend fork1() to gain a 'struct filecaps' argument that allows you to
construct process descriptors with custom rights. Use this in
cloudabi_sys_proc_fork() to limit the rights to just fstat() and
pdwait().

Obtained from:	https://github.com/NuxiNL/freebsd
2015-07-31 10:21:58 +00:00
Zbigniew Bodek
a2b3dfad08 Apply erratum for mrs ICC_IAR1_EL1 speculative execution on ThunderX
ERRATUM:     22978, 23154
PASS (rev.): 1.0/1.1

Reviewed by:   imp
Obtained from: Semihalf
Sponsored by:  The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D3184
2015-07-31 10:00:45 +00:00
Hans Petter Selasky
56d6361d92 Limit the number of times we loop inside the DWC OTG poll handler to
avoid starving other fast interrupts. Fix a comment while at it.

MFC after:	1 week
Suggested by:	Svatopluk Kraus <onwahe@gmail.com>
2015-07-31 09:12:31 +00:00
Andrey V. Elsukov
926381e108 Ansify if_stf.c 2015-07-31 09:04:22 +00:00
Andrey V. Elsukov
cf14ccb0f7 Remove unneded #include "opt_inet.h". 2015-07-31 09:02:28 +00:00
Ed Schouten
42642ecd97 Document the existence of cloudabi_load and cloudabi64_load. 2015-07-31 08:45:35 +00:00
John-Mark Gurney
af024d3b23 temporarily fix build.. This isn't the final fix, and testing is
still on going, but it has passed world for mips and powerpc...

I know this has an extra semicolon, but this is the patch that is
tested...

Looks like better fix is to use _Static_assert...
2015-07-31 07:48:08 +00:00
Navdeep Parhar
3d3169c858 cxgbe(4): initialize debug_flags from the kernel environment.
MFC after:	3 days
2015-07-31 04:50:47 +00:00
Konstantin Belousov
8917728875 vn_io_fault() handling of the LOR for i/o into the file-backed buffers
has observable overhead when the buffer pages are not resident or not
mapped.  The overhead comes at least from two factors, one is the
additional work needed to detect the situation, prepare and execute
the rollbacks.  Another is the consequence of the i/o splitting into
the batches of the held pages, causing filesystems see series of the
smaller i/o requests instead of the single large request.

Note that expected case of the resident i/o buffer does not expose
these issues.  Provide a prefaulting for the userspace i/o buffers,
disabled by default.  I am careful of not enabling prefaulting by
default for now, since it would be detrimental for the applications
which speculatively pass extra-large buffers of anonymous memory to
not deal with buffer sizing (if such apps exist).

Found and tested by:	bde, emaste
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2015-07-31 04:12:51 +00:00
John-Mark Gurney
42e5fcbf2b these are comparing authenticators and need to be constant time...
This could be a side channel attack...  Now that we have a function
for this, use it...

jmgurney/ipsecgcm:	24d704cc and 7f37a14
2015-07-31 00:31:52 +00:00
John-Mark Gurney
817c7ed900 Clean up this header file...
use CTASSERTs now that we have them...

Replace a draft w/ RFC that's over 10 years old.

Note that _AALG and _EALG do not need to match what the IKE daemons
think they should be..  This is part of the KABI...  I decided to
renumber AESCTR, but since we've never had working AESCTR mode, I'm
not really breaking anything..  and it shortens a loop by quite
a bit..

remove SKIPJACK IPsec support...  SKIPJACK never made it out of draft
(in 1999), only has 80bit key, NIST recommended it stop being used
after 2010, and setkey nor any of the IKE daemons I checked supported
it...

jmgurney/ipsecgcm: a357a33, c75808b, e008669, b27b6d6

Reviewed by:	gnn (earlier version)
2015-07-31 00:23:21 +00:00
Ermal Luçi
59959de526 Correct IPSec SA statistic keeping
The IPsec SA statistic keeping is used even for decision making on expiry/rekeying SAs.
When there are multiple transformations being done the statistic keeping might be wrong.

This mostly impacts multiple encapsulations on IPsec since the usual scenario it is not noticed due to the code path not taken.

Differential Revision:	https://reviews.freebsd.org/D3239
Reviewed by:		ae, gnn
Approved by:		gnn(mentor)
2015-07-30 20:56:27 +00:00
Mateusz Guzik
4ae1e3c752 Revert r285125 until rmlocks get fixed.
Right now there is a chance that sysctl unregister will cause reader to
block on the sx lock associated with sysctl rmlock, in which case kernels
with debug enabled will panic.
2015-07-30 19:52:43 +00:00
Hiren Panchasara
03041aaac8 Update snd_una description to make it more readable.
Differential Revision:	https://reviews.freebsd.org/D3179
Reviewed by:		gnn
Sponsored by:		Limelight Networks
2015-07-30 19:24:49 +00:00
Oleksandr Tymoshenko
40ebf1626f Add GPIO backlight driver compatible with Linux FDT bindings.
Brightness is controlled through sysctl dev.gpiobacklight.X.brightness:
  - any value greater than 0: backlight is on
  - any value less than or equal to  0: backlight is off

FDT bindings docs in Linux tree:
    Documentation/devicetree/bindings/video/backlight/gpio-backlight.txt
2015-07-30 19:04:14 +00:00
Craig Rodrigues
bc81f73771 Get function prototypes for msg, shm, sem functions
from header files.

Differential Revision: D2669
2015-07-30 18:59:01 +00:00
Mark Johnston
e2e45da0e8 ib mad: fix an incorrect use of list_for_each_entry
In tf_dequeue(), if we reach the end of the list without finding a
non-cancelled element, "tmp" will be a pointer into the list head, so the
tmp->canceled check is bogus. Use a flag instead.

Submitted by:	Tao Liu <Tao.Liu@isilon.com>
Reviewed by:	hselasky
MFC after:	1 week
Sponsored by:	EMC / Isilon Storage Division
Differential Revision: https://reviews.freebsd.org/D3244
2015-07-30 18:28:37 +00:00
Konstantin Belousov
6a875bf929 Do not pretend that vm_fault(9) supports unwiring the address. Rename
the VM_FAULT_CHANGE_WIRING flag to VM_FAULT_WIRE.  Assert that the
flag is only passed when faulting on the wired map entry.  Remove the
vm_page_unwire() call, which should be never reachable.

Since VM_FAULT_WIRE flag implies wired map entry, the TRYPAGER() macro
is reduced to the testing of the fs.object having a default pager.
Inline the check.

Suggested and reviewed by:	alc
Tested by:	pho (previous version)
MFC after:	1 week
2015-07-30 18:28:34 +00:00
Andrew Turner
8df0053b7a Add enough of pmap_page_set_memattr to run gstat. It still needs to split
the DMAP 1G pages so we set the attributes only on the specified page.

Obtained from:	ABT Systems Ltd
Sponsored by:	The FreeBSD Foundation
2015-07-30 16:17:44 +00:00
Konstantin Belousov
0b6476ec5b Improve comments.
Submitted by:	bde
MFC after:	2 weeks
2015-07-30 15:47:53 +00:00
Roger Pau Monné
c023d8234b vfs: fill fallout from r286076
This right operator is >= not =>.

Reported by: cem
2015-07-30 15:43:26 +00:00
Roger Pau Monné
8f89a299e2 vfs: fix off-by-one error in vfs_buf_check_mapped
The check added in r285872 can trigger for valid buffers if the buffer space
used happens to be just after unmapped_buf in KVA space.

Discussed with: kib
Sponsored by: Citrix Systems R&D
2015-07-30 15:28:06 +00:00
Ed Maste
c547d650eb Add ARM64TODO markers to unimplemented functionality
Reviewed by:	andrew
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D2389
2015-07-30 14:20:36 +00:00
Zbigniew Bodek
9028b18f75 Enable IRQ during syscalls on ARM64
FreeBSD provides a feature called Adaptive Mutexes, which allows
a thread to spin for a while when the mutex is taken instead of
immediately going to sleep. This causes issues when called from
syscall handler if interrupts are masked. If every other core
also attempts to access the same mutex there is a chance that
all of them are spinning on the same lock at the same time.
If interrupts are disabled, no kernel preemtion can occur and
the system becomes unresponsive.

This patch enables interrupts when syscall is being executed
and masks them as soon as it is completed.

Reviewed by:   andrew
Obtained from: Semihalf
Sponsored by:  The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D3246
2015-07-30 13:59:38 +00:00
Zbigniew Bodek
4d3523c2f7 Remove obsolete vendor code from Alpine platform support
This is a clean-up patch from a serie delivering support for
Annapurna Labs Alpine PoC.
The HAL files have already been added to sys/contrib/alpine-hal
so there is no need for them in the platform directory.
This patch removes obsolete files.

Reviewed by:    andrew
Obtained from:  Semihalf
Sponsored by:   Annapurna Labs
Differential Revision: https://reviews.freebsd.org/D3248
2015-07-30 13:45:34 +00:00
Andrey V. Elsukov
a5965d1513 Build if_stf(4) module only when both INET and INET6 support are enabled. 2015-07-30 10:26:43 +00:00
Colin Percival
aaebf69062 Add support for Xen blkif indirect segment I/Os. This makes it possible for
the blkfront driver to perform I/Os of up to 2 MB, subject to support from
the blkback to which it is connected and the initiation of such large I/Os
by the rest of the kernel.  In practice, the I/O size is increased from 40 kB
to 128 kB.

The changes to xen/interface/io/blkif.h consist merely of merging updates
from the upstream Xen repository.

In dev/xen/blkfront/block.h we add some convenience macros and structure
fields used for indirect-page I/Os: The device records its negotiated limit
on the number of indirect pages used, while each I/O command structure gains
permanently allocated page(s) for indirect page references and the Xen grant
references for those pages.

In dev/xen/blkfront/blkfront.c we now check in xbd_queue_cb whether a request
is small enough to handle without an indirection page, and either follow the
previous behaviour or use new code for issuing an indirect segment I/O.  In
xbd_connect we read the size of indirect segment I/Os supported by the backend
and select the maximum size we will use; then allocate the pages and Xen grant
references for each I/O command structure.  In xbd_free those grants and pages
are released.

A new loader tunable, hw.xbd.xbd_enable_indirect, can be set to 0 in order to
disable this functionality; it works by pretending that the backend does not
support this feature.  Some backends exhibit a loss of performance with large
I/Os, so users may wish to test with and without this functionality enabled.

Reviewed by:	royger
MFC after:	3 days
Relnotes:	yes
2015-07-30 03:50:01 +00:00
Luiz Otavio O Souza
8b15f615e0 Follow r256586 and rename the kernel version of the Free() macro to
R_Free().  This matches the other macros and reduces the chances to clash
with other headers.

This also fixes the build of radix.c outside of the kernel environment.

Reviewed by:	glebius
2015-07-30 02:09:03 +00:00
Konstantin Belousov
48cae112b5 Use private cache line for the locked nop in *mb() on i386.
Suggested by:	alc
Reviewed by:	alc, bde
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
2015-07-30 00:13:20 +00:00
Konstantin Belousov
dd5b64258f MFamd64 r285934: Remove store/load (= full) barrier from the i386
atomic_load_acq_*().

Noted by:	alc (long time ago)
Reviewed by:	alc, bde
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
2015-07-29 23:59:17 +00:00
John-Mark Gurney
e381fd293d const'ify an arg that we don't update... 2015-07-29 23:37:15 +00:00
Rick Macklem
25f37276e5 This patch fixes a problem where, if the NFSv4 server has a previous
unconfirmed clientid structure for the same client on the last hash list,
this old entry would not be removed/deleted. I do not think this bug would have
caused serious problems, since the new entry would have been before the old one
on the list. This old entry would have eventually been scavenged/removed.
Detected while reading the code looking for another bug.

MFC after:	3 days
2015-07-29 23:06:30 +00:00
Jim Harris
0e1fd2dda3 nvme: do not notify a consumer about failures that occur during initialization
MFC after:	3 days
Sponsored by:	Intel
2015-07-29 21:29:50 +00:00
Sean Bruno
e0fe6b4835 Add support for BCM5466 PHY
Differential Revision:	D3232
Submitted by:	kevin.bowling@kev009.com
2015-07-29 20:50:48 +00:00
Sean Bruno
79855a57e2 Remove dead functions pmap_pvdump and pads.
Differential Revision:	D3206
Submitted by:	kevin.bowling@kev009.com
Reviewed by:	alc
2015-07-29 20:47:27 +00:00
Ermal Luçi
3c40232395 Avoid double reference decrement when firewalls force relooping of packets
When firewalls force a reloop of packets and the caller supplied a route the reference to the route might be reduced twice creating issues.
This is especially the scenario when a packet is looped because of operation in the firewall but the new route lookup gives a down route.

Differential Revision:	https://reviews.freebsd.org/D3037
Reviewed by:	gnn
Approved by:	gnn(mentor)
2015-07-29 20:10:36 +00:00
Ermal Luçi
d9f2a78249 ip_output normalization and fixes
ip_output has a big chunk of code used to handle special cases with pfil consumers which also forces a reloop on it.
Gather all this code together to make it readable and properly handle the reloop cases.

Some of the issues identified:

M_IP_NEXTHOP is not handled properly in existing code.
route reference leaking is possible with in FIB number change
route flags checking is not consistent in the function

Differential Revision:	https://reviews.freebsd.org/D3022
Reviewed by:	gnn
Approved by:	gnn(mentor)
MFC after:	4 weeks
2015-07-29 18:04:01 +00:00
Patrick Kelsey
4741bfcb57 Revert r265338, r271089 and r271123 as those changes do not handle
non-inline urgent data and introduce an mbuf exhaustion attack vector
similar to FreeBSD-SA-15:15.tcp, but not requiring VNETs.

Address the issue described in FreeBSD-SA-15:15.tcp.

Reviewed by:	glebius
Approved by:	so
Approved by:	jmallett (mentor)
Security:	FreeBSD-SA-15:15.tcp
Sponsored by:	Norse Corp, Inc.
2015-07-29 17:59:13 +00:00
Ed Schouten
8328babdd0 Make pipes in CloudABI work.
Summary:
Pipes in CloudABI are unidirectional. The reason for this is that
CloudABI attempts to provide a uniform runtime environment across
different flavours of UNIX.

Instead of implementing a custom pipe that is unidirectional, we can
simply reuse Capsicum permission bits to support this. This is nice,
because CloudABI already attempts to restrict permission bits to
correspond with the operations that apply to a certain file descriptor.

Replace kern_pipe() and kern_pipe2() by a single kern_pipe() that takes
a pair of filecaps. These filecaps are passed to the newly introduced
falloc_caps() function that creates the descriptors with rights in
place.

Test Plan:
CloudABI pipes seem to be created with proper rights in place:

https://github.com/NuxiNL/cloudlibc/blob/master/src/libc/unistd/pipe_test.c#L44

Reviewers: jilles, mjg

Reviewed By: mjg

Subscribers: imp

Differential Revision: https://reviews.freebsd.org/D3236
2015-07-29 17:18:27 +00:00
Ed Schouten
e555b4309c Introduce falloc_caps() to create descriptors with capabilties in place.
falloc_noinstall() followed by finstall() allows you to create and
install file descriptors with custom capabilities. Add falloc_caps()
that can do both of these actions in one go.

This will be used by CloudABI to create pipes with custom capabilities.

Reviewed by:	mjg
2015-07-29 17:16:53 +00:00
Sean Bruno
1f6aae90ad Make Broadcom XLR use shared ds1374 RTC driver.
Remove its identical and redundant ds1374u version.

Differential Revision:	D3225
Submitted by:	kevin.bowling@kev009.com
2015-07-29 15:32:59 +00:00
Andrey V. Elsukov
10a0e0bf0a Eliminate the use of m_copydata() in gif_encapcheck().
ip_encap already has inspected mbuf's data, at least an IP header.
And it is safe to use mtod() and do direct access to needed fields.
Add M_ASSERTPKTHDR() to gif_encapcheck(), since the code expects that
mbuf has a packet header.
Move the code from gif_validate[46] into in[6]_gif_encapcheck(), also
remove "martian filters" checks. According to RFC 4213 it is enough to
verify that the source address is the address of the encapsulator, as
configured on the decapsulator.

Reviewed by:	melifaro
Obtained from:	Yandex LLC
Sponsored by:	Yandex LLC
2015-07-29 14:07:43 +00:00
Ed Schouten
9d2332c9ee Split up Capsicum to CloudABI rights conversion into two separate routines.
CloudABI's openat() ensures that files are opened with the smallest set
of relevant rights. For example, when opening a FIFO, unrelated rights
like CAP_RECV are automatically removed. To remove unrelated rights, we
can just reuse the code for this that was already present in the rights
conversion function.
2015-07-29 12:42:45 +00:00
Zbigniew Bodek
cf89e8c919 Add quirk for ThunderX ITS device table size
Limit the number of supported device IDs to 0x100000
in order to decrease the size of the ITS device table so
that it matches with the HW capabilities.

Obtained from: Semihalf
Sponsored by:  The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D3131
2015-07-29 11:22:19 +00:00
Andrey V. Elsukov
b13653baf9 Reduce overhead of ipfw's me6 opcode.
Skip checks for IPv6 multicast addresses.
Use in6_localip() for global unicast.
And for IPv6 link-local addresses do search in the IPv6 addresses list.
Since LLA are stored in the kernel internal form, use
IN6_ARE_MASKED_ADDR_EQUAL() macro with lla_mask for addresses comparison.
lla_mask has zero bits in the second word, where we keep sin6_scope_id.

Obtained from:	Yandex LLC
Sponsored by:	Yandex LLC
2015-07-29 10:53:42 +00:00
Konstantin Belousov
6cebf7e2be Move bufshutdown() out of the #ifdef INVARIANTS block. 2015-07-29 09:57:34 +00:00