Commit Graph

105579 Commits

Author SHA1 Message Date
Ed Schouten
0f85ff377b Add file_open(): the underlying system call of openat().
CloudABI purely operates on file descriptor rights (CAP_*). File
descriptor access modes (O_ACCMODE) are emulated on top of rights.

Instead of accepting the traditional flags argument, file_open() copies
in an fdstat_t object that contains the initial rights the descriptor
should have, but also file descriptor flags that should persist after
opening (APPEND, NONBLOCK, *SYNC). Only flags that don't persist (EXCL,
TRUNC, CREAT, DIRECTORY) are passed in as an argument.

file_open() first converts the rights, the persistent flags and the
non-persistent flags to fflags. It then calls into vn_open(). If
successful, it installs the file descriptor with the requested
rights, trimming off rights that don't apply to the type of
the file that has been opened.

Unlike kern_openat(), this function does not support /dev/fd/*. I can't
think of a reason why we need to support this for CloudABI.

Obtained from:	https://github.com/NuxiNL/freebsd
Differential Revision:	https://reviews.freebsd.org/D3235
2015-08-06 06:47:28 +00:00
Conrad Meyer
b5af3f30a7 nfsclient: Protest loudly when GETATTR responses are invalid
BROKEN NFS SERVER OR MIDDLEWARE: Certain WAN "accelerators" attempt to cache
NFS GETATTR traffic, but actually corrupt it (e.g., responding to requests
with attributes for totally different files).

Warn very verbosely when this is detected. Linux' NFS client has a similar
warning.

Adds a sysctl/tunable (vfs.nfs.fileid_maxwarnings) to configure the quantity
of warnings; default to 10. (Zero disables; -1 is unlimited.)

Adds a failpoint to aid in validating the warning / behavior with a
non-broken server. Use something like:

    sysctl 'debug.fail_point.nfscl_force_fileid_warning=10%return(1)'

Reviewed by:	rmacklem
Approved by:	markj (mentor)
Sponsored by:	EMC / Isilon Storage Division
Differential Revision:	https://reviews.freebsd.org/D3304
2015-08-05 22:27:30 +00:00
Alexander Motin
7d0d4342e3 Pass SYNCHRONIZE CACHE command parameters to backends.
At this point IMMED flag is translated to MNT_NOWAIT flag of VOP_FSYNC(),
hoping that file system implements that (ZFS seems doesn't).

MFC after:	2 weeks
Sponsored by:	iXsystems, Inc.
2015-08-05 22:24:49 +00:00
Alexander Motin
f2a20b166a Relax serialization of SYNCHRONIZE CACHE commands.
Before this change SYNCHRONIZE CACHE commands were executed exclusively,
as if they had ORDERED tag.  But looking through SCSI specs I've found
no any reason to be so strict.  For reads this ordering seems pointless.
For writes it looks less obvious, so I left ordering against preceeding
write commands, while following ones are no longer required to wait.

MFC after:	2 weeks
Sponsored by:	iXsystems, Inc.
2015-08-05 21:58:32 +00:00
Adrian Chadd
70c81b2077 Add a hack-around to this fatal taskqueue running whilst the NIC
is detaching.

This mostly fixes a panic - the reset path shouldn't run whilst
the NIC is being torn down.

It's not locked, so it's "mostly" ok, but most of the rest of
the driver doesn't read sc->invalid with sensible locking. Grr.

The real solution is to cleanly tear down taskqueues in the detach/suspend
phase, but ..
2015-08-05 21:22:25 +00:00
Adrian Chadd
66b870f3cb Add a missing method - ath_hal_settsf64().
This is required for TDMA slave mode.
2015-08-05 21:16:12 +00:00
Navdeep Parhar
d94b89b915 cxgbe(4): Update T5 and T4 firmwares bundled with the driver to 1.14.4.0. The
changes in the firmwares since 1.11.27.0 are listed here (straight copy-paste
from the "Release Notes.txt" accompanying the Chelsio Unified Wire 2.11.1.0
release on the website).

22.1. T5 Firmware
+++++++++++++++++++++++++++++++++

Version : 1.14.4.0
Date    : 08/05/2015
================================================================================

FIXES
-----

BASE:
- Fixes a potential data path hang by properly programming PMTX congestion
  threshold settings.
- Fixes a potential initialization error when accessing a configuration file
  stored on the flash.
- Fixes a regression where SGE resources can be miss-sized if iWARP is disabled.

ETH:
- Fixes a timing issue that would prevent CR4 links from coming up with some
  switches.

FOFCoE:
- Defers fcoe linkdown mailbox command handling till LOGO is sent.
- Updates vlan prio for all outstanding IOs during dcbx update.

ENHANCEMENTS
------------

BASE:
- Adds support for PAUSE OFF watchdog.
- Reports devlog access information in PCIE_FW_PF register 7.

ETH:
- Enhances segmentation offload to include VxLAN and Geneve.
- Adds PTP support.
- Adds new interface to allow the driver to query the VI rss table base
  addresses.
- Allows the driver to program the SGE ingrext contxt CongDrop field.

OFLD:
- Adds new interface for the driver to specify offloaded connections TCP snd
  and rcv scale factors.

iSCSI:
- Adds support for iscsi segmentatation offload (ISO).
- Adds support for iscsi t10-dif offload.

FOiSCSI:
- Sets FORCE_BIT for cut through processing for FOiSCSI.

FOFCoE:
- Adds support for FCoE BB6.
- Improves WRITE performance.

================================================================================
================================================================================

Version : 1.13.32.0
Date    : 03/25/2015
================================================================================

FIXES
-----

BASE:
- Fixes FW_CAPS_CONFIG_CMD return value on error (was positive instead of
   negative)
- Fixes FW_PARAMS_PARAM_DEV_FLOWC_BUFFIFO_SZ indication (was wrong on certain
   adapter configurations)
- Fixes config file based PL_TIMEOUT register programming

ETH:
- Fixes a potential EO UDP SEG header corruption
- Fixes an issue where 1000Base-X was not enabled correctly when using QSA
   modules

OFLD:
- Fixes timeout issue with half-open connections
- Fixes FW_FLOWC_WR processing when state is set to finwait1

FOFCoE:
- Fixes fcoe xchg leaks in linkdown/peer down path
- Fixes cleanup in FCoE linkdown and fixed buf timer flowid abuse
- Fixes fw crash by clearing fcf flowc during bye

FOiSCSI:
- Don't create a new tcp socket if ERL0 attempt has timed out.

ENHANCEMENTS
------------

BASE:
- Adds support for VFs on PFs 4 to 7
- Adds support for QPs/CQs on any physical and virtual function

ETH:
- Stops sending LACP frames on loopback interface
- Adds an AUTOEQU indication to CPL_SGE_EGR_UPDATE
- Adds support for CR4 links (BEAN/AEC on 40G TwinAx cables)

OFLD:
- Improves default settings of LAN and CLUSTER TCP timer settings
- Sends Negative Advice CPLs to software

FOISCSI:
- Adds IPv6 support for foiscsi. Keeps backward compatibility with
   old foiscsi drivers which doesn't support ipv6.

FOFCoE:
- Added fcoe debug support in flowc dump

================================================================================
================================================================================

Version : 1.12.25.0
Date    : 10/22/2014
================================================================================

FIXES
-----

BASE:
- Improves precision of the Weight Round Robing Traffic Management Algorithm
- Fixes an issue where the link would intermittently fail to come up
- Fixes an issue where adapters with an external PHY couldn't run at 100Mbps
- Fixes an issue where active optical cables were not recognized
- Fixes link advertising issues on T520-BT (speed and pause frames) that would
  cause the link to negotiate unexpected settings
- Forces link restart when auto-negotiation is disabled
- Fix an issue where pause frames wouldn't be fully disabled even if requested

ETH:
- Fixes NVGRE Segmentation Offload network header generation.

DCBX:
- Fixes an issue where some settings were not being sent to the switch
  correctly
- Fixes an issue where back-to-back DCBX port updates could get overwritten by
  FW
- Fixes a firmware crash on DCBX APP information request before link up

FOiSCSI:
- Fixes abort task leak in tmf response handling
- Fixes TCP RST handling while in iSCSI ERL0
- Fixes a firmware crash on BYE without INIT

ENHANCEMENTS
-------------

BASE:
- Adds link partner settings reporting when available
- Adds QSA support (in conjunction with QSA VPD)
- Adds T520-BT LED support
- Reports NOTSUPPORTED for modules with an unhandled identifier

DCBX:
- Adds version reporting (indicating which version FW is trying to negotiate)
- Adds IEEE support
- Reports LLDP time outs

FOiSCSI:
- Add support for multiple iSCSI DDP client
- Sends DHCP renew request when lease expires

================================================================================

22.2. T4 Firmware
+++++++++++++++++

Version : 1.14.4.0
Date    : 08/05/2015
================================================================================

FIXES
-----

BASE:
- Fixes a potential initialization error when accessing a configuration file
  stored on the flash.
- Initialize PCIE_DBG_INDIR_REQ.Enable to 0, as hardware failed to do so and
  register dumps could result in errors.

ETH:
- Fixes an issue that sometimes prevented the link from coming up in CR adapters.

ENHANCEMENTS
------------

BASE:
- Adds support for PAUSE OFF watchdog.
- Reports devlog access information in PCIE_FW_PF register 7.

ETH:
- Adds new interface to allow the driver to query the VI rss table base
  addresses.

OFLD:
- Adds new interface for the driver to specify offloaded connections TCP snd
  and rcv scale factors.

================================================================================
================================================================================

Version : 1.13.32.0
Date    : 03/25/2015
================================================================================

FIXES
-----

BASE:
- Fixes FW_CAPS_CONFIG_CMD return value on error (was positive instead of
    negative)
- Fixes FW_PARAMS_PARAM_DEV_FLOWC_BUFFIFO_SZ indication (was wrong on certain
    adapter configurations)
- Fixes config file based PL_TIMEOUT register programming

ETH:
- Fixes a potential EO UDP SEG header corruption

OFLD:
- Fixes timeout issue with half-open connections
- Fixes FW_FLOWC_WR processing when state is set to finwait1

FOiSCSI:
- Don't create a new tcp socket if ERL0 attempt has timed out.

ENHANCEMENTS
------------

ETH:
- Stops sending LACP frames on loopback interface
- Adds an AUTOEQU indication to CPL_SGE_EGR_UPDATE

OFLD:
- Improves default settings of LAN and CLUSTER TCP timer settings
- Sends Negative Advice CPLs to software

================================================================================
================================================================================

Version : 1.12.25.0
Date    : 10/22/2014
================================================================================

FIXES
-----

BASE:
- Improves precision of the Weight Round Robing Traffic Management Algorithm
- Forces link restart when auto-negotiation is disabled
- Fix an issue where pause frames wouldn't be fully disabled even if requested

DCBX:
- Fixes an issue where some settings were not being sent to the switch
  correctly
- Fixes an issue where back-to-back DCBX port updates could get overwritten by
  FW
- Fixes a firmware crash on DCBX APP information request before link up

FOiSCSI:
- Fixes abort task leak in tmf response handling
- Fixes TCP RST handling while in iSCSI ERL0
- Fixes a firmware crash on BYE without INIT

ENHANCEMENTS
------------

BASE:
- Adds link partner settings reporting when available
- Firmware now reports NOTSUPPORTED for modules with an unhandled identifier

DCBX:
- Adds version reporting (indicating which version FW is trying to negotiate)
- Adds IEEE support
- Reports LLDP time outs

FOiSCSI:
- Adds support for multiple iSCSI DDP clients
- Sends DHCP renew request when lease expires

================================================================================

Obtained from:	Chelsio Communications
MFC after:	2 weeks
Sponsored by:	Chelsio Communications
2015-08-05 19:45:11 +00:00
Adrian Chadd
711b0fa045 Add TXOP enforce support to the AR9300 HAL.
This is required for (more) correct TDMA support.  Without it, the
code tries to calculate the required guard interval based on the
current rate, and since this is an 11n NIC and people try using
11n, it calls ath_hal_computetxtime() on an 11n rate which then
panics.

This doesn't fix TDMA slave mode on AR9300 - it just makes it
have one less bug.

Reported by:	Berislav Purgar <bpurgar@gmail.com>
2015-08-05 19:32:35 +00:00
Ed Maste
fc8c856029 Rationalize BSD license on sys/*/include/in_cksum.h
Remove the advertising clause from the Regents of the University of
California's license, per the letter dated July 22, 1999.

Update clause numbering.
2015-08-05 19:05:12 +00:00
Jung-uk Kim
fb396e55da Fix more style issues.
Submitted by:	bde
2015-08-05 17:21:42 +00:00
Ed Maste
96226a9aa7 Rationalize BSD license on sys/*/include/float.h
Remove the advertising clause from the Regents of the University of
California's license, per the letter dated July 22, 1999.

Update clause numbering.
2015-08-05 17:05:35 +00:00
Ed Schouten
aaf53ab2aa Correct the previous commit: remove the DECLARE_MODULE().
It looks like a MODULE_VERSION() can also appear on its own -- there is
no need to use explicitly use DECLARE_MODULE(). Looking at other
modules, this seems common practice.
2015-08-05 16:53:49 +00:00
Ed Schouten
b6efa27589 Add DECLARE_MODULE() to the "cloudabi" kernel module.
This kernel module does not require any explicit initialization, but a
module declaration is needed to let the "cloudabi64" kernel module
automatically pull this in.

Obtained from:	https://github.com/NuxiNL/freebsd
2015-08-05 16:45:47 +00:00
Ed Schouten
36310bcd1d Make fcntl(F_SETFL) work.
The stat_put() system call can be used to modify file descriptor
attributes, such as flags, but also Capsicum permission bits. Support
for changing Capsicum bits will be added as soon as its dependent
changes have been pushed through code review.

Obtained from:	https://github.com/NuxiNL/freebsd
2015-08-05 16:15:43 +00:00
Li-Wen Hsu
79b7e3e2c2 Fix make depend in sys/modules
Reviewed by:	delphij
Differential Revision:	https://reviews.freebsd.org/D3291
2015-08-05 14:45:52 +00:00
Alexander Motin
73942c5ce0 Issue all reads of single XCOPY segment simultaneously.
During vMotion and Clone VMware by default runs multiple sequential 4MB
XCOPY requests same time.  If CTL issues reads sequentially in 1MB chunks
for each XCOPY command, reads from different commands are not detected
as sequential by serseq option code and allowed to execute simultaneously.
Such read pattern confused ZFS prefetcher, causing suboptimal disk access.
Issuing all reads same time make serseq code work properly, serializing
reads both within each XCOPY command and between them.

My tests with ZFS pool of 14 disks in RAID10 shows prefetcher efficiency
improved from 37% to 99.7%, copying speed improved by 10-60%, average
read latency reduced twice on HDD layer and by five times on zvol layer.

MFC after:	2 weeks
Sponsored by:	iXsystems, Inc.
2015-08-05 13:46:15 +00:00
Ed Schouten
2412ae2b8e Regenerate the system call table. 2015-08-05 13:10:13 +00:00
Ed Schouten
2837d9ed43 Import the latest CloudABI system call definitions and table.
We're going to need these for next code I'm going to send out for
review: support for poll() and kqueue() on CloudABI.
2015-08-05 13:09:46 +00:00
Konstantin Belousov
c8fbdcc10d Fix UP build after r286296, ensure that CPU_FOREACH() is defined.
Sponsored by:	The FreeBSD Foundation
2015-08-05 10:50:33 +00:00
Jason A. Harmening
0a3e154709 Properly sort the function declarations added in r286296
Submitted by:	alc
Approved by:	kib (mentor)
2015-08-05 10:48:32 +00:00
Ed Schouten
db1c8ee585 Add the remaining pointer size independent CloudABI socket system calls.
CloudABI uses a structure called cloudabi_sockstat_t. Think of it as
'struct stat' for sockets. It is used by functions such as
getsockname(), getpeername(), some of the getsockopt() values, etc.

This change implements the sock_stat_get() system call that returns a
copy of this structure. The accept() system call should also return a
full copy of this structure eventually, but for now we're only
interested in the peer address. Add a TODO() to make sure this is
patched up later on.

Differential Revision:	https://reviews.freebsd.org/D3218
2015-08-05 08:18:05 +00:00
Ed Schouten
4958fab8cd Allow the creation of polling descriptors (kqueues) on CloudABI. 2015-08-05 07:37:06 +00:00
Ed Schouten
a2034cc98a Allow the creation of kqueues with a restricted set of Capsicum rights.
On CloudABI we want to create file descriptors with just the minimal set
of Capsicum rights in place. The reason for this is that it makes it
easier to obtain uniform behaviour across different operating systems.

By explicitly whitelisting the operations, we can return consistent
error codes, but also prevent applications from depending OS-specific
behaviour.

Extend kern_kqueue() to take an additional struct filecaps that is
passed on to falloc_caps(). Update the existing consumers to pass in
NULL.

Differential Revision:	https://reviews.freebsd.org/D3259
2015-08-05 07:36:50 +00:00
Ed Schouten
2433a4eb04 Make it possible to implement poll(2) on top of kqueue(2).
It looks like EVFILT_READ and EVFILT_WRITE trigger under the same
conditions as poll()'s POLLRDNORM and POLLWRNORM as described by POSIX.
The only difference is that POLLRDNORM has to be triggered on regular
files unconditionally, whereas EVFILT_READ only triggers when not EOF.

Introduce a new flag, NOTE_FILE_POLL, that can be used to make
EVFILT_READ and EVFILT_WRITE behave identically to poll(). This flag
will be used by cloudlibc's poll() function.

Reviewed by:	jmg
Differential Revision:	https://reviews.freebsd.org/D3303
2015-08-05 07:34:29 +00:00
Justin Hibbits
daebf39a41 Remove one more that crept in unnecessarily from previous commit. 2015-08-05 01:52:52 +00:00
Justin Hibbits
6ee5cf50ee Remove some unnecessary includes. 2015-08-05 01:52:11 +00:00
Jason A. Harmening
713841afb2 Add two new pmap functions:
vm_offset_t pmap_quick_enter_page(vm_page_t m)
void pmap_quick_remove_page(vm_offset_t kva)

These will create and destroy a temporary, CPU-local KVA mapping of a specified page.

Guarantees:
--Will not sleep and will not fail.
--Safe to call under a non-sleepable lock or from an ithread

Restrictions:
--Not guaranteed to be safe to call from an interrupt filter or under a spin mutex on all platforms
--Current implementation does not guarantee more than one page of mapping space across all platforms. MI code should not make nested calls to pmap_quick_enter_page.
--MI code should not perform locking while holding onto a mapping created by pmap_quick_enter_page

The idea is to use this in busdma, for bounce buffer copies as well as virtually-indexed cache maintenance on mips and arm.

NOTE: the non-i386, non-amd64 implementations of these functions still need review and testing.

Reviewed by:	kib
Approved by:	kib (mentor)
Differential Revision:	http://reviews.freebsd.org/D3013
2015-08-04 19:46:13 +00:00
Rui Paulo
7b80d5ad13 BEAGLEBONE: remove dtrace from MODULES_EXTRA.
This config is already building all modules, so we don't need the
MODULES_EXTRA definition.  It was also causing problems to users who
rely on MODULES_OVERRIDE to do the right thing.

Discussed with:	ian
2015-08-04 19:04:02 +00:00
Jung-uk Kim
ca23ca33f2 Fix style(9) bugs. 2015-08-04 18:59:54 +00:00
John-Mark Gurney
a2bc81bf7c Make IPsec work with AES-GCM and AES-ICM (aka CTR) in OCF... IPsec
defines the keys differently than NIST does, so we have to muck with
key lengths and nonce/IVs to be standard compliant...

Remove the iv from secasvar as it was unused...

Add a counter protected by a mutex to ensure that the counter for GCM
and ICM will never be repeated..  This is a requirement for security..
I would use atomics, but we don't have a 64bit one on all platforms..

Fix a bug where IPsec was depending upon the OCF to ensure that the
blocksize was always at least 4 bytes to maintain alignment... Move
this logic into IPsec so changes to OCF won't break IPsec...

In one place, espx was always non-NULL, so don't test that it's
non-NULL before doing work..

minor style cleanups...

drop setting key and klen as they were not used...

Enforce that OCF won't pass invalid key lengths to AES that would
panic the machine...

This was has been tested by others too...  I tested this against
NetBSD 6.1.5 using mini-test suite in
https://github.com/jmgurney/ipseccfgs and the only things that don't
pass are keyed md5 and sha1, and 3des-deriv (setkey syntax error),
all other modes listed in setkey's man page...  The nice thing is
that NetBSD uses setkey, so same config files were used on both...

Reviewed by:	gnn
2015-08-04 17:47:11 +00:00
Konstantin Belousov
3208d3ff46 Give large kernel stack to the initial thread . Otherwise, ZFS
overflows the stack during root mount in some configurations.

Tested by:	Fabian Keil <freebsd-listen@fabiankeil.de> (previous version)
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
2015-08-04 13:50:52 +00:00
Konstantin Belousov
35dfc644f5 Copy the fencing of the algorithm to do lock-less update and reading
of the timehands, from the kern_tc.c implementation to vdso.  Add
comments giving hints where to look for the algorithm explanation.

To compensate the removal of rmb() in userspace binuptime(), add
explicit lfence instruction before rdtsc.  On i386, add usual
complications to detect SSE2 presence; assume that old CPUs which do
not implement SSE2 also execute rdtsc almost in order.

Reviewed by:	alc, bde (previous version)
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
2015-08-04 12:33:51 +00:00
Edward Tomasz Napierala
72800098bf Fix panic triggered by code like this:
open("/dev/md0", O_EXEC);

Discussed with:	kib@, mav@
MFC after:	1 month
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D3051
2015-08-04 10:40:08 +00:00
Hans Petter Selasky
5884383f19 Avoid calling into the random subsystem before it is initialized.
Sponsored by:	Mellanox Technologies
2015-08-04 09:45:10 +00:00
Edward Tomasz Napierala
57a73b26e0 Mark vgonel() as static. It was already declared static earlier;
no idea why compilers don't warn about this.

MFC after:	1 month
Sponsored by:	The FreeBSD Foundation
2015-08-04 08:51:56 +00:00
Ed Schouten
0c0964844e Let the CloudABI futex code use umtx_keys.
The CloudABI kernel still passes all of the cloudlibc unit tests.

Reviewed by:	vangyzen
Differential Revision:	https://reviews.freebsd.org/D3286
2015-08-04 06:02:03 +00:00
Ed Schouten
dc4b532479 Fix bad arithmetic in umtx_key_get() to compute object offset.
It looks like umtx_key_get() has the addition and subtraction the wrong
way around, meaning that it fails to match in certain cases. This causes
the cloudlibc unit tests to deadlock in certain cases.

Reviewed by:	kib
Differential Revision:	https://reviews.freebsd.org/D3287
2015-08-04 06:01:13 +00:00
Jung-uk Kim
45c2c9a84a Always define __va_list for amd64 and restore pre-r232261 behavior for i386.
Note it allows exotic compilers, e.g., TCC, to build with our stdio.h, etc.

PR:		201749
MFC after:	1 week
2015-08-04 00:11:39 +00:00
Luiz Otavio O Souza
9224217213 Remove the mtx_sleep() from the kqueue f_event filter.
The filter is called from the network hot path and must not sleep.

The filter runs with the descriptor lock held and does not manipulates the
buffers, so it is not necessary sleep when the hold buffer is in use.

Just ignore the hold buffer contents when it is being copied to user space
(when hold buffer in use is set).

This fix the "Sleeping thread owns a non-sleepable lock" panic when the
userland thread is too busy reading the packets from bpf(4).

PR:		200323
MFC after:	2 weeks
Sponsored by:	Rubicon Communications (Netgate)
2015-08-03 22:14:45 +00:00
Ed Schouten
52942c1eae Add missing const keyword to function parameter.
The umtx_key_get() function does not dereference the address off the
userspace object. The pointer can safely be const.
2015-08-03 21:11:33 +00:00
John Baldwin
92de34df2c kgdb uses td_oncpu to determine if a thread is running and should use
a pcb from stoppcbs[] rather than the thread's PCB.  However, exited threads
retained td_oncpu from the last time they ran, and newborn threads had their
CPU fields cleared to zero during fork and thread creation since they are
in the set of fields zeroed when threads are setup.  To fix, explicitly
update the CPU fields for exiting threads in sched_throw() to reflect the
switch out and reset the CPU fields for new threads in sched_fork_thread()
to NOCPU.

Reviewed by:	kib
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D3193
2015-08-03 20:43:36 +00:00
Alan Cox
d8015db3b5 Refinements to r281079's sequential access optimization: Prefetched pages,
which constitute the majority of the pages that are processed by
vm_fault_dontneed(), are already near the tail of the inactive queue.  Only
the pages at faulting virtual addresses are actually moved by
vm_page_advise(..., MADV_DONTNEED).  However, vm_page_advise(...,
MADV_DONTNEED) is simultaneously too aggressive and passive for the moved
pages.  It makes most of these pages too easily reclaimable, and at the same
time it leaves enough pages in the active queue to trigger pageouts by the
page daemon.  Instead, with this change, the pages at faulting virtual
addresses are moved to the tail of the inactive queue, where they are
relatively close to the pages prefetched by the same page fault.

Discussed with:	jeff
Sponsored by:	EMC / Isilon Storage Division
2015-08-03 20:30:27 +00:00
Luiz Otavio O Souza
98fa5d858c Add a KASSERT() to make sure we wont rotate the buffers twice (rotate the
buffers while the hold buffer is in use).

Suggested by:	ed, ghelmer
MFC with:	r286142
2015-08-03 18:22:31 +00:00
Mark Johnston
8f980c016b The mbuf parameter to ip_output_pfil() must be an output parameter since
pfil(9) hooks may modify the chain.

X-MFC-With:	r286028
2015-08-03 17:47:02 +00:00
Mark Johnston
1c9a705223 Remove a couple of unused fields from the FBT probe struct. 2015-08-03 17:39:36 +00:00
Sean Bruno
e5cb6169d3 A misplaced #endif in ixgbe_ioctl() causes interface MTU to become
zero when INET and INET6 are undefined.

PR:		162028
Differential Revision:	https://reviews.freebsd.org/D3187
Submitted by:	hoomanfazaeli@gmail.com pluknet
Reviewed by:	erj hiren gelbius
MFC after:	2 weeks
2015-08-03 16:39:25 +00:00
Edward Tomasz Napierala
d6cc35b287 Fix panic that would happen on forcibly unmounting devfs (note that
as it is now, devfs ignores MNT_FORCE anyway, so it needs to be modified
to trigger the panic) with consumers still opened.

Note that this still results in a leak of r/w/e counters.  It seems
to be harmless, though.  If anyone knows a better way to approach
this - please tell.

Discussed with:	kib@, mav@
MFC after:	1 month
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D3050
2015-08-03 16:35:18 +00:00
Edward Tomasz Napierala
8d9ed17366 Fix a problem which made loader(8) load non-kld files twice.
For example, without this patch, the following three lines
in /boot/loader.conf would result in /boot/root.img being preloaded
twice, and two md(4) devices - md0 and md1 - being created.

initmd_load="YES"
initmd_type="md_image"
initmd_name="/boot/root.img"

Reviewed by:	marcel@
MFC after:	1 month
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D3204
2015-08-03 16:27:36 +00:00
Zbigniew Bodek
4cbca60875 Add missing exception number to EL0 sync. abort on ARM64
When doing a data abort from userland it is possible to get
more than one data abort inside the same exception level.
Add an appropriate exception number to allow nesting of
data_abort handler for EL0.

Reviewed by:   andrew
Obtained from: Semihalf
Sponsored by:  The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D3276
2015-08-03 14:58:46 +00:00
Warner Losh
75333e6435 Add pmspvc device back to GENERIC. The issues with the device playing
grabby hands with other driver's devices has been solved.

MFC After: 3 weeks
2015-08-03 13:49:46 +00:00
Ed Schouten
ee95773383 Let CloudABI use the SV_CAPSICUM flag.
CloudABI processes will now start up in capabilities mode.

Reviewed by:	kib
2015-08-03 13:42:52 +00:00
Ed Schouten
39f5ebb774 Add sysent flag to switch to capabilities mode on startup.
CloudABI processes should run in capabilities mode automatically. There
is no need to switch manually (e.g., by calling cap_enter()). Add a
flag, SV_CAPSICUM, that can be used to call into cap_enter() during
execve().

Reviewed by:	kib
2015-08-03 13:41:47 +00:00
Konstantin Belousov
f94cc23475 Clear the IA32_MISC_ENABLE MSR bit, which limits the max CPUID
reported, on APs.  We already did this on BSP.

Otherwise, the userspace software which depends on the features
reported by the high CPUID levels is misbehaving.  In particular, AVX
detection is non-functional, depending on which CPU thread happens to
execute when doing CPUID.  Another victim is the libthr signal
handlers interposer, which needs to save full FPU extended state.

Reported and tested by:	Andre Meiser <ortadur@web.de>
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
2015-08-03 12:14:42 +00:00
Julien Charbon
ff9b006d61 Decompose TCP INP_INFO lock to increase short-lived TCP connections scalability:
- The existing TCP INP_INFO lock continues to protect the global inpcb list
  stability during full list traversal (e.g. tcp_pcblist()).

- A new INP_LIST lock protects inpcb list actual modifications (inp allocation
  and free) and inpcb global counters.

It allows to use TCP INP_INFO_RLOCK lock in critical paths (e.g. tcp_input())
and INP_INFO_WLOCK only in occasional operations that walk all connections.

PR:			183659
Differential Revision:	https://reviews.freebsd.org/D2599
Reviewed by:		jhb, adrian
Tested by:		adrian, nitroboost-gmail.com
Sponsored by:		Verisign, Inc.
2015-08-03 12:13:54 +00:00
Edward Tomasz Napierala
e553ca4994 Rework the way iSCSI initiator handles system shutdown. This fixes
hangs on shutdown with LUNs with mounted filesystems over a disconnected
iSCSI session.

MFC after:	1 month
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D3052
2015-08-03 11:57:11 +00:00
Andrew Turner
f692e32555 Pass the pcb to store the vfp state in to vfp_save_state. This fixes a bug
in savectx where it will be used to store the current state however will
pass in a pcb when vfp_save_state expected a thread pointer.

Obtained from:	ABT Systems Ltd
Sponsored by:	The FreeBSD Foundation
2015-08-03 11:05:02 +00:00
Steven Hartland
ebbc56ecd6 Fix KSTACK_PAGES check in ZFS module
The check introduced by r285946 failed to add the dependency on
opt_kstack_pages.h which meant the default value for the platform instead
of the customised options KSTACK_PAGES=X was being tested.

Also wrap in #ifdef __FreeBSD__ for portability.

MFC after:	3 days
Sponsored by:	Multiplay
2015-08-03 09:34:09 +00:00
Ed Schouten
75c9f22394 Set p_osrel to __FreeBSD_version on process startup.
Certain system calls have quirks applied to make them work as if called
on an older version of FreeBSD. As CloudABI executables don't have the
FreeBSD OS release number in the ELF header, this value is set to zero,
making the system calls fall back to typically historic, non-standard
behaviour.

Reviewed by:	kib
2015-08-03 07:29:57 +00:00
Oleksandr Tymoshenko
7b25d1d63b Pass correct type of argument to ti_gpio_unmask_irq in ti_gpio_activate_resource 2015-08-03 01:22:49 +00:00
John-Mark Gurney
bba6880eab looks like all archs either have clang or cdefs included before..
drop this include as unnecessary..

Requested by:	bde
2015-08-02 21:33:40 +00:00
Warner Losh
123049cf36 Don't forget to check the vendor when probing. Also, there's no need
to double check for if the card has probed before. In fact, there's no
reason to single check either. Simplify the code as a result.
$FreeBSD$ added to lxutil.c in a non-standard way to help keep the
diffs with upstream to a minimum.

Differential Revision: https://reviews.freebsd.org/D3263
2015-08-02 16:26:41 +00:00
Michael Tuexen
e7e71dd7f3 Don't take the port numbers for packets containing ABORT chunks from
a freed mbuf. Just use them from the stcb.

MFC after: 3 days
2015-08-02 16:07:30 +00:00
Andrey V. Elsukov
51a01baf23 Properly handle IPV6_NEXTHOP socket option in selectroute().
o remove disabled code;
 o if nexthop address is link-local, use embedded scope zone id to
   determine outgoing interface;
 o properly fill ro_dst before doing route lookup;
 o remove LLE lookup, instead check rt_flags for RTF_GATEWAY bit.

Sponsored by:	Yandex LLC
2015-08-02 12:40:56 +00:00
Andrey V. Elsukov
a6f7dea1fe Remove redundant check. 2015-08-02 11:58:24 +00:00
John-Mark Gurney
70e47040b0 convert to C11's _Static_assert, and pull in sys/cdefs.h for
compatibility w/ older non-C11 compilers...

passed make tinerdbox..

Suggested by:	imp
2015-08-02 00:15:52 +00:00
Mark Johnston
48fcd357c4 Avoid dereferencing curthread->td_proc->p_cred in DTrace probe context.
When a process is exiting, there is a narrow window where p_cred may be
NULL while its threads are still executing. Specifically, the last thread
to exit a process sets the process state to PRS_ZOMBIE with the proc
spinlock held and then calls thread_exit(). thread_exit() drops the spin
lock, permitting the process to be reaped and thus causing its cred struct
to be released. However, the exiting thread may still cause DTrace probes
to fire by calling sched_throw(), resulting in a double fault if such a
probe enabling attempts to access the GID or UID DIF variables.

The thread's cred reference is not susceptible to this race since it is not
released until after the thread has exited.

MFC after:	1 week
2015-08-02 00:11:56 +00:00
Mark Johnston
ce1c953ee0 Don't modify curthread->td_locks unless INVARIANTS is enabled.
This field is only used in a KASSERT that verifies that no locks are held
when returning to user mode. Moreover, the td_locks accounting is only
correct when LOCK_DEBUG > 0, which is implied by INVARIANTS.

Reviewed by:	jhb
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D3205
2015-08-02 00:03:08 +00:00
Oleksandr Tymoshenko
c3321180ec Set output pin initial value based on pin's pinmux pullup/pulldown setup
Some of FDT blobs for AM335x-based devices use pinmux pullup/pulldown
flag to setup initial GPIO ouputp value, e.g. 4DCAPE-43 sets LCD DATAEN
signal this way. It works for Linux because Linux driver does not enforce
pin direction until after it's requested by consumer. So input with pullup
flag set acts as output with GPIO_HIGH value

Reviewed by:	loos
2015-08-01 23:10:36 +00:00
Hans Petter Selasky
577c341353 Free mbufs when busdma loading fails.
Reviewed by:	erj, sbruno
MFC after:	1 month
2015-08-01 20:40:37 +00:00
John Baldwin
98685dc8af Clear P_TRACED before reparenting a detached process back to its
original parent. Otherwise the debugee will be set as an orphan of
the debugger.

Add tests for tracing forks via PT_FOLLOW_FORK.

Reviewed by:	kib
MFC after:	2 weeks
Differential Revision:	https://reviews.freebsd.org/D2809
2015-08-01 16:27:52 +00:00
Ed Schouten
f52c3dd415 Allow CloudABI processes to create shared memory objects.
Summary:
Use the newly created `kern_shm_open()` function to create objects with
just the rights that are actually needed.

Reviewers: jhb, kib

Subscribers: imp

Differential Revision: https://reviews.freebsd.org/D3260
2015-08-01 07:51:48 +00:00
Ed Schouten
7ee1b208c3 Add kern_shm_open().
This allows you to specify the capabilities that the new file descriptor
should have. This allows us to create shared memory objects that only
have the rights we're interested in.

The idea behind restricting the rights is that it makes it a lot easier
for CloudABI to get consistent behaviour across different operating
systems. We only need to make sure that a shared memory implementation
consistently implements the operations that are whitelisted.

Approved by:	kib
Obtained from:	https://github.com/NuxiNL/freebsd
2015-08-01 07:21:14 +00:00
Luiz Otavio O Souza
f87e372ef2 Remove two unnecessary sleeps from the hot path in bpf(4).
The first one never triggers because bpf_canfreebuf() can only be true for
zero-copy buffers and zero-copy buffers are not read with read(2).

The second also never triggers, because we check the free buffer before
calling ROTATE_BUFFERS().  If the hold buffer is in use the free buffer
will be NULL and there is nothing else to do besides drop the packet.  If
the free buffer isn't NULL the hold buffer _is_ free and it is safe to
rotate the buffers.

Update the comment in ROTATE_BUFFERS macro to match the logic described
here.

While here fix a few typos in comments.

MFC after:	2 weeks
Sponsored by:	Rubicon Communications (Netgate)
2015-07-31 21:43:27 +00:00
Luiz Otavio O Souza
faa693cdbe Remove the sleep from the buffer allocation routine.
The buffer must be allocated (or even changed) before the interface is set
and thus, there is no need to verify if the buffer is in use.

MFC after:	2 weeks
Sponsored by:	Rubicon Communications (Netgate)
2015-07-31 20:25:54 +00:00
Luiz Otavio O Souza
4f42daa4a3 Do not allocate the buffers at opening of the descriptor, because once
the buffer is allocated we are committed to a particular buffer method
(BPF_BUFMODE_BUFFER in this case).

If we are using zero-copy buffers, the userland program must register its
buffers before set the interface.

If we are using kernel memory buffers, we can allocate the buffer at the
time that the interface is being set.

This fix allows the usage of BIOCSETBUFMODE after r235746.

Update the comments to reflect the recent changes.

MFC after:	2 weeks
Sponsored by:	Rubicon Communications (Netgate)
2015-07-31 20:02:12 +00:00
Andrew Turner
9e2529043b Try to put the CPU into a low power state if we failed to otherwise halt
the system.

Obtained from:	ABT Systems Ltd
Sponsored by:	The FreeBSD Foundation
2015-07-31 15:54:34 +00:00
Andrew Turner
176739d3f5 Load the stack in stack_save and stack_save_td. This uses the generalised
unwind_frame function to read each stack frame until either the pc or stack
are no longer withing the kernel's address space.

Obtained from:	ABT Systems Ltd
Sponsored by:	The FreeBSD Foundation
2015-07-31 15:32:32 +00:00
Glen Barber
45e1c1a38d Pull pmspcv (pms(4)) from GENERIC. It has PCI ID conflicts
with ahd(4), mvs(4), and likely other drivers.

MFC after:	immediately
With hat:	re
Sponsored by:	The FreeBSD Foundation
2015-07-31 15:23:48 +00:00
Andrew Turner
36baf858c9 Add support for uma_small_alloc and uma_small_free, and make use of these.
This is copied from the amd64 version with minor changes. These should be
merged into a single file as from a quick look there are other copies of
the same file in other parts of the tree.

Obtained from:	ABT Systems Ltd
Sponsored by:	The FreeBSD Foundation
2015-07-31 14:17:26 +00:00
Andrew Turner
872df66596 Add memrw. This has had minimal testing, and will likely panic the kernel
when trying to read data from outside the DMAP region. I expect this panic
to be from within uiomove_fromphys, which needs to grow support to support
such addresses.

Obtained from:	ABT Systems Ltd
Sponsored by:	The FreeBSD Foundation
2015-07-31 13:39:51 +00:00
Andrew Turner
9b8c3c4f0b Add more atomic_swap_* functions.
Obtained from:	ABT Systems Ltd
Sponsored by:	The FreeBSD Foundation
2015-07-31 13:34:43 +00:00
Andrew Turner
71d72ea14f Add VIRT_IN_DMAP to check if a virtual address is from the DMAP range.
Obtained from:	ABT Systems Ltd
Sponsored by:	The FreeBSD Foundation
2015-07-31 13:32:25 +00:00
Ed Schouten
6236e71bfe Fix accidental line wrapping introduced in r286122. 2015-07-31 10:46:45 +00:00
Ed Schouten
367a13f905 Limit rights on process descriptors.
On CloudABI, the rights bits returned by cap_rights_get() match up with
the operations that you can actually perform on the file descriptor.

Limiting the rights is good, because it makes it easier to get uniform
behaviour across different operating systems. If process descriptors on
FreeBSD would suddenly gain support for any new file operation, this
wouldn't become exposed to CloudABI processes without first extending
the rights.

Extend fork1() to gain a 'struct filecaps' argument that allows you to
construct process descriptors with custom rights. Use this in
cloudabi_sys_proc_fork() to limit the rights to just fstat() and
pdwait().

Obtained from:	https://github.com/NuxiNL/freebsd
2015-07-31 10:21:58 +00:00
Zbigniew Bodek
a2b3dfad08 Apply erratum for mrs ICC_IAR1_EL1 speculative execution on ThunderX
ERRATUM:     22978, 23154
PASS (rev.): 1.0/1.1

Reviewed by:   imp
Obtained from: Semihalf
Sponsored by:  The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D3184
2015-07-31 10:00:45 +00:00
Hans Petter Selasky
56d6361d92 Limit the number of times we loop inside the DWC OTG poll handler to
avoid starving other fast interrupts. Fix a comment while at it.

MFC after:	1 week
Suggested by:	Svatopluk Kraus <onwahe@gmail.com>
2015-07-31 09:12:31 +00:00
Andrey V. Elsukov
926381e108 Ansify if_stf.c 2015-07-31 09:04:22 +00:00
Andrey V. Elsukov
cf14ccb0f7 Remove unneded #include "opt_inet.h". 2015-07-31 09:02:28 +00:00
Ed Schouten
42642ecd97 Document the existence of cloudabi_load and cloudabi64_load. 2015-07-31 08:45:35 +00:00
John-Mark Gurney
af024d3b23 temporarily fix build.. This isn't the final fix, and testing is
still on going, but it has passed world for mips and powerpc...

I know this has an extra semicolon, but this is the patch that is
tested...

Looks like better fix is to use _Static_assert...
2015-07-31 07:48:08 +00:00
Navdeep Parhar
3d3169c858 cxgbe(4): initialize debug_flags from the kernel environment.
MFC after:	3 days
2015-07-31 04:50:47 +00:00
Konstantin Belousov
8917728875 vn_io_fault() handling of the LOR for i/o into the file-backed buffers
has observable overhead when the buffer pages are not resident or not
mapped.  The overhead comes at least from two factors, one is the
additional work needed to detect the situation, prepare and execute
the rollbacks.  Another is the consequence of the i/o splitting into
the batches of the held pages, causing filesystems see series of the
smaller i/o requests instead of the single large request.

Note that expected case of the resident i/o buffer does not expose
these issues.  Provide a prefaulting for the userspace i/o buffers,
disabled by default.  I am careful of not enabling prefaulting by
default for now, since it would be detrimental for the applications
which speculatively pass extra-large buffers of anonymous memory to
not deal with buffer sizing (if such apps exist).

Found and tested by:	bde, emaste
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2015-07-31 04:12:51 +00:00
John-Mark Gurney
42e5fcbf2b these are comparing authenticators and need to be constant time...
This could be a side channel attack...  Now that we have a function
for this, use it...

jmgurney/ipsecgcm:	24d704cc and 7f37a14
2015-07-31 00:31:52 +00:00
John-Mark Gurney
817c7ed900 Clean up this header file...
use CTASSERTs now that we have them...

Replace a draft w/ RFC that's over 10 years old.

Note that _AALG and _EALG do not need to match what the IKE daemons
think they should be..  This is part of the KABI...  I decided to
renumber AESCTR, but since we've never had working AESCTR mode, I'm
not really breaking anything..  and it shortens a loop by quite
a bit..

remove SKIPJACK IPsec support...  SKIPJACK never made it out of draft
(in 1999), only has 80bit key, NIST recommended it stop being used
after 2010, and setkey nor any of the IKE daemons I checked supported
it...

jmgurney/ipsecgcm: a357a33, c75808b, e008669, b27b6d6

Reviewed by:	gnn (earlier version)
2015-07-31 00:23:21 +00:00
Ermal Luçi
59959de526 Correct IPSec SA statistic keeping
The IPsec SA statistic keeping is used even for decision making on expiry/rekeying SAs.
When there are multiple transformations being done the statistic keeping might be wrong.

This mostly impacts multiple encapsulations on IPsec since the usual scenario it is not noticed due to the code path not taken.

Differential Revision:	https://reviews.freebsd.org/D3239
Reviewed by:		ae, gnn
Approved by:		gnn(mentor)
2015-07-30 20:56:27 +00:00
Mateusz Guzik
4ae1e3c752 Revert r285125 until rmlocks get fixed.
Right now there is a chance that sysctl unregister will cause reader to
block on the sx lock associated with sysctl rmlock, in which case kernels
with debug enabled will panic.
2015-07-30 19:52:43 +00:00
Hiren Panchasara
03041aaac8 Update snd_una description to make it more readable.
Differential Revision:	https://reviews.freebsd.org/D3179
Reviewed by:		gnn
Sponsored by:		Limelight Networks
2015-07-30 19:24:49 +00:00
Oleksandr Tymoshenko
40ebf1626f Add GPIO backlight driver compatible with Linux FDT bindings.
Brightness is controlled through sysctl dev.gpiobacklight.X.brightness:
  - any value greater than 0: backlight is on
  - any value less than or equal to  0: backlight is off

FDT bindings docs in Linux tree:
    Documentation/devicetree/bindings/video/backlight/gpio-backlight.txt
2015-07-30 19:04:14 +00:00
Craig Rodrigues
bc81f73771 Get function prototypes for msg, shm, sem functions
from header files.

Differential Revision: D2669
2015-07-30 18:59:01 +00:00
Mark Johnston
e2e45da0e8 ib mad: fix an incorrect use of list_for_each_entry
In tf_dequeue(), if we reach the end of the list without finding a
non-cancelled element, "tmp" will be a pointer into the list head, so the
tmp->canceled check is bogus. Use a flag instead.

Submitted by:	Tao Liu <Tao.Liu@isilon.com>
Reviewed by:	hselasky
MFC after:	1 week
Sponsored by:	EMC / Isilon Storage Division
Differential Revision: https://reviews.freebsd.org/D3244
2015-07-30 18:28:37 +00:00
Konstantin Belousov
6a875bf929 Do not pretend that vm_fault(9) supports unwiring the address. Rename
the VM_FAULT_CHANGE_WIRING flag to VM_FAULT_WIRE.  Assert that the
flag is only passed when faulting on the wired map entry.  Remove the
vm_page_unwire() call, which should be never reachable.

Since VM_FAULT_WIRE flag implies wired map entry, the TRYPAGER() macro
is reduced to the testing of the fs.object having a default pager.
Inline the check.

Suggested and reviewed by:	alc
Tested by:	pho (previous version)
MFC after:	1 week
2015-07-30 18:28:34 +00:00
Andrew Turner
8df0053b7a Add enough of pmap_page_set_memattr to run gstat. It still needs to split
the DMAP 1G pages so we set the attributes only on the specified page.

Obtained from:	ABT Systems Ltd
Sponsored by:	The FreeBSD Foundation
2015-07-30 16:17:44 +00:00
Konstantin Belousov
0b6476ec5b Improve comments.
Submitted by:	bde
MFC after:	2 weeks
2015-07-30 15:47:53 +00:00
Roger Pau Monné
c023d8234b vfs: fill fallout from r286076
This right operator is >= not =>.

Reported by: cem
2015-07-30 15:43:26 +00:00
Roger Pau Monné
8f89a299e2 vfs: fix off-by-one error in vfs_buf_check_mapped
The check added in r285872 can trigger for valid buffers if the buffer space
used happens to be just after unmapped_buf in KVA space.

Discussed with: kib
Sponsored by: Citrix Systems R&D
2015-07-30 15:28:06 +00:00
Ed Maste
c547d650eb Add ARM64TODO markers to unimplemented functionality
Reviewed by:	andrew
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D2389
2015-07-30 14:20:36 +00:00
Zbigniew Bodek
9028b18f75 Enable IRQ during syscalls on ARM64
FreeBSD provides a feature called Adaptive Mutexes, which allows
a thread to spin for a while when the mutex is taken instead of
immediately going to sleep. This causes issues when called from
syscall handler if interrupts are masked. If every other core
also attempts to access the same mutex there is a chance that
all of them are spinning on the same lock at the same time.
If interrupts are disabled, no kernel preemtion can occur and
the system becomes unresponsive.

This patch enables interrupts when syscall is being executed
and masks them as soon as it is completed.

Reviewed by:   andrew
Obtained from: Semihalf
Sponsored by:  The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D3246
2015-07-30 13:59:38 +00:00
Zbigniew Bodek
4d3523c2f7 Remove obsolete vendor code from Alpine platform support
This is a clean-up patch from a serie delivering support for
Annapurna Labs Alpine PoC.
The HAL files have already been added to sys/contrib/alpine-hal
so there is no need for them in the platform directory.
This patch removes obsolete files.

Reviewed by:    andrew
Obtained from:  Semihalf
Sponsored by:   Annapurna Labs
Differential Revision: https://reviews.freebsd.org/D3248
2015-07-30 13:45:34 +00:00
Andrey V. Elsukov
a5965d1513 Build if_stf(4) module only when both INET and INET6 support are enabled. 2015-07-30 10:26:43 +00:00
Colin Percival
aaebf69062 Add support for Xen blkif indirect segment I/Os. This makes it possible for
the blkfront driver to perform I/Os of up to 2 MB, subject to support from
the blkback to which it is connected and the initiation of such large I/Os
by the rest of the kernel.  In practice, the I/O size is increased from 40 kB
to 128 kB.

The changes to xen/interface/io/blkif.h consist merely of merging updates
from the upstream Xen repository.

In dev/xen/blkfront/block.h we add some convenience macros and structure
fields used for indirect-page I/Os: The device records its negotiated limit
on the number of indirect pages used, while each I/O command structure gains
permanently allocated page(s) for indirect page references and the Xen grant
references for those pages.

In dev/xen/blkfront/blkfront.c we now check in xbd_queue_cb whether a request
is small enough to handle without an indirection page, and either follow the
previous behaviour or use new code for issuing an indirect segment I/O.  In
xbd_connect we read the size of indirect segment I/Os supported by the backend
and select the maximum size we will use; then allocate the pages and Xen grant
references for each I/O command structure.  In xbd_free those grants and pages
are released.

A new loader tunable, hw.xbd.xbd_enable_indirect, can be set to 0 in order to
disable this functionality; it works by pretending that the backend does not
support this feature.  Some backends exhibit a loss of performance with large
I/Os, so users may wish to test with and without this functionality enabled.

Reviewed by:	royger
MFC after:	3 days
Relnotes:	yes
2015-07-30 03:50:01 +00:00
Luiz Otavio O Souza
8b15f615e0 Follow r256586 and rename the kernel version of the Free() macro to
R_Free().  This matches the other macros and reduces the chances to clash
with other headers.

This also fixes the build of radix.c outside of the kernel environment.

Reviewed by:	glebius
2015-07-30 02:09:03 +00:00
Konstantin Belousov
48cae112b5 Use private cache line for the locked nop in *mb() on i386.
Suggested by:	alc
Reviewed by:	alc, bde
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
2015-07-30 00:13:20 +00:00
Konstantin Belousov
dd5b64258f MFamd64 r285934: Remove store/load (= full) barrier from the i386
atomic_load_acq_*().

Noted by:	alc (long time ago)
Reviewed by:	alc, bde
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
2015-07-29 23:59:17 +00:00
John-Mark Gurney
e381fd293d const'ify an arg that we don't update... 2015-07-29 23:37:15 +00:00
Rick Macklem
25f37276e5 This patch fixes a problem where, if the NFSv4 server has a previous
unconfirmed clientid structure for the same client on the last hash list,
this old entry would not be removed/deleted. I do not think this bug would have
caused serious problems, since the new entry would have been before the old one
on the list. This old entry would have eventually been scavenged/removed.
Detected while reading the code looking for another bug.

MFC after:	3 days
2015-07-29 23:06:30 +00:00
Jim Harris
0e1fd2dda3 nvme: do not notify a consumer about failures that occur during initialization
MFC after:	3 days
Sponsored by:	Intel
2015-07-29 21:29:50 +00:00
Sean Bruno
e0fe6b4835 Add support for BCM5466 PHY
Differential Revision:	D3232
Submitted by:	kevin.bowling@kev009.com
2015-07-29 20:50:48 +00:00
Sean Bruno
79855a57e2 Remove dead functions pmap_pvdump and pads.
Differential Revision:	D3206
Submitted by:	kevin.bowling@kev009.com
Reviewed by:	alc
2015-07-29 20:47:27 +00:00
Ermal Luçi
3c40232395 Avoid double reference decrement when firewalls force relooping of packets
When firewalls force a reloop of packets and the caller supplied a route the reference to the route might be reduced twice creating issues.
This is especially the scenario when a packet is looped because of operation in the firewall but the new route lookup gives a down route.

Differential Revision:	https://reviews.freebsd.org/D3037
Reviewed by:	gnn
Approved by:	gnn(mentor)
2015-07-29 20:10:36 +00:00
Ermal Luçi
d9f2a78249 ip_output normalization and fixes
ip_output has a big chunk of code used to handle special cases with pfil consumers which also forces a reloop on it.
Gather all this code together to make it readable and properly handle the reloop cases.

Some of the issues identified:

M_IP_NEXTHOP is not handled properly in existing code.
route reference leaking is possible with in FIB number change
route flags checking is not consistent in the function

Differential Revision:	https://reviews.freebsd.org/D3022
Reviewed by:	gnn
Approved by:	gnn(mentor)
MFC after:	4 weeks
2015-07-29 18:04:01 +00:00
Patrick Kelsey
4741bfcb57 Revert r265338, r271089 and r271123 as those changes do not handle
non-inline urgent data and introduce an mbuf exhaustion attack vector
similar to FreeBSD-SA-15:15.tcp, but not requiring VNETs.

Address the issue described in FreeBSD-SA-15:15.tcp.

Reviewed by:	glebius
Approved by:	so
Approved by:	jmallett (mentor)
Security:	FreeBSD-SA-15:15.tcp
Sponsored by:	Norse Corp, Inc.
2015-07-29 17:59:13 +00:00
Ed Schouten
8328babdd0 Make pipes in CloudABI work.
Summary:
Pipes in CloudABI are unidirectional. The reason for this is that
CloudABI attempts to provide a uniform runtime environment across
different flavours of UNIX.

Instead of implementing a custom pipe that is unidirectional, we can
simply reuse Capsicum permission bits to support this. This is nice,
because CloudABI already attempts to restrict permission bits to
correspond with the operations that apply to a certain file descriptor.

Replace kern_pipe() and kern_pipe2() by a single kern_pipe() that takes
a pair of filecaps. These filecaps are passed to the newly introduced
falloc_caps() function that creates the descriptors with rights in
place.

Test Plan:
CloudABI pipes seem to be created with proper rights in place:

https://github.com/NuxiNL/cloudlibc/blob/master/src/libc/unistd/pipe_test.c#L44

Reviewers: jilles, mjg

Reviewed By: mjg

Subscribers: imp

Differential Revision: https://reviews.freebsd.org/D3236
2015-07-29 17:18:27 +00:00
Ed Schouten
e555b4309c Introduce falloc_caps() to create descriptors with capabilties in place.
falloc_noinstall() followed by finstall() allows you to create and
install file descriptors with custom capabilities. Add falloc_caps()
that can do both of these actions in one go.

This will be used by CloudABI to create pipes with custom capabilities.

Reviewed by:	mjg
2015-07-29 17:16:53 +00:00
Sean Bruno
1f6aae90ad Make Broadcom XLR use shared ds1374 RTC driver.
Remove its identical and redundant ds1374u version.

Differential Revision:	D3225
Submitted by:	kevin.bowling@kev009.com
2015-07-29 15:32:59 +00:00
Andrey V. Elsukov
10a0e0bf0a Eliminate the use of m_copydata() in gif_encapcheck().
ip_encap already has inspected mbuf's data, at least an IP header.
And it is safe to use mtod() and do direct access to needed fields.
Add M_ASSERTPKTHDR() to gif_encapcheck(), since the code expects that
mbuf has a packet header.
Move the code from gif_validate[46] into in[6]_gif_encapcheck(), also
remove "martian filters" checks. According to RFC 4213 it is enough to
verify that the source address is the address of the encapsulator, as
configured on the decapsulator.

Reviewed by:	melifaro
Obtained from:	Yandex LLC
Sponsored by:	Yandex LLC
2015-07-29 14:07:43 +00:00
Ed Schouten
9d2332c9ee Split up Capsicum to CloudABI rights conversion into two separate routines.
CloudABI's openat() ensures that files are opened with the smallest set
of relevant rights. For example, when opening a FIFO, unrelated rights
like CAP_RECV are automatically removed. To remove unrelated rights, we
can just reuse the code for this that was already present in the rights
conversion function.
2015-07-29 12:42:45 +00:00
Zbigniew Bodek
cf89e8c919 Add quirk for ThunderX ITS device table size
Limit the number of supported device IDs to 0x100000
in order to decrease the size of the ITS device table so
that it matches with the HW capabilities.

Obtained from: Semihalf
Sponsored by:  The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D3131
2015-07-29 11:22:19 +00:00
Andrey V. Elsukov
b13653baf9 Reduce overhead of ipfw's me6 opcode.
Skip checks for IPv6 multicast addresses.
Use in6_localip() for global unicast.
And for IPv6 link-local addresses do search in the IPv6 addresses list.
Since LLA are stored in the kernel internal form, use
IN6_ARE_MASKED_ADDR_EQUAL() macro with lla_mask for addresses comparison.
lla_mask has zero bits in the second word, where we keep sin6_scope_id.

Obtained from:	Yandex LLC
Sponsored by:	Yandex LLC
2015-07-29 10:53:42 +00:00
Konstantin Belousov
6cebf7e2be Move bufshutdown() out of the #ifdef INVARIANTS block. 2015-07-29 09:57:34 +00:00
Andrey V. Elsukov
cc0a3c8ca4 Convert in_ifaddr_lock and in6_ifaddr_lock to rmlock.
Both are used to protect access to IP addresses lists and they can be
acquired for reading several times per packet. To reduce lock contention
it is better to use rmlock here.

Reviewed by:	gnn (previous version)
Obtained from:	Yandex LLC
Sponsored by:	Yandex LLC
Differential Revision:	https://reviews.freebsd.org/D3149
2015-07-29 08:12:05 +00:00
John-Mark Gurney
a09a7146a7 RFC4868 section 2.3 requires that the output be half... This fixes
problems that was introduced in r285336...  I have verified that
HMAC-SHA2-256 both ah only and w/ AES-CBC interoperate w/ a NetBSD
6.1.5 vm...

Reviewed by:	gnn
2015-07-29 07:15:16 +00:00
Kristof Provost
48c29b118e pf: Always initialise pf_fragment.fr_flags
When we allocate the struct pf_fragment in pf_fillup_fragment() we forgot to
initialise the fr_flags field. As a result we sometimes mistakenly thought the
fragment to not be a buffered fragment. This resulted in panics because we'd end
up freeing the pf_fragment but not removing it from V_pf_fragqueue (believing it
to be part of V_pf_cachequeue).
The next time we iterated V_pf_fragqueue we'd use a freed object and panic.

While here also fix a pf_fragment use after free in pf_normalize_ip().
pf_reassemble() frees the pf_fragment, so we can't use it any more.

PR:		201879, 201932
MFC after:	5 days
2015-07-29 06:35:36 +00:00
Ed Schouten
3720b82fa8 Implement CloudABI's readdir().
Summary:
CloudABI's readdir() system call could be thought of as a mixture
between FreeBSD's getdents(2) and pread(). Instead of using the file
descriptor offset, userspace provides a 64-bit cloudabi_dircookie_t
continue reading at a given point. CLOUDABI_DIRCOOKIE_START, having
value 0, can be used to return entries at the start of the directory.

The file descriptor offset is not used to store the cookie for the
reason that in a file descriptor centric environment, it would make
sense to allow concurrent use of a single file descriptor.

The remaining space returned by the system call should be filled with a
partially truncated copy of the next entry. The advantage of doing this
is that it gracefully deals with long filenames. If the C library
provides a buffer that is too small to hold a single entry, it can still
extract the directory entry header, meaning that it can retry the read
with a larger buffer or skip it using the cookie.

Test Plan:
This implementation passes the cloudlibc unit tests at:

	https://github.com/NuxiNL/cloudlibc/tree/master/src/libc/dirent

Reviewers: marcel, kib

Reviewed By: kib

Subscribers: imp

Differential Revision: https://reviews.freebsd.org/D3226
2015-07-29 06:31:44 +00:00
Jeff Roberson
7d07bfd8a3 - Remove some dead code copied from ffs. 2015-07-29 03:06:08 +00:00
Jeff Roberson
98082691bb - Make 'struct buf *buf' private to vfs_bio.c. Having a global variable
'buf' is inconvenient and has lead me to some irritating to discover
   bugs over the years.  It also makes it more challenging to refactor
   the buf allocation system.
 - Move swbuf and declare it as an extern in vfs_bio.c.  This is still
   not perfect but better than it was before.
 - Eliminate the unused ffs function that relied on knowledge of the buf
   array.
 - Move the shutdown code that iterates over the buf array into vfs_bio.c.

Reviewed by:	kib
Sponsored by:	EMC / Isilon Storage Division
2015-07-29 02:26:57 +00:00
Jean-Sébastien Pédron
133362912c drm/i915: Sort functions in i915_gem.c to match Linux 3.8's ordering
While here, reduce the style diff with Linux.

There is no functional change. The goal is to ease the future update to
Linux 3.8's i915 driver.

MFC after:	2 months
2015-07-28 21:47:37 +00:00
Jeff Roberson
38750ada8f - Eliminate the EMPTYKVA queue. It served as a cache of KVA allocations
attached to bufs to avoid the overhead of the vm.  This purposes is now
   better served by vmem.  Freeing the kva immediately when a buf is
   destroyed leads to lower fragmentation and a much simpler scan algorithm.

Reviewed by:	kib
Sponsored by:	EMC / Isilon Storage Division
2015-07-28 20:24:09 +00:00
David C Somayajulu
ab97207add - Avoid lock contention in the if_transmit callback by using trylock and
enqueueing the frames when it fails. This way there is some latency
 removed from the transmitting path.
- If IFF_DRV_OACTIVE is set (and also if IFF_DRV_RUNNING is not) just
 enqueue the desired frames and return successful transmit. This way we
 avoid to return errors on transmit side and resulting in
 possible out-of-order frames. Please note that IFF_DRV_OACTIVE is set
 everytime we get the threshold ring hit, so this can be happening quite
 often.

Submitted by:	Attilio.Rao@isilon.com
MFC after:5 days
2015-07-28 19:15:44 +00:00
Renato Botelho
299c819a75 Simplify logic added in r285945 as suggested by glebius
Approved by:	glebius
MFC after:	3 days
Sponsored by:	Netgate
2015-07-28 14:59:29 +00:00
Zbigniew Bodek
f4b37ed0f8 Import Annapurna Labs Alpine HAL to sys/contrib/
Import from vendor-sys/alpine-hal/2.7
SVN rev.: 285432
HAL version: 2.7

Obtained from:  Semihalf
Sponsored by:   Annapurna Labs
2015-07-28 14:20:33 +00:00
Zbigniew Bodek
8b21d6ae5a Limit ofw_cpu_early_foreach() to CPUs only
On some platforms, the /cpus node contains cpu-to-cluster
map which deffinitely is not a CPU node. Its presence was
causing incrementing of "id" variable and reporting more
CPUs available than it should.
To make "id" valid, increment it only when an entry really
is a CPU device.

Reviewed by:   andrew
Obtained from: Semihalf
Sponsored by:  The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D3216
2015-07-28 13:16:08 +00:00
Ed Schouten
1d96fd8d9f Implement file attribute modification system calls for CloudABI.
CloudABI uses a system call interface to modify file attributes that is
more similar to KPI's/FUSE, namely where a stat structure is passed back
to the kernel, together with a bitmask of attributes that should be
changed. This would allow us to update any set of attributes atomically.

That said, I'd rather not go as far as to actually implement it that
way, as it would require us to duplicate more code than strictly needed.
Let's just stick to the combinations that are actually used by
cloudlibc.

Obtained from:	https://github.com/NuxiNL/freebsd
2015-07-28 12:57:19 +00:00
Steven Hartland
1635369e99 Add warning about low KSTACK_PAGES for ZFS use
As ZFS requires a more kernel stack pages than is the default on some
architectures e.g. i386, warn if KSTACK_PAGES is less than
ZFS_MIN_KSTACK_PAGES (which is 4 at the time of writing).

MFC after:	3 days
Sponsored by:	Multiplay
2015-07-28 11:19:38 +00:00
Renato Botelho
b1b98a2db7 Respect pf rule log option before log dropped packets with IP options or
dangerous v6 headers

Reviewed by:	gnn, eri
Approved by:	gnn
Obtained from:	pfSense
MFC after:	3 days
Sponsored by:	Netgate
Differential Revision:	https://reviews.freebsd.org/D3222
2015-07-28 10:31:34 +00:00
Gleb Smirnoff
3e437fd2c6 Fix a typo in r280169. Of course we are interested in deleting nsn only
if we have just created it and we were the last reference.

Submitted by:	dhartmei
2015-07-28 09:36:26 +00:00
Michael Tuexen
9ae56375af Fix a typo reported by Erik Cederstrand.
MFC after: 	1 week
2015-07-28 08:50:13 +00:00
Hans Petter Selasky
ed0ed9b424 Optimise the DWC OTG host mode driver's receive path:
Remove NAKing limit and pause IN and OUT transactions for 125us in
case of NAK response for BULK and CONTROL endpoints. This gets the
receive latency down and improves USB network throughput at the cost
of some CPU usage.

MFC after:	1 month
2015-07-28 07:30:07 +00:00
Konstantin Belousov
1d1ec02c44 Remove full barrier from the amd64 atomic_load_acq_*(). Strong
ordering semantic of x86 CPUs makes only the compiler barrier
neccessary to give the acquire behaviour.

Existing implementation ensured sequentially consistent semantic for
load_acq, making much stronger guarantee than required by standard's
definition of the load acquire.  Consumers which depend on the barrier
are believed to be identified and already fixed to use proper
operations.

Noted by:	alc (long time ago)
Reviewed by:	alc, bde
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
2015-07-28 07:04:51 +00:00
Konstantin Belousov
b4c0214605 Remove useless acquire semantic from the atomic_add operation before
sosend().  The only release on the xp_snt_cnt is done after sosend(),
with an intent to synchronize with load_acq in svc_vc_ack().

Reviewed by:	alc
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
2015-07-28 06:58:10 +00:00
Konstantin Belousov
90a2db45eb Add bit names for the IA32_MISC_ENABLE msr.
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2015-07-28 06:55:08 +00:00
Ed Schouten
29515a68a5 Implement directory and FIFO creation.
The file_create() system call can be used to create files of a given
type. Right now it can only be used to create directories and FIFOs. As
CloudABI does not expose filesystem permissions, this system call lacks
a mode argument. Simply use 0777 or 0666 depending on the file type.
2015-07-28 06:50:47 +00:00
Ed Schouten
cec575201a Make fstat() and friends work.
Summary:
CloudABI provides access to two different stat structures:

- fdstat, containing file descriptor level status: oflags, file
  descriptor type and Capsicum rights, used by cap_rights_get(),
  fcntl(F_GETFL), getsockopt(SO_TYPE).
- filestat, containing your regular file status: timestamps, inode
  number, used by fstat().

Unlike FreeBSD's stat::st_mode, CloudABI file descriptor types don't
have overloaded meanings (e.g., returning S_ISCHR() for kqueues). Add a
utility function to extract the type of a file descriptor accurately.

CloudABI does not work with O_ACCMODEs. File descriptors have two sets
of Capsicum-style rights: rights that apply to the file descriptor
itself ('base') and rights that apply to any new file descriptors
yielded through openat() ('inheriting'). Though not perfect, we can
pretty safely decompose Capsicum rights to such a pair. This is done in
convert_capabilities().

Test Plan: Tests for these system calls are fairly extensive in cloudlibc.

Reviewers: jonathan, mjg, #manpages

Reviewed By: mjg

Subscribers: imp

Differential Revision: https://reviews.freebsd.org/D3171
2015-07-28 06:36:49 +00:00
Marcel Moolenaar
f40c76d8de Check the sync operation. 2015-07-28 04:54:05 +00:00
Michael Tuexen
267dbe63a1 Provide consistent error causes whenever an ABORT chunk is sent.
MFC after:	1 week
2015-07-27 22:35:54 +00:00
Marius Strobl
43bc87c459 - Move the remainder of host controller capability registers reading from
xhci_start_controller() to xhci_init(). These values don't change at run-
  time so there's no point of acquiring them on every USB_HW_POWER_RESUME
  instead of only once during initialization. In r276717, reading the first
  couple of registers in question already had been moved as a prerequisite
  for the changes in that revision.
- Identify ASMedia ASM1042A controllers.
- Use NULL instead of 0 for pointers.

MFC after:	3 days
2015-07-27 15:26:50 +00:00
Marius Strobl
891c57d8a9 - Fix compilation after r285909 with USB_DEBUG defined.
- Regenerate usb.conf.
2015-07-27 14:43:14 +00:00
Marius Strobl
d75accb539 - Use __FBSDID().
- Const'ify cons_to_vga_colors.
- Fix line wrapping.

MFC after:	3 days
2015-07-27 14:34:32 +00:00
Marius Strobl
0309276c28 - Nuke dupe $FreeBSD$.
- Fix whitespace.

MFC after:	3 days
2015-07-27 14:03:34 +00:00
Ed Schouten
b114aa7959 Make shutdown() return ENOTCONN as required by POSIX, part deux.
Summary:
Back in 2005, maxim@ attempted to fix shutdown() to return ENOTCONN in case the socket was not connected (r150152). This had to be rolled back (r150155), as it broke some of the existing programs that depend on this behavior. I reapplied this change on my system and indeed, syslogd failed to start up. I fixed this back in February (279016) and MFC'ed it to the supported stable branches. Apart from that, things seem to work out all right.

Since at least Linux and Mac OS X do the right thing, I'd like to go ahead and give this another try. To keep old copies of syslogd working, only start returning ENOTCONN for recent binaries.

I took a look at the XNU sources and they seem to test against both SS_ISCONNECTED, SS_ISCONNECTING and SS_ISDISCONNECTING, instead of just SS_ISCONNECTED. That seams reasonable, so let's do the same.

Test Plan:
This issue was uncovered while writing tests for shutdown() in CloudABI:

https://github.com/NuxiNL/cloudlibc/blob/master/src/libc/sys/socket/shutdown_test.c#L26

Reviewers: glebius, rwatson, #manpages, gnn, #network

Reviewed By: gnn, #network

Subscribers: bms, mjg, imp

Differential Revision: https://reviews.freebsd.org/D3039
2015-07-27 13:17:57 +00:00
Marius Strobl
fecf9642ba - Probe UICLASS_CDC/UISUBCLASS_ABSTRACT_CONTROL_MODEL/0xff again. This
variant of Microsoft RNDIS, i. e. their unofficial version of CDC ACM,
  has been disabled in r261544 for resolving a conflict with umodem(4).
  Eventually, in r275790 that problem was dealt with in the right way.
  However, r275790 failed to put probing of RNDIS devices in question
  back.
- Initialize the device prior to querying it, as required by the RNDIS
  specification. Otherwise already determining the MAC address may fail
  rightfully.
- On detach, halt the device again.
- Use UCDC_SEND_ENCAPSULATED_{COMMAND,RESPONSE}. While these macros are
  resolving to the same values as UR_{CLEAR_FEATURE,GET_STATUS}, the
  former set is way more appropriate in this context.
- Report unknown - rather: unimplemented - events unconditionally and
  not just in debug mode. This ensures that we'll get some hint of what
  is going wrong instead of the driver silently failing.
- Deal with the Microsoft ActiveSync requirement of using an input buffer
  the size of the expected reply or larger - except for variably sized
  replies - when querying a device.
- Fix some pointless NULL checks, style bugs etc.

This changes allow urndis(4) to communicate with a Microsoft-certified
USB RNDIS test token.

MFC after:	3 days
Sponsored by:	genua mbh
2015-07-27 12:14:14 +00:00
Ed Schouten
af7e75f59d Add a futex implementation for CloudABI.
Summary:
CloudABI provides two different types of futex objects: read-write locks
and condition variables. There is no need to provide separate support
for once objects and thread joining, as these are efficiently simulated
by blocking on a read-write lock. Mutexes simply use read-write locks.

Condition variables always have a lock object associated to them. They
always know to which lock a thread needs to be migrated if woken up.
This allows us to implement requeueing. A broadcast on a condition
variable will never cause multiple threads to be woken up at once. They
will be woken up iteratively.

This implementation still has lots of room for improvement. Locking is
coarse and right now we use linked lists to store all of the locks and
condition variables, instead of using a hash table. The primary goal of
this implementation was to behave correctly. Performance will be
improved as we go.

Test Plan:
This futex implementation has been in use for the last couple of months
and seems to work pretty well. All of the cloudlibc and libc++ unit
tests seem to pass.

Reviewers: dchagin, kib, vangyzen

Subscribers: imp

Differential Revision: https://reviews.freebsd.org/D3148
2015-07-27 10:07:29 +00:00
Ed Schouten
533c8a29da Regenerate system call table. 2015-07-27 10:04:28 +00:00
Ed Schouten
f4c06d124f Sync in latest upstream system call definitions.
Futex object scopes have been renamed from using their own constants to
simply reusing the existing CLOUDABI_MAP_{PRIVATE,SHARED} flags, as they
are more accurate in this context.
2015-07-27 10:04:06 +00:00
Marcel Moolenaar
b2ce196ca1 o make sure the boundary is a power of 2, when not zero.
o   don't convert 0 to ~0 just so that we can use MIN. ~0 is not a
    valid boundary. Introduce BNDRY_MIN that deals with 0 values
    that mean no boundary.
2015-07-26 16:39:37 +00:00
Andrey V. Elsukov
da6c24e123 Report the scheme and provider names in warning message about unaligned
partition.

PR:		201873
MFC after:	1 week
2015-07-26 11:16:48 +00:00
Andrey V. Elsukov
41f5f69f96 Build debug version of rmlock's methods only when LOCK_DEBUG > 0.
Currently LOCK_DEBUG is always defined in sys/lock.h (0 or 1).
This means that debugging code always built. In addition the kernel
modules have always defined LOCK_DEBUG as 1. So, debugging rmlock code
is always used by kernel modules.

MFC after:	1 week
2015-07-26 10:53:32 +00:00
Michael Tuexen
cf9e47b2f0 Improve locking on Mac OS X. This does not change the functionality
on FreeBSD.

Reviewed by:	rrs
MFC after:	1 week
2015-07-26 10:37:40 +00:00
Michael Tuexen
6247db3541 Fix and improve a debug message. The SID was reported as an SSN.
MFC after:	1 week
2015-07-26 10:17:17 +00:00
Christian Brueffer
382353e2e8 In tmpfs_chtimes(), remove checks on the nanosecond level when
determining whether a node changed.

Other filesystems, e.g., UFS, only check on seconds, when determining
whether something changed.

This also corrects the birthtime case, where we checked tv_nsec
twice, instead of tv_sec and tv_nsec (PR).

PR:			201284
Submitted by:		David Binderman
Patch suggested by:	kib
Reviewed by:		kib
MFC after:		2 weeks
Committed from:		Essen FreeBSD Hackathon
2015-07-26 08:33:46 +00:00
Michael Gmelin
ca2e4ecd73 isl(4), driver for Intersil I2C ISL29018 Digital Ambient Light Sensor
Differential Revision:	https://reviews.freebsd.org/D2811
Reviewed by:	adrian, wblock
Approved by:	adrian, wblock
Relnotes:	yes
2015-07-25 20:17:19 +00:00
Edward Tomasz Napierala
46a8ca51e3 Use consistent spacing.
MFC after:	1 month
Sponsored by:	The FreeBSD Foundation
2015-07-25 20:17:19 +00:00
Edward Tomasz Napierala
caf9bbecdc Add md_root example to defaults/loader.conf.
Note that this doesn't quite work yet - the preloaded image
gets loaded twice for some reason.

MFC after:	1 month
Sponsored by:	The FreeBSD Foundation
2015-07-25 20:15:29 +00:00
Sean Bruno
a82cd51680 Remove unused txd_saved.
Intialize txd_upper, txd_lower and txd_used at declaration.

Differential Revision:	D3174
Reviewed by:	erj hiren
MFC after:	2 weeks
Sponsored by:	Limelight Networks
2015-07-25 19:24:33 +00:00
Konstantin Belousov
6195b24a79 Revert r173708's modifications to vm_object_page_remove().
Assume that a vnode is mapped shared and mlocked(), and then the vnode
is truncated, or truncated and then again extended past the mapping
point EOF.  Truncation removes the pages past the truncation point,
and if pages are later created at this range, they are not properly
mapped into the mlocked region, and their wiring count is wrong.

The revert leaves the invalidated but wired pages on the object queue,
which means that the pages are found by vm_object_unwire() when the
mapped range is munlock()ed, and reused by the buffer cache when the
vnode is extended again.

The changes in r173708 were required since then vm_map_unwire() looked
at the page tables to find the page to unwire.  This is no longer
needed with the vm_object_unwire() introduction, which follows the
objects shadow chain.

Also eliminate OBJPR_NOTWIRED flag for vm_object_page_remove(), which
is now redundand, we do not remove wired pages.

Reported by:	trasz, Dmitry Sivachenko <trtrmitya@gmail.com>
Suggested and reviewed by:	alc
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	1 week
2015-07-25 18:29:06 +00:00
Michael Tuexen
4ff815b71c Move including netinet/icmp6.h around to avoid a problem when including
netinet/icmp6.h and net/netmap.h. Both use ni_flags...
This allows to build multistack with SCTP support.

MFC after: 1 week
2015-07-25 18:26:09 +00:00
Michael Gmelin
46f07718f7 cyapa(4), driver for the Cypress APA I2C trackpad
Differential Revision:	https://reviews.freebsd.org/D3068
Reviewed by:	kib, wblock
Approved by:	kib
Relnotes:	yes
2015-07-25 18:14:35 +00:00
Edward Tomasz Napierala
371583f6ca Use double newlines consistently.
MFC after:	1 month
Sponsored by:	The FreeBSD Foundation
2015-07-25 16:20:04 +00:00
Kristof Provost
fc4443a1d5 Remove stale comment.
The IPv6 pseudo header checksum was added by bz in r235961.

Sponsored by:	Essen FreeBSD Hackathon
2015-07-25 16:14:55 +00:00
Konstantin Belousov
6fd04eff66 With the removal of b_saveaddr in the r285819, b_data must be reset to
b_kvabase when the buffer is reclaimed.  Otherwise, if b_data for the
mapped buffer was adjusted with the page-offset portion of b_offset,
nothing would re-adjust the b_data, which breaks buffer management
code which expects page-aligned b_data (see e.g. bpman_qenter(), which
skips partial pages).

Fix a minor issue with the GB_KVAALLOC requests, which could result in
returning the mapped buffer if the reused buffer is mapped and have
the right amount of KVA reserved.

Improve assertion in the vfs_buf_check_mapped() to catch unmapped
buffers which have their b_data incorrectly adjusted with offset.

Reported and tested by:	pho (previous version)
Reviewed by:	jeff (previous version)
Sponsored by:	The FreeBSD Foundation
2015-07-25 15:00:14 +00:00
Edward Tomasz Napierala
933333caf8 Document md_root in loader(8). The md(4) manual page mentions it,
but it's hard to find and easy to miss.

Reviewed by:	wblock@
MFC after:	2 weeks
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D3183
2015-07-25 13:02:41 +00:00
Oleksandr Tymoshenko
4f5f0f288f Fix color mapping for TDA19988. Values for VIP_CNTRL_1 and VIP_CNTRL_2
registers were mixed up
2015-07-25 03:19:02 +00:00
Oleksandr Tymoshenko
b8397a9f01 Synchronize PIN input/output modes with gnu/dts/include/dt-bindings/pinctrl/am33xx.h
gpio driver requires exact value to match SoC pin mode with GPIO pin direction
2015-07-25 03:03:32 +00:00
Oleksandr Tymoshenko
5625a3e560 If there is panel info in DTB do not wait for HDMI event and setup
framebuffer immediately
2015-07-25 02:59:45 +00:00
Oleksandr Tymoshenko
7339f7821b OF_getencprop_alloc shouldn't be used to get string value. If string
length + 1 is not divisible by 4 this function returns NULL property
value. Otherwise - string with each 4 letters inverted
2015-07-25 00:58:50 +00:00
Xin LI
1a7c14aec7 Fix a typo in comment.
Submitted by:	Yanhui Shen via twitter
MFC after:	3 days
2015-07-24 22:13:39 +00:00
Alan Cox
d8b56c8eab Add a comment discussing the appropriate use of the atomic_*() functions
with acquire and release semantics versus the *mb() functions on amd64
processors.

Reviewed by:	bde (an earlier version), kib
Sponsored by:	EMC / Isilon Storage Division
2015-07-24 19:43:18 +00:00
Marius Strobl
86fb540033 - Since r253161, uart_intr() abuses FILTER_SCHEDULE_THREAD for signaling
uart_bus_attach() during its test that 20 iterations weren't sufficient
  for clearing all pending interrupts, assuming this means that hardware
  is broken and doesn't deassert interrupts. However, under pressure, 20
  iterations also can be insufficient for clearing all pending interrupts,
  leading to a panic as intr_event_handle() tries to schedule an interrupt
  handler not registered. Solve this by introducing a flag that is set in
  test mode and otherwise restores pre-r253161 behavior of uart_intr(). The
  approach of additionally registering uart_intr() as handler as suggested
  in PR 194979 is not taken as that in turn would abuse special pccard and
  pccbb handling code of intr_event_handle(). [1]
- Const'ify uart_driver_name.
- Fix some minor style bugs.

PR:		194979 [1]
Reviewed by:	marcel (earlier version)
MFC after:	3 days
2015-07-24 17:01:16 +00:00
Ed Maste
119b75925c Add RISC-V ELF machine type definition
EM_RISCV is now officially registered as e_machine 243.

MFC after:	1 month
Sponsored by:	The FreeBSD Foundation
2015-07-24 16:52:21 +00:00
Marius Strobl
e18e2adaae - In mpt_send_handshake_cmd(), use bus_space_write_stream_4(9) for writing
raw data to the doorbell offset in order to clarify the intent and for
  avoiding unnecessarily converting the endianess back and forth.
  Unfortunately, the same can't be done in mpt_recv_handshake_reply() as
  16-bit data needs to be read using 32-bit bus accessors.
- In mpt_recv_handshake_reply(), get rid of a redundant variable.

MFC after:	1 fortnight
2015-07-24 16:00:35 +00:00
Marius Strobl
7815d3948c o Revert the other functional half of r239864, i. e. the merge of r134227
from x86 to use smp_ipi_mtx spin lock not only for smp_rendezvous_cpus()
  but also for the MD cache invalidation, TLB demapping and remote register
  reading IPIs due to the following reasons:
  - The cross-IPI SMP deadlock x86 otherwise is subject to can't happen on
    sparc64. That's because on sparc64, spin locks don't disable interrupts
    completely but only raise the processor interrupt level to PIL_TICK. This
    means that IPIs still get delivered and direct dispatch IPIs such as the
    cache invalidation etc. IPIs in question are still executed.
  - In smp_rendezvous_cpus(), smp_ipi_mtx is held not only while sending an
    IPI_RENDEZVOUS, but until all CPUs have processed smp_rendezvous_action().
    Consequently, smp_ipi_mtx may be locked for an extended amount of time as
    queued IPIs (as opposed to the direct ones) such as IPI_RENDEZVOUS are
    scheduled via a soft interrupt. Moreover, given that this soft interrupt
    is only delivered at PIL_RENDEZVOUS, processing of smp_rendezvous_action()
    on a target may be interrupted by f. e. a tick interrupt at PIL_TICK, in
    turn leading to the target in question trying to send an IPI by itself
    while IPI_RENDEZVOUS isn't fully handled, yet, and, thus, resulting in a
    deadlock.
o As mentioned in the commit message of r245850, on least some sun4u platforms
  concurrent sending of IPIs by different CPUs is fatal. Therefore, hold the
  reintroduced MD ipi_mtx also while delivering cross-traps via MI helpers,
  i. e. ipi_{all_but_self,cpu,selected}().
o Akin to x86, let the last CPU to process cpu_mp_bootstrap() set smp_started
  instead of the BSP in cpu_mp_unleash(). This ensures that all APs actually
  are started, when smp_started is no longer 0.
o In all MD and MI IPI helpers, check for smp_started == 1 rather than for
  smp_cpus > 1 or nothing at all. This avoids races during boot causing IPIs
  trying to be delivered to APs that in fact aren't up and running, yet.
  While at it, move setting of the cpu_ipi_{selected,single}() pointers to
  the appropriate delivery functions from mp_init() to cpu_mp_start() where
  it's better suited and allows to get rid of the global isjbus variable.
o Given that now concurrent IPI delivery no longer is possible, also nuke
  the delays before completely disabling interrupts again in the CPU-specific
  cross-trap delivery functions, previously giving other CPUs a window for
  sending IPIs on their part. Actually, we now should be able to entirely get
  rid of completely disabling interrupts in these functions. Such a change
  needs more testing, though.
o In {s,}tick_get_timecount_mp(), make the {s,}tick variable static. While not
  necessary for correctness, this avoids page faults when accessing the stack
  of a foreign CPU as {s,}tick now is locked into the TLBs as part of static
  kernel data. Hence, {s,}tick_get_timecount_mp() always execute as fast as
  possible, avoiding jitter.

PR:		201245
MFC after:	3 days
2015-07-24 15:13:21 +00:00
Randall Stewart
5f98acb594 Fix silly syntax error emacs chugged in for me.. gesh.
MFC after:	3 weeks
2015-07-24 14:13:43 +00:00
Randall Stewart
c616859963 Fix an issue with MAC OS locking and also optimize the case
where we are sending back a stream-reset and a sack timer is running, in
that case we should just send the SACK.

MFC after:	3 weeks
2015-07-24 14:09:03 +00:00
Ed Schouten
4615998165 Implement the basic system calls that operate on pathnames.
Summary:
Unlike FreeBSD, CloudABI does not use null terminated strings for its
pathnames. Introduce a function called copyin_path() that can be used by
all of the filesystem system calls that use pathnames. This change
already implements the system calls that don't depend on any additional
functionality (e.g., conversion of struct stat).

Also implement the socket system calls that operate on pathnames, namely
the ones used by the C library functions bindat() and connectat(). These
don't receive a 'struct sockaddr_un', but just the pathname, meaning
they could be implemented in such a way that they don't depend on the
size of sun_path. For now, just use the existing interfaces.

Add a missing #include to cloudabi_syscalldefs.h to get this code to
build, as one of its macros depends on UINT64_C().

Test Plan:
These implementations have already been tested in the CloudABI branch on
GitHub. They pass all of the tests.

Reviewers: kib, pjd

Subscribers: imp

Differential Revision: https://reviews.freebsd.org/D3097
2015-07-24 07:46:02 +00:00
Warner Losh
d2e3ed5af6 Panic when a device is trying to recursively acquire rather than hang
indefinitely. Improve error messages from other panics.
2015-07-24 04:56:46 +00:00
Sergey Kandaurov
ef88ae77ea Call ksem_get() with initialized 'rights'.
ksem_get() consumes fget(), and it's mandatory there.

Reported by:	truckman
Reviewed by:	mjg
2015-07-23 23:18:03 +00:00
Jeff Roberson
fade8dd714 Refactor unmapped buffer address handling.
- Use pointer assignment rather than a combination of pointers and
   flags to switch buffers between unmapped and mapped.  This eliminates
   multiple flags and generally simplifies the logic.
 - Eliminate b_saveaddr since it is only used with pager bufs which have
   their b_data re-initialized on each allocation.
 - Gather up some convenience routines in the buffer cache for
   manipulating buf space and buf malloc space.
 - Add an inline, buf_mapped(), to standardize checks around unmapped
   buffers.

In collaboration with: mlaier
Reviewed by:	kib
Tested by:	pho (many small revisions ago)
Sponsored by:	EMC / Isilon Storage Division
2015-07-23 19:13:41 +00:00
Jim Harris
cbdec09c1c nvme: ensure csts.rdy bit is cleared before returning from nvme_ctrlr_disable
PR:		200458
MFC after:	3 days
Sponsored by:	Intel
2015-07-23 15:50:39 +00:00
Jim Harris
de9a58f4ee nvme: properly handle case where pci_alloc_msix does not alloc all vectors
Reported by: Sean Kelly <smkelly@smkelly.org>
MFC after:	3 days
Sponsored by:	Intel
2015-07-23 15:35:08 +00:00
Ed Schouten
fef97e09d9 Allow us to create UNIX sockets and socketpairs in CloudABI processes. 2015-07-23 13:52:53 +00:00
Ed Schouten
cf6b9e9b07 Allow cap_rights_{set,clear,is_set} to be called with no arguments.
In the CloudABI code I sometimes call into cap_rights_* without
providing any arguments. Though one could argue that this doesn't make
sense, in this specific case it's hard to avoid, as the rights that
should be tested against are forwarded by a couple of wrapper macros.
2015-07-23 11:11:01 +00:00
Jeff Roberson
1c1ddc0351 - Don't defeat the FIFO nature of the buffer cache by eliminating the
most recently used buffer when we are under paging pressure.  This is
   a perversion of the buffer and page replacement algorithms and recent
   improvements to the page daemon have rendered it unnecessary.  In the
   event that low-memory deadlocks become an issue it would be possible
   to make a daemon or event handler that performs a similar action on
   the oldest buffers rather than the newest.  Since the buf cache
   is analogous to the page cache and some minimum working set is desired
   another possibility is to simply shrink the minimum working set which
   has less downside now that file pages are not directly mapped.

Sponsored by:	EMC / Isilon
Reviewed by:	alc, kib (with some minor objection)
Tested by:	pho
2015-07-23 02:20:41 +00:00
Conrad Meyer
6b8c5d92a4 vt: cpu logos: Correct reversed 0/1 beastie descriptions
Differential Revision:	https://reviews.freebsd.org/D3158
Approved by:	markj (mentor)
Obtained from:	Pavel Timofeev
MFC after:	1 week
2015-07-22 23:30:54 +00:00
Conrad Meyer
f39130e75d vt: Change default CPU logo to Orb
Differential Revision:	https://reviews.freebsd.org/D3156
Approved by:	markj (mentor)
MFC after:	1 week
2015-07-22 23:23:12 +00:00
Conrad Meyer
6d2b01fc54 vt: Default to cpu logos off
Apologies, this was how it was supposed to land. Mea culpa.

Differential Revision:	https://reviews.freebsd.org/D3157
Reviewed by:	gnn, hiren
Approved by:	markj (mentor)
MFC after:	1 week
2015-07-22 23:19:53 +00:00
Conrad Meyer
8ef2f53c59 vt_core.c: Use do/while to highlight missed semi-colon errors
Also, fix some nearby #define whitespace while here.

(Style cleanup for r285794.)

Suggested by:	jmg

Differential Revision:	https://reviews.freebsd.org/D3154
Approved by:	markj (mentor)
MFC after:	1 week
Sponsored by:	EMC / Isilon Storage Division
2015-07-22 18:50:47 +00:00
Jung-uk Kim
0594dadeb8 Catch up with ACPICA 20150717. 2015-07-22 16:26:17 +00:00
Andrew Rybchenko
e31b688a57 sfxge: added fallbacks for pre 4.2.1 firmware support
Driver must be able to start against older firmware that is missing
recently added MCDI calls, otherwise firmware upgrade will not be
possible.

Submitted by:   Richard Houldsworth <rhouldsworth at solarflare.com>
Sponsored by:   Solarflare Communications, Inc.
MFC after:      2 days
Differential Revision: https://reviews.freebsd.org/D3145
2015-07-22 16:25:18 +00:00
Jung-uk Kim
fe0f0bbb19 Merge ACPICA 20150717. 2015-07-22 16:25:07 +00:00
Conrad Meyer
5f7d6682c5 vt: Unbreak build on no-splash configurations
PR:		201751
Differential Revision:	https://reviews.freebsd.org/D3151
Tested by:	Andrey Fesenko
Approved by:	markj (mentor)
MFC after:	1 week
2015-07-22 15:30:10 +00:00
Randall Stewart
7cca17758c Fix several problems with Stream Reset.
1) We were not handling (or sending) the IN_PROGRESS case if
    the other side (or our side) was not able to reset (awaiting more data).
 2) We would improperly send a stream-reset when we should not. Not
    waiting until the TSN had been assigned when data was inqueue.

Reviewed by:	tuexen
2015-07-22 11:30:37 +00:00
Ed Schouten
c989441af6 Regenerate system call table. 2015-07-22 10:05:46 +00:00
Ed Schouten
73dcd7db56 Import upstream changes to the system call definitions.
Support has been added for providing the scope of a futex operation,
whether the futex is local to the process or shared between processes.
2015-07-22 10:04:53 +00:00
Zbigniew Bodek
0af6011a92 Introduce support for MSI-X interrupts in AHCI
- Allocate resources for MSI-X table and PBA if necessary
- Add function ahci_free_mem() to free all resources

Reviewed by:   jhb, mav
Obtained from: Semihalf
Sponsored by:  The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D3009
2015-07-22 09:46:22 +00:00
Randall Stewart
f260c1b939 Fix inverted logic bug that David Wolfskill found (thanks David!)
MFC after:	3 Weeks
2015-07-22 09:29:50 +00:00
Konstantin Belousov
c48c590f63 Remove duplicate and useless declarations.
Submitted by:	bde
2015-07-22 09:12:40 +00:00
Ed Schouten
8bc7851803 Add Makefiles for CloudABI kernel modules.
Place all of the machine/pointer size independent code in a kernel
module called 'cloudabi'. All of the 64-bit specific code goes in a
separate module called 'cloudabi64'. The latter is only enabled on
amd64, as it is the only architecture supported.
2015-07-22 07:32:49 +00:00
Wei Hu
5f302628d0 Do not enable UDP checksum offloading when running on the Hyper-V on
Windows Server 2012 and earlier hosts.

Submitted by: whu
Reviewed by: royger
Approved by: royger
MFC after: 3 days
Relnotes: No
Sponsored by: Microsoft OSTC
Differential Revision:  https://reviews.freebsd.org/D3086
2015-07-22 05:05:01 +00:00
Luiz Otavio O Souza
315dbfb053 Cosmetic change. When printing the child's mapped pins, use the plural
only when necessary.

Reported by:	Daniel O'Connor <darius@dons.net.au>,
		Sulev-Madis Silber (ketas)
2015-07-22 04:18:33 +00:00
John Baldwin
9a2d6ab990 Various changes to the registers displayed in DDB for x86.
- Fix segment registers to only display the low 16 bits.
- Remove unused handlers and entries for the debug registers.
- Display xcr0 (if valid) in 'show sysregs'.
- Add '0x' prefix to MSR values to match other values in 'show sysregs'.
- MFamd64: Display various MSRs in 'show sysregs'.
- Add a 'show dbregs' to display the value of debug registers.
- Dynamically size the column width for register values to properly
  align columns on 64-bit platforms.
- Display %gs for i386 in 'show registers'.

Differential Revision:	https://reviews.freebsd.org/D2784
Reviewed by:	kib, markj
MFC after:	2 weeks
2015-07-22 01:09:02 +00:00
Mark Johnston
d258fd1d98 Remove checks for a NULL return value from M_WAITOK allocations. 2015-07-21 23:44:36 +00:00
Xin LI
47a8e86509 Fix resource exhaustion due to sessions stuck in LAST_ACK state.
Submitted by:	Jonathan Looney (Juniper SIRT)
Reviewed by:	lstewart
Security:	CVE-2015-5358
Security:	SA-15:13.tcp
2015-07-21 23:42:15 +00:00
Mark Johnston
a5cbf8b9c0 Let the unwinder handle faults during function prologues or epilogues.
The i386 and amd64 DDB stack unwinders contain code to detect and handle
the case where the first frame is not completely set up or torn down. This
code was accidentally unused however, since db_backtrace() was never called
with a non-NULL trap frame. This change fixes that.

Also remove get_rsp() from the amd64 code. It appears to have come from
i386, which needs to take into account whether the exception triggered a
CPL switch, since SS:ESP is only pushed onto the stack if so. On amd64,
SS:RSP is pushed regardless, so get_rsp() was doing the wrong thing for
kernel-mode exceptions. As a result, we can also remove custom print
functions for these registers.

Reviewed by:	jhb
Sponsored by:	EMC / Isilon Storage Division
Differential Revision:	https://reviews.freebsd.org/D2881
2015-07-21 23:22:23 +00:00
Mark Johnston
f8a757d016 Improve stack unwinding on i386 and amd64 after an IP fault.
If we can't find a symbol corresponding to the faulting instruction, assume
that the previously-executed function is a call and attempt to find the
calling function using the return address on the stack. Otherwise we end
up associating the last stack frame with the current call, which is
incorrect and causes the unwinder to skip printing of the calling function,
resulting in a confusing backtrace.

Reviewed by:	jhb
Sponsored by:	EMC / Isilon Storage Division
Differential Revision:	https://reviews.freebsd.org/D2859
2015-07-21 23:13:11 +00:00
Mark Johnston
e31a60b486 Don't return undefined symbols to a DDB symbol lookup.
Undefined symbols have a value of zero, so it makes no sense to return
such a symbol when performing a lookup by value. This occurs for example
when unwinding the stack after calling a NULL function pointer, and we
confusingly report the faulting function as uart_sab82532_class() on
amd64.

Convert db_print_loc_and_inst() to only attempt disassembly if we managed
to find a symbol corresponding to the IP. Otherwise we may fault and
re-enter the debugger.

Reviewed by:	jhb
Sponsored by:	EMC / Isilon Storage Division
Differential Revision:	https://reviews.freebsd.org/D2858
2015-07-21 23:07:55 +00:00
Mark Johnston
1a5bee0849 Remove some dead code from DDB's amd64 stack unwinder.
The amd64 port copied some code from i386 to fetch function arguments and
display them in backtraces. However, it was commented out and can't easily
be implemented since the function arguments are passed in
registers rather than on the stack in amd64. Remove it in preparation for
some bug fixes in this area.

Reviewed by:	jhb
Sponsored by:	EMC / Isilon Storage Division
Differential Revision:	https://reviews.freebsd.org/D2857
2015-07-21 23:03:21 +00:00
Konstantin Belousov
e637a6e3f9 The smp_rendezvous_cpus() function should ensure that all accesses
done by the functions called on other CPUs, are visible to the caller.
Pair otherwise useless acquire on smp_rv_waiters[3] with a release add
to ensure synchronized with relation, which guarantees visibility.

Reviewed by:	alc
Sponsored by:	The FreeBSD Foundation
MFC after:	3 weeks
2015-07-21 22:56:46 +00:00
Ermal Luçi
705f4d9c6a IPSEC, remove variable argument function its already due.
Differential Revision:		https://reviews.freebsd.org/D3080
Reviewed by:	gnn, ae
Approved by:	gnn(mentor)
2015-07-21 21:46:24 +00:00
Eric Joyner
39020fdfa0 Fix for a customer issue with ixl(4):
- Add required MAC/VLAN filter when adding an LAA
- Fix bug where code did not check for I40E_SUCCESS from a successful
  i40e_validate_mac_address() call in ixl_init_locked(), when setting
  an LAA.

PR: 201240
Differential Revision: https://reviews.freebsd.org/D3111
Submitted by: Gregory Rose <gregory.v.rose@intel.com>
Reviewed by: gnn, rstone
Approved by: gnn
MFC after: 2 weeks
2015-07-21 21:07:18 +00:00
Jim Harris
70fb74bd12 nvd: set d_delmaxsize to full capacity of NVMe namespace
The NVMe specification has no ability to specify a maximum delete size
that is less than the full capacity of the namespace - so just using the
namespace size is the correct value here.

This fixes reported issues where ZFS trim on init looked like it was
hanging the system - previously the default I/O max size (128KB on
Intel NVMe controllers) was used for delete operations which worked out
to only about 8MB/s.  With this patch I can add an 800GB DC P3700
drive to a ZFS pool in about 15-20 seconds.

Reported by: Dylan Just <dylan@techtangents.com>
MFC after:	3 days
Sponsored by:	Intel
2015-07-21 20:53:21 +00:00
Conrad Meyer
75ac3a7359 vt: Draw logos per CPU core
This feature is inspired by another Unix-alike OS commonly found on
airplane headrests.

A number of beasties[0] are drawn at top of framebuffer during boot,
based on the number of active SMP CPUs[1]. Console buffer output
continues to scroll in the screen area below beastie(s)[2].

After some time[3] has passed, the beasties are erased leaving the
entire terminal for use.

Includes two 80x80 vga16 beastie graphics and an 80x80 vga16 orb
graphic. (The graphics are RLE compressed to save some space -- 3x 3200
bytes uncompressed, or 4208 compressed.)

[0]: The user may select the style of beastie with

    kern.vt.splash_cpu_style=(0|1|2)

[1]: Or the number may be overridden with tunable kern.vt.splash_ncpu.
[2]: https://www.youtube.com/watch?v=UP2jizfr3_o
[3]: Configurable with kern.vt.splash_cpu_duration (seconds, def. 10).

Differential Revision:	https://reviews.freebsd.org/D2181
Reviewed by:	dumbbell, emaste
Approved by:	markj (mentor)
MFC after:	2 weeks
2015-07-21 20:33:36 +00:00
Conrad Meyer
bcfb2e3dd2 vt: De-static VT_SYSCTL_INT-defined objects
Explicitly mark existing VT_SYSCTL_INTs static. This is in preparation for
D2181.

Reviewed by:	dumbbell, emaste
Approved by:	markj (mentor)
MFC after:	1 week
2015-07-21 20:30:06 +00:00
Andrew Turner
4027d3d62a Teach the GICv2 driver about the Qualcomm GICv2 compatible string.
Obtained from:	ABT Systems Ltd
Sponsored by:	The FreeBSD Foundation
2015-07-21 18:08:10 +00:00
Zbigniew Bodek
3ed97a1a52 Add some more explanation to r285752
Add brief commentary to vendor-specific devid function in ITS
and remove redundant spaces by the way.

Obtained from: Semihalf
Sponsored by:  The FreeBSD Foundation
2015-07-21 17:14:24 +00:00
Zbigniew Bodek
9920b3aa95 Don't allow malloc() to wait for resource while holding a lock in ITS
malloc() should not go to sleep in case of lack of resource while
the kernel thread is holding a non-sleepable lock.

- change malloc() flags to M_NOWAIT in such cases implement
  lpi_free_chunk() routine as it will be needed when ITT
  allocation fails in its_device_alloc_locked()
- do not increase verbosity of this code since upper layers will
  communicate an error if the interrupt setup fails

Obtained from: Semihalf
Sponsored by:  The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D3121
2015-07-21 15:28:07 +00:00
Ed Schouten
072cb63ddc Make clock_gettime() and clock_getres() work for CloudABI programs.
Though the standard C library uses a 'struct timespec' using a 64-bit
'time_t', there is no need to use such a type at the system call level.
CloudABI uses a simple 64-bit unsigned timestamp in nanoseconds. This is
sufficient to express any time value from 1970 to 2554.

The CloudABI low-level interface also supports fetching timestamp values
with a lower precision. Instead of overloading the clock ID argument for
this purpose, the system call provides a precision argument that may be
used to specify the maximum slack. The current system call
implementation does not use this information, but it's good to already
have this available.

Expose cloudabi_convert_timespec(), as we're going to need this for
fstat() as well.

Obtained from:	https://github.com/NuxiNL/freebsd
2015-07-21 15:08:13 +00:00
Zbigniew Bodek
1fe6a1a25a Add support for vendor specific function for PCI devid acquisition in ITS
It is possible that some HW will use different PCI devids,
hence allow to replace the default domain🚌slot:func schema
by implementing and registering custom function.

Obtained from: Semihalf
Sponsored by:  The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D3118
2015-07-21 14:47:23 +00:00
Konstantin Belousov
01f5e0866b The part of r285680 which removed release semantic for two stores to
it_need was wrong [*].  Restore the releases and add a comment
explaining why it is needed.

Noted by:	alc [*]
Reviewed by:	bde [*]
Sponsored by:	The FreeBSD Foundation
2015-07-21 14:39:34 +00:00
Ed Schouten
d0da90b198 Describe COMPAT_CLOUDABI64 in the amd64 configuration NOTES file. 2015-07-21 12:53:47 +00:00
Zbigniew Bodek
52b584bc15 Implement get_cyclecount() on ARM64
Use Vritual Counter register associated with Generic Timer to
read the cyclecount.

Obtained from: Semihalf
Sponsored by:  The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D3134
2015-07-21 12:50:45 +00:00
Ed Schouten
21d30b29d5 Make thread creation work for CloudABI processes.
Summary:
Remove the stub system call that was put in place during the system call
import and replace it by a target-dependent version stored in sys/amd64.
Initialize the thread in a way similar to cpu_set_upcall_kse(). We
provide the entry point with two arguments: the thread ID and the
argument pointer.

Test Plan:
Thread creation still seems to work, both for FreeBSD and CloudABI
binaries.

Reviewers: dchagin, mjg, kib

Reviewed By: kib

Subscribers: imp

Differential Revision: https://reviews.freebsd.org/D3110
2015-07-21 12:47:15 +00:00
Zbigniew Bodek
13aaea2fd7 Improve ARM64 CPU_MATCH
Add a method to identify CPU based on RAW MIDR value.

Obtained from: Semihalf
Sponsored by:  The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D3117
2015-07-21 12:15:00 +00:00
Sergey Kandaurov
94df6fad1d Fix sb_state constant names as used e.g. to display in DDB ``show sockbuf''.
MFC after:	1 week
2015-07-21 09:57:13 +00:00
Randall Stewart
c0d1be08f6 When a tunneling protocol is being used with UDP we must release the
lock on the INP before calling the tunnel protocol, else a LOR
may occur (it does with SCTP for sure). Instead we must acquire a
ref count and release the lock, taking care to allow for the case
where the UDP socket has gone away and *not* unlocking since the
refcnt decrement on the inp will do the unlock in that case.

Reviewed by:	tuexen
MFC after:	3 weeks
2015-07-21 09:54:31 +00:00
Hiren Panchasara
4d5e6ef665 Remove a couple of TUNABLE_INT() calls which are unnecessary after r267961.
r267961 did remove them but they "reappeared" when ixgbe(4) rewrite happened in
r280182.

Sponsored by:		Limelight Networks
2015-07-21 06:48:36 +00:00
Konstantin Belousov
9b3df93bf1 Typo in comment. 2015-07-20 19:51:41 +00:00
Alexander Motin
d575325b81 Increase output amp on ASUS UX31A by +5dB.
While there, implement couple helper functions.
2015-07-20 17:48:00 +00:00
Ed Schouten
62c31cffae Make forking of CloudABI processes work.
Just like FreeBSD+Capsicum, CloudABI uses process descriptors. Return
the file descriptor number to the parent process.

To the child process we both return a special value for the file
descriptor number (CLOUDABI_PROCESS_CHILD). We also return the thread ID
of the new thread in the copied process, so the threading library can
reinitialize itself.

Obtained from:	https://github.com/NuxiNL/freebsd
2015-07-20 13:46:22 +00:00
Ed Schouten
5a170c1b0e Add an API for easily creating userspace threads in kernelspace.
This change refactors the existing create_thread() function to be more
generic. It replaces almost all of its arguments by a callback that can
be used to extract the thread ID and copy it out to the right place, but
also to perform additional initialization steps, such as setting the
trapframe. This also makes the difference between thr_new() and
thr_create() more clear in my opinion.

This function is going to be used by the CloudABI compatibility layer.

It looks like the OpenSolaris compatibility framework already provides a
function called thread_create(). Rename this function to
do_thread_create() and use a macro to deal with the namespacing
conflict. A similar approach is already used for thread_exit().

MFC after:	1 month
2015-07-20 10:20:04 +00:00
Alexander Motin
d3e2e28e74 Fix typo in comment.
Submitted by:	Masao Uebayashi
2015-07-20 09:37:42 +00:00
Marko Zec
22a9384098 Prevent null-pointer dereferencing.
MFC after:	3 days
2015-07-20 08:21:51 +00:00
Andrey V. Elsukov
af9aa0a837 Add helper functions for IP checksum adjusting. Use these functions in
dummynet code and for setdscp. This fixes wrong checksums in some cases.

Obtained from:	Yandex LLC
MFC after:	2 weeks
Sponsored by:	Yandex LLC
2015-07-20 07:26:31 +00:00
Andrey V. Elsukov
30aee13117 Add LLE event handler to report ND6 events to userland via rtsock.
Obtained from:	Yandex LLC
MFC after:	2 weeks
Sponsored by:	Yandex LLC
2015-07-20 06:58:32 +00:00
Andrey V. Elsukov
585753c432 Invoke LLE event handler when entry is deleted.
MFC after:	2 weeks
Sponsored by:	Yandex LLC
2015-07-20 06:54:50 +00:00
Mark Johnston
bd2519480e Declare lockstat_enabled even when KDTRACE_HOOKS is not defined.
Reported by:	bz
X-MFC-With:	r285704
2015-07-20 04:41:25 +00:00
Marcel Moolenaar
be00e09818 Check the hw.proto.attach environment variable for devices that
proto(4) should attach to instead of the normal driver.

Document the variable.
2015-07-19 23:37:45 +00:00
Mark Johnston
97cc6870f6 Don't increment the spin count until after the first attempt to acquire a
rwlock read lock. Otherwise the lockstat:::rw-spin probe will fire
spuriously.

MFC after:	1 week
2015-07-19 22:26:02 +00:00
Kirk McKusick
1b79b9498b Restructure code for readability improvement. No functional change.
Reviewed by: kib
2015-07-19 22:25:16 +00:00
Mark Johnston
de2c95cc00 Consistently use a reader/writer flag for lockstat probes in rwlock(9) and
sx(9), rather than using the probe function name to determine whether a
given lock is a read lock or a write lock. Update lockstat(1) accordingly.
2015-07-19 22:24:33 +00:00
Mark Johnston
32cd0147fa Implement the lockstat provider using SDT(9) instead of the custom provider
in lockstat.ko. This means that lockstat probes now have typed arguments and
will utilize SDT probe hot-patching support when it arrives.

Reviewed by:	gnn
Differential Revision:	https://reviews.freebsd.org/D2993
2015-07-19 22:14:09 +00:00
Mark Murray
eda4aaeb3f Fix some untidy logic. I committed the wrong local fix; please pass the pointy hat.
Approved by:        so (/dev/random blanket)
2015-07-19 18:07:35 +00:00
Luigi Rizzo
847adfb7b3 add a use count so the netmap module cannot be unloaded while in use. 2015-07-19 18:07:25 +00:00
Luigi Rizzo
10b8ef3d6a properly destroy persistent vale ports 2015-07-19 18:06:30 +00:00
Luigi Rizzo
9694aad375 do not free NULL if pipe allocation fails 2015-07-19 18:05:49 +00:00
Luigi Rizzo
05f7605789 release a reference when stopping a monitor 2015-07-19 18:04:51 +00:00
Luigi Rizzo
85fe4e7c6b small documentation update 2015-07-19 17:54:42 +00:00
Andrew Turner
70888b7ed5 Fix atomic_store_64, it should write the value passed in, not the value
read by the load.

Pointy Hat:	andrew
2015-07-19 16:55:47 +00:00
Mark Murray
f703e79990 Remove out-of-date comments.
Approved by:        so (/dev/random blanket)
2015-07-19 16:05:34 +00:00
Mark Murray
dbefaadca8 Fix the read blocking so that it is interruptable and slow down the rate of console warning spamming while blocked.
Approved by:	so (/dev/random blanket)
2015-07-19 16:05:30 +00:00
Mark Murray
d657959305 Clarify the intent of the RANDOM_* options.
Approved by:	so (/dev/random blanket)
2015-07-19 16:05:26 +00:00
Mark Murray
95b184a048 Optimise the buffer-size calculation. It was possible to get one block too many.
Approved by:	so (/dev/random blanket)
2015-07-19 16:05:23 +00:00
Andrew Turner
a612bbfa12 Clean up the style of the armv6 atomic code.
Sponsored by:	ABT Systems Ltd
2015-07-19 15:44:51 +00:00
Andrew Turner
d6a2102846 Sort the ARM atomic functions to be in alphabetical order.
Sponsored by:	ABT Systems Ltd
2015-07-19 13:10:47 +00:00
Konstantin Belousov
a8e1bc2e14 Revert bit of the r285627, locore.s does not need include of
opt_kstack_pages.h.  The asm gets the right KSTACK_PAGES from the
assym.s.

Reported by:	bz
Sponsored by:	The FreeBSD Foundation
2015-07-19 10:45:58 +00:00
Marcelo Araujo
f19e47d691 Add support to the jail framework to be able to mount linsysfs(5) and
linprocfs(5).

Differential Revision:	D2846
Submitted by:		Nikolai Lifanov <lifanov@mail.lifanov.com>
Reviewed by:		jamie
2015-07-19 08:52:35 +00:00
John-Mark Gurney
02bee582d0 move the prototype to the lib.h header.. This makes more sense, and
it's an API between boot2.c and arm_init.S which calls it..
2015-07-18 22:47:46 +00:00
John-Mark Gurney
c09626461f other fixes to make boot2 compile for IXP... Properly end the asm
sections, and for some reason, main needs a prototype... If someone
has a better fix, I'm all ears...

Pointed out by:	Berislav Purgar
2015-07-18 20:21:25 +00:00
John-Mark Gurney
8d0440e04b revert r278579, this is in a different compile environment than the
kernel, and needs to be named cpu_id...

Pointed out by:	Berislav Purgar
2015-07-18 20:19:51 +00:00
Konstantin Belousov
283dfee925 Further cleanup after r285607.
Remove useless release semantic for some stores to it_need.  For
stores where the release is needed, add a comment explaining why.

Fence after the atomic_cmpset() op on the it_need should be acquire
only, release is not needed (see above).  The combination of
atomic_cmpset() + fence_acq() is better expressed there as
atomic_cmpset_acq().

Use atomic_cmpset() for swi' ih_need read and clear.

Discussed with:	alc, bde
Reviewed by:	bde
Comments wording provided by:	bde
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
2015-07-18 19:59:29 +00:00
Ian Lepore
e7b25f9168 Deselect the sd card before re-selecting it when working around a problem
with some cards that causes them to become deselected after probing for
switch capabilities.  The old workaround fixes the behavior with some cards,
but causes problems with the cards the behave correctly and don't become
deselected.  Forcing a deselect then reselect appears to work correctly
with all cards in initial testing.
2015-07-18 16:56:51 +00:00
Luigi Rizzo
a6e8e92404 fix a typo in a comment 2015-07-18 15:28:32 +00:00
Konstantin Belousov
b4490c6e93 The si_status field of the siginfo_t, provided by the waitid(2) and
SIGCHLD signal, should keep full 32 bits of the status passed to the
_exit(2).

Split the combined p_xstat of the struct proc into the separate exit
status p_xexit for normal process exit, and signalled termination
information p_xsig.  Kernel-visible macro KW_EXITCODE() reconstructs
old p_xstat from p_xexit and p_xsig.  p_xexit contains complete status
and copied out into si_status.

Requested by:	Joerg Schilling
Reviewed by:	jilles (previous version), pho
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
2015-07-18 09:02:50 +00:00
Kevin Lo
ddee45244d Since the IETF has redefined the meaning of the tos field to accommodate
a set of differentiated services, set IPTOS_PREC_* macros using
IPTOS_DSCP_* macro definitions.

While here, add IPTOS_DSCP_VA macro according to RFC 5865.

Differential Revision:	https://reviews.freebsd.org/D3119
Reviewed by:	gnn
2015-07-18 06:48:30 +00:00
Mark Johnston
c6d48c8752 Fix the !KDTRACE_HOOKS build.
X-MFC-With:	r285664
2015-07-18 04:38:11 +00:00
Mark Johnston
e2b25737ee Pass the lock object to lockstat_nsecs() and return immediately if
LO_NOPROFILE is set. Some timecounter handlers acquire a spin mutex, and
we don't want to recurse if lockstat probes are enabled.

PR:		201642
Reviewed by:	avg
MFC after:	3 days
2015-07-18 00:57:30 +00:00
Mark Johnston
efe8b26b82 Modify lockstat_nsecs() to just return unless lockstat probes are actually
enabled. The cost of a timecounter read can be quite significant, and the
problem became more apparent after r284297, since that change resulted in
a call to lockstat_nsecs() for each acquisition of an rwlock read lock.

PR:		201642
Reviewed by:	avg
Tested by:	Jason Unovitch
MFC after:	3 days
Differential Revision:	https://reviews.freebsd.org/D3073
2015-07-18 00:22:00 +00:00
Benno Rice
eacbeb2b95 Merge driver for PMC Sierra's range of SAS/SATA HBAs.
Submitted by:	Achim Leubner <Achim.Leubner@pmcs.com>
Reviewed by:	scottl
2015-07-17 23:30:43 +00:00
Ed Schouten
fd054c2df9 Undo r285656.
It turns out that the CDDL sources already introduce a function called
thread_create(). I'll investigate what we can do to make these functions
coexist.

Reported by: Ivan Klymenko
2015-07-17 22:26:45 +00:00
Benno Rice
a650d8699f Enable pms module on amd64 for now. 2015-07-17 20:30:30 +00:00
Benno Rice
5894064d12 Disable debugging.
Submitted by:	Vasanthalakshmi Tharmarajan <Vasanthalakshmi.Tharmarajan@pmcs.com>
Reviewed by:	scottl
2015-07-17 20:29:47 +00:00
Patrick Kelsey
d57724fd46 Check TCP timestamp option flag so that the automatic receive buffer
scaling code does not use an uninitialized timestamp echo reply value
from the stack when timestamps are not enabled.

Differential Revision: https://reviews.freebsd.org/D3060
Reviewed by:	hiren
Approved by:	jmallett (mentor)
MFC after:	3 days
Sponsored by:	Norse Corp, Inc.
2015-07-17 17:36:33 +00:00
Ed Schouten
82a3d2cbfc Add an API for easily creating userspace threads in kernelspace.
This change refactors the existing create_thread() function to be more
generic. It replaces almost all of its arguments by a callback that can
be used to extract the thread ID and copy it out to the right place, but
also to perform additional initialization steps, such as setting the
trapframe. This also makes the difference between thr_new() and
thr_create() more clear in my opinion.

This function is going to be used by the CloudABI compatibility layer.

Reviewed by:	kib
MFC after:	1 month
2015-07-17 16:34:01 +00:00
Zbigniew Bodek
b8bbefed30 Fix possible coherency issues between PEs related to I-cache
Basing on B.2.3.4:
Synchronization and coherency issues between data and
instruction accesses.

To ensure that modified instructions are visible to all PEs
(Processing Elements) in a shareability domain one need to
perform following sequence:
    1. Clean D-cache
    2. Ensure the visibility of data cleaned from cache
    3. Invalidate I-cache
    4. Ensure completion
    5. In SMP system PE must issue isb to ensure execution of the
       modified instructions

Reviewed by:   andrew
Obtained from: Semihalf
Sponsored by:  The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D3106
2015-07-17 14:33:47 +00:00
Zbigniew Bodek
ab89029bd0 Fix secondary stacks calculation on ARM64
Secondary stack calculation is modified to provide
stack_top = secondary_stacks + (cpu_id) * PAGE_SIZE * KSTACK_PAGES
because on ARM64 the stack grows to lower memory addresses.

Obtained from: Semihalf
Sponsored by:  The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D3107
2015-07-17 14:08:08 +00:00
Zbigniew Bodek
d5dfc8ad00 Increase DMAP (Direct Map) size on ARM64
Previous DMAP size was too small for systems with more than 64GB
of RAM. Increase it to 128GB to support ThunderX CRB.

Reviewed by:   andrew
Obtained from: Semihalf
Sponsored by:  The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D3113
2015-07-17 13:58:00 +00:00
Ed Schouten
6256e57ba9 Implement CloudABI memory management system calls.
Add support for the <sys/mman.h> functions by wrapping around our own
implementations. There are no kern_*() variants of these system calls,
but we also don't need them in this case. It is sufficient to just call
into the sys_*() functions.

Differential Revision:	https://reviews.freebsd.org/D3033
Reviewed by:		brooks
2015-07-17 09:00:38 +00:00
Navdeep Parhar
a1ed88571f cxgbe(4): Ask the firmware for the start of the RSS slice for a port and
save it for later.  This enables direct manipulation of the indirection
tables (although the stock driver doesn't do that right now).

MFC after:	1 month
2015-07-17 06:46:18 +00:00
Konstantin Belousov
888e282ab4 When checking for the valid value of the frame pointer, verify that it
belongs to the kernel stack address range for the thread.  Right now,
code checks that new frame is not farther then KSTACK_PAGES pages from
the current frame, which allows the address to point past the top of
the stack.

Reviewed by:	andrew, emaste, markj
Differential revision:	https://reviews.freebsd.org/D3108
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
2015-07-16 19:40:18 +00:00
Ed Schouten
6e5fcd99df Add a sysentvec for CloudABI on x86-64.
Summary:
For CloudABI we need to put two things on the stack of new processes:
the argument data (a binary blob; not strings) and a startup data
structure. The startup data structure contains interesting things such
as a pointer to the ELF program header, the thread ID of the initial
thread, a stack smashing protection canary, and a pointer to the
argument data.

Fetching system call arguments and setting the return value is similar
to FreeBSD. The only differences are that system call 0 does not exist
and that we call into cloudabi_convert_errno() to convert the error
code. We also need this function in a couple of other places, so we'd
better reuse it here.

Reviewers: dchagin, kib

Reviewed By: kib

Subscribers: imp

Differential Revision: https://reviews.freebsd.org/D3098
2015-07-16 18:24:06 +00:00
Sean Bruno
f46fb03de7 Add an adapter CORE lock in the DDB hook em_dump_queue to avoid WITNESS
panic in em_init_locked() while debugging.

MFC after:	2 weeks
Sponsored by:	Limelight Networks
2015-07-16 16:32:57 +00:00
Hans Petter Selasky
a529288d65 Optimise the DWC OTG host mode driver's transmit path:
1) Use the TX FIFO empty interrupts to poll the transmit FIFO usage,
instead of using own software counters and waiting for SOF
interrupts. Assume that enough FIFO space is available to execute one
USB OUT transfer of any kind when the TX FIFO is empty.

2) Use the host channel halted event to asynchronously wait for host
channels to be disabled instead of waiting for SOF interrupts. This
results in less turnaround time for re-using host channels and at the
same time increases the performance.

The network transmit performance measured by "iperf" for the "RPi-B v1
2011/12" board, increased from 45MBit/s to 65Mbit/s after applying the
changes above.

No regressions seen using:
 - High Speed (BULK, CONTROL, INTERRUPT)
 - Full Speed (All transfer types)
 - Low Speed (Control and Interrupt)

MFC after:	1 month
Submitted by:	Daisuke Aoyama <aoyama@peach.ne.jp>
2015-07-16 16:08:40 +00:00
Mateusz Guzik
2919a0c5c1 fd: partially deduplicate fdescfree and fdescfree_remapped
This also moves vrele of cdir/rdir/jdir vnodes earlier, which should not
matter.
2015-07-16 15:26:37 +00:00
Mateusz Guzik
cd672ca60f Get rid of lim_update_thread and cred_update_thread.
Their primary use was in thread_cow_update to free up old resources.
Freeing had to be done with proc lock held and _cow_ funcs already knew
how to free old structs.
2015-07-16 14:30:11 +00:00
Mateusz Guzik
752fc07d33 vfs: implement v_holdcnt/v_usecount manipulation using atomic ops
Transitions 0->1 and 1->0 (which decide e.g. on putting the vnode on the free
list) of either counter are still guarded with vnode interlock.

Reviewed by:	kib (earlier version)
Tested by:	pho
2015-07-16 13:57:05 +00:00
Andrew Turner
8fa2222f46 Split out the arm and armv6 parts of atomic.h to new files. While here use
__ARM_ARCH to determine which revision of the architecture is applicable.

Sponsored by:	ABT Systems Ltd
2015-07-16 13:33:03 +00:00
Konstantin Belousov
1ef630fb33 Fix warnings about unused functions for UP build.
Sponsored by:	The FreeBSD Foundation
2015-07-16 12:16:42 +00:00
Christian Brueffer
16858c207b Actually recognize all Intel Lynx Point devices we have device IDs for.
PR:		195851
Submitted by:	ftigeot@wolfpond.org
MFC after:	1 week
2015-07-16 11:14:59 +00:00
Zbigniew Bodek
721555e7ee Fix KSTACK_PAGES issue when the default value was changed in KERNCONF
If KSTACK_PAGES was changed to anything alse than the default,
the value from param.h was taken instead in some places and
the value from KENRCONF in some others. This resulted in
inconsistency which caused corruption in SMP envorinment.

Ensure all places where KSTACK_PAGES are used the opt_kstack_pages.h
is included.

The file opt_kstack_pages.h could not be included in param.h
because was breaking the toolchain compilation.

Reviewed by:   kib
Obtained from: Semihalf
Sponsored by:  The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D3094
2015-07-16 10:46:52 +00:00
Zbigniew Bodek
1038d102c4 Set-up proper TCR values for memory related to Translation Table Walking
This commit adds proper cache and shareability attributes to
the TCR register.
Set memory attributes to Normal, outer and inner cacheable WBWA.
Set shareability to inner and outer shareable when SMP is enabled.

Reviewed by:   andrew
Obtained from: Semihalf
Sponsored by:  The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D3093
2015-07-16 10:22:57 +00:00
Kevin Lo
f7c698e20d Fix typo in register definition.
Submitted by:	James Hung
Reviewed by:	sbruno
2015-07-16 08:03:23 +00:00
Ed Schouten
457f7e23b1 Implement CloudABI's exec() call.
Summary:
In a runtime that is purely based on capability-based security, there is
a strong emphasis on how programs start their execution. We need to make
sure that we execute an new program with an exact set of file
descriptors, ensuring that credentials are not leaked into the process
accidentally.

Providing the right file descriptors is just half the problem. There
also needs to be a framework in place that gives meaning to these file
descriptors. How does a CloudABI mail server know which of the file
descriptors corresponds to the socket that receives incoming emails?
Furthermore, how will this mail server acquire its configuration
parameters, as it cannot open a configuration file from a global path on
disk?

CloudABI solves this problem by replacing traditional string command
line arguments by tree-like data structure consisting of scalars,
sequences and mappings (similar to YAML/JSON). In this structure, file
descriptors are treated as a first-class citizen. When calling exec(),
file descriptors are passed on to the new executable if and only if they
are referenced from this tree structure. See the cloudabi-run(1) man
page for more details and examples (sysutils/cloudabi-utils).

Fortunately, the kernel does not need to care about this tree structure
at all. The C library is responsible for serializing and deserializing,
but also for extracting the list of referenced file descriptors. The
system call only receives a copy of the serialized data and a layout of
what the new file descriptor table should look like:

    int proc_exec(int execfd, const void *data, size_t datalen, const int *fds,
              size_t fdslen);

This change introduces a set of fd*_remapped() functions:

- fdcopy_remapped() pulls a copy of a file descriptor table, remapping
  all of the file descriptors according to the provided mapping table.
- fdinstall_remapped() replaces the file descriptor table of the process
  by the copy created by fdcopy_remapped().
- fdescfree_remapped() frees the table in case we aborted before
  fdinstall_remapped().

We then add a function exec_copyin_data_fds() that builds on top these
functions. It copies in the data and constructs a new remapped file
descriptor. This is used by cloudabi_sys_proc_exec().

Test Plan:
cloudabi-run(1) is capable of spawning processes successfully, providing
it data and file descriptors. procstat -f seems to confirm all is good.
Regular FreeBSD processes also work properly.

Reviewers: kib, mjg

Reviewed By: mjg

Subscribers: imp

Differential Revision: https://reviews.freebsd.org/D3079
2015-07-16 07:05:42 +00:00
Justin Hibbits
96f3c2adbe Fix userland program exception handling for powerpc64.
It appears that the linker will not handle 64-bit relocations at addresses that
are not aligned to 8-byte boundaries.  Prior to this change the line:

  .llong generictrap

was aligned to a 4-byte address, and the linker replaced that with an 8-byte
0x0.  Aligning that address to 8 bytes caused the linker to generate the proper
relocation.  As a follow-through, the dblow from trap_subr33.S used the code
sequence 'lwz %r1, TRAP_GENTRAP(0)', so this reproduces the analogue of that for
64-bit.
2015-07-16 05:13:08 +00:00
Neel Natu
62145ff347 If uart interrupts are not functioning then schedule the callout to do the
polling at device attach time [1].

Add tunables 'debug.uart_force_poll' and 'debug.uart_poll_freq' to control
uart polling.

Submitted by:	Aleksey Kuleshov (rndfax@yandex.ru) [1]
2015-07-16 04:15:22 +00:00
Konstantin Belousov
70a3efc14f Do not use atomic_swap_int(9), it is not available on all
architectures.  Atomic_cmpset_int(9) is a direct replacement, due to
loop.  The change fixes arm, arm64, mips an sparc64, which lack
atomic_swap().

Suggested and reviewed by:	alc
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
2015-07-15 21:44:16 +00:00
Konstantin Belousov
615b6ea2c8 Reset non-zero it_need indicator to zero atomically with fetching its
current value.  It is believed that the change is the real fix for the
issue which was covered over by the r252683.

With the current code, if the interrupt handler sets it_need between
read and consequent reset, the update could be lost and
ithread_execute_handlers() would not be called in response to the lost
update.

The r252683 could have hide the issue since at the moment of commit,
atomic_load_acq_int() did locked cmpxchg on the variable, which puts
the cache line into the exclusive owned state and clears store
buffers.  Then the immediate store of zero has very high chance of
reusing the exclusive state of the cache line and make the load and
store sequence operate as atomic swap.

For now, add the acq+rel fence immediately after the swap, to not
disturb current (but excessive) ordering.  Acquire is needed for the
ih_need reads after the load, while release does not serve a useful
purpose [*].

Reviewed by:	alc
Noted by:	alc [*]
Discussed with:	bde
Tested by:	pho
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
2015-07-15 17:36:35 +00:00
Konstantin Belousov
03bbcb2f0c Style. Remove excessive brackets. Compare non-boolean with zero.
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
2015-07-15 17:14:05 +00:00
Andrew Turner
63e8633e80 Fix an infinite loop when a node doesn't have an interrupt-parent property.
Submitted by:	Aleksey Kuleshov <rndfax@yandex.ru>
Differential Revision: https://reviews.freebsd.org/D3041
2015-07-15 13:28:25 +00:00
Alexander Motin
7dbe8f175b MULTI_ID supported does not mean it is used. 2015-07-15 12:04:12 +00:00
Ed Schouten
952c6e1010 Implement the trivial socket system calls: shutdown() and listen(). 2015-07-15 11:27:34 +00:00
Zbigniew Bodek
b49baf8065 Add identify_cpu() to ARM64 init_secondary routine
Identify current CPU. This is necessary to setup
affinity registers and to provide support for
runtime chip identification.

Reviewed by:   andrew
Obtained from: Semihalf
Sponsored by:  The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D3095
2015-07-15 09:24:45 +00:00
Ed Schouten
4fa92fb538 Make posix_fallocate() and posix_fadvise() work.
We can map these system calls directly to the FreeBSD counterparts. The
other filesystem related system calls will be sent out for review
separately, as they are a bit more complex to get right.
2015-07-15 09:14:06 +00:00
Allan Jude
ce808c7ad8 Add a new option to gpart(8) to fix Lenovo BIOS boot issue
PR:		184910
Reviewed by:	ae, wblock
Approved by:	marcel
MFC after:	3 days
Relnotes:	yes
Sponsored by:	ScaleEngine Inc.
Differential Revision:	https://reviews.freebsd.org/D3065
2015-07-15 02:23:55 +00:00
Patrick Kelsey
2ec930efea Revert inadvertent change to amd64/GENERIC. 2015-07-15 01:04:54 +00:00
Patrick Kelsey
8aa7fdbd78 Add netmap support for ixgbe SRIOV VFs (that is, to if_ixv).
Differential Revision: https://reviews.freebsd.org/D2923
Reviewed by: erj, gnn
Approved by: jmallett (mentor)
Sponsored by: Norse Corp, Inc.
2015-07-15 01:02:01 +00:00
Hiren Panchasara
fd3e9bafbd Remove FreeBSD version check for deprecated M_FLOWID.
Reviewed by:	    erj
Sponsored by:	    Limelight Networks
2015-07-15 01:01:17 +00:00
Patrick Kelsey
c8ed84db3a Fix igxbe SRIOV VF (if_ixv) initialization bugs. The MAC address for
an if_ixv instance can now set at creation time, and the receive ring
tail pointer is correctly initialized (previously, things still worked
because the receive ring tail pointer was being fixed up as a side
effect of other activity).

Differential Revision: https://reviews.freebsd.org/D2922
Reviewed by: erj, gnn
Approved by: jmallett (mentor)
Sponsored by: Norse Corp, Inc.
2015-07-15 00:35:50 +00:00
Ed Schouten
bc41a24735 Fix the build after breaking it in r285549.
I performed the commit on a different system as where I wrote the
change. After pulling in the change from Phabricator, I didn't notice
that a single chunk did not apply.

Approved by:	secteam (implicit, as intended change was approved)
Pointy hat to:	me
2015-07-14 20:45:24 +00:00
Andrew Turner
f3856d8fcb Also accept "ok" to enable a device, some vendor device trees use this when
they mean "okay"
2015-07-14 19:11:16 +00:00
Ed Schouten
707d98fe2f Implement the CloudABI random_get() system call.
The random_get() system call works similar to getentropy()/getrandom()
on OpenBSD/Linux. It fills a buffer with random data.

This change introduces a new function, read_random_uio(), that is used
to implement read() on the random devices. We can call into this
function from within the CloudABI compatibility layer.

Approved by:	secteam
Reviewed by:	jmg, markm, wblock
Obtained from:	https://github.com/NuxiNL/freebsd
Differential Revision:	https://reviews.freebsd.org/D3053
2015-07-14 18:45:15 +00:00
Mark Johnston
02d131ad11 Fix some error-handling bugs when core dump compression is enabled:
- Ensure that core dump parameters are initialized in the error path.
- Don't call gzio_fini() on a NULL stream.

Reported by:	rpaulo
2015-07-14 18:24:05 +00:00
Ed Schouten
460ac6370a Regenerate system call table for r285540. 2015-07-14 15:12:24 +00:00
Ed Schouten
1eb7c7cae3 Implement thread_tcb_set() and thread_yield().
The first system call is used to set the user TLS address. Right now
this system call is invoked by the C library for both the initial thread
and additional threads unconditionally, but in the future we'll only
call this if the architecture does not support this. On recent x86-64
CPUs we could use the WRFSBASE instruction.

This system call was erroneously placed in sys/compat/cloudabi64, even
though it does not depend on any pointer size dependent datastructure.
Move it to the right place.

Obtained from:	https://github.com/NuxiNL/freebsd
2015-07-14 15:11:50 +00:00
Ed Schouten
03744d7c8d Implement {,p}{read,write}{,v}().
Add a routine similar to copyinuio() and freebsd32_copyinuio() that
copies in CloudABI's struct iovecs. These are then translated into
FreeBSD format and placed in a 'struct uio', so we can call into the
kern_*() functions.

Obtained from:	https://github.com/NuxiNL/freebsd
2015-07-14 14:33:21 +00:00
Andrew Turner
b7fbd410ab Set memory to be inner-sharable. This isn't needed on device memory as the
MMU will ignore the attribute there, howeverit simplifies to code to alwas
set it.

Obtained from:	ABT Systems Ltd
Sponsored by:	The FreeBSD Foundation
2015-07-14 12:37:47 +00:00
Ed Schouten
f9675092b8 Let proc_raise() call into pksignal() directly.
Summary:
As discussed with kib@ in response to r285404, don't call into
kern_sigaction() within proc_raise() to reset the signal to the default
action before delivery. We'd better do that during image execution.

Change the code to simply use pksignal(), so we don't waste cycles on
functions like pfind() to look up the currently running process itself.

Test Plan:
This change has also been pushed into the cloudabi branch on GitHub. The
raise() tests still seem to pass.

Reviewers: kib

Reviewed By: kib

Subscribers: imp

Differential Revision: https://reviews.freebsd.org/D3076
2015-07-14 12:16:14 +00:00
Zbigniew Bodek
d1be8e59e2 Fix secondary PIC initialization order
Call arm_init_secondary before any other PIC-related functions
are called. This is necessary for GICv3 where PIC_INIT_SECONDARY
allocates resources needed for all further operations.

Obtained from: Semihalf
Sponsored by:  The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D3066
2015-07-14 12:02:56 +00:00
Zbigniew Bodek
b7ac293f44 Fix intr_machdep.c for ARM64
On ARMv8 IPIs are mapped to 0-15. Incrementing the number by 16
is wrong, because it sets a reserved bit in the IPI register.
This patch removes all "+16" to comply with specs.

Obtained from: Semihalf
Sponsored by:  The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D3029
2015-07-14 11:59:43 +00:00
Christian Brueffer
f4c1eac7cd Spell crypto correctly. 2015-07-14 10:47:56 +00:00
Hiren Panchasara
df7b11fa09 Expose full 32bit RSS hash from card regardless of whether RSS is defined or
not. When doing multiqueue, we are all setup to have full 32bit RSS hash from
the card. We do not need to hide that under "ifdef RSS" and should expose that
by default so others like lagg(4) can use that and avoid hashing the traffic by
themselves.

While here, delete the FreeBSD version check and use of deprecated M_FLOWID.

Reviewed by:	adrian, erj
MFC after:	1 week
Sponsored by:	Limelight Networks
2015-07-14 09:13:18 +00:00
Navdeep Parhar
c7dbd80213 cxgbe(4): Update T4 and T5 firmwares to 1.14.2.0.
Obtained from:	Chelsio Communications
MFC after:	3 days
2015-07-14 08:02:05 +00:00
John-Mark Gurney
577f7474b0 Fix XTS, and name things a bit better...
Though confusing, GCM using ICM_BLOCK_LEN, but ICM does not is
correct...  GCM is built on ICM, but uses a function other than
swcr_encdec...  swcr_encdec cannot handle partial blocks which is
why it must still use AES_BLOCK_LEN and is why XTS was broken by the
commit...

Thanks to the tests for helping sure I didn't break GCM w/ an earlier
patch...

I did run the tests w/o this patch, and need to figure out why they
did not fail, clearly more tests are needed...

Prodded by:	peter
2015-07-14 07:45:18 +00:00
John-Mark Gurney
e0b231cbc8 fix typos..
Submitted by:	brueffer
2015-07-14 06:34:57 +00:00
Adrian Chadd
85b543e06d Populate hw.model with the CPU model information.
Now you see something like:

# sysctl hw.model
hw.model: Atheros AR9330 rev 1

Tested:

* Carambola 2, AR9331 SoC
2015-07-14 05:14:10 +00:00
John-Mark Gurney
b65946c631 cryptodev is not needed for TCP_SIGNATURE...
Comment that cryptodev shouldn't be used unless you know what you're
doing...

The various arm/mips and one powerpc configs that have cryptodev in
them need to be addressed, audited if they provide benefit and removed
if they don't...
2015-07-14 05:09:58 +00:00
Conrad Meyer
0c40f3532d Fix cleanup race between unp_dispose and unp_gc
unp_dispose and unp_gc could race to teardown the same mbuf chains, which
can lead to dereferencing freed filedesc pointers.

This patch adds an IGNORE_RIGHTS flag on unpcbs marking the unpcb's RIGHTS
as invalid/freed. The flag is protected by UNP_LIST_LOCK.

To serialize against unp_gc, unp_dispose needs the socket object. Change the
dom_dispose() KPI to take a socket object instead of an mbuf chain directly.

PR:		194264
Differential Revision:	https://reviews.freebsd.org/D3044
Reviewed by:	mjg (earlier version)
Approved by:	markj (mentor)
Obtained from:	mjg
MFC after:	1 month
Sponsored by:	EMC / Isilon Storage Division
2015-07-14 02:00:50 +00:00
Mateusz Guzik
6161705823 exec: textvp -> oldtextvp; binvp -> newtextvp
This makes it consistent with the rest of the naming in do_execve.

No functional changes.
2015-07-14 01:13:37 +00:00
Mateusz Guzik
853be5ffef exec plug a redundant vref + vrele of the image vnode 2015-07-14 00:43:08 +00:00
Mateusz Guzik
e94e50af1d racct: perform a lockless check for p_throttled
This reduces proc lock contention.

Reviewed by:	trasz
2015-07-13 22:52:11 +00:00
Alexander Motin
d4f3ad3a26 Switch initiator IDs in target mode to the same address space as target
IDs in initiator mode -- index in port database instead of handlers.

This makes initiator IDs persist across role changes and firmware resets,
when handlers previously assigned by firmware are lost and reused.

Sponsored by:	iXsystems, Inc.
2015-07-13 21:01:24 +00:00
Luiz Otavio O Souza
fb54940587 Bring a few simplifications to a10_gpio:
o Return the real hardware state in gpio_pin_getflags() instead of keep
   the last state in an internal table.  Now the driver returns the real
   state of pins (input/output and pull-up/pull-down) at all times.
 o Use a spin mutex.  This is required by interrupts and the 1-wire code.
 o Use better variable names and place parentheses around them in MACROS.
 o Do not lock the driver when returning static data.

Tested with gpioled(4) and DS1820 (1-wire) sensors on banana pi.
2015-07-13 18:19:26 +00:00
Conrad Meyer
c578e0fb48 pipe_direct_write: Fix mismatched pipelock/unlock
If a signal is caught in pipelock, causing it to fail, pipe_direct_write
should not try to pipeunlock.

Reported by:	pho
Differential Revision:	https://reviews.freebsd.org/D3069
Reviewed by:	kib
Approved by:	markj (mentor)
MFC after:	1 week
Sponsored by:	EMC / Isilon Storage Division
2015-07-13 17:45:22 +00:00