Commit Graph

105189 Commits

Author SHA1 Message Date
Ed Schouten
aaf53ab2aa Correct the previous commit: remove the DECLARE_MODULE().
It looks like a MODULE_VERSION() can also appear on its own -- there is
no need to use explicitly use DECLARE_MODULE(). Looking at other
modules, this seems common practice.
2015-08-05 16:53:49 +00:00
Ed Schouten
b6efa27589 Add DECLARE_MODULE() to the "cloudabi" kernel module.
This kernel module does not require any explicit initialization, but a
module declaration is needed to let the "cloudabi64" kernel module
automatically pull this in.

Obtained from:	https://github.com/NuxiNL/freebsd
2015-08-05 16:45:47 +00:00
Ed Schouten
36310bcd1d Make fcntl(F_SETFL) work.
The stat_put() system call can be used to modify file descriptor
attributes, such as flags, but also Capsicum permission bits. Support
for changing Capsicum bits will be added as soon as its dependent
changes have been pushed through code review.

Obtained from:	https://github.com/NuxiNL/freebsd
2015-08-05 16:15:43 +00:00
Li-Wen Hsu
79b7e3e2c2 Fix make depend in sys/modules
Reviewed by:	delphij
Differential Revision:	https://reviews.freebsd.org/D3291
2015-08-05 14:45:52 +00:00
Alexander Motin
73942c5ce0 Issue all reads of single XCOPY segment simultaneously.
During vMotion and Clone VMware by default runs multiple sequential 4MB
XCOPY requests same time.  If CTL issues reads sequentially in 1MB chunks
for each XCOPY command, reads from different commands are not detected
as sequential by serseq option code and allowed to execute simultaneously.
Such read pattern confused ZFS prefetcher, causing suboptimal disk access.
Issuing all reads same time make serseq code work properly, serializing
reads both within each XCOPY command and between them.

My tests with ZFS pool of 14 disks in RAID10 shows prefetcher efficiency
improved from 37% to 99.7%, copying speed improved by 10-60%, average
read latency reduced twice on HDD layer and by five times on zvol layer.

MFC after:	2 weeks
Sponsored by:	iXsystems, Inc.
2015-08-05 13:46:15 +00:00
Ed Schouten
2412ae2b8e Regenerate the system call table. 2015-08-05 13:10:13 +00:00
Ed Schouten
2837d9ed43 Import the latest CloudABI system call definitions and table.
We're going to need these for next code I'm going to send out for
review: support for poll() and kqueue() on CloudABI.
2015-08-05 13:09:46 +00:00
Konstantin Belousov
c8fbdcc10d Fix UP build after r286296, ensure that CPU_FOREACH() is defined.
Sponsored by:	The FreeBSD Foundation
2015-08-05 10:50:33 +00:00
Jason A. Harmening
0a3e154709 Properly sort the function declarations added in r286296
Submitted by:	alc
Approved by:	kib (mentor)
2015-08-05 10:48:32 +00:00
Ed Schouten
db1c8ee585 Add the remaining pointer size independent CloudABI socket system calls.
CloudABI uses a structure called cloudabi_sockstat_t. Think of it as
'struct stat' for sockets. It is used by functions such as
getsockname(), getpeername(), some of the getsockopt() values, etc.

This change implements the sock_stat_get() system call that returns a
copy of this structure. The accept() system call should also return a
full copy of this structure eventually, but for now we're only
interested in the peer address. Add a TODO() to make sure this is
patched up later on.

Differential Revision:	https://reviews.freebsd.org/D3218
2015-08-05 08:18:05 +00:00
Ed Schouten
4958fab8cd Allow the creation of polling descriptors (kqueues) on CloudABI. 2015-08-05 07:37:06 +00:00
Ed Schouten
a2034cc98a Allow the creation of kqueues with a restricted set of Capsicum rights.
On CloudABI we want to create file descriptors with just the minimal set
of Capsicum rights in place. The reason for this is that it makes it
easier to obtain uniform behaviour across different operating systems.

By explicitly whitelisting the operations, we can return consistent
error codes, but also prevent applications from depending OS-specific
behaviour.

Extend kern_kqueue() to take an additional struct filecaps that is
passed on to falloc_caps(). Update the existing consumers to pass in
NULL.

Differential Revision:	https://reviews.freebsd.org/D3259
2015-08-05 07:36:50 +00:00
Ed Schouten
2433a4eb04 Make it possible to implement poll(2) on top of kqueue(2).
It looks like EVFILT_READ and EVFILT_WRITE trigger under the same
conditions as poll()'s POLLRDNORM and POLLWRNORM as described by POSIX.
The only difference is that POLLRDNORM has to be triggered on regular
files unconditionally, whereas EVFILT_READ only triggers when not EOF.

Introduce a new flag, NOTE_FILE_POLL, that can be used to make
EVFILT_READ and EVFILT_WRITE behave identically to poll(). This flag
will be used by cloudlibc's poll() function.

Reviewed by:	jmg
Differential Revision:	https://reviews.freebsd.org/D3303
2015-08-05 07:34:29 +00:00
Justin Hibbits
daebf39a41 Remove one more that crept in unnecessarily from previous commit. 2015-08-05 01:52:52 +00:00
Justin Hibbits
6ee5cf50ee Remove some unnecessary includes. 2015-08-05 01:52:11 +00:00
Jason A. Harmening
713841afb2 Add two new pmap functions:
vm_offset_t pmap_quick_enter_page(vm_page_t m)
void pmap_quick_remove_page(vm_offset_t kva)

These will create and destroy a temporary, CPU-local KVA mapping of a specified page.

Guarantees:
--Will not sleep and will not fail.
--Safe to call under a non-sleepable lock or from an ithread

Restrictions:
--Not guaranteed to be safe to call from an interrupt filter or under a spin mutex on all platforms
--Current implementation does not guarantee more than one page of mapping space across all platforms. MI code should not make nested calls to pmap_quick_enter_page.
--MI code should not perform locking while holding onto a mapping created by pmap_quick_enter_page

The idea is to use this in busdma, for bounce buffer copies as well as virtually-indexed cache maintenance on mips and arm.

NOTE: the non-i386, non-amd64 implementations of these functions still need review and testing.

Reviewed by:	kib
Approved by:	kib (mentor)
Differential Revision:	http://reviews.freebsd.org/D3013
2015-08-04 19:46:13 +00:00
Rui Paulo
7b80d5ad13 BEAGLEBONE: remove dtrace from MODULES_EXTRA.
This config is already building all modules, so we don't need the
MODULES_EXTRA definition.  It was also causing problems to users who
rely on MODULES_OVERRIDE to do the right thing.

Discussed with:	ian
2015-08-04 19:04:02 +00:00
Jung-uk Kim
ca23ca33f2 Fix style(9) bugs. 2015-08-04 18:59:54 +00:00
John-Mark Gurney
a2bc81bf7c Make IPsec work with AES-GCM and AES-ICM (aka CTR) in OCF... IPsec
defines the keys differently than NIST does, so we have to muck with
key lengths and nonce/IVs to be standard compliant...

Remove the iv from secasvar as it was unused...

Add a counter protected by a mutex to ensure that the counter for GCM
and ICM will never be repeated..  This is a requirement for security..
I would use atomics, but we don't have a 64bit one on all platforms..

Fix a bug where IPsec was depending upon the OCF to ensure that the
blocksize was always at least 4 bytes to maintain alignment... Move
this logic into IPsec so changes to OCF won't break IPsec...

In one place, espx was always non-NULL, so don't test that it's
non-NULL before doing work..

minor style cleanups...

drop setting key and klen as they were not used...

Enforce that OCF won't pass invalid key lengths to AES that would
panic the machine...

This was has been tested by others too...  I tested this against
NetBSD 6.1.5 using mini-test suite in
https://github.com/jmgurney/ipseccfgs and the only things that don't
pass are keyed md5 and sha1, and 3des-deriv (setkey syntax error),
all other modes listed in setkey's man page...  The nice thing is
that NetBSD uses setkey, so same config files were used on both...

Reviewed by:	gnn
2015-08-04 17:47:11 +00:00
Konstantin Belousov
3208d3ff46 Give large kernel stack to the initial thread . Otherwise, ZFS
overflows the stack during root mount in some configurations.

Tested by:	Fabian Keil <freebsd-listen@fabiankeil.de> (previous version)
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
2015-08-04 13:50:52 +00:00
Konstantin Belousov
35dfc644f5 Copy the fencing of the algorithm to do lock-less update and reading
of the timehands, from the kern_tc.c implementation to vdso.  Add
comments giving hints where to look for the algorithm explanation.

To compensate the removal of rmb() in userspace binuptime(), add
explicit lfence instruction before rdtsc.  On i386, add usual
complications to detect SSE2 presence; assume that old CPUs which do
not implement SSE2 also execute rdtsc almost in order.

Reviewed by:	alc, bde (previous version)
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
2015-08-04 12:33:51 +00:00
Edward Tomasz Napierala
72800098bf Fix panic triggered by code like this:
open("/dev/md0", O_EXEC);

Discussed with:	kib@, mav@
MFC after:	1 month
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D3051
2015-08-04 10:40:08 +00:00
Hans Petter Selasky
5884383f19 Avoid calling into the random subsystem before it is initialized.
Sponsored by:	Mellanox Technologies
2015-08-04 09:45:10 +00:00
Edward Tomasz Napierala
57a73b26e0 Mark vgonel() as static. It was already declared static earlier;
no idea why compilers don't warn about this.

MFC after:	1 month
Sponsored by:	The FreeBSD Foundation
2015-08-04 08:51:56 +00:00
Ed Schouten
0c0964844e Let the CloudABI futex code use umtx_keys.
The CloudABI kernel still passes all of the cloudlibc unit tests.

Reviewed by:	vangyzen
Differential Revision:	https://reviews.freebsd.org/D3286
2015-08-04 06:02:03 +00:00
Ed Schouten
dc4b532479 Fix bad arithmetic in umtx_key_get() to compute object offset.
It looks like umtx_key_get() has the addition and subtraction the wrong
way around, meaning that it fails to match in certain cases. This causes
the cloudlibc unit tests to deadlock in certain cases.

Reviewed by:	kib
Differential Revision:	https://reviews.freebsd.org/D3287
2015-08-04 06:01:13 +00:00
Jung-uk Kim
45c2c9a84a Always define __va_list for amd64 and restore pre-r232261 behavior for i386.
Note it allows exotic compilers, e.g., TCC, to build with our stdio.h, etc.

PR:		201749
MFC after:	1 week
2015-08-04 00:11:39 +00:00
Luiz Otavio O Souza
9224217213 Remove the mtx_sleep() from the kqueue f_event filter.
The filter is called from the network hot path and must not sleep.

The filter runs with the descriptor lock held and does not manipulates the
buffers, so it is not necessary sleep when the hold buffer is in use.

Just ignore the hold buffer contents when it is being copied to user space
(when hold buffer in use is set).

This fix the "Sleeping thread owns a non-sleepable lock" panic when the
userland thread is too busy reading the packets from bpf(4).

PR:		200323
MFC after:	2 weeks
Sponsored by:	Rubicon Communications (Netgate)
2015-08-03 22:14:45 +00:00
Ed Schouten
52942c1eae Add missing const keyword to function parameter.
The umtx_key_get() function does not dereference the address off the
userspace object. The pointer can safely be const.
2015-08-03 21:11:33 +00:00
John Baldwin
92de34df2c kgdb uses td_oncpu to determine if a thread is running and should use
a pcb from stoppcbs[] rather than the thread's PCB.  However, exited threads
retained td_oncpu from the last time they ran, and newborn threads had their
CPU fields cleared to zero during fork and thread creation since they are
in the set of fields zeroed when threads are setup.  To fix, explicitly
update the CPU fields for exiting threads in sched_throw() to reflect the
switch out and reset the CPU fields for new threads in sched_fork_thread()
to NOCPU.

Reviewed by:	kib
MFC after:	1 week
Differential Revision:	https://reviews.freebsd.org/D3193
2015-08-03 20:43:36 +00:00
Alan Cox
d8015db3b5 Refinements to r281079's sequential access optimization: Prefetched pages,
which constitute the majority of the pages that are processed by
vm_fault_dontneed(), are already near the tail of the inactive queue.  Only
the pages at faulting virtual addresses are actually moved by
vm_page_advise(..., MADV_DONTNEED).  However, vm_page_advise(...,
MADV_DONTNEED) is simultaneously too aggressive and passive for the moved
pages.  It makes most of these pages too easily reclaimable, and at the same
time it leaves enough pages in the active queue to trigger pageouts by the
page daemon.  Instead, with this change, the pages at faulting virtual
addresses are moved to the tail of the inactive queue, where they are
relatively close to the pages prefetched by the same page fault.

Discussed with:	jeff
Sponsored by:	EMC / Isilon Storage Division
2015-08-03 20:30:27 +00:00
Luiz Otavio O Souza
98fa5d858c Add a KASSERT() to make sure we wont rotate the buffers twice (rotate the
buffers while the hold buffer is in use).

Suggested by:	ed, ghelmer
MFC with:	r286142
2015-08-03 18:22:31 +00:00
Mark Johnston
8f980c016b The mbuf parameter to ip_output_pfil() must be an output parameter since
pfil(9) hooks may modify the chain.

X-MFC-With:	r286028
2015-08-03 17:47:02 +00:00
Mark Johnston
1c9a705223 Remove a couple of unused fields from the FBT probe struct. 2015-08-03 17:39:36 +00:00
Sean Bruno
e5cb6169d3 A misplaced #endif in ixgbe_ioctl() causes interface MTU to become
zero when INET and INET6 are undefined.

PR:		162028
Differential Revision:	https://reviews.freebsd.org/D3187
Submitted by:	hoomanfazaeli@gmail.com pluknet
Reviewed by:	erj hiren gelbius
MFC after:	2 weeks
2015-08-03 16:39:25 +00:00
Edward Tomasz Napierala
d6cc35b287 Fix panic that would happen on forcibly unmounting devfs (note that
as it is now, devfs ignores MNT_FORCE anyway, so it needs to be modified
to trigger the panic) with consumers still opened.

Note that this still results in a leak of r/w/e counters.  It seems
to be harmless, though.  If anyone knows a better way to approach
this - please tell.

Discussed with:	kib@, mav@
MFC after:	1 month
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D3050
2015-08-03 16:35:18 +00:00
Edward Tomasz Napierala
8d9ed17366 Fix a problem which made loader(8) load non-kld files twice.
For example, without this patch, the following three lines
in /boot/loader.conf would result in /boot/root.img being preloaded
twice, and two md(4) devices - md0 and md1 - being created.

initmd_load="YES"
initmd_type="md_image"
initmd_name="/boot/root.img"

Reviewed by:	marcel@
MFC after:	1 month
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D3204
2015-08-03 16:27:36 +00:00
Zbigniew Bodek
4cbca60875 Add missing exception number to EL0 sync. abort on ARM64
When doing a data abort from userland it is possible to get
more than one data abort inside the same exception level.
Add an appropriate exception number to allow nesting of
data_abort handler for EL0.

Reviewed by:   andrew
Obtained from: Semihalf
Sponsored by:  The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D3276
2015-08-03 14:58:46 +00:00
Warner Losh
75333e6435 Add pmspvc device back to GENERIC. The issues with the device playing
grabby hands with other driver's devices has been solved.

MFC After: 3 weeks
2015-08-03 13:49:46 +00:00
Ed Schouten
ee95773383 Let CloudABI use the SV_CAPSICUM flag.
CloudABI processes will now start up in capabilities mode.

Reviewed by:	kib
2015-08-03 13:42:52 +00:00
Ed Schouten
39f5ebb774 Add sysent flag to switch to capabilities mode on startup.
CloudABI processes should run in capabilities mode automatically. There
is no need to switch manually (e.g., by calling cap_enter()). Add a
flag, SV_CAPSICUM, that can be used to call into cap_enter() during
execve().

Reviewed by:	kib
2015-08-03 13:41:47 +00:00
Konstantin Belousov
f94cc23475 Clear the IA32_MISC_ENABLE MSR bit, which limits the max CPUID
reported, on APs.  We already did this on BSP.

Otherwise, the userspace software which depends on the features
reported by the high CPUID levels is misbehaving.  In particular, AVX
detection is non-functional, depending on which CPU thread happens to
execute when doing CPUID.  Another victim is the libthr signal
handlers interposer, which needs to save full FPU extended state.

Reported and tested by:	Andre Meiser <ortadur@web.de>
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
2015-08-03 12:14:42 +00:00
Julien Charbon
ff9b006d61 Decompose TCP INP_INFO lock to increase short-lived TCP connections scalability:
- The existing TCP INP_INFO lock continues to protect the global inpcb list
  stability during full list traversal (e.g. tcp_pcblist()).

- A new INP_LIST lock protects inpcb list actual modifications (inp allocation
  and free) and inpcb global counters.

It allows to use TCP INP_INFO_RLOCK lock in critical paths (e.g. tcp_input())
and INP_INFO_WLOCK only in occasional operations that walk all connections.

PR:			183659
Differential Revision:	https://reviews.freebsd.org/D2599
Reviewed by:		jhb, adrian
Tested by:		adrian, nitroboost-gmail.com
Sponsored by:		Verisign, Inc.
2015-08-03 12:13:54 +00:00
Edward Tomasz Napierala
e553ca4994 Rework the way iSCSI initiator handles system shutdown. This fixes
hangs on shutdown with LUNs with mounted filesystems over a disconnected
iSCSI session.

MFC after:	1 month
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D3052
2015-08-03 11:57:11 +00:00
Andrew Turner
f692e32555 Pass the pcb to store the vfp state in to vfp_save_state. This fixes a bug
in savectx where it will be used to store the current state however will
pass in a pcb when vfp_save_state expected a thread pointer.

Obtained from:	ABT Systems Ltd
Sponsored by:	The FreeBSD Foundation
2015-08-03 11:05:02 +00:00
Steven Hartland
ebbc56ecd6 Fix KSTACK_PAGES check in ZFS module
The check introduced by r285946 failed to add the dependency on
opt_kstack_pages.h which meant the default value for the platform instead
of the customised options KSTACK_PAGES=X was being tested.

Also wrap in #ifdef __FreeBSD__ for portability.

MFC after:	3 days
Sponsored by:	Multiplay
2015-08-03 09:34:09 +00:00
Ed Schouten
75c9f22394 Set p_osrel to __FreeBSD_version on process startup.
Certain system calls have quirks applied to make them work as if called
on an older version of FreeBSD. As CloudABI executables don't have the
FreeBSD OS release number in the ELF header, this value is set to zero,
making the system calls fall back to typically historic, non-standard
behaviour.

Reviewed by:	kib
2015-08-03 07:29:57 +00:00
Oleksandr Tymoshenko
7b25d1d63b Pass correct type of argument to ti_gpio_unmask_irq in ti_gpio_activate_resource 2015-08-03 01:22:49 +00:00
John-Mark Gurney
bba6880eab looks like all archs either have clang or cdefs included before..
drop this include as unnecessary..

Requested by:	bde
2015-08-02 21:33:40 +00:00
Warner Losh
123049cf36 Don't forget to check the vendor when probing. Also, there's no need
to double check for if the card has probed before. In fact, there's no
reason to single check either. Simplify the code as a result.
$FreeBSD$ added to lxutil.c in a non-standard way to help keep the
diffs with upstream to a minimum.

Differential Revision: https://reviews.freebsd.org/D3263
2015-08-02 16:26:41 +00:00