201974 Commits

Author SHA1 Message Date
Ed Schouten
2491302a04 Add implementations for some of the CloudABI file descriptor system calls.
All of the CloudABI system calls that operate on file descriptors of an
arbitrary type are prefixed with fd_. This change adds wrappers for
most of these system calls around their FreeBSD equivalents.

The dup2() system call present on CloudABI deviates from POSIX, in the
sense that it can only be used to replace existing file descriptor. It
cannot be used to create new ones. The reason for this is that this is
inherently thread-unsafe. Furthermore, there is no need on CloudABI to
use fixed file descriptor numbers. File descriptors 0, 1 and 2 have no
special meaning.

This change exposes the kern_dup() through <sys/syscallsubr.h> and puts
the FDDUP_* flags in <sys/filedesc.h>. It then adds a new flag,
FDDUP_MUSTREPLACE to force that file descriptors are replaced -- not
allocated.

Differential Revision:	https://reviews.freebsd.org/D3035
Reviewed by:	mjg
2015-07-09 16:07:01 +00:00
Mateusz Guzik
efdc25304c fd: prepare do_dup for being exported
- rename it to kern_dup.
- prefix flags with FD
- assert that correct flags were passed
2015-07-09 15:19:45 +00:00
Mateusz Guzik
d19ba50e12 vfs: avoid spurious vref/vrele for absolute lookups
namei used to vref fd_cdir, which was immediatley vrele'd on entry to
the loop.

Check for absolute lookup and vref the right vnode the first time.

Reviewed by:	kib
2015-07-09 15:06:58 +00:00
Mateusz Guzik
a03f1b2970 vfs: plug a use-after-free of fd_rdir in namei
fd_rdir vnode was stored in ni_rootdir without refing it in any way,
after which the filedsc lock was being dropped.

The vnode could have been freed by mountcheckdirs or another thread doing
chroot.

VREF the vnode while the lock is held.

Reviewed by:	kib
MFC after:	1 week
2015-07-09 15:06:24 +00:00
Baptiste Daroussin
0fc58d1446 Do not try to set password on group if the group is added as a consequence of
of creating a user (regression from r285136)

Reported by:	Fabian Keil <fk@fabiankeil.de>
2015-07-09 14:14:44 +00:00
Andrew Turner
b2b5507779 Add support for SMP. This uses the FDT data to find the CPUs to start on,
and psci to start them. I expect ACPI support to be added later.

This has been tested on qemu with 2 cpus as that is the current value of
MAXCPUS. This is expected to be increased in the future as FreeBSD has
been tested on 48 cores on the Cavium ThunderX hardware.

Partially based on a patch from Robin Randhawa from ARM.

Approved by:	ABT Systems Ltd
Relnotes:	yes
Sponsored by:	The FreeBSD Foundation
Differential Revision:	https://reviews.freebsd.org/D3024
2015-07-09 13:23:29 +00:00
Andrew Turner
3ad7e84ef5 Add logging of synchronous exceptions.
Obtained from:	ABT Systems Ltd
Sponsored by:	The FreeBSD Foundation
2015-07-09 13:07:12 +00:00
Andrew Turner
7df38eabdc Add the definition of the shareable bits in the pagetables
Obtained from:	ABT Systems Ltd
Sponsored by:	The FreeBSD Foundation
2015-07-09 12:56:09 +00:00
Andrew Turner
144aa0b7f5 Clean up the types used in <machine/ucontext.h> on arm64. As some ports
include this file without first including the headers needed for uint32_t
and the like use the __foo type.

Obtained from:	ABT Systems Ltd
Sponsored by:	The FreeBSD Foundation
2015-07-09 12:51:50 +00:00
Ed Schouten
3a41ec6af7 Don't clobber td->td_retval[0] in proc_reap().
While writing tests for CloudABI, I noticed that close() on process
descriptors returns the process ID of the child process. This is
interesting, as close() is only allowed to return 0 or -1. It turns out
that we clobber td->td_retval[0] in proc_reap(), so that wait*()
properly returns the process ID.

Change proc_reap() to leave td->td_retval[0] alone. Set the return value
in kern_wait6() instead, by keeping track of the PID before we
(potentially) reap the process.

Differential Revision:	https://reviews.freebsd.org/D3032
Reviewed by:	kib
2015-07-09 12:04:45 +00:00
Zbigniew Bodek
6c03ba71f8 Rework CPU identification on ARM64
This commit reworks the code responsible for identification of
the CPUs during runtime.
It is necessary to provide a way for workarounds and erratums
to be applied only for certain HW versions.

The copy of MIDR is now stored in pcpu to provide a fast and
convenient way for assambly code to read it (pcpu is used quite often
so there is a chance it's inside the cache).
The MIDR is also better way of identification than using user-friendly
cpu_desc structure, because it can be compiled into comparision of
single u32 with only one access to the memory - this is crucial
for some erratums which are called from performance-critical
places.

Changes in cpu_identify makes this function safe to be called
on non-boot CPUs.

New function CPU_MATCH was implemented which returns boolean
value based on mathing masked MIDR with chip identification.
Example of usage:

printf("is thunder: %d\n", CPU_MATCH(CPU_IMPL_MASK | CPU_PART_MASK,
        CPU_IMPL_CAVIUM, CPU_PART_THUNDER, 0, 0));
printf("is generic: %d\n", CPU_MATCH(CPU_IMPL_MASK | CPU_PART_MASK,
        CPU_IMPL_ARM, CPU_PART_FOUNDATION, 0, 0));

Reviewed by:   andrew
Obtained from: Semihalf
Sponsored by:  The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D3030
2015-07-09 11:32:29 +00:00
Konstantin Belousov
fcb5b3a419 Cover a race between doselwakeup() and selfdfree(). If doselwakeup()
loop finds the selfd entry and clears its sf_si pointer, which is
handled by selfdfree() in parallel, NULL sf_si makes selfdfree() free
the memory.  The result is the race and accesses to the freed memory.

Refcount the selfd ownership.  One reference is for the sf_link
linkage, which is unconditionally dereferenced by selfdfree().
Another reference is for sf_threads, both selfdfree() and
doselwakeup() race to deref it, the winner unlinks and than frees the
selfd entry.

Reported by:	Larry Rosenman <ler@lerctr.org>
Tested by:	Larry Rosenman <ler@lerctr.org>, pho
Sponsored by:	The FreeBSD Foundation
MFC after:	2 weeks
2015-07-09 09:22:21 +00:00
Ed Schouten
39b160b3e2 Add forward declaration of struct thread.
This structure is used in some of the functions in this header, but we
don't depend on any header that pulls it i.
2015-07-09 07:31:40 +00:00
Ed Schouten
f355e810cf Generate CloudABI system call table with proper $FreeBSD$ tags. 2015-07-09 07:21:33 +00:00
Ed Schouten
6d338f9a81 Import the CloudABI datatypes and create a system call table.
CloudABI is a pure capability-based runtime environment for UNIX. It
works similar to Capsicum, except that processes already run in
capabilities mode on startup. All functionality that conflicts with this
model has been omitted, making it a compact binary interface that can be
supported by other operating systems without too much effort.

CloudABI is 'secure by default'; the idea is that it should be safe to
run arbitrary third-party binaries without requiring any explicit
hardware virtualization (Bhyve) or namespace virtualization (Jails). The
rights of an application are purely determined by the set of file
descriptors that you grant it on startup.

The datatypes and constants used by CloudABI's C library (cloudlibc) are
defined in separate files called syscalldefs_mi.h (pointer size
independent) and syscalldefs_md.h (pointer size dependent). We import
these files in sys/contrib/cloudabi and wrap around them in
cloudabi*_syscalldefs.h.

We then add stubs for all of the system calls in sys/compat/cloudabi or
sys/compat/cloudabi64, depending on whether the system call depends on
the pointer size. We only have nine system calls that depend on the
pointer size. If we ever want to support 32-bit binaries, we can simply
add sys/compat/cloudabi32 and implement these nine system calls again.

The next step is to send in code reviews for the individual system call
implementations, but also add a sysentvec, to allow CloudABI executabled
to be started through execve().

More information about CloudABI:
- GitHub: https://github.com/NuxiNL/cloudlibc
- Talk at BSDCan: https://www.youtube.com/watch?v=SVdF84x1EdA

Differential Revision:	https://reviews.freebsd.org/D2848
Reviewed by:	emaste, brooks
Obtained from:	https://github.com/NuxiNL/freebsd
2015-07-09 07:20:15 +00:00
Patrick Kelsey
358fd7dabb MFV r285292:
Merge upstream fix to eliminate build-breaking gcc warnings of no
importance.

commit: cab33b7a0acba7d2268a23c4383be6167106e549

Update ND_TTEST2 to fix issue 443

Add IS_NOT_NEGATIVE macro.
Avoid these warnings:
- comparison of unsigned expression >= 0 is always true [-Wtype-limits],
- comparison is always true due to limited range of data type [-Wtype-limits].

Reviewed by: adrian
Approved by: jmallett (mentor)
MFC after: 1 month
2015-07-08 23:57:58 +00:00
John-Mark Gurney
275a0a97ed upon further examination, it turns out that _unregister_all already
provides the guarantee that no threads will be in the _newsession code..
This is provided by the CRYPTODRIVER lock...  This makes the pause
unneeded...
2015-07-08 22:48:41 +00:00
John-Mark Gurney
94b591186d yet more documentation improvements... Many changes were made to the
OCF w/o documentation...

Document the new (8+ year old) device_t way of handling things, that
_unregister_all will leave no threads in newsession, the _SYNC flag,
the requirement that a flag be specified...

Other minor changes like breaking up a wall of text into paragraphs...
2015-07-08 22:46:45 +00:00
Baptiste Daroussin
f9b9c07087 Fix typo which breaks build of manpages when WITHOUT_MANCOMPRESS is set
PR:		201153
Reported by:	Andriy Voskoboinyk <s3erios@gmail.com>
2015-07-08 22:24:55 +00:00
Mateusz Guzik
06d1ada870 seq: use seq_consistent_nomb in seq_consistent
Constify seqp argument for seq_consistent_nomb.

No functional changes.
2015-07-08 22:21:25 +00:00
Zbigniew Bodek
0a3f65a107 Style cleanups after r285270
There should be no semicolons in added macro definitions.
Define empty macro as "do {} while (0)".

Pointed out by: jmg
2015-07-08 22:09:47 +00:00
Patrick Kelsey
1bf4ba1024 Merge upstream fix to eliminate build-breaking gcc warnings of no
importance.

commit: cab33b7a0acba7d2268a23c4383be6167106e549

Update ND_TTEST2 to fix issue 443

Add IS_NOT_NEGATIVE macro.
Avoid these warnings:
- comparison of unsigned expression >= 0 is always true [-Wtype-limits],
- comparison is always true due to limited range of data type [-Wtype-limits].

Approved by: jmallett (mentor)
2015-07-08 21:32:57 +00:00
John-Mark Gurney
e808e13b8b Now that aesni won't reuse fpu contexts (D3016), add seatbelts to the
fpu code to prevent other reuse of the contexts in the future...

Differential Revision:        https://reviews.freebsd.org/D3015
Reviewed by:	kib, gnn
2015-07-08 19:26:36 +00:00
John-Mark Gurney
9d38fd076e address an issue where consumers, like IPsec, can reuse the same
session in multiple threads w/o locking..  There was a single fpu
context shared per session, if multiple threads were using the session,
and both migrated away, they could corrupt each other's fpu context...

This patch adds a per cpu context and a lock to protect it...

It also tries to better address unloading of the aesni module...
The pause will be removed once the OpenCrypto Framework provides a
better method for draining callers into _newsession...

I first discovered the fpu context sharing issue w/ a flood ping over
an IPsec tunnel between two bhyve machines...  The patch in D3015
was used to verify that this fix does fix the issue...

Reviewed by:	gnn, kib (both earlier versions)
Differential Revision:        https://reviews.freebsd.org/D3016
2015-07-08 19:15:29 +00:00
Mark Murray
4cbf30133e Address review.
Differential Revision: https://reviews.freebsd.org/D2924
2015-07-08 18:46:44 +00:00
Konstantin Belousov
f4b5a9725a Reimplement the ordering requirements for the timehands updates, and
for timehands consumers, by using fences.

Ensure that the timehands->th_generation reset to zero is visible
before the data update is visible [*].  tc_setget() allowed data update
writes to become visible before generation (but not on TSO
architectures).

Remove tc_setgen(), tc_getgen() helpers, use atomics inline [**].

Noted by:	alc [*]
Requested by:	bde [**]
Reviewed by:	alc, bde
Sponsored by:	The FreeBSD Foundation
MFC after:	3 weeks
2015-07-08 18:42:08 +00:00
Konstantin Belousov
261fda00cd Use atomic_fence_fence_rel() to ensure ordering in the
seq_write_begin(), instead of the load_rmb/rbm_load functions.  The
update does not need to be atomic due to the write lock owned.

Similarly, in seq_write_end(), update of *seqp needs not be atomic.
Only store must be atomic with release.

For seq_read(), the natural operation is the load acquire of the
sequence value, express this directly with atomic_load_acq_int()
instead of using custom partial fence implementation
atomic_load_rmb_int().

In seq_consistent, use atomic_thread_fence_acq() which provides the
desired semantic of ordering reads before fence before the re-reading
of *seqp, instead of custom atomic_rmb_load_int().

Reviewed by:	alc, bde
Sponsored by:	The FreeBSD Foundation
MFC after:	3 weeks
2015-07-08 18:37:08 +00:00
Luigi Rizzo
b775c213c2 only enable immintrin when clang is used. The base gcc does not support it.
Reviewed by:	delphij
2015-07-08 18:36:37 +00:00
Konstantin Belousov
8954a9a4e6 Add the atomic_thread_fence() family of functions with intent to
provide a semantic defined by the C11 fences with corresponding
memory_order.

atomic_thread_fence_acq() gives r | r, w, where r and w are read and
write accesses, and | denotes the fence itself.

atomic_thread_fence_rel() is r, w | w.

atomic_thread_fence_acq_rel() is the combination of the acquire and
release in single operation.  Note that reads after the acq+rel fence
could be made visible before writes preceeding the fence.

atomic_thread_fence_seq_cst() orders all accesses before/after the
fence, and the fence itself is globally ordered against other
sequentially consistent atomic operations.

Reviewed by:	alc
Discussed with:	bde
Sponsored by:	The FreeBSD Foundation
MFC after:	3 weeks
2015-07-08 18:12:24 +00:00
Alan Cox
22cf98d1f3 The intention of r254304 was to scan the active queue continuously.
However, I've observed the active queue scan stopping when there are
frequent free page shortages and the inactive queue is steadily refilled
by other mechanisms, such as the sequential access heuristic in vm_fault()
or madvise(2).  To remedy this problem, record the time of the last active
queue scan, and always scan a number of pages proportional to the time
since the last scan, regardless of whether that last scan was a
timeout-triggered ("pass == 0") or free-page-shortage-triggered ("pass >
0") scan.

Also, on a timeout-triggered scan, allow a full scan of the active queue
when the system is short of inactive pages.

Reviewed by:	kib
MFC after:	6 weeks
Sponsored by:	EMC / Isilon Storage Division
2015-07-08 17:45:59 +00:00
Luigi Rizzo
613ab60283 add an extra tty for picobsd builds 2015-07-08 16:42:28 +00:00
Luigi Rizzo
e4405be58b trap some errors when building picobsd 2015-07-08 16:41:25 +00:00
Hiroki Sato
64bb8a3881 Implement PF_IMMUTABLE flag and apply it to "name" and "jid" in
jail.conf parameters.  This flag disallows redefinition of the parameter.

"name" and/or "jid" are automatically defined in jail.conf by using
the jail names at the front of jail parameter definitions.  However,
one could override them by using a variable with the same name like
$name = "foo".  This confused the parser and could end up with SIGSEGV.

Note that this change also affects a case when all of parameters are
defined in the command line arguments, not in jail.conf.  Specifically,
"jail -c name=j1 name=j2" no longer works.  This should be harmless.

PR:		196574
Reviewed by:	jamie
Differential Revision:	https://reviews.freebsd.org/D3017
2015-07-08 16:37:48 +00:00
Pedro F. Giffuni
850c6f5fd2 cosmetic: whitespaces-tab before EOL
Obtained from:	cpi-llvm project
2015-07-08 16:35:24 +00:00
Pedro F. Giffuni
5d0bef91b7 Use the __sentinel attribute.
Start using the gcc sentinel attribute, which can be used to
mark varargs function that need a NULL pointer to mark argument
termination, like execl(3).

Relnotes:	yes
2015-07-08 16:21:10 +00:00
Patrick Kelsey
8bdc5a6251 MFV r285191: tcpdump 4.7.4.
Also, the changes made in r272451 and r272653 that were lost in the
merge of 4.6.2 (r276788) have been restored.

PR: 199568
Differential Revision: https://reviews.freebsd.org/D3007
Reviewed by: brooks, hiren
Approved by: jmallett (mentor)
MFC after: 1 month
2015-07-08 16:19:32 +00:00
Andrew Turner
6bae05d951 Correctly set __WCHAR_MIN, there is no __UINT_MIN, it's 0.
Sponsored by:	ABT Systems Ltd
2015-07-08 16:18:28 +00:00
Patrick Kelsey
fe3ff217dd Replace use of .Po Pc with the preferred .Pq for single line
enclosures in iovctl.conf(5), iovctl(8), pci(9), and
pci_iov_schema(9).

Differential Revision: https://reviews.freebsd.org/D3000
Reviewed by: wblock
Approved by: jmallett (mentor)
2015-07-08 16:16:44 +00:00
Andrew Turner
ded32d88f1 Add support for ipi_all_but_self on arm64.
Obtained from:	ABT Systems Ltd
Sponsored by:	The freeBSD Foundation
2015-07-08 15:32:59 +00:00
Andrew Turner
80ad08a3e9 Add an implementation of savectx that doesn't just call panic.
Obtained from:	ABT Systems Ltd
Sponsored by:	The FreeBSD Foundation
2015-07-08 14:07:06 +00:00
Zbigniew Bodek
4981c60e07 Add memory barrier to bus_dmamap_sync()
On platforms which are fully IO-coherent, the map might be null.
We need to guarantee that all data is observable after the
sync operation is called. Add a memory barrier to ensure that on ARM.

Reviewed by:   andrew, kib
Obtained from: Semihalf
Sponsored by:  The FreeBSD Foundation
Differential Revision: https://reviews.freebsd.org/D3012
2015-07-08 13:52:59 +00:00
Konstantin Belousov
69d11def74 Handle copyout for the fcntl(F_OGETLK) using oflock structure.
Otherwise, kernel overwrites a word past the destination.

Submitted by:	walter@pelissero.de
PR:	196718
MFC after:	1 week
2015-07-08 13:19:13 +00:00
Andrew Turner
cb02f6b942 Send the correct signal when vm_fault fails. While here also set the code
and address fields.

Sponsored by:	ABT Systems Ltd
2015-07-08 12:42:44 +00:00
Glen Barber
e90b4979bb Document r283961, pw(8) '-R' option.
Sponsored by:	The FreeBSD Foundation
2015-07-08 12:07:50 +00:00
Conrad Meyer
80f295a415 Add myself to committers-src.dot
Approved by:	markj (mentor)
2015-07-08 03:20:28 +00:00
Hiroki Sato
882efc9ac2 Fix offset calculation in variable substitution
in jail.conf.  The following did not work correctly:

 A="A_${B}_C_${D}"
 B="BBBBB"
 D="DDDD_${E}_FFFFF"
 E="EEEEE"

PR:		189139
Reviewed by:	jamie
Differential Revision:	https://reviews.freebsd.org/D3018
2015-07-08 00:51:53 +00:00
Rick Macklem
c088e62e34 Since the case where secflavor < 0 indicates the security flavor is
to be negotiated, it could be a Kerberized mount. As such, filling
in the "principal" argument using the canonized host name makes sense.
If it is negotiated as AUTH_SYS, the "principal" argument is meaningless
but harmless.

Requested by:	masato@itc.naist.jp
Tested by:	masato@itc.naist.jp
PR:		201073
MFC after:	1 month
2015-07-07 23:41:25 +00:00
Baptiste Daroussin
59856c7d26 pw: fail if an invalid entry is found while parsing master.passwd and group
PR:		198554
Reported by:	diaran <fbsd@centraltech.co.uk>
MFC after:	2 days
2015-07-07 21:05:20 +00:00
John-Mark Gurney
a13589bc47 unroll the loop slightly... This improves performance enough to
justify, especially for CBC performance where we can't pipeline..  I
don't happen to have my measurements handy though...

Sponsored by:	Netflix, Inc.
2015-07-07 20:31:09 +00:00
Hiroki Sato
754f368cda - Add IPv6 support in quota(1). While rpc.rquotad has supported PF_INET6
for a long time, quota(1) utility supported only PF_INET.

- Clean up confusing changes in f_mntfromname.

- Add an entry for rquotad with rpc/udp6 to inetd.conf.

PR:	194084
2015-07-07 20:15:09 +00:00